## Archive for October, 2015

### A Certain Tendency Of The Database Community

Tuesday, October 27th, 2015

From the post:

Abstract

We posit that striving for distributed systems that provide “single system image” semantics is fundamentally flawed and at odds with how systems operate in the physical world. We realize the database as an optimization of this system: a required, essential optimization in practice that facilitates central data placement and ease of access to participants in a system. We motivate a new model of computation that is designed to address the problems of computation over “eventually consistent” information in a large-scale distributed system.

Eventual Consistency

When we think about the world we live in, we do not usually say it is eventually consistent, for this is a term usually applied to computing systems, made up of multiple machines, that have to operate with shared information.

Eventual consistency is a consistency model for replicated, shared state. A consistency model is a contract between an application developer and a system that application will run on. A contract between a developer and a system states the following: given the developer follows the rules defined by the system, certain outcomes from the system are guaranteed. This makes it possible for developers to build successful applications, for without this contract, applications would have no guarantee that the actions they perform would have a correct outcome.

(italics in original)

A very accessible and great read on “eventual consistency.”

Christopher points out that any “state” of knowledge is a snapshot under a given set of constraints:

For instance, if the leading researchers on breast cancer were to document the state-of-the-art in a book, as the document is being written it would no longer reflect the state-of-the-art. The collective knowledge of this group is always changing, and as long as we continue to rewrite the document it will only be approaching the combined knowledge of the group. We can think of this somewhat formally: if we had a way to view the group’s knowledge as a omniscient observer and we represent that knowledge as a linear function, the recorded text would be asymptotic to function of the sum of global knowledge.

He concludes with this question:

…Can we build computational abstractions that allow devices to communicate peer-to-peer, acknowledging the true source of truth for a particular piece of information and scale to the amount of information that exists, not only between all computers in a planetary-scale distributed system, but all entities in the universe[?]

I’m not sure about “all entities in the universe,” or even a “planetary-scale distributed system,” but we do know that Netware Directory Services (NDS) (now eDirectory) was a replicated, distributed, sharded database with eventual convergence that was written in 1993.

We have had the computational abstractions for a replicated, distributed, sharded database with eventual convergence for a number of years.

I would adjust Christopher’s “true source of truth,” for “source of truth as defined by users,” to avoid the one-world-truth position that crippled the Semantic Web even before FOL and RDF syntax arrived.

Monday, October 26th, 2015

From the post:

MIT should issue a paperback version for 5.00 (or less in bulk), to put Obfuscation in the range of conference swag. The underlying principles and discussion are all very scholarly I’m sure (I haven’t read it yet) but obfuscation can only flourish when practiced in large numbers. Cf. “I’m Spartacus”. Spartacus (IMDB), Spartacus Film (Wikipedia) To paraphrase the Capital One ad: How many different identities do you have in your wallet? ### Howler Monkeys with the Louder Voices have Smaller Testicles Saturday, October 24th, 2015 Howler Monkeys with the Louder Voices have Smaller Testicles by Donald V. Morris. This was too funny to pass up. Reminds me of pitch people for technologies that gloss over the details and distort reality beyond mere exaggeration. Claims of impending world domination when your entire slice of the market for a type of technology is less than one percent for example. That not “impending” in any recognizable sense of the word. Add your own commentary/remarks and pass this along to your co-workers. I first saw this in a tweet by Violet Blue. PS: Yes, I saw that Howler monkeys with smaller testicles live with harems. Consider that a test of how many people will forward the article without reading it first. 😉 ### Information Cartography Friday, October 23rd, 2015 Information Cartography by Carlos Guestrin and Eric Horvitz. (cacm.acm.org/magazines/2015/11/193323) Brief discussion of the CACM paper that I think will capture your interest. From the introduction: We demonstrate that metro maps can help people understand information in many areas, including news stories, research areas, legal cases, even works of literature. Metro maps can help them cope with information overload, framing a direction for research on automated extraction of information, as well as on new representations for summarizing and presenting complex sets of interrelated concepts. Spend some time this weekend with this article and its references. More to follow next week! ### “The first casualty, when war comes, is truth” Thursday, October 22nd, 2015 The quote, “The first casualty, when war comes, is truth,” is commonly attributed to Hiram Johnson a Republican politician from California in 1917. Johnson died on August 6, 1945, the day the United States dropped an atomic bomb on Hiroshima. The ARCADE: Artillery Crater Analysis and Detection Engine is an effort to make it possible for anyone to rescue bits of the truth, even during war, at least with regard to the use of military ordinance. From the post: Destroyed buildings and infrastructure, temporary settlements, terrain disturbances and other signs of conflict can be seen in freely available satellite imagery. The ARtillery Crater Analysis and Detection Engine (ARCADE) is experimental computer vision software developed by Rudiment and the Centre for Visual Computing at the University of Bradford. ARCADE examines satellite imagery for signs of artillery bombardment, calculates the location of artillery craters, the inbound trajectory of projectiles to aid identification of their possible origins of fire. An early version of the tool that demonstrates the core capabilities is available here. The software currently runs on Windows with MATLAB, but if there is enough interest, it could be ported to an open toolset built around OpenCV. Everyone who is interested in military actions anywhere in the world should be a supporter of this project. Given the poverty of Western reporting on bombings by the United States government around the world, I am very interested in the success of this project. The post is a great introduction to the difficulties and potential uses of satellite data to uncover truths governments would prefer to remain hidden. That alone should be enough justification for supporting this project. ### Learning Topic Map Concepts Through Topic Map Completion Puzzles Wednesday, October 21st, 2015 From the post: There are lots of puzzle programming tutorials currently in fashion: Code.org, Gidget and Parson’s programming puzzles. But, we don’t really know if they work? There is work [1] that shows that completion exercises do work well, but what about puzzles? That is what Kyle wants to find out. Felienne is live blogging presentations from VL/HCC 2015 IEEE Symposium on Visual Languages and Human-Centric. The post is quick read and should generate interest in both programming completion puzzles as well as similar puzzles for authoring topic maps. Before you question the results based on the sample size, 27 students, realize that is 27 more test subjects than a database project to replace all the outward services for 5K+ users. Fortunately, very fortunately, a group was able to convince management to tank the entire project. Quite a nightmare and slur on “agile development.” The lesson here is that puzzles are useful and some test subjects are better than no test subjects at all. Suggestions for topic map puzzles? ### Query the Northwind Database as a Graph Using Gremlin Wednesday, October 21st, 2015 From the post: One of the most popular and interesting topics in the world of NoSQL databases is graph. At DataStax, we have invested in graph computing through the acquisition of Aurelius, the company behind TitanDB, and are especially committed to ensuring the success of the Gremlin graph traversal language. Gremlin is part of the open source Apache TinkerPop graph framework project and is a graph traversal language used by many different graph databases. I wanted to introduce you to a superb web site that our own Daniel Kuppitz maintains called “SQL2Gremlin” (http://sql2gremlin.com) which I think is great way to start learning how to query graph databases for those of us who come from the traditional relational database world. It is full of excellent sample SQL queries from the popular public domain RDBMS dataset Northwind and demonstrates how to produce the same results by using Gremlin. For me, learning by example has been a great way to get introduced to graph querying and I think that you’ll find it very useful as well. I’m only going to walk through a couple of examples here as an intro to what you will find at the full site. But if you are new to graph databases and Gremlin, then I highly encourage you to visit the sql2gremlin site for the rest of the complete samples. There is also a nice example of an interactive visualization / filtering, search tool here that helps visualize the Northwind data set as it has been converted into a graph model. I’ve worked with (and worked for) Microsoft SQL Server for a very long time. Since Daniel’s examples use T-SQL, we’ll stick with SQL Server for this blog post as an intro to Gremlin and we’ll use the Northwind samples for SQL Server 2014. You can download the entire Northwind sample database here. Load that database into your SQL Server if you wish to follow along. When I first saw the title to this post, Query the Northwind Database as a Graph Using Gremlin (emphasis added) I thought this was something else. A database about the Northwind album. Little did I suspect that the Northwind Database is a test database for SQL Server 2005 and SQL Server 2008. Yikes! Still, I thought some of you might have access to such legacy software and so I am pointing you to this post. 😉 PSA: Support for SQL Server 2005 ends April 16, 2016 (that’s next April) Support for SQL Server 2008 ended July 8, 2014 Ouch! You are more than a year into a dangerous place. Upgrade, migrate or get another job. Hard times are coming and blame will be assigned. ### Clojure for the Brave and True Update! Wednesday, October 21st, 2015 Clojure for the Brave and True by Daniel Higginbotham. From the webpage: Clojure for the Brave and True is now available in print! You can use the coupon code ZOMBIEHUGS to get 30% off at No Starch (plus you’ll get a free sticker), or buy it from Amazon. The web site has been updated, too! (Don’t forget to force refresh.) One of the reasons I went with No Starch as a publisher was that they supported the idea of keeping the entire book available for free online. It makes me super happy to release the professionally-edited, even better book for free. I hope it makes you laugh, cry, and give up on object-oriented programming forever. Writing this book was one of the most ambitious projects of my life, and I appreciate all the support I’ve gotten from friends, family, and readers like you. Thank you from the bottom of my crusty heart! [Update] I got asked for a list of the major differences. Here they are: • Illustrations! • Almost every chapter now has exercises • The first macro chapter, Read and Eval, is massively improved. I’m hoping this will gives readers an excellent conceptual foundation for working with macros • There’s now a joke about melting faces • There used to be two Emacs chapters (basic emacs and using Emacs for Clojure dev), now there’s just one • The concurrency chapter got split into two chapters • Appendices on Leiningen and Boot were added • The “Do Things” chapter is much friendlier • I spend a lot more time explaining some of the more obscure topics, like lazy sequences. • Many of the chapters got massive overhauls. The functional programming chapter, for example, was turned completely inside out, and the result is that it’s much, much clearer • Overall, everything should be clearer Daniel has taken the plunge and quit his job to have more time for writing. If you can, buy a print copy and recommend Clojure for the Brave and True to a friend! We need to encourage people like Daniel and publishers like No Starch. Vote with your feet and your pocket books. Follow Daniel on twitter @nonrecursive ### The Future Of News Is Not An Article Wednesday, October 21st, 2015 The Future Of News Is Not An Article by Alexis Lloyd. Alexis challenges readers to reconsider their assumptions about the nature of “articles.” Beginning with the model for articles that was taken over from traditional print media. Whatever appeared in an article yesterday must be re-created today if there is a new article on the same subject. Not surprising since print media lacks the means to transclude content from a prior article into a new one. She saves her best argument for last: A news organization publishes hundreds of articles a day, then starts all over the next day, recreating any redundant content each time. This approach is deeply shaped by the constraints of print media and seems unnecessary and strange when looked at from a natively digital perspective. Can you imagine if, every time something new happened in Syria, Wikipedia published a new Syria page, and in order to understand the bigger picture, you had to manually sift through hundreds of pages with overlapping information? The idea seems absurd in that context and yet, it is essentially what news publishers do every day. While I agree fully with the advantages Alexis summarizes as Enhanced tools for journalists, Summarization and synthesis, and Adaptive Content (see her post), there are technical and non-technical roadblocks to such changes. First and foremost, people are being paid to re-create redundant content everyday and their comfort levels, to say nothing about their remuneration for repetitive reporting of the same content will loom large in the adoption of the technology Alexis imagines. I recall a disturbing story from a major paper where reporters didn’t share leads or research because of fear that other reporters would “scoop” them. That sort of protectionism isn’t limited to journalists. Rumor has it that Oracle sale reps refused to enter potential sales leads in a company wide database. I don’t understand why that sort of pettiness is tolerated but be aware that it is, both in government and corporate environments. Second and almost as importantly, Alexis needs raise the question of semantic ROI for any semantic technology. Take her point about adoption of the Semantic Web: but have not seen universal adoption because of the labor costs involved in doing so. To adopt a single level of semantic encoding for all content, without regard to its value, either historical or current use, is a sure budget buster. Perhaps the business community was playing closer attention to the Semantic Web than many of us thought, hence its adoption failure. Some content may need machine driven encoding, more valuable content may require human supervision and/or encoding and some content may not be worth encoding at all. Depends on your ROI model. I should mention that the Semantic Web manages statements about statements (in its or other semantic systems) poorly. (AKA, “facts about facts.”) Although I hate to use the term “facts.” The very notion of “facts” is misleading and tricky under the best of circumstances. However universal (universal = among people you know) knowledge of a “fact” may seem, the better argument is that it is only a “fact” from a particular point of view. Semantic Web based systems have difficulty with such concepts. Third, and not mentioned by Alexis, is that semantic systems should capture and preserve trails created by information explorers. Reporters at the New York Times use databases everyday, but each search starts from scratch. If re-making redundant information over and over again is absurd, repeating the same searches (more or less successfully) over and over again is insane. Capturing search trails as data would enrich existing databases, especially if searchers could annotate their trails and data they encounter along the way. The more intensively searched a resource becomes, the richer its semantics. As it is today, all the effort of searchers is lost at the end of each search. Alexis is right, let’s stop entombing knowledge in articles, papers, posts and books. It won’t be quick or easy, but worthwhile journeys rarely are. I first saw this in a tweet by Tim Strehle. ### Who’s talking about what [BBC News Labs] Wednesday, October 21st, 2015 Who’s talking about what – See who’s talking about what across hundreds of news sources. Imagine comparing the coverage of news feeds from approximately 350 sources (you choose), with granular date ranges (instead of last 24 hours, last week, last month, last year) plus, “…AND, OR, NOT and parenthesis in queries.” The interface shows co-occurring topics as well. BBC New Labs did more than +1! a great idea, they implemented it and posted their code. From the webpage: Inspired by a concept created by Adam Ramsay, Zoe Blackler & Iain Collins at the Center for Investigative Reporting design sprint on Climate Change Implementation by Iain Collins and Sylvia Tippmann, using data from the BBC News Labs Juicer | View Source What conclusions would you draw from reports starting September 1, 2015 to date, “violence AND Israel? One story only illustrates the power of this tool to create comparisons between news sources. Drawing conclusions about news sources requires systematic study of sources across a range of stories. The ability to do precisely that has fallen into your lap. I first saw this in a tweet by Nick Diakopoulos. ### Pixar Online Library Tuesday, October 20th, 2015 Pixar Online Library The five most recent titles: • Vector Field Processing on Triangle Meshes • Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains • Approximate Reflectance Profiles for Efficient Subsurface Scattering • Subspace Condensation: Full Space Adaptivity for Subspace Deformations • A Data-Driven Light Scattering Model for Hair Even with help from PIXAR, your app isn’t going to be compelling enough to make users forego breaks, etc. But, on the other hand, you won’t know until you try. 😉 I was surprised that a list of Pixar films didn’t have an edgy one in the bunch. The techniques valid for G-rated fare can be amped up for your app. What graphics or sounds would you program for bank apps? I first saw this in a tweet by Ozge Ozcakir. ### Making Learning Easy by Design Tuesday, October 20th, 2015 Making Learning Easy by Design – How Google’s Primer team approached UX by Sandra Nam. From the post: How can design make learning feel like less of a chore? It’s not as easy as it sounds. Flat out, people usually won’t go out of their way to learn something new. Research shows that only 3% of adults in the U.S. spend time learning during their day.¹ Think about that for a second: Despite all the information available at our fingertips, and all the new technologies that emerge seemingly overnight, 97% of people won’t spend any time actively seeking out new knowledge for their own development. That was the challenge at hand when our team at Google set out to create Primer, a new mobile app that helps people learn digital marketing concepts in 5 minutes or less. UX was at the heart of this mission. Learning has several barriers to entry: you need to figure out what, where, how you want to learn, and then you need the time, money, and energy to follow through. A short read that makes it clear that designing a learning experience is not easy or quick. Take fair warning from: only 3% of adults in the U.S. spend time learning during their day when you plan on users “learning” a better way from your app or software. Targeting 3% of a potential audience isn’t a sound marketing strategy. Google is targeting the other 97%. Shouldn’t you too? ### Python at the Large Hadron Collider and CERN Tuesday, October 20th, 2015 From the webpage: The largest machine ever built is the Large Hadron Collider at CERN. It’s primary goal was the discovery of the Higgs Boson: the fundamental particle which gives all objects mass. The LHC team of 1000’s of physicists achieved that goal in 2012 winning the Nobel Prize in physics. Kyle Cranmer is here to share how Python was at the core of this amazing achievement! You’ll learn about the different experiment including ATLAS and CMS. We talk a bit about the physics involved in the discovery before digging into the software and computer technology used at CERN. The collisions generate a tremendous amount of data and the technology to filter, gather, and understand the data is super interesting. You’ll also learn about Crayfis, the app that turns your phone into a cosmic ray detector. No joke. Kyle is taking citizen science to a whole new level. Bio on Kyle Crammer: Kyle Cranmer is an American physicist and a professor at New York University at the Center for Cosmology and Particle Physics and Affiliated Faculty member at NYU’s Center for Data Science. He is an experimental particle physicist working, primarily, on the Large Hadron Collider, based in Geneva, Switzerland. Cranmer popularized a collaborative statistical modeling approach and developed statistical methodology, which was used extensively for the discovery of the Higgs boson at the LHC in July, 2012. CRAYFIS – Join the first and only crowd-sourced cosmic ray detector. You might just help discover something big. Not heavy with technical information but a nice glimpse into the computing side of CERN. Share with students to encourage them to pick up programming skills as we once did typing. ### Neural Networks Demystified (videos) Tuesday, October 20th, 2015 I first saw this video series in a tweet by Jason Baldridge. You know what a pig’s breakfast YouTube’s related videos can be. No matter which part I looked at, there was no full listing of the other parts. To save you that annoyance, here are all the videos in this series. (That’s a partial definition of curation, saving other people time and expense in finding information.) ### Faster Graph Processing [Almost Linear Time Construction Of Spectral Sparsifier For Any Graph] Tuesday, October 20th, 2015 Abstract: We present the first almost-linear time algorithm for constructing linear-sized spectral sparsification for graphs. This improves all previous constructions of linear-sized spectral sparsification, which requires $\Omega(n^2)$ time. A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance, and adaptive constructions based on barrier functions. Apologies to the paper authors for my liberties with their title: Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time but I wanted to capture eyes that might glaze past their more formal title. The PR release where I saw this article reads as follows: In the second paper, Constructing linear-sized spectral sparsification in almost-linear time, Dr He Sun, Lecturer in Computer Science in the University’s Department of Computer Science and Yin Tat Lee, a PhD student from MIT, have presented the first algorithm for constructing linear-sized spectral sparsifiers that runs in almost-linear time. More and more applications from today’s big data scenario need to deal with graphs of millions of vertices. While traditional algorithms can be applied directly in these massive graphs, these algorithms are usually too slow to be practical when the graph contains millions of vertices. Also, storing these practical massive graphs are very expensive. Dr He Sun said: “Over the past decade, there have been intensive studies in order to overcome these two bottlenecks. One notable approach is through the intermediate step called spectral sparsification, which is the approximation of any input graph by a very sparse graph that inherits many properties of the input graph. Since most algorithms run faster in sparse graphs, spectral sparsification is used as a key intermediate step in speeding up the runtime of many practical graph algorithms, including finding approximate maximum flows in an undirected graph, and approximately solving linear systems, among many others.” Using spectral sparsification, the researchers ran many algorithms in a sparse graph, and obtained approximately the correct results as well. This general framework allowed them to speed up the runtime of a wide range of algorithms by a magnitude. However, to make the overall approach practical, a key issue was to find faster constructions of spectral sparsification with fewer edges in the resulting sparsifiers. There have been many studies looking at this area in the past decade. The researchers have proved that, for any graph, they can construct in almost-linear time its spectral sparsifier, and in the output sparsifier every vertex has only constant number of vertices. This result is almost optimal respect to time complexity of the algorithm, and the number of edges in the spectral sparsifier. Very heavy sledding in the paper but you don’t have to be able to originate the insight in order to take advantage of the technique. Enjoy! ### Introduction to Data Science (3rd Edition) Monday, October 19th, 2015 Introduction to Data Science, 3rd Edition by Jeffrey Stanton. From the webpage: In this Introduction to Data Science eBook, a series of data problems of increasing complexity is used to illustrate the skills and capabilities needed by data scientists. The open source data analysis program known as “R” and its graphical user interface companion “R-Studio” are used to work with real data examples to illustrate both the challenges of data science and some of the techniques used to address those challenges. To the greatest extent possible, real datasets reflecting important contemporary issues are used as the basis of the discussions. A very good introductory text on data science. I originally saw a tweet about the second edition but searching on the title and Stanton uncovered this later version. In the timeless world of the WWW, the amount of out-dated information vastly exceeds the latest. Check for updates before broadcasting your latest “find.” ### CrowdTruth Monday, October 19th, 2015 CrowdTruth From the webpage: The CrowdTruth Framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The approach is focussed specifically on collecting gold standard data for training and evaluation of cognitive computing systems. The original framework was inspired by the IBM Watson project for providing improved (multi-perspective) gold standard (medical) text annotation data for the training and evaluation of various IBM Watson components, such as Medical Relation Extraction, Medical Factor Extraction and Question-Answer passage alignment. The CrowdTruth framework supports the composition of CrowdTruth gathering workﬂows, where a sequence of micro-annotation tasks can be configured and sent out to a number of crowdsourcing platforms (e.g. CrowdFlower and Amazon Mechanical Turk) and applications (e.g. Expert annotation game Dr. Detective). The CrowdTruth framework has a special focus on micro-tasks for knowledge extraction in medical text (e.g. medical documents, from various sources such as Wikipedia articles or patient case reports). The main steps involved in the CrowdTruth workﬂow are: (1) exploring & processing of input data, (2) collecting of annotation data, and (3) applying disagreement analytics on the results. These steps are realised in an automatic end-to-end workﬂow, that can support a continuous collection of high quality gold standard data with feedback loop to all steps of the process. Have a look at our presentations and papers for more details on the research. An encouraging quote from Truth is a Lie by Lora Aroyo. the idea of truth is a fallacy for semantic interpretation and needs to be changed I don’t disagree but observe a “crowdtruth” with disagreements is a variant of “truth.” What variant of “truth” is of interest to your client is an important issue. CIA analysts, for example, have little interest in crowdtruths that threaten their prestige and/or continued employment. “Accuracy” is only one aspect of any truth. If your client is sold on crowdtruths, by all means take up the banner on their behalf. Always remembering: There are no facts, only interpretations. (Nietzsche) Which interpretation interests you? ### Holographic Embeddings of Knowledge Graphs [Are You Blinding/Gelding Raw Data?] Monday, October 19th, 2015 Abstract: Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. In extensive experiments we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction in knowledge graphs and relational learning benchmark datasets. Heavy sledding but also a good candidate for practicing How to Read a Paper. I suggest that in part because of this comment by the authors in the conclusion: In future work we plan to further exploit the fixed-width representations of holographic embeddings in complex scenarios, as they are especially suitable to model higher-arity relations (e.g., taughtAt(John, AI, MIT)) and facts about facts (e.g., believes(John, loves(Tom, Mary))). Any representation where statements of “higher-arity relations” and “facts about facts” are not easily recorded and processed, is seriously impaired when it comes to capturing human knowledge. Perhaps capturing only triples and “facts” explains the multiple failures of the U.S. intelligence community. It is working with tools that blind and geld its raw data. The rich nuances of intelligence data are lost in a grayish paste suitable for computer consumption. A line of research worth following. Maximilian Nickel‘s homepage at MIT is a good place to start. I first saw this in a tweet by Stefano Bertolo. ### Tracie Powell: “We’re supposed to challenge power… Sunday, October 18th, 2015 From the post: Tracie Powell tries not to use the word “diversity” anymore. “When you talk about diversity, people’s eyes glaze over,” Powell, the founder of All Digitocracy, told me. The site covers tech, policy, and the impact of media on communities that Powell describes as “emerging audiences” — people of color and of different sexual orientations and gender identities. ￼I first heard Powell speak at the LION conference for hyperlocal publishers in Chicago earlier this month, where she stood in front of the almost entirely white audience to discuss how journalists and news organizations can get better at reporting for more people. I followed up with Powell, who is currently a John S. Knight Journalism Fellow at Stanford, to hear more. “If we [as journalists] don’t do a better job at engaging with these audiences, we’re dead,” Powell said. “Our survival depends on reaching these emerging audiences.” Here’s a lightly condensed and edited version of our conversation. Warning: Challenging power is far more risky than supporting fiery denunciations of the most vulnerable and least powerful in society. From women facing hard choices about pregnancy, rape victims, survivors of abuse both physical and emotional, or those who have lived with doubt, discrimination and deprivation as day to day realities, victims of power aren’t hard to find. One of the powers that needs to be challenged is the news media itself. Take for example the near constant emphasis on gun violence and mass shootings. If you were to take the news media at face value, you would be frightened to go outside. But, a 2013 Pew Center Report, Gun Homicide Rate Down 49% Since 1993 Peak; Public Unaware tell a different tale: Not as satisfying as taking down a representative or senator but in the long run, influencing the mass media may be a more reliable path to challenging power. ### Teaching Deep Convolutional Neural Networks to Play Go [Networks that can’t explain their play] Sunday, October 18th, 2015 Abstract: Mastering the game of Go has remained a long standing challenge to the field of AI. Modern computer Go systems rely on processing millions of possible future positions to play well, but intuitively a stronger and more ‘humanlike’ way to play the game would be to rely on pattern recognition abilities rather then brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ‘hard code’ symmetries that are expect to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction programs have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go, indicating it is state of the art among programs that do not use Monte Carlo Tree Search. It is also able to win some games against state of the art Go playing program Fuego while using a fraction of the play time. This success at playing Go indicates high level principles of the game were learned. The last line of the abstract caught my eye: This success at playing Go indicates high level principles of the game were learned. That statement is expanded in 4.3 Playing Go: The results are very promising. Even though the networks are playing using a ‘zero step look ahead’ policy, and using a fraction of the computation time as their opponents, they are still able to play better then GNU Go and take some games away from Fuego. Under these settings GNU Go might play at around a 6-8 kyu ranking and Fuego at 2-3 kyu, which implies the networks are achieving a ranking of approximately 4-5 kyu. For a human player reaching this ranking would normally require years of study. This indicates that sophisticated knowledge of the game was acquired. This also indicates great potential for a Go program that integrates the information produced by such a network. An interesting limitation that the network can’t communicate what it has learned. It can only produce an answer for a given situation. In gaming situations that opaqueness isn’t immediately objectionable. But what if the situation was fire/don’t fire in a combat situation? Would the limitation that the network can only say yes or no, with no way to explain its answer, be acceptable? Is that any worse than humans inventing explanations for decisions that weren’t the result of any rational thinking process? Some additional Go resources you may find useful: American Go Association, Go Game Guru (with a printable Go board and stones), GoBase.org (has a Japanese dictionary). Those site will lead you to many other Go sites. ### Text Analysis Without Programming Sunday, October 18th, 2015 My favorite line in the slideshow reads: PDFs are a sad text data reality The slides give a good overview of a number of simple tools for text analysis. And Cherny doesn’t skimp on pointing out issues with tools such as word clouds, where she says: People don’t know what they indicate (and at the bottom of the slide: “But geez do people love them.”) I suspect her observation on the uncertainty of what word clouds indicate is partially responsible for their popularity. No matter what conclusion you draw about a word cloud, how could anyone offer a contrary argument? A coding talk is promised and I am looking forward to it. Enjoy! ### 16+ Free Data Science Books Sunday, October 18th, 2015 From the webpage: As a data scientist at Quora, I often get asked for my advice about becoming a data scientist. To help those people, I’ve took some time to compile my top recommendations of quality data science books that are either available for free (by generosity of the author) or are Pay What You Want (PWYW) with0 minimum.

Please bookmark this place and refer to it often! Click on the book covers to take yourself to the free versions of the book. I’ve also provided Amazon links (when applicable) in my descriptions in case you want to buy a physical copy. There’s actually more than 16 free books here since I’ve added a few since conception, but I’m keeping the name of this website for recognition.

The authors of these books have put in much effort to produce these free resources – please consider supporting them through avenues that the authors provide, such as contributing via PWYW or buying a hard copy [Disclosure: I get a small commission via the Amazon links, and I am co-author of one of these books].

Some of the usual suspects are here along with some unexpected titles, such as A First Course in Design and Analysis of Experiments by Gary W. Oehlert.

From the introduction:

Researchers use experiments to answer questions. Typical questions might be:

• Is a drug a safe, effective cure for a disease? This could be a test of how AZT affects the progress of AIDS
• Which combination of protein and carbohydrate sources provides the best nutrition for growing lambs?
• How will long-distance telephone usage change if our company offers a different rate structure to our customers
• Will an ice cream manufactured with a new kind of stabilizer be as palatable as our current ice cream?
• Does short-term incarceration of spouse abusers deter future assaults?
• Under what conditions should I operate my chemical refinery, given this month’s grade of raw material?

This book is meant to help decision makers and researchers design good experiments, analyze them properly, and answer their questions.

It isn’t short, six hundred and fifty-nine pages, but taken in small doses you will learn a great deal about experimental design. Not only how to properly design experiments but how to spot when they aren’t well designed.

Think of it as training to go big-game hunting in the latest issue of Nature or Science. Adds a bit of competitiveness to the enterprise.

### Drone Registration Coming! Call the NRA!

Sunday, October 18th, 2015

From the post:

The US government plans to make it a mandatory requirement that all drone purchases, including those made by consumers, be formally registered. NBC News reports that the Department of Transportation will announce the new plan on Monday, with hopes to have this drone registry implemented by the holidays, when drones will likely prove a popular gift. The Obama administration and DoT have yet to announce any such press conference for Monday.

Chris promises more details so follow @chriswelch.

Registration of drones isn’t going to help regulate drones, unless of course the drones have identifying marks and/or broadcast their registration. Yes?

In other words, registration of drones is a means of further government surveillance on where and when you fly your drone.

If you want an unregistered drone, buy one before regulations requiring registration go into effect.

So long as you are obeying all aviation laws, the government has no right to know where and when you fly your drone.

Hopefully the NRA will realize that preserving gun ownership where the government tracks: