Archive for the ‘Inference’ Category

Drawing Causal Inference from Big Data

Wednesday, April 8th, 2015

Drawing Causal Inference from Big Data.


This colloquium was motivated by the exponentially growing amount of information collected about complex systems, colloquially referred to as “Big Data”. It was aimed at methods to draw causal inference from these large data sets, most of which are not derived from carefully controlled experiments. Although correlations among observations are vast in number and often easy to obtain, causality is much harder to assess and establish, partly because causality is a vague and poorly specified construct for complex systems. Speakers discussed both the conceptual framework required to establish causal inference and designs and computational methods that can allow causality to be inferred. The program illustrates state-of-the-art methods with approaches derived from such fields as statistics, graph theory, machine learning, philosophy, and computer science, and the talks will cover such domains as social networks, medicine, health, economics, business, internet data and usage, search engines, and genetics. The presentations also addressed the possibility of testing causality in large data settings, and will raise certain basic questions: Will access to massive data be a key to understanding the fundamental questions of basic and applied science? Or does the vast increase in data confound analysis, produce computational bottlenecks, and decrease the ability to draw valid causal inferences?

Videos of the talks are available on the Sackler YouTube Channel. More videos will be added as they are approved by the speakers.

Great material but I’m in the David Hume camp when it comes to causality. Or more properly the sceptical realist interpretation of David Hume. The contemporary claims that ISIS is a social media Svengali is a good case in point. The only two “facts” that not in dispute is that ISIS has used social media and some Westerners have in fact joined up with ISIS.

Both of those facts are true, but to assert a causal link between them borders on the bizarre. Joshua Berlinger reports in The names: Who has been recruited to ISIS from the West that some twenty thousand (20,000) foreign fighters have joined ISIS. That group of foreign fighters hails from ninety (90) countries and thirty-four hundred are from Western states.

Even without Hume’s skepticism on causation, there is no evidence for the proposition that current foreign fighters read about ISIS on social media and therefore decided to join up. None, nada, the empty set. The causal link between social media and ISIS is wholly fictional and made to further other policy goals, like censoring ISIS content.

Be careful how you throw “causality” about when talking about big data or data in general.

The listing of the current videos at YouTube has the author names only, does not include the titles or abstracts. To make these slightly more accessible, I have created the following listing with the author, title (link to YouTube if available), and Abstract/Slides as appropriate. In alphabetical order by last name. Author names are hyperlinks to identify the authors.

Edo Airoldi, Harvard University, Optimal Design of Causal Experiments in the Presence of Social Interference. Abstract

Susan Athey, Stanford University, Estimating Heterogeneous Treatment Effects Using Machine Learning in Observational Studies. Slides.

Leon Bottou, Facebook AI Research, Causal Reasoning and Learning Systems Abstract

Peter Buhlmann, ETH Zurich, Causal Inference Based on Invariance: Exploiting the Power of Heterogeneous Data Slides

Dean Eckles, Facebook, Identifying Peer Effects in Social Networks Abstract

James Fowler, University of California, San Diego, An 85 Million Person Follow-up to a 61 Million Person Experiment in Social Influence and Political Mobilization. Abstract

Michael Hawrylycz, Allen Institute, Project MindScope:  From Big Data to Behavior in the Functioning Cortex Abstract

David Heckerman, Microsoft Corporation, Causal Inference in the Presence of Hidden Confounders in Genomics Slides.

Michael Jordan, University of California, Berkeley, On Computational Thinking, Inferential Thinking and Big Data . Abstract.

Steven Levitt, The University of Chicago, Thinking Differently About Big Data Abstract

David Madigan, Columbia University, Honest Inference From Observational Database Studies Abstract

Judea Pearl, University of California, Los Angeles, Taming the Challenge of Extrapolation: From Multiple Experiments and Observations to Valid Causal Conclusions Slides

Thomas Richardson, University of Washington, Non-parametric Causal Inference Abstract

James Robins, Harvard University, Personalized Medicine, Optimal Treatment Strategies, and First Do No Harm: Time Varying Treatments and Big Data Abstract

Bernhard Schölkopf, Max Planck Institute, Toward Causal Machine Learning Abstract.

Jasjeet Sekhon, University of California, Berkeley, Combining Experiments with Big Data to Estimate Treatment Effects Abstract.

Richard Shiffrin, Indiana University, The Big Data Sea Change Abstract.

I call your attention to this part of Shiffrin’s abstract:

Second, having found a pattern, how can we explain its causes?

This is the focus of the present Sackler Colloquium. If in a terabyte data base we notice factor A is correlated with factor B, there might be a direct causal connection between the two, but there might be something like 2**300 other potential causal loops to be considered. Things could be even more daunting: To infer probabilities of causes could require consideration all distributions of probabilities assigned to the 2**300 possibilities. Such numbers are both fanciful and absurd, but are sufficient to show that inferring causality in Big Data requires new techniques. These are under development, and we will hear some of the promising approaches in the next two days.

John Stamatoyannopoulos, University of Washington, Decoding the Human Genome:  From Sequence to Knowledge.

Hal Varian, Google, Inc., Causal Inference, Econometrics, and Big Data Abstract.

Bin Yu, University of California, Berkeley, Lasso Adjustments of Treatment Effect Estimates in Randomized Experiments  Abstract.

If you are interested in starting an argument, watch the Steven Levitt video starting at timemark 46:20. 😉


Can recursive neural tensor networks learn logical reasoning?

Thursday, March 19th, 2015

Can recursive neural tensor networks learn logical reasoning? by Samuel R. Bowman.


Recursive neural network models and their accompanying vector representations for words have seen success in an array of increasingly semantically sophisticated tasks, but almost nothing is known about their ability to accurately capture the aspects of linguistic meaning that are necessary for interpretation or reasoning. To evaluate this, I train a recursive model on a new corpus of constructed examples of logical reasoning in short sentences, like the inference of “some animal walks” from “some dog walks” or “some cat walks,” given that dogs and cats are animals. This model learns representations that generalize well to new types of reasoning pattern in all but a few cases, a result which is promising for the ability of learned representation models to capture logical reasoning.

From the introduction:

Natural language inference (NLI), the ability to reason about the truth of a statement on the basis of some premise, is among the clearest examples of a task that requires comprehensive and accurate natural language understanding [6].

I stumbled over that line in Samuel’s introduction because it implies, at least to me, that there is a notion of truth that resides outside of ourselves as speakers and hearers.

Take his first example:

Consider the statement all dogs bark. From this, one can infer quite a number of other things. One can replace the first argument of all (the first of the two predicates following it, here dogs) with any more specific category that contains only dogs and get a valid inference: all puppies bark; all collies bark.

Contrast that with one the premises that starts my day:

All governmental statements are lies of omission or commission.

Yet, firmly holding that as a “fact” of the world, I write to government officials, post ranty blog posts about government policies, urge others to attempt to persuade government to take certain positions.

Or as Leonard Cohen would say:

Everybody knows that the dice are loaded

Everybody rolls with their fingers crossed

It’s not that I think Samuel is incorrect about monotonicity for “logical reasoning” but monotonicity is a far cry from how people reason day to day.

Rather than creating “reasoning” that is such a departure from human inference, why not train a deep learning system to “reason” by exposing it to the same inputs and decisions made by human decision makers? Imitation doesn’t require understanding of human “reasoning,” just the ability to engage in the same behavior under similar circumstances.

That would reframe Samuel’s question to read: Can recursive neural tensor networks learn human reasoning?

I first saw this in a tweet by Sharon L. Bolding.

Stardog 2.0.0 (26 September 2013)

Friday, September 27th, 2013

Stardog 2.0.0 (26 September 2013)

From the docs page:

Introducing Stardog

Stardog is a graph database—fast, lightweight, pure Java storage for mission-critical apps—that supports:

  • the RDF data model
  • SPARQL 1.1 query language
  • HTTP and SNARL protocols for remote access and control
  • OWL 2 and rules for inference and data analytics
  • Java, JavaScript, Ruby, Python, .Net, Groovy, Spring, etc.

New features in 2.0:

I was amused to read in Stardog Rules Syntax:

Stardog supports two different syntaxes for defining rules. The first is native Stardog Rules syntax and is based on SPARQL, so you can re-use what you already know about SPARQL to write rules. Unless you have specific requirements otherwise, you should use this syntax for user-defined rules in Stardog. The second, the de facto standard RDF/XML syntax for SWRL. It has the advantage of being supported in many tools; but it‘s not fun to read or to write. You probably don’t want to use it. Better: don’t use this syntax! (emphasis in the original)

Install and play with it over the weekend. It’s a good way to experience RDF and SPARQL.

Advances in Neural Information Processing Systems (NIPS)

Sunday, April 7th, 2013

Advances in Neural Information Processing Systems (NIPS)

From the homepage:

The Neural Information Processing Systems (NIPS) Foundation is a non-profit corporation whose purpose is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects. Neural information processing is a field which benefits from a combined view of biological, physical, mathematical, and computational sciences.

Links to videos from NIPS 2012 meetings are featured on the homepage. The topics are as wide ranging as the foundation’s description.

A tweet from Chris Diehl, wondering what to do with “old hardbound NIPS proceedings (NIPS 11)” led me to: Advances in Neural Information Processing Systems (NIPS) [Online Papers], which has the papers from 1987 to 2012 by volume and a search interface to the same.

Quite a remarkable collection just from a casual skim of some of the volumes.

Unless you need to fill book shelf space, suggest you bookmark the NIPS Online Papers.

The #NIPS2012 Videos are out

Monday, January 21st, 2013

The #NIPS2012 Videos are out by Igor Carron.

From the post:

Videolectures came through earlier than last year. woohoo! Presentations relevant to Nuit Blanche were featured earlier here. Videos for the presentations for the Posner Lectures, Invited Talks and Oral Sessions of the conference are here. Videos for the presentations for the different Workshops are here. Some videos are not available because the presenters have not given their permission to the good folks at Videolectures. If you know any of them, let them know the world is waiting.

Just in case Netflix is down. 😉

Analyzing Categorical Data

Saturday, December 29th, 2012

Analyzing Categorical Data by Jeffrey S. Simonoff.

Mentioned in My Intro to Multiple Classification… but thought it merited a more prominent mention.

From the webpage:

Welcome to the web site for the book Analyzing Categorical Data, published by Springer-Verlag in July 2003 as part of the Springer Texts in Statistics series. This site allows access to the data sets used in the book, S-PLUS/R and SAS code to perform the analyses in the book, some general information on statistical software for analyzing categorical data, and an errata list. I would be very happy to receive comments on this site, and on the book itself.

Data sets, code to duplicate the analysis in the book and other information at this site.

My Intro to Multiple Classification…

Saturday, December 29th, 2012

My Intro to Multiple Classification with Random Forests, Conditional Inference Trees, and Linear Discriminant Analysis

From the post:

After the work I did for my last post, I wanted to practice doing multiple classification. I first thought of using the famous iris dataset, but felt that was a little boring. Ideally, I wanted to look for a practice dataset where I could successfully classify data using both categorical and numeric predictors. Unfortunately it was tough for me to find such a dataset that was easy enough for me to understand.

The dataset I use in this post comes from a textbook called Analyzing Categorical Data by Jeffrey S Simonoff, and lends itself to basically the same kind of analysis done by blogger “Wingfeet” in his post predicting authorship of Wheel of Time books. In this case, the dataset contains counts of stop words (function words in English, such as “as”, “also, “even”, etc.) in chapters, or scenes, from books or plays written by Jane Austen, Jack London (I’m not sure if “London” in the dataset might actually refer to another author), John Milton, and William Shakespeare. Being a textbook example, you just know there’s something worth analyzing in it!! The following table describes the numerical breakdown of books and chapters from each author:

Introduction to authorship studies as they were known (may still be) in the academic circles of my youth.

I wonder if the same techniques are as viable today as on the Federalist Papers?

The Wheel of Time example demonstrates the technique remains viable for novel authors.

But what about authorship more broadly?

Can we reliably distinguish between news commentary from multiple sources?

Or between statements by elected officials?

How would your topic map represent purported authorship versus attributed authorship?

Or even a common authorship for multiple purported authors? (speech writers)

Accelerating Inference: towards a full Language, Compiler and Hardware stack

Friday, December 14th, 2012

Accelerating Inference: towards a full Language, Compiler and Hardware stack by Shawn Hershey, Jeff Bernstein, Bill Bradley, Andrew Schweitzer, Noah Stein, Theo Weber, Ben Vigoda.


We introduce Dimple, a fully open-source API for probabilistic modeling. Dimple allows the user to specify probabilistic models in the form of graphical models, Bayesian networks, or factor graphs, and performs inference (by automatically deriving an inference engine from a variety of algorithms) on the model. Dimple also serves as a compiler for GP5, a hardware accelerator for inference.

From the introduction:

Graphical models alleviate the complexity inherent to large dimensional statistical models (the so-called curse of dimensionality) by dividing the problem into a series of logically (and statistically) independent components. By factoring the problem into subproblems with known and simple interdependencies, and by adopting a common language to describe each subproblem, one can considerably simplify the task of creating complex Bayesian models. Modularity can be taken advantage of further by leveraging this modeling hierarchy over several levels (e.g. a submodel can also be decomposed into a family of sub-submodels). Finally, by providing a framework which abstracts the key concepts underlying classes of models, graphical models allow the design of general algorithms which can be efficiently applied across completely different fields, and systematically derived from a model description.

Suggestive of sub-models of merging?

I first saw this in a tweet from Stefano Bertolo.

Are Expert Semantic Rules so 1980’s?

Monday, October 8th, 2012

In The Geometry of Constrained Structured Prediction: Applications to Inference and Learning of Natural Language Syntax André Martins proposes advances in inferencing and learning for NLP processing. And it is important work for that reason.

But in his introduction to recent (and rapid) progress in language technologies, the following text caught my eye:

So, what is the driving force behind the aforementioned progress? Essentially, it is the alliance of two important factors: the massive amount of data that became available with the advent of the Web, and the success of machine learning techniques to extract statistical models from the data (Mitchell, 1997; Manning and Schötze, 1999; Schölkopf and Smola, 2002; Bishop, 2006; Smith, 2011). As a consequence, a new paradigm has emerged in the last couple of decades, which directs attention to the data itself, as opposed to the explicit representation of knowledge (Abney, 1996; Pereira, 2000; Halevy et al., 2009). This data-centric paradigm has been extremely fruitful in natural language processing (NLP), and came to replace the classic knowledge representation methodology which was prevalent until the 1980s, based on symbolic rules written by experts. (emphasis added)

Are RDF, Linked Data, topic maps, and other semantic technologies caught in a 1980’s “symbolic rules” paradigm?

Are we ready to make the same break that NLP did, what, thirty (30) years ago now?

To get started on the literature, consider André’s sources:

Abney, S. (1996). Statistical methods and linguistics. In The balancing act: Combining symbolic and statistical approaches to language, pages 1–26. MIT Press, Cambridge, MA.

A more complete citation: Steven Abney. Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA. 1996. (Link is to PDF of Abney’s paper.)

Pereira, F. (2000). Formal grammar and information theory: together again? Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 358(1769):1239–1253.

I added a pointer to the Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences abstract for the article. You can see it at: Formal grammar and information theory: together again? (PDF file).

Halevy, A., Norvig, P., and Pereira, F. (2009). The unreasonable effectiveness of data. Intelligent Systems, IEEE, 24(2):8–12.

I added a pointer to the Intelligent Systems, IEEE abstract for the article. You can see it at: The unreasonable effectiveness of data (PDF file).

The Halevy article doesn’t have an abstract per se but the ACM reports one as:

Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears to be to embrace the complexity of the domain and address it by harnessing the power of data: if other humans engage in the tasks and generate large amounts of unlabeled, noisy data, new algorithms can be used to build high-quality models from the data. [ACM]

That sounds like a challenge to me. You?

PS: I saw the pointer to this thesis at Christophe Lalanne’s A bag of tweets / September 2012

Information Theory, Pattern Recognition, and Neural Networks

Friday, July 27th, 2012

Information Theory, Pattern Recognition, and Neural Networks by David MacKay.

David MacKay’s lectures with slides on information theory, inference and neural networks. Spring/Summer of 2012.

Just in time for the weekend!

I saw this in Christophe Lalanne’s Bag of Tweets for July 2012.

Sarcastic Computers?

Thursday, May 31st, 2012

You may have seen the headline: Could Sarcastic Computers Be in Our Future? New Math Model Can Help Computers Understand Inference.

And the lead for the article sounds promising:

In a new paper, the researchers describe a mathematical model they created that helps predict pragmatic reasoning and may eventually lead to the manufacture of machines that can better understand inference, context and social rules.

Language is so much more than a string of words. To understand what someone means, you need context.

Consider the phrase, “Man on first.” It doesn’t make much sense unless you’re at a baseball game. Or imagine a sign outside a children’s boutique that reads, “Baby sale — One week only!” You easily infer from the situation that the store isn’t selling babies but advertising bargains on gear for them.

Present these widely quoted scenarios to a computer, however, and there would likely be a communication breakdown. Computers aren’t very good at pragmatics — how language is used in social situations.

But a pair of Stanford psychologists has taken the first steps toward changing that.

Context being one of those things you can use semantic mapping techniques to capture, I was interested.

Jack Park pointed me to a public PDF of the article: Predicting pragmatic reasoning in language games

Be sure to read the entire file.

A blue square, a blue circle, a green square.

Not exactly a general model for context and inference.

Intent vs. Inference

Tuesday, May 8th, 2012

Intent vs. Inference by David Loshin.

David writes:

I think that the biggest issue with integrating external data into the organization (especially for business intelligence purposes) is related to the question of data repurposing. It is one thing to consider data sharing for cross-organization business processes (such as brokering transactions between two different trading partners) because those data exchanges are governed by well-defined standards. It is another when your organization is tapping into a data stream created for one purpose to use the data for another purpose, because there are no negotiated standards.

In the best of cases, you are working with some published metadata. In my previous post I referred to the public data at, and those data sets are sometimes accompanied by their data layouts or metadata. In the worst case, you are integrating a data stream with no provided metadata. In both cases, you, as the data consumer, must make some subjective judgments about how that data can be used.

A caution about “intent” or as I knew it, the intentional fallacy in literary criticism. It is popular in some legal circles in the United States as well.

One problem is that there is no common basis for determining authorial intent.

Another problem is that “intent” is often used to privilege one view over others as representing the “intent” of the author. The “original” view is beyond questioning or criticism because it is the “intent” of the original author.

It should come as no surprise that for law (Scalia and the constitution) and the Bible (you pick’em), “original intent” means agrees with the speaker.

It isn’t entirely clear where David is going with this thread but I would simply drop the question of intent and ask two questions:

  1. What is the purpose of this data?
  2. Is the data suited to that purpose?

Where #1 may include what inferences we want to make, etc.

Cuts to the chase as it were.

ParLearning 2012 (silos or maps?)

Friday, September 23rd, 2011

ParLearning 2012 : Workshop on Parallel and Distributed Computing for Machine Learning and Inference Problems


When May 25, 2012 – May 25, 2012
Where Shanghai, China
Submission Deadline Dec 19, 2011
Notification Due Feb 1, 2012
Final Version Due Feb 21, 2012

From the notice:


  • Foster collaboration between HPC community and AI community
  • Applying HPC techniques for learning problems
  • Identifying HPC challenges from learning and inference
  • Explore a critical emerging area with strong industry interest without overlapping with existing IPDPS workshops
  • Great opportunity for researchers worldwide for collaborating with Chinese Academia and Industry


Authors are invited to submit manuscripts of original unpublished research that demonstrate a strong interplay between parallel/distributed computing techniques and learning/inference applications, such as algorithm design and libraries/framework development on multicore/ manycore architectures, GPUs, clusters, supercomputers, cloud computing platforms that target applications including but not limited to:

  • Learning and inference using large scale Bayesian Networks
  • Large scale inference algorithms using parallel TPIC models, clustering and SVM etc.
  • Parallel natural language processing (NLP).
  • Semantic inference for disambiguation of content on web or social media
  • Discovering and searching for patterns in audio or video content
  • On-line analytics for streaming text and multimedia content
  • Comparison of various HPC infrastructures for learning
  • Large scale learning applications in search engine and social networks
  • Distributed machine learning tools (e.g., Mahout and IBM parallel tool)
  • Real-time solutions for learning algorithms on parallel platforms

If you are wondering what role topic maps have to play in this arena, ask yourself the following question:

Will the systems and techniques demonstrated at this conference use the same means to identify the same subjects?*

If your answer is no, what would you suggest is the solution for mapping different identifications of the same subjects together?

My answer to that question is to use topic maps.

*Whatever your ascribe as its origin, semantic diversity is part and parcel of the human condition. We can either develop silos or maps across silos. Which do you prefer?

A Uniform Fixpoint Approach to the Implementation of Inference Methods for Deductive Databases

Saturday, September 10th, 2011

A Uniform Fixpoint Approach to the Implementation of Inference Methods for Deductive Databases by Andreas Behrend.


Within the research area of deductive databases three different database tasks have been deeply investigated: query evaluation, update propagation and view updating. Over the last thirty years various inference mechanisms have been proposed for realizing these main functionalities of a rule-based system. However, these inference mechanisms have been rarely used in commercial DB systems until now. One important reason for this is the lack of a uniform approach well-suited for implementation in an SQL-based system. In this paper, we present such a uniform approach in form of a new version of the soft consequence operator. Additionally, we present improved transformation-based approaches to query optimization and update propagation and view updating which are all using this operator as underlying evaluation mechanism.

This one will take a while and discussions with people more familiar than I am with deductive databases.

But, having said that, it looks important. The approach has been validated for stock market data streams and management of airspace. Not to mention:

EU Project INFOMIX (IST-2001-33570)

Information system of University “La Sapienza” in Rome.

  • 14 global relations,
  • 29 integrity constraints,
  • 29 relations (in 3 legacy databases) and 12 web wrappers,

More than 24MB of data regarding students, professors and exams of the University.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition

Sunday, April 17th, 2011

The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition

by Trevor Hastie, Robert Tibshirani and Jerome Friedman.

The full pdf of the latest printing is available at this site.

Strongly recommend that if you find the text useful, that you ask your library to order the print version.

From the website:

During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting–the first comprehensive treatment of this topic in any book.

This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for “wide” data (italics p bigger than n), including multiple testing and false discovery rates.

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful {italics An Introduct ion to the Bootstrap}. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Which Automatic Differentiation Tool for C/C++?

Tuesday, February 8th, 2011

Which Automatic Differentiation Tool for C/C++?

OK, not immediately obvious why this is relevant to topic maps.

Nor is Bob Carpenter’s references:

I’ve been playing with all sorts of fun new toys at the new job at Columbia and learning lots of new algorithms. In particular, I’m coming to grips with Hamiltonian (or hybrid) Monte Carlo, which isn’t as complicated as the physics-based motivations may suggest (see the discussion in David MacKay’s book and then move to the more detailed explanation in Christopher Bishop’s book).

particularly useful.

I suspect the two book references are:

but I haven’t asked. In part to illustrate the problem of resolving any entity reference. Both authors have authored other books touching on the same subjects so my guesses may or may not be correct.

Oh, relevance to topic maps. The technique automatic differentiation is used in Hamiltonian Monte Carlo methods for the generation of gradients. Still not helpful? Isn’t to me either.

Ah, what about Bayesian models in IR? That made the light go on!

I will be discussing ways to show more immediate relevance to topic maps, at least for some posts, in post #1000.

It isn’t as far away as you might think.

Information Theory, Inference, and Learning Algorithms

Wednesday, January 12th, 2011

Information Theory, Inference, and Learning Algorithms Author: David J.C. MacKay, full text of the 2005 printing available for downloading. Software is also available.

From a review that I read (, MacKay treats machine learning as the other side of the coin from information theory.

Take the time to visit MacKay’s homepage.

There you will find his book Sustainable Energy – Without the Hot Air. Highly entertaining.


Monday, December 27th, 2010


From the website:

Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Extensions for bioinformatics and text mining. Packed with features for data analytics.

I had to look at the merge data widget.

Which is said to: Merges two data sets based on the values of selected attributes.

According to the documentation:

Merge Data widget is used to horizontally merge two data sets based on the values of selected attributes. On input, two data sets are required, A and B. The widget allows for selection of an attribute from each domain which will be used to perform the merging. When selected, the widget produces two outputs, A+B and B+A. The first output (A+B) corresponds to instances from input data A which are appended attributes from B, and the second output (B+A) to instances from B which are appended attributes from A.

The merging is done by the values of the selected (merging) attributes. For example, instances from from A+B are constructed in the following way. First, the value of the merging attribute from A is taken and instances from B are searched with matching values of the merging attributes. If more than a single instance from B is found, the first one is taken and horizontally merged with the instance from A. If no instance from B match the criterium, the unknown values are assigned to the appended attributes. Similarly, B+A is constructed.

Which illustrates the problem that topic maps solves rather neatly:

  1. How does a subsequent researcher reliably duplicate such a merger?
  2. How does a subsequent researcher reliably merge that data with other data?
  3. How do other researchers reliably merge that data with their own data?

Answer is: They can’t. Not enough information.

Question: How would you change the outcome for those three questions? In detail. (5-7 pages, citations)

TMVA Toolkit for Multivariate Data Analysis with ROOT

Monday, December 27th, 2010

TMVA Toolkit for Multivariate Data Analysis with ROOT

From the website:

The Toolkit for Multivariate Analysis (TMVA) provides a ROOT-integrated machine learning environment for the processing and parallel evaluation of multivariate classification and regression techniques. TMVA is specifically designed to the needs of high-energy physics (HEP) applications, but should not be restricted to these. The package includes:

TMVA consists of object-oriented implementations in C++ for each of these multivariate methods and provides training, testing and performance evaluation algorithms and visualization scripts. The MVA training and testing is performed with the use of user-supplied data sets in form of ROOT trees or text files, where each event can have an individual weight. The true event classification or target value (for regression problems) in these data sets must be known. Preselection requirements and transformations can be applied on this data. TMVA supports the use of variable combinations and formulas.


  1. Review TMVA documentation on one method in detail.
  2. Using a topic map, demonstrate supplementing that documentation with additional literature or examples.
  3. TMVA is not restricted to high energy physics but do you find citations of its use outside of high energy physics?


Sunday, December 26th, 2010

Waffles Authors: Mike Gashler

From the website:

Waffles is a collection of command-line tools for performing machine learning tasks. These tools are divided into 4 script-friendly apps:

waffles_learn contains tools for supervised learning.
waffles_transform contains tools for manipulating data.
waffles_plot contains tools for visualizing data.
waffles_generate contains tools to generate certain types of data.

For people who prefer not to have to remember commands, waffles also includes a graphical tool called


which guides the user to generate a command that will perform the desired task.

While exploring the site I looked at the demo applications and:

At some point, it seems, almost every scholar has an idea for starting a new journal that operates in some a-typical manner. This demo is a framework for the back-end of an on-line journal, to help get you started.

with the “…operates in some a-typical manner” was close enough to the truth that I just has to laugh out loud.

Care to nominate your favorite software project that “…operates in some a-typical manner?”

Update: Almost a year later I revisited the site to find:

Michael S. Gashler. Waffles: A machine learning toolkit. Journal of Machine Learning Research, MLOSS 12:2383-2387, July 2011. ISSN 1532-4435.


Graphical Models

Tuesday, December 21st, 2010

Graphical Models Author: Zoubin Ghahramani


An introduction to directed and undirected probabilistic graphical models, including inference (belief propagation and the junction tree algorithm), parameter learning and structure learning, variational approximations, and approximate inference.

  • Introduction to graphical models: (directed, undirected and factor graphs; conditional independence; d-separation; plate notation)
  • Inference and propagation algorithms: (belief propagation; factor graph propagation; forward-backward and Kalman smoothing; the junction tree algorithm)
  • Learning parameters and structure: maximum likelihood and Bayesian parameter learning for complete and incomplete data; EM; Dirichlet distributions; score-based structure learning; Bayesian structural EM; brief comments on causality and on learning undirected models)
  • Approximate Inference: (Laplace approximation; BIC; variational Bayesian EM; variational message passing; VB for model selection)
  • Bayesian information retrieval using sets of items: (Bayesian Sets; Applications)
  • Foundations of Bayesian inference: (Cox Theorem; Dutch Book Theorem; Asymptotic consensus and certainty; choosing priors; limitations)

Start with this lecture before Dirichlet Processes: Tutorial and Practical Course

Declared Instance Inferences (DI2)? (RDF, OWL, Semantic Web)

Friday, December 3rd, 2010

In recent discussions of identity, I have seen statements that OWL reasoners could infer that two or more representatives stood for the same subject.

That’s useful but I wondered if the inferencing overhead is necessary in all in such cases?

If a user recognizes that a subject representative (a subject proxy in topic map terms) represents the same subject as another representative, a declarative statement avoids the need for artificial inferencing.

I am sure there are cases where inferencing is useful, particularly to suggest inferences to users, but declared inferences could reduce that need and the overhead.

Declarative information artifacts could be created that contain rules for known identifications.

For example, gene names found in PubMed. If two or more names are declared to refer to the same gene, where is the need for inferencing?

With such declarations in place, no reasoner has to “infer” anything about those names.

Declared instance inferences (DI2) reduce semantic dissonance, inferencing overhead and uncertainty.

Looks like a win-win situation to me.

PS: It occurs to me that ontologies are also “declared instance inferences” upon which artificial reasoners rely. The instances happen to be classes and not individuals.


Friday, November 26th, 2010


From the website:

Infer.NET is a framework for running Bayesian inference in graphical models. It can also be used for probabilistic programming as shown in this video.

You can use Infer.NET to solve many different kinds of machine learning problems, from standard problems like classification or clustering through to customised solutions to domain-specific problems. Infer.NET has been used in a wide variety of domains including information retrieval, bioinformatics, epidemiology, vision, and many others.

I should not have been surprised but use of a “.net” language is required to use Infer.Net.

I would appreciate comments from anyone who uses Infer.Net for inferencing to assist in the authoring of topic maps.