Archive for the ‘Probabilistic Programming’ Category

Probabilistic Programming and Bayesian Methods for Hackers

Saturday, March 30th, 2013

Probabilistic Programming and Bayesian Methods for Hackers

From the webpage:

Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author’s own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simplely not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

DARPA (Logic and Probabilistic Programming) should be glad that someone else is working on probabilistic programming.

I first saw this at Nat Torkington’s Four short links: 29 March 2103.

Logic and Probabilistic Programming

Friday, March 29th, 2013

Programming Trends to Watch: Logic and Probabilistic Programming by Dean Wampler.

From the post:

I believe there are two other emerging trends in programming worth watching that will impact the data world.

Logic Programming, like FP, is actually not new at all, but it is seeing a resurgence of interest, especially in the Clojure community. Rules engines, like Drools, are an example category of logic programming that has been in use for a long time.

We’re on the verge of moving to the next level, probabilistic programming languages and systems that make it easier to build probabilistic models, where the modeling concepts are promoted to first-class primitives in new languages, with underlying runtimes that do the hard work of inferring answers, similar to the way that logic programming languages work already. The ultimate goal is to enable end users with limited programming skills, like domain experts, to build effective probabilistic models, without requiring the assistance of Ph.D.-level machine learning experts, much the way that SQL is widely used today.

DARPA, the research arm of the U.S. Department of Defense, considers this trend important enough that they are starting an initiative to promote it, called Probabilistic Programming for Advanced Machine Learning, which is also described in this Wired article.

Registration for the DARPA event (April 10, 2013) is closed but a video recording will be posted at: http://www.darpa.mil/Opportunities/Solicitations/I2O_Solicitations.aspx after April 10, 2013.

I suspect semantics are going to be at issue in any number of ways.

The ability to handle semantics robustly may be of value.

Probabilistic Programming

Friday, April 13th, 2012

Probabilistic Programming by Deniz Yuret.

From the post:

The probabilistic programming language Church brings together two of my favorite subjects: Scheme and Probability. I highly recommend this tutorial to graduate students interested in machine learning and statistical inference. The tutorial explains probabilistic inference through programming starting from simple generative models with biased coins and dice leading up to hierarchical, non-parametric, recursive and nested models. Even at the undergraduate level, I have long thought probability and statistics should be taught in an integrated manner instead of their current almost independent treatment. One roadblock is that even the simplest statistical inference (e.g. three tosses of a coin with an unknown (uniformly distributed) weight results in H, H, T; what is the fourth toss?) requires some calculus at the undergraduate level. Using a programming language like Church may allow an instructor to introduce basic concepts without students getting confused about the details of integration.

Good pointers on probabilistic programming resources. Enjoy!

Dr. Watson?

Wednesday, December 7th, 2011

I got up thinking that there needs to be a project for automated authoring of a topic map and the name, Dr. Watson suddenly occurred to me. After all, Dr. Watson was Sherlock Holmes’ sidekick so it would not be like saying it could stand on its own. Plus there would be some name recognition and/or confusion with the real Dr. Watson, or rather imaginary Dr. Watson of Sherlock Holmes’ fame.

And there would be confusion with the Dr. Watson that is the internal debugger for Windows (MS, I never can remember if the ™ goes on Windows or MS. Not that anyone else would want to call themselves MS. ;-) ) Plus the Watson research center at IBM.

Well, I suspect being an automated, probabilistic topic map authoring system will be enough to distinguish it from the foregoing uses.

Any others that I should be aware of?

I say probabilistic because even with the TMDM’s string matching on URIs, it is only probable that two or more topics actually represent the same topic. It is always possible that a URI has been incorrectly used to identity the subject that a topic represents. And in such cases, the error perpetuates itself across a topic map.

So we start off with the realization that even string matching results in a probability of less than 1.0 (where 1.0 is absolute certainty) that two or more topics represent the same subject.

Since we are already being probabilistic, why not be openly so?

But, before we get into the weeds and details, the project has to have a cool name. (As in not an acronym that is cool and we make up a long name to fit the acronym.)

All those in favor of Dr. Watson, please signify by raising your hands (or the beer you are holding).

More to follow.

MayBMS – A Probabilistic Database Management System

Thursday, June 30th, 2011

MayBMS – A Probabilistic Database Management System

From the homepage:

MayBMS is a state-of-the-art probabilistic database management system developed as an extension of the Postgres server backend (download).

The MayBMS project is founded on the thesis that a principled effort to use and extend mature relational database technology will be essential for creating robust and scalable systems for managing and querying large uncertain datasets.

MayBMS stands alone as a complete probabilistic database management system that supports a very powerful, compositional query language (examples) for which nevertheless worst-case efficiency and result quality guarantees can be made. Central to this is our choice of essentially using probabilistic versions of conditional tables as the representation system, but in a form engineered for admitting the efficient evaluation and automatic optimization of most operations of our language using robust and mature relational database technology.

Another probabilistic system.

I wonder about the consistency leg of CAP as a database principle. Is is a database principle only because we have had such locally located and small data sets that consistency was possible?

Think about any of the sensor arrays and memory banks located light seconds or even minutes away from data stores on Earth. As a practical matter they are always inconsistent with Earth bound data stores. Physical remoteness is the cause of inconsistency in that case. But what of something as simple as not all data having first priority for processing? Or varying priorities for processing depending upon system load? Or even analysis or processing of data that causes a lag between the states of data at different locations?

I’m not suggesting the usual cop-out of eventual consistency because the data may never be consistent. At least in the sense that we use the term for a database located on a single machine or local cluster. We may have to ask, “How consistent do you want the data to be upon delivery?,” knowing the data may be inconsistent on delivery with other data already in existence.

Scalable Query Processing in Probabilistic Databases

Monday, June 27th, 2011

Scalable Query Processing in Probabilistic Databases

From the webpage:

Today, uncertainty is commonplace in data management scenarios dealing with data integration, sensor readings, information extraction from unstructured sources, and whenever information is manually entered and therefore prone to inaccuracy or partiality. Key challenges in probabilistic data management are to design probabilistic database formalisms that can compactly represent large sets of possible interpretations of uncertain data together with their probability distributions, and to efficiently evaluate queries on very large probabilistic data. Such queries could ask for confidences in data patterns possibly in the presence of additional evidence. The problem of query evaluation in probabilistic databases is still in its infancy. Little is known about which queries can be evaluated in polynomial time, and the few existing evaluation methods employ expensive main-memory algorithms.

The aim of this project is to develop techniques for scalable query processing in probabilistic databases and use them to build a robust query engine called SPROUT ( Scalable PROcessing on Tables). We are currently exploring three main research directions.


  • We are investigating open problems in efficient query evaluation. In particular, we aim at discovering classes of tractable (i.e., computable in polynomial time wrt data complexity) queries on probabilistic databases. The query language under investigation is SQL (and its formal core, relational algebra) extended with uncertainty-aware query constructs to create probabilistic data under various probabilistic data models (such as tuple-independent databases, block-independent disjoint databases, or U-relations of MayBMS).
  • For the case of intractable queries, we investigate approximate query evaluation. In contrast to exact evaluation, which computes query answers together with their exact confidences, approximate evaluation computes the query answers with approximate confidences. We are working on new techniques for approximate query evaluation that are aware of the query and the input probabilistic database model (tuple-independent, block-independent disjoint, etc).
  • Our open-source query engine for probabilistic data management systems uses the insights gained from the first two directions. This engine is based on efficient secondary-storage exact and approximate evaluation algorithms for arbitrary queries.

As of June 2, 2011, order Probabilistic Databases by Dan Suciu, Dan Olteanu, Christopher Re, and Christoph Kock from Amazon.

Exciting work!

It occurs to me that semantics are always “probabilistic.”

What does that say about the origin of the semantics of a term?

If semantics are probabilistic, is it ever possible to fix the semantic of a term?

If so, how?

Comparative Study of Probabilistic Logic Languages and Systems

Thursday, April 7th, 2011

Comparative Study of Probabilistic Logic Languages and Systems

This was mentioned in a response to Chris Diehl’s post in his series.

Good source of software/information.

(I checked, all the links work. That is something these days.)

factorie: Probabilistic programming with imperatively-defined factor graphs

Friday, March 11th, 2011

factorie: Probabilistic programming with imperatively-defined factor graphs

The website says factorie has been applied to:

FACTORIE has been successfully applied to various tasks in natural language processing and information integration, including

  • named entity recognition
  • entity resolution
  • relation extraction
  • parsing
  • schema matching
  • ontology alignment
  • latent-variable generative models, including latent Dirichlet allocation.

Sound like topic map tasks to me!

Currently at version 0.90 but the website indicates the the project is planning on a 1.0 release in early 2011.

Just so you know what you are looking forward to:

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference. Key features:

  • It is object-oriented, enabling encapsulation, abstraction and inheritance in the definition of random variables, factors, inference and learning methods.
  • It is scalable, with demonstrated success on problems with many millions of variables and factors, and on models that have changing structure, such as case factor diagrams. It has also been plugged into a database back-end, representing a new approach to probabilistic databases capable of handling billions of variables.
  • It is flexible, supporting multiple modeling and inference paradigms. Its original emphasis was on conditional random fields, undirected graphical models, MCMC inference, online training, and discriminative parameter estimation. However, it now also supports directed generative models (such as latent Dirichlet allocation), and has preliminary support for variational inference, including belief propagation and mean-field methods.
  • It is embedded into a general purpose programming language, providing model authors with familiar and extensive resources for implementing the procedural aspects of their solution, including the ability to beneficially mix data pre-processing, diagnostics, evaluation, and other book-keeping code in the same files as the probabilistic model specification.
  • It allows the use of imperative (procedural) constructs to define the factor graph—an unusual and powerful facet that enables significant efficiencies and also supports the injection of both declarative and procedural domain knowledge into model design.

The structure of generative models can be expressed as a program that describes the generative storyline. The structure undirected graphical models can be specified in an entity-relationship language, in which the factor templates are expressed as compatibility functions on arbitrary entity-relationship expressions; alternatively, factor templates may also be specified as formulas in first-order logic. However, most generally, data can be stored in arbitrary data structures (much as one would in deterministic programming), and the connectivity patterns of factor templates can be specified in a Turing-complete imperative style. This usage of imperative programming to define various aspects of factor graph construction and operation is an innovation originated in FACTORIE; we term this approach imperatively-defined factor graphs. The above three methods for specifying relational factor graph structure can be mixed in the same model.