Archive for the ‘Probabilistic Programming’ Category

None/Some/All … Are Suicide Bombers & Probabilistic Programming Languages

Tuesday, November 8th, 2016

The Design and Implementation of Probabilistic Programming Languages by Noah D. Goodman and Andreas Stuhlmüller.

Abstract:

Probabilistic programming languages (PPLs) unify techniques for the formal description of computation and for the representation and use of uncertain knowledge. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. This book explains how to implement PPLs by lightweight embedding into a host language. We illustrate this by designing and implementing WebPPL, a small PPL embedded in Javascript. We show how to implement several algorithms for universal probabilistic inference, including priority-based enumeration with caching, particle filtering, and Markov chain Monte Carlo. We use program transformations to expose the information required by these algorithms, including continuations and stack addresses. We illustrate these ideas with examples drawn from semantic parsing, natural language pragmatics, and procedural graphics.

If you want to sharpen the discussion of probabilistic programming languages, substitute in the pragmatics example:

‘none/some/all of the children are suicide bombers’,

The substitution raises the issue of how “certainty” can/should vary depending upon the gravity of results.

Who is a nice person?, has low stakes.

Who is a suicide bomber?, has high stakes.

Breaking the Similarity Bottleneck

Saturday, May 9th, 2015

Ultra-Fast Data-Mining Hardware Architecture Based on Stochastic Computing by Antoni Morro, et al.

Abstract:

Minimal hardware implementations able to cope with the processing of large amounts of data in reasonable times are highly desired in our information-driven society. In this work we review the application of stochastic computing to probabilistic-based pattern-recognition analysis of huge database sets. The proposed technique consists in the hardware implementation of a parallel architecture implementing a similarity search of data with respect to different pre-stored categories. We design pulse-based stochastic-logic blocks to obtain an efficient pattern recognition system. The proposed architecture speeds up the screening process of huge databases by a factor of 7 when compared to a conventional digital implementation using the same hardware area.

I haven’t included the hyperlinks, but:


In this work we present a highly efficient methodology for data mining based on probabilistic processing. High dimensional data is inherently complex in clustering, classification and similarity search [15]. The proposed approach is evaluated showing its application to a similarity search over a huge database. Most data mining algorithms use similarity search as a subroutine core [16–18], and thus the time taken for this task is the bottleneck of virtually all data mining algorithms [19]. Similarity search plays a fundamental role in many data mining and machine learning problems, e.g. text categorization [20], collaborative filtering [21], time-series analysis [22,23], protein sequencing [24] or any application-specific task as petroglyphs comparison [25]. At the same time, the mining of huge datasets implies the use of large computer clusters [26,27]. The proposed approach based on the use of probabilistic processing shows large improvements in terms of hardware resources when compared with conventional solutions.

Sorry they omitted topic maps but what is a merging criteria if it isn’t a type of “similarity?”

From the conclusion:


This implementation uses less hardware resources than conventional digital methodologies (based on binary and not probabilistic logic) and is able to process the order of 13GBytes of information per second (in contrast to the estimated 2GBytes/s of speed that could be achieved by the conventional implementation using the same hardware area). With the 12-dimensional space used to allocate each vector in the example shown in this paper we obtain the order of 1 billion of comparisons per second. A patent application has been done for this new mining methodology [32].

The patent was filed in Spanish but English and French auto-translations are available.

Hopefully the patent will be used in such a way as to promote widespread implementation of this technique.

I could stand 1 billion comparisons a second, quite easily. Interactive development of merging algorithms anyone?

I first saw this in a tweet by Stefano Bertolo.

The State of Probabilistic Programming

Sunday, April 12th, 2015

The State of Probabilistic Programming by Mohammed AlQuraishi.

From the post:

For two weeks last July, I cocooned myself in a hotel in Portland, OR, living and breathing probabilistic programming as a “student” in the probabilistic programming summer school run by DARPA. The school is part of the broader DARPA program on Probabilistic Programming for Advanced Machine Learning (PPAML), which has resulted in a great infusion of energy (and funding) into the probabilistic programming space. Last year was the inaugural one for the summer school, one that is meant to introduce and disseminate the languages and tools being developed to the broader scientific and technology communities. The school was graciously hosted by Galois Inc., which did a terrific job of organizing the event. Thankfully, they’re hosting the summer school again this year (there’s still time to apply!), which made me think that now is a good time to reflect on last year’s program and provide a snapshot of the state of the field. I will also take some liberty in prognosticating on the future of this space. Note that I am by no means a probabilistic programming expert, merely a curious outsider with a problem or two to solve.

A very engaging introduction to probabilistic programming and current work in the field.

It has the added advantage of addressing a subject of interest to a large investor with lots of cash (DARPA). You may have heard of another of their projects, ARPANET, predecessor to the Internet.

12 Free (as in beer) Data Mining Books

Sunday, May 18th, 2014

12 Free (as in beer) Data Mining Books by Chris Leonard.

While all of these volumes could be shelved under “data mining” in a bookstore, I would break them out into smaller categories:

  • Bayesian Analysis/Methods
  • Data Mining
  • Data Science
  • Machine Learning
  • R
  • Statistical Learning

Didn’t want you to skip over Chris’ post because it was “just about data mining.” 😉

Check your hard drive to see what you are missing.

I first saw this in a tweet by Carl Anderson.

Augur:…

Tuesday, December 31st, 2013

Augur: a Modeling Language for Data-Parallel Probabilistic Inference by Jean-Baptiste Tristan, et.al.

Abstract:

It is time-consuming and error-prone to implement inference procedures for each new probabilistic model. Probabilistic programming addresses this problem by allowing a user to specify the model and having a compiler automatically generate an inference procedure for it. For this approach to be practical, it is important to generate inference code that has reasonable performance. In this paper, we present a probabilistic programming language and compiler for Bayesian networks designed to make effective use of data-parallel architectures such as GPUs. Our language is fully integrated within the Scala programming language and benefits from tools such as IDE support, type-checking, and code completion. We show that the compiler can generate data-parallel inference code scalable to thousands of GPU cores by making use of the conditional independence relationships in the Bayesian network.

A very good paper but the authors should highlight the caveat in the introduction:

We claim that many MCMC inference algorithms are highly data-parallel (Hillis & Steele, 1986; Blelloch, 1996) if we take advantage of the conditional independence relationships of the input model (e.g. the assumption of i.i.d. data makes the likelihood independent across data points).

(Where i.i.d. = Independent and identically distributed random variables.)

That assumption does allow for parallel processing, but users should be cautious about accepting assumptions about data.

The algorithms will still work, even if your assumptions about the data are incorrect.

But the answer you get may not be as useful as you would like.

I first saw this in a tweet by Stefano Bertolo.

Bayesian Methods for Hackers

Thursday, July 11th, 2013

Bayesian Methods for Hackers by a community of authors!

From the readme:

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author’s own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simply not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

If Bayesian inference is the destination, then mathematical analysis is a particular path to towards it. On the other hand, computing power is cheap enough that we can afford to take an alternate route via probabilistic programming. The latter path is much more useful, as it denies the necessity of mathematical intervention at each step, that is, we remove often-intractable mathematical analysis as a prerequisite to Bayesian inference. Simply put, this latter computational path proceeds via small intermediate jumps from beginning to end, where as the first path proceeds by enormous leaps, often landing far away from our target. Furthermore, without a strong mathematical background, the analysis required by the first path cannot even take place.

Bayesian Methods for Hackers is designed as a introduction to Bayesian inference from a computational/understanding-first, and mathematics-second, point of view. Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical-background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining.

(…)

Useful in case all the knowledge you want to put in a topic map is far from certain. 😉

Probabilistic Programming and Bayesian Methods for Hackers

Saturday, March 30th, 2013

Probabilistic Programming and Bayesian Methods for Hackers

From the webpage:

Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author’s own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simplely not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

DARPA (Logic and Probabilistic Programming) should be glad that someone else is working on probabilistic programming.

I first saw this at Nat Torkington’s Four short links: 29 March 2103.

Logic and Probabilistic Programming

Friday, March 29th, 2013

Programming Trends to Watch: Logic and Probabilistic Programming by Dean Wampler.

From the post:

I believe there are two other emerging trends in programming worth watching that will impact the data world.

Logic Programming, like FP, is actually not new at all, but it is seeing a resurgence of interest, especially in the Clojure community. Rules engines, like Drools, are an example category of logic programming that has been in use for a long time.

We’re on the verge of moving to the next level, probabilistic programming languages and systems that make it easier to build probabilistic models, where the modeling concepts are promoted to first-class primitives in new languages, with underlying runtimes that do the hard work of inferring answers, similar to the way that logic programming languages work already. The ultimate goal is to enable end users with limited programming skills, like domain experts, to build effective probabilistic models, without requiring the assistance of Ph.D.-level machine learning experts, much the way that SQL is widely used today.

DARPA, the research arm of the U.S. Department of Defense, considers this trend important enough that they are starting an initiative to promote it, called Probabilistic Programming for Advanced Machine Learning, which is also described in this Wired article.

Registration for the DARPA event (April 10, 2013) is closed but a video recording will be posted at: http://www.darpa.mil/Opportunities/Solicitations/I2O_Solicitations.aspx after April 10, 2013.

I suspect semantics are going to be at issue in any number of ways.

The ability to handle semantics robustly may be of value.

Probabilistic Programming

Friday, April 13th, 2012

Probabilistic Programming by Deniz Yuret.

From the post:

The probabilistic programming language Church brings together two of my favorite subjects: Scheme and Probability. I highly recommend this tutorial to graduate students interested in machine learning and statistical inference. The tutorial explains probabilistic inference through programming starting from simple generative models with biased coins and dice leading up to hierarchical, non-parametric, recursive and nested models. Even at the undergraduate level, I have long thought probability and statistics should be taught in an integrated manner instead of their current almost independent treatment. One roadblock is that even the simplest statistical inference (e.g. three tosses of a coin with an unknown (uniformly distributed) weight results in H, H, T; what is the fourth toss?) requires some calculus at the undergraduate level. Using a programming language like Church may allow an instructor to introduce basic concepts without students getting confused about the details of integration.

Good pointers on probabilistic programming resources. Enjoy!

Dr. Watson?

Wednesday, December 7th, 2011

I got up thinking that there needs to be a project for automated authoring of a topic map and the name, Dr. Watson suddenly occurred to me. After all, Dr. Watson was Sherlock Holmes’ sidekick so it would not be like saying it could stand on its own. Plus there would be some name recognition and/or confusion with the real Dr. Watson, or rather imaginary Dr. Watson of Sherlock Holmes’ fame.

And there would be confusion with the Dr. Watson that is the internal debugger for Windows (MS, I never can remember if the ™ goes on Windows or MS. Not that anyone else would want to call themselves MS. 😉 ) Plus the Watson research center at IBM.

Well, I suspect being an automated, probabilistic topic map authoring system will be enough to distinguish it from the foregoing uses.

Any others that I should be aware of?

I say probabilistic because even with the TMDM’s string matching on URIs, it is only probable that two or more topics actually represent the same topic. It is always possible that a URI has been incorrectly used to identity the subject that a topic represents. And in such cases, the error perpetuates itself across a topic map.

So we start off with the realization that even string matching results in a probability of less than 1.0 (where 1.0 is absolute certainty) that two or more topics represent the same subject.

Since we are already being probabilistic, why not be openly so?

But, before we get into the weeds and details, the project has to have a cool name. (As in not an acronym that is cool and we make up a long name to fit the acronym.)

All those in favor of Dr. Watson, please signify by raising your hands (or the beer you are holding).

More to follow.

MayBMS – A Probabilistic Database Management System

Thursday, June 30th, 2011

MayBMS – A Probabilistic Database Management System

From the homepage:

MayBMS is a state-of-the-art probabilistic database management system developed as an extension of the Postgres server backend (download).

The MayBMS project is founded on the thesis that a principled effort to use and extend mature relational database technology will be essential for creating robust and scalable systems for managing and querying large uncertain datasets.

MayBMS stands alone as a complete probabilistic database management system that supports a very powerful, compositional query language (examples) for which nevertheless worst-case efficiency and result quality guarantees can be made. Central to this is our choice of essentially using probabilistic versions of conditional tables as the representation system, but in a form engineered for admitting the efficient evaluation and automatic optimization of most operations of our language using robust and mature relational database technology.

Another probabilistic system.

I wonder about the consistency leg of CAP as a database principle. Is is a database principle only because we have had such locally located and small data sets that consistency was possible?

Think about any of the sensor arrays and memory banks located light seconds or even minutes away from data stores on Earth. As a practical matter they are always inconsistent with Earth bound data stores. Physical remoteness is the cause of inconsistency in that case. But what of something as simple as not all data having first priority for processing? Or varying priorities for processing depending upon system load? Or even analysis or processing of data that causes a lag between the states of data at different locations?

I’m not suggesting the usual cop-out of eventual consistency because the data may never be consistent. At least in the sense that we use the term for a database located on a single machine or local cluster. We may have to ask, “How consistent do you want the data to be upon delivery?,” knowing the data may be inconsistent on delivery with other data already in existence.

Scalable Query Processing in Probabilistic Databases

Monday, June 27th, 2011

Scalable Query Processing in Probabilistic Databases

From the webpage:

Today, uncertainty is commonplace in data management scenarios dealing with data integration, sensor readings, information extraction from unstructured sources, and whenever information is manually entered and therefore prone to inaccuracy or partiality. Key challenges in probabilistic data management are to design probabilistic database formalisms that can compactly represent large sets of possible interpretations of uncertain data together with their probability distributions, and to efficiently evaluate queries on very large probabilistic data. Such queries could ask for confidences in data patterns possibly in the presence of additional evidence. The problem of query evaluation in probabilistic databases is still in its infancy. Little is known about which queries can be evaluated in polynomial time, and the few existing evaluation methods employ expensive main-memory algorithms.

The aim of this project is to develop techniques for scalable query processing in probabilistic databases and use them to build a robust query engine called SPROUT ( Scalable PROcessing on Tables). We are currently exploring three main research directions.


  • We are investigating open problems in efficient query evaluation. In particular, we aim at discovering classes of tractable (i.e., computable in polynomial time wrt data complexity) queries on probabilistic databases. The query language under investigation is SQL (and its formal core, relational algebra) extended with uncertainty-aware query constructs to create probabilistic data under various probabilistic data models (such as tuple-independent databases, block-independent disjoint databases, or U-relations of MayBMS).
  • For the case of intractable queries, we investigate approximate query evaluation. In contrast to exact evaluation, which computes query answers together with their exact confidences, approximate evaluation computes the query answers with approximate confidences. We are working on new techniques for approximate query evaluation that are aware of the query and the input probabilistic database model (tuple-independent, block-independent disjoint, etc).
  • Our open-source query engine for probabilistic data management systems uses the insights gained from the first two directions. This engine is based on efficient secondary-storage exact and approximate evaluation algorithms for arbitrary queries.

As of June 2, 2011, order Probabilistic Databases by Dan Suciu, Dan Olteanu, Christopher Re, and Christoph Kock from Amazon.

Exciting work!

It occurs to me that semantics are always “probabilistic.”

What does that say about the origin of the semantics of a term?

If semantics are probabilistic, is it ever possible to fix the semantic of a term?

If so, how?

Comparative Study of Probabilistic Logic Languages and Systems

Thursday, April 7th, 2011

Comparative Study of Probabilistic Logic Languages and Systems

This was mentioned in a response to Chris Diehl’s post in his series.

Good source of software/information.

(I checked, all the links work. That is something these days.)

factorie: Probabilistic programming with imperatively-defined factor graphs

Friday, March 11th, 2011

factorie: Probabilistic programming with imperatively-defined factor graphs

The website says factorie has been applied to:

FACTORIE has been successfully applied to various tasks in natural language processing and information integration, including

  • named entity recognition
  • entity resolution
  • relation extraction
  • parsing
  • schema matching
  • ontology alignment
  • latent-variable generative models, including latent Dirichlet allocation.

Sound like topic map tasks to me!

Currently at version 0.90 but the website indicates the the project is planning on a 1.0 release in early 2011.

Just so you know what you are looking forward to:

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference. Key features:

  • It is object-oriented, enabling encapsulation, abstraction and inheritance in the definition of random variables, factors, inference and learning methods.
  • It is scalable, with demonstrated success on problems with many millions of variables and factors, and on models that have changing structure, such as case factor diagrams. It has also been plugged into a database back-end, representing a new approach to probabilistic databases capable of handling billions of variables.
  • It is flexible, supporting multiple modeling and inference paradigms. Its original emphasis was on conditional random fields, undirected graphical models, MCMC inference, online training, and discriminative parameter estimation. However, it now also supports directed generative models (such as latent Dirichlet allocation), and has preliminary support for variational inference, including belief propagation and mean-field methods.
  • It is embedded into a general purpose programming language, providing model authors with familiar and extensive resources for implementing the procedural aspects of their solution, including the ability to beneficially mix data pre-processing, diagnostics, evaluation, and other book-keeping code in the same files as the probabilistic model specification.
  • It allows the use of imperative (procedural) constructs to define the factor graph—an unusual and powerful facet that enables significant efficiencies and also supports the injection of both declarative and procedural domain knowledge into model design.

The structure of generative models can be expressed as a program that describes the generative storyline. The structure undirected graphical models can be specified in an entity-relationship language, in which the factor templates are expressed as compatibility functions on arbitrary entity-relationship expressions; alternatively, factor templates may also be specified as formulas in first-order logic. However, most generally, data can be stored in arbitrary data structures (much as one would in deterministic programming), and the connectivity patterns of factor templates can be specified in a Turing-complete imperative style. This usage of imperative programming to define various aspects of factor graph construction and operation is an innovation originated in FACTORIE; we term this approach imperatively-defined factor graphs. The above three methods for specifying relational factor graph structure can be mixed in the same model.