Archive for the ‘Probalistic Models’ Category

None/Some/All … Are Suicide Bombers & Probabilistic Programming Languages

Tuesday, November 8th, 2016

The Design and Implementation of Probabilistic Programming Languages by Noah D. Goodman and Andreas Stuhlmüller.


Probabilistic programming languages (PPLs) unify techniques for the formal description of computation and for the representation and use of uncertain knowledge. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages communities. This book explains how to implement PPLs by lightweight embedding into a host language. We illustrate this by designing and implementing WebPPL, a small PPL embedded in Javascript. We show how to implement several algorithms for universal probabilistic inference, including priority-based enumeration with caching, particle filtering, and Markov chain Monte Carlo. We use program transformations to expose the information required by these algorithms, including continuations and stack addresses. We illustrate these ideas with examples drawn from semantic parsing, natural language pragmatics, and procedural graphics.

If you want to sharpen the discussion of probabilistic programming languages, substitute in the pragmatics example:

‘none/some/all of the children are suicide bombers’,

The substitution raises the issue of how “certainty” can/should vary depending upon the gravity of results.

Who is a nice person?, has low stakes.

Who is a suicide bomber?, has high stakes.

The State of Probabilistic Programming

Sunday, April 12th, 2015

The State of Probabilistic Programming by Mohammed AlQuraishi.

From the post:

For two weeks last July, I cocooned myself in a hotel in Portland, OR, living and breathing probabilistic programming as a “student” in the probabilistic programming summer school run by DARPA. The school is part of the broader DARPA program on Probabilistic Programming for Advanced Machine Learning (PPAML), which has resulted in a great infusion of energy (and funding) into the probabilistic programming space. Last year was the inaugural one for the summer school, one that is meant to introduce and disseminate the languages and tools being developed to the broader scientific and technology communities. The school was graciously hosted by Galois Inc., which did a terrific job of organizing the event. Thankfully, they’re hosting the summer school again this year (there’s still time to apply!), which made me think that now is a good time to reflect on last year’s program and provide a snapshot of the state of the field. I will also take some liberty in prognosticating on the future of this space. Note that I am by no means a probabilistic programming expert, merely a curious outsider with a problem or two to solve.

A very engaging introduction to probabilistic programming and current work in the field.

It has the added advantage of addressing a subject of interest to a large investor with lots of cash (DARPA). You may have heard of another of their projects, ARPANET, predecessor to the Internet.

Open Sourcing Duckling, our probabilistic (date) parser [Clojure]

Friday, October 3rd, 2014

Open Sourcing Duckling, our probabilistic (date) parser

From the post:

We’ve previously discussed ambiguity in natural language. What’s really fascinating is that even the simplest, seemingly most structured parts of natural language, like the way we humans describe dates and times, are actually so difficult to turn into structured data.

The wild world of temporal expressions in human language

All the following expressions describe the same point in time (at least in some contexts):

  • “December 30th, at 3 in the afternoon”
  • “The day before New Year’s Eve at 3pm”
  • “At 1500 three weeks from now”
  • “The last Tuesday of December at 3pm”

But wait… is it really equivalent to say 3pm and 1500? In the latter case, it seems that speaker meant to be more precise. Is it OK to drop this information?

And what about “next Tuesday”? If today is Monday, is that tomorrow or in 8 days? When I say “last month”, is it the last full month or the last 30 days?

A last example: “one month” looks like a well defined duration. That is, until you try to normalize durations in seconds, and you realize different months have anywhere between 28 and 31 days! Even “one day” is difficult. Yes, a day can last between 23 and 25 hours, because of daylight savings. Oh, and did I mention that at midnight at the end of 1927 in Shanghai, the clocks went back 5 minutes and 52 seconds? So “1927-12-31 23:54:08” actually happened twice there.

There are hundreds of hard things like these, and the more you dig into this, believe me, the more you’ll encounter. But that’s out of the scope of this post.

An introduction to the vagaries of date statements in natural language, a probabilistic (date) parser in Clojure, and an opportunity to extend said parser to other data types.

Nice way to end the week!

An Introduction to Graphical Models

Sunday, August 31st, 2014

An Introduction to Graphical Models by Michael I. Jordan.

A bit dated (1997), slides, although “wordy” ones, that introduce you to graphical models.

Makes a nice outline to check your knowledge of graphical models.

I first saw this in a tweet by Data Tau.

Probabilistic Topic Maps?

Tuesday, August 26th, 2014

Probabilistic Soft Logic

From the webpage:

Probabilistic soft logic (PSL) is a modeling language (with accompanying implementation) for learning and predicting in relational domains. Such tasks occur in many areas such as natural language processing, social-network analysis, computer vision, and machine learning in general.

PSL allows users to describe their problems in an intuitive, logic-like language and then apply their models to data.


  • PSL models are templates for hinge-loss Markov random fields (HL-MRFs), a powerful class of probabilistic graphical models.
  • HL-MRFs are extremely scalable models because they are log-concave densities over continuous variables that can be optimized using the alternating direction method of multipliers.
  • See the publications page for more technical information and applications.

This homepage lists three introductory videos and has a set of slides on PSL.

Under entity resolution, the slides illustrate rules that govern the “evidence” that two entities represent the same person. You will also find link prediction, mapping of different ontologies, discussion of mapreduce implementations and other materials in the slides.

Probabilistic rules could be included in a TMDM instance but I don’t know of any topic map software that supports probabilistic merging. Would be a nice authoring feature to have.

The source code is on GitHub if you want to take a closer look.

Introduction to Information Retrieval

Wednesday, November 6th, 2013

Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze.

A bit dated now (2008) but the underlying principles of information retrieval remain the same.

I have a hard copy but the additional materials and ability to cut-n-paste will make this a welcome resource!

We’d be pleased to get feedback about how this book works out as a textbook, what is missing, or covered in too much detail, or what is simply wrong. Please send any feedback or comments to: informationretrieval (at) yahoogroups (dot) com

Online resources

Apart from small differences (mainly concerning copy editing and figures), the online editions should have the same content as the print edition.

The following materials are available online. The date of last update is given in parentheses.

Information retrieval resources

A list of information retrieval resources is also available.

Introduction to Information Retrieval: Table of Contents

Front matter (incl. table of notations) pdf

01   Boolean retrieval pdf html

02 The term vocabulary & postings lists pdf html

03 Dictionaries and tolerant retrieval pdf html

04 Index construction pdf html

05 Index compression pdf html

06 Scoring, term weighting & the vector space model pdf html

07 Computing scores in a complete search system pdf html

08 Evaluation in information retrieval pdf html

09 Relevance feedback & query expansion pdf html

10 XML retrieval pdf html

11 Probabilistic information retrieval pdf html

12 Language models for information retrieval pdf html

13 Text classification & Naive Bayes pdf html

14 Vector space classification pdf html

15 Support vector machines & machine learning on documents pdf html

16 Flat clustering pdf html Resources.

17 Hierarchical clustering pdf html

18 Matrix decompositions & latent semantic indexing pdf html

19 Web search basics pdf html

20 Web crawling and indexes pdf html

21 Link analysis pdf html

Bibliography & Index pdf

bibtex file bib

Inferring Social Rank in…

Wednesday, March 13th, 2013

Inferring Social Rank in an Old Assyrian Trade Network by David Bamman, Adam Anderson, Noah A. Smith.


We present work in jointly inferring the unique individuals as well as their social rank within a collection of letters from an Old Assyrian trade colony in K\”ultepe, Turkey, settled by merchants from the ancient city of Assur for approximately 200 years between 1950-1750 BCE, the height of the Middle Bronze Age. Using a probabilistic latent-variable model, we leverage pairwise social differences between names in cuneiform tablets to infer a single underlying social order that best explains the data we observe. Evaluating our output with published judgments by domain experts suggests that our method may be used for building informed hypotheses that are driven by data, and that may offer promising avenues for directed research by Assyriologists.

An example of how digitization of ancient texts enables research other than text searching.

Inferring identity and social rank may be instructive for creation of topic maps from both ancient and modern data sources.

I first saw this in a tweet by Stefano Bertolo.

Dr. Watson?

Wednesday, December 7th, 2011

I got up thinking that there needs to be a project for automated authoring of a topic map and the name, Dr. Watson suddenly occurred to me. After all, Dr. Watson was Sherlock Holmes’ sidekick so it would not be like saying it could stand on its own. Plus there would be some name recognition and/or confusion with the real Dr. Watson, or rather imaginary Dr. Watson of Sherlock Holmes’ fame.

And there would be confusion with the Dr. Watson that is the internal debugger for Windows (MS, I never can remember if the ™ goes on Windows or MS. Not that anyone else would want to call themselves MS. 😉 ) Plus the Watson research center at IBM.

Well, I suspect being an automated, probabilistic topic map authoring system will be enough to distinguish it from the foregoing uses.

Any others that I should be aware of?

I say probabilistic because even with the TMDM’s string matching on URIs, it is only probable that two or more topics actually represent the same topic. It is always possible that a URI has been incorrectly used to identity the subject that a topic represents. And in such cases, the error perpetuates itself across a topic map.

So we start off with the realization that even string matching results in a probability of less than 1.0 (where 1.0 is absolute certainty) that two or more topics represent the same subject.

Since we are already being probabilistic, why not be openly so?

But, before we get into the weeds and details, the project has to have a cool name. (As in not an acronym that is cool and we make up a long name to fit the acronym.)

All those in favor of Dr. Watson, please signify by raising your hands (or the beer you are holding).

More to follow.

Apache Flume incubation wiki

Sunday, October 2nd, 2011

Apache Flume incubation wiki

From the website:

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.

A number of resources for Flume.

Will “data flows” as the dominant means of accessing data be a consequence of an environment where a “local copy” of data is no longer meaningful or an enabler of such an environment? Or both?

I think topic maps would do well to develop models for streaming and perhaps probabilistic merging or even probabilistic creation of topics/associations from data streams. Static structures only give the appearance of certainty.

SIGKDD 2011 Conference

Tuesday, September 6th, 2011

A pair of posts from Ryan Rosario on the SIGKDD 2011 Conference.

Day 1 (Graph Mining and David Blei/Topic Models)

Tough sledding on Probabilistic Topic Models but definitely worth the effort to follow.

Days 2/3/4 Summary

Useful summaries and pointers to many additional resources.

If you attended SIGKDD 2011, do you have pointers to other reviews of the conference or other resources?

I added a category for SIGKDD.

Probability Primer – Mathematicalmonk

Monday, August 8th, 2011

Probability Primer – Mathematicalmonk

From the description:

A series of videos giving an introduction to the definitions, notation, and basic concepts one would encounter in a 1st year graduate probability course.

More math videos from the Mathematicalmonk.

I don’t think a presentation style can be “copied” but surely there are lessons we can take from it to apply in other venues.


Machine Learning and Probabilistic Graphical Models

Thursday, May 12th, 2011

Machine Learning and Probabilistic Graphical Models

From the website:

Instructor: Sargur Srihari Department of Computer Science and Engineering, University at Buffalo

Machine learning is an exciting topic about designing machines that can learn from examples. The course covers the necessary theory, principles and algorithms for machine learning. The methods are based on statistics and probability– which have now become essential to designing systems exhibiting artificial intelligence. The course emphasisizes Bayesian techniques and probabilistic graphical models (PGMs). The material is complementary to a course on Data Mining where statistical concepts are used to analyze data for human, rather than machine, use.

The textbooks for different parts of the course are “Pattern Recognition and Machine Learning” by Chris Bishop (Springer 2006) and “Probabilistic Graphical Models” by Daphne Koller and Nir Friedman (MIT Press 2009).

Lecture slides and some videos of lectures.

Probabilistic Models in the Study of Language

Monday, April 11th, 2011

Probabilistic Models in the Study of Language

From the website:

I’m in the process of writing a textbook on the topic of using probabilistic models in scientific work on language ranging from experimental data analysis to corpus work to cognitive modeling. A current (partial) draft is available here. The intended audience is graduate students in linguistics, psychology, cognitive science, and computer science who are interested in using probabilistic models to study language. Feedback (both comments on existing drafts, and expressed desires for additional material to include!) is more than welcome — send it to

Just scanning the chapters titles this looks like a useful work for anyone concerned with language issues.

Comparative Study of Probabilistic Logic Languages and Systems

Thursday, April 7th, 2011

Comparative Study of Probabilistic Logic Languages and Systems

This was mentioned in a response to Chris Diehl’s post in his series.

Good source of software/information.

(I checked, all the links work. That is something these days.)

Toward Topic Search on the Web – Paper

Tuesday, March 8th, 2011

Toward Topic Search on the Web was reported by

Report from Microsoft researchers on “…framework that improves web search experiences through the use of a probabilistic knowledge base.”

Interesting report.

Even more so if you think about topic maps as declarative knowledge bases and consider the use of probabilistic knowledge bases as a means to add to the former.

BTW, user satisfaction was used as the criteria for success.

Now that is local semantics.

Probably at just about the right level as well.


Probability and Computing

Friday, January 7th, 2011

Probability and Computing

Lecture notes by Ryan O’Donnell for his Fall 2009 course at Carnegie Mellon University.

Probability is an important topic (sorry) in a number of CS areas.

Probabilistic merging is why I mention it here but understanding it more broadly will be useful in other CS areas as well.

A weighted similarity measure for non-conjoint rankings – Post

Monday, December 27th, 2010

A weighted similarity measure for non-conjoint rankings updates the recently published A Similarity Measure for Indefinite Rankings
and provides an implementation.

Abstract from the article:

Ranked lists are encountered in research and daily life, and it is often of interest to compare these lists, even when they are incomplete or have only some members in common. An example is document rankings returned for the same query by different search engines. A measure of the similarity between incomplete rankings should handle non-conjointness, weight high ranks more heavily than low, and be monotonic with increasing depth of evaluation; but no measure satisfying all these criteria currently exists. In this article, we propose a new measure having these qualities, namely rank-biased overlap (RBO). The RBO measure is based on a simple probabilistic user model. It provides monotonicity by calculating, at a given depth of evaluation, a base score that is non-decreasing with additional evaluation, and a maximum score that is non-increasing. An extrapolated score can be calculated between these bounds if a point estimate is required. RBO has a parameter which determines the strength of the weighting to top ranks. We extend RBO to handle tied ranks and rankings of different lengths. Finally, we give examples of the use of the measure in comparing the results produced by public search engines, and in assessing retrieval systems in the laboratory.

There are also some comments about journal publishing delays that should serve as fair warning to other authors.