Archive for the ‘Markov Decision Processes’ Category

Document Summarization via Markov Chains

Saturday, October 17th, 2015

Document Summarization via Markov Chains by Atabey Kaygun.

From the post:

Description of the problem

Today’s question is this: we have a long text and we want a machine generated summary of the text. Below, I will describe a statistical (hence language agnostic) method to do just that.

Sentences, overlaps and Markov chains.

In my previous post I described a method to measure the overlap between two sentences in terms of common words. Today, we will use the same measure, or a variation, to develop a discrete Markov chain whose nodes are labeled by individual sentences appearing in our text. This is essentially page rank applied to sentences.

Atabey says the algorithm (code supplied) works well on:

news articles, opinion pieces and blog posts.

Not so hot on Supreme Court decisions.

In commenting on a story from the New York Times, Obama Won’t Seek Access to Encrypted User Data, I suspect, Atabey says that we have no reference for “what frustrated him” in the text summary.

If you consider the relevant paragraph from the New York Times story:

Mr. Comey had expressed alarm a year ago after Apple introduced an operating system that encrypted virtually everything contained in an iPhone. What frustrated him was that Apple had designed the system to ensure that the company never held on to the keys, putting them entirely in the hands of users through the codes or fingerprints they use to get into their phones. As a result, if Apple is handed a court order for data — until recently, it received hundreds every year — it could not open the coded information.

The reference is clear. Several other people are mentioned in the New York Times article but none rank high enough to appear in the summary.

Not a sure bet but with testing, try attribution to people who rank high enough to appear in the summary.

Markov Chains in Neo4j

Monday, October 7th, 2013

Markov Chains in Neo4j by Nicole White.

From the post:

My new favorite thing lately is Neo4j, a graph database. It’s simple yet powerful: a graph database contains nodes and relationships, each which have properties. I recently made this submission to Neo4j’s GraphGist Challenge, which I did pretty well in.

After discovering Neo4j and graph databases a little over a month and a half ago, I’ve become subject to this weird syndrome where I think to myself, “Could I put that into a graph database?” with literally everything I encounter. The answer is usually yes.

Markov Chains

I realized the other day that nodes can have relationships with themselves, and for some reason, this immediately reminded me of Markov chains. The term Markov chain sounds intimidating at first (it did to me when I first saw the term on a syllabus), but they’re actually pretty simple: Markov chains consist of states and probabilities. The number of possible states is finite, and the Markov chain is a stochastic process that transitions, with certain probabilities, from one state to another over what I like to call time-steps.

The most important property of a Markov chain is that it is memoryless; that is, the probability of entering the next state depends only on the current state. We don’t care about where the process has been, only about where it is now.

If you wander over to the Wikipedia page on Markov chains, you’ll see pretty quickly why they are an obvious candidate for a graph database. The main profile picture for the page shows a Markov chain in graph form, where the states are nodes and the probabilities of transitioning from one state to another are the relationships between those nodes. The reason my realization mentioned earlier was important is that there is often a non-zero probability, given a Markov chain is in state A, that it will ‘enter’ state A in the next time-step. This is represented by a node that has a relationship with itself.

Interesting use of Neo4j to create a transition model.

Curious what you think of Nicole’s use of queries to avoid matrix multiplication?

It works but how often do you want to know the probability of one element in one state of a system?

Or would you extend the one element probability query to query more elements in a particular state?

When will my computer understand me?

Monday, June 10th, 2013

When will my computer understand me?

From the post:

It’s not hard to tell the difference between the “charge” of a battery and criminal “charges.” But for computers, distinguishing between the various meanings of a word is difficult.

For more than 50 years, linguists and computer scientists have tried to get computers to understand human language by programming semantics as software. Driven initially by efforts to translate Russian scientific texts during the Cold War (and more recently by the value of information retrieval and data analysis tools), these efforts have met with mixed success. IBM’s Jeopardy-winning Watson system and Google Translate are high profile, successful applications of language technologies, but the humorous answers and mistranslations they sometimes produce are evidence of the continuing difficulty of the problem.

Our ability to easily distinguish between multiple word meanings is rooted in a lifetime of experience. Using the context in which a word is used, an intrinsic understanding of syntax and logic, and a sense of the speaker’s intention, we intuit what another person is telling us.

“In the past, people have tried to hand-code all of this knowledge,” explained Katrin Erk, a professor of linguistics at The University of Texas at Austin focusing on lexical semantics. “I think it’s fair to say that this hasn’t been successful. There are just too many little things that humans know.”

Other efforts have tried to use dictionary meanings to train computers to better understand language, but these attempts have also faced obstacles. Dictionaries have their own sense distinctions, which are crystal clear to the dictionary-maker but murky to the dictionary reader. Moreover, no two dictionaries provide the same set of meanings — frustrating, right?

Watching annotators struggle to make sense of conflicting definitions led Erk to try a different tactic. Instead of hard-coding human logic or deciphering dictionaries, why not mine a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a weighted map of relationships — a dictionary without a dictionary?

“An intuition for me was that you could visualize the different meanings of a word as points in space,” she said. “You could think of them as sometimes far apart, like a battery charge and criminal charges, and sometimes close together, like criminal charges and accusations (“the newspaper published charges…”). The meaning of a word in a particular context is a point in this space. Then we don’t have to say how many senses a word has. Instead we say: ‘This use of the word is close to this usage in another sentence, but far away from the third use.'”

Before you jump to the post looking for the code, Erk is working with a 10,000 dimension space to analyze her data.

The most recent paper: Montague Meets Markov: Deep Semantics with Probabilistic Logical Form (2013)


We combine logical and distributional representations of natural language meaning by transforming distributional similarity judgments into weighted inference rules using Markov Logic Networks (MLNs). We show that this framework supports both judging sentence similarity and recognizing textual entailment by appropriately adapting the MLN implementation of logical connectives. We also show that distributional phrase similarity, used as textual inference rules created on the fly, improves its performance.

Random Walks on the Click Graph

Monday, April 16th, 2012

Random Walks on the Click Graph by Nick Craswell and Martin Szummer.


Search engines can record which documents were clicked for which query, and use these query-document pairs as ‘soft’ relevance judgments. However, compared to the true judgments, click logs give noisy and sparse relevance information. We apply a Markov random walk model to a large click log, producing a probabilistic ranking of documents for a given query. A key advantage of the model is its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively. We conduct experiments on click logs from image search, comparing our (‘backward’) random walk model to a different (‘forward’) random walk, varying parameters such as walk length and self-transition probability. The most effective combination is a long backward walk with high self-transition probability.

Two points that may capture your interest:

  • The model does not consider query or document content. “Just the clicks, Ma’am.”
  • Image data is said to have “less noise” since users can see thumbnails before they follow a link. (True?)

I saw this cited quite recently but it is about five years old now (2007). Any recent literature on click graphs that you would point out?

Probabilistic Graphical Models (class)

Monday, November 21st, 2011

Probabilistic Graphical Models (class) by Daphne Koller. (Stanford University)

From the web page:

What are Probabilistic Graphical Models?

Uncertainty is unavoidable in real-world applications: we can almost never predict with certainty what will happen in the future, and even in the present and the past, many important aspects of the world are not observed with certainty. Probability theory gives us the basic foundation to model our beliefs about the different possible states of the world, and to update these beliefs as new evidence is obtained. These beliefs can be combined with individual preferences to help guide our actions, and even in selecting which observations to make. While probability theory has existed since the 17th century, our ability to use it effectively on large problems involving many inter-related variables is fairly recent, and is due largely to the development of a framework known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as Bayesian networks and Markov random fields, uses ideas from discrete data structures in computer science to efficiently encode and manipulate probability distributions over high-dimensional spaces, often involving hundreds or even many thousands of variables. These methods have been used in an enormous range of application domains, which include: web search, medical and fault diagnosis, image understanding, reconstruction of biological networks, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, robot navigation, and many more. The PGM framework provides an essential tool for anyone who wants to learn how to reason coherently from limited and noisy observations.

About The Course

In this class, you will learn the basics of the PGM representation and how to construct them, using both human knowledge and machine learning techniques; you will also learn algorithms for using a PGM to reach conclusions about the world from limited and noisy evidence, and for making good decisions under uncertainty. The class covers both the theoretical underpinnings of the PGM framework and practical skills needed to apply these techniques to new problems. Topics include: (i) The Bayesian network and Markov network representation, including extensions for reasoning over domains that change over time and over domains with a variable number of entities; (ii) reasoning and inference methods, including exact inference (variable elimination, clique trees) and approximate inference (belief propagation message passing, Markov chain Monte Carlo methods); (iii) learning methods for both parameters and structure in a PGM; (iv) using a PGM for decision making under uncertainty. The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply PGM methods to computer vision, text understanding, medical decision making, speech recognition, and many other areas.

Another very strong resource from Stanford.

Serious (or aspiring) data miners will be lining up for this course!

Partially Observable Markov Decision Processes

Sunday, October 16th, 2011

Partially Observable Markov Decision Processes

From the webpage:

This web site is devoted to information on partially observable Markov decision processes.

Choose a sub-topic below::

  • POMDP Tutorial – I made a simplified POMDP tutorial a while back. It is still in a somewhat crude form, but people tell me it has served a useful purpose.
  • POMDP Papers – For research papers on POMDPs, see this page.
  • POMDP Code – In addition to the format and examples, I have C-code for solving POMDPs that is available.
  • POMDP Examples – From other literature sources and our own work, we have accumulated a bunch of POMDP examples.
  • POMDP Talks – Miscellaneous material for POMDP talks.


Well, the site has not been undated since 2009.

But, given the timeless nature of the WWW, it shows up just after the Wikipedia page entry on “Partially Observable Markov Decision Processes.” That is to say it was #2 on the list of relevant resources.

Could be that no one has been talking about POMDPs for the last two years. Except that a quick search at Citeseer shows 18 papers there with POMDP in the text.

I understand interests changing, etc. but we need to develop ways to evaluate resources for the timely nature of their data and perhaps just as importantly, to be able to keep such resources updated.

Both of those are very open issues and I am interested in any suggestions for how to approach them.

POMDPs for Dummies

Sunday, October 16th, 2011

POMDPs for Dummies: partially observable Markov decision processes (POMDPs)

From the webpage:

This is a tutorial aimed at trying to build up the intuition behind solution procedures for partially observable Markov decision processes (POMDPs). It sacrifices completeness for clarity. It tries to present the main problems geometrically, rather than with a series of formulas. In fact, we avoid the actual formulas altogether, try to keep notation to a minimum and rely on pictures to build up the intuition.

I just found this today and even with pictures it is slow going. But, I thought you might appreciate something “different” for the week. Something to read, think about, then reread.

If you are taking the Stanford AI course you may remember the mentioning of “partially observable” in week 1. There was a promise of further treatment later in the course.