Archive for the ‘Bayesian Models’ Category

Bayesian identity resolution – Post

Friday, February 11th, 2011

Bayesian identity resolution

Lars Marius Garshol walks through finding duplicate records in data records.

As Lars notes, there are commercial products for this same task but I think this is a useful exercise.

Isn’t that hard to imagine the creation of test data sets with a variety of conditions to underscore lessons about detecting duplicate records.

I suspect such training data may already be available.

Will have to see what I can find and post about it.

*****
PS: Lars is primary editor of the TMDM, working on TMCL and several other parts of the topic maps standard.

Which Automatic Differentiation Tool for C/C++?

Tuesday, February 8th, 2011

Which Automatic Differentiation Tool for C/C++?

OK, not immediately obvious why this is relevant to topic maps.

Nor is Bob Carpenter’s references:

I’ve been playing with all sorts of fun new toys at the new job at Columbia and learning lots of new algorithms. In particular, I’m coming to grips with Hamiltonian (or hybrid) Monte Carlo, which isn’t as complicated as the physics-based motivations may suggest (see the discussion in David MacKay’s book and then move to the more detailed explanation in Christopher Bishop’s book).

particularly useful.

I suspect the two book references are:

but I haven’t asked. In part to illustrate the problem of resolving any entity reference. Both authors have authored other books touching on the same subjects so my guesses may or may not be correct.

Oh, relevance to topic maps. The technique automatic differentiation is used in Hamiltonian Monte Carlo methods for the generation of gradients. Still not helpful? Isn’t to me either.

Ah, what about Bayesian models in IR? That made the light go on!

I will be discussing ways to show more immediate relevance to topic maps, at least for some posts, in post #1000.

It isn’t as far away as you might think.

PyBrain: The Python Machine Learning Library

Thursday, February 3rd, 2011

PyBrain: The Python Machine Learning Library

From the website:

PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive “Backronym”.

How is PyBrain different?

While there are a few machine learning libraries out there, PyBrain aims to be a very easy-to-use modular library that can be used by entry-level students but still offers the flexibility and algorithms for state-of-the-art research. We are constantly working on more and faster algorithms, developing new environments and improving usability.

What PyBrain can do

PyBrain, as its written-out name already suggests, contains algorithms for neural networks, for reinforcement learning (and the combination of the two), for unsupervised learning, and evolution. Since most of the current problems deal with continuous state and action spaces, function approximators (like neural networks) must be used to cope with the large dimensionality. Our library is built around neural networks in the kernel and all of the training methods accept a neural network as the to-be-trained instance. This makes PyBrain a powerful tool for real-life tasks.

Another tool kit to assist in the construction of topic maps.

And another likely contender for the Topic Map Competition!

MALLET: MAchine Learning for LanguagE Toolkit
Topic Map Competition (TMC) Contender?

Thursday, February 3rd, 2011

MALLET: MAchine Learning for LanguagE Toolkit

From the website:

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. These methods are implemented in an extensible system for finite state transducers.

Topic models are useful for analyzing large collections of unlabeled text. The MALLET topic modeling toolkit contains efficient, sampling-based implementations of Latent Dirichlet Allocation, Pachinko Allocation, and Hierarchical LDA.

Many of the algorithms in MALLET depend on numerical optimization. MALLET includes an efficient implementation of Limited Memory BFGS, among many other optimization methods.

In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into numerical representations that can then be processed efficiently. This process is implemented through a flexible system of “pipes”, which handle distinct tasks such as tokenizing strings, removing stopwords, and converting sequences into count vectors.

An add-on package to MALLET, called GRMM, contains support for inference in general graphical models, and training of CRFs with arbitrary graphical structure.

Another tool to assist in the authoring of a topic map from a large data set.

It would be interesting but beyond the scope of the topic maps class, to organize a competition around several of the natural language processing packages.

To have a common data set, to be released on X date, with topic maps due say within 24 hours (there is a TV show with that in the title or so I am told).

Will have to give that some thought.

Could be both interesting and entertaining.

Modeling Social Annotation: A Bayesian Approach

Monday, January 3rd, 2011

Modeling Social Annotation: A Bayesian Approach Authors: Anon Plangprasopchok, Kristina Lerman

Abstract:

Collaborative tagging systems, such as Delicious, CiteULike, and others, allow users to annotate resources, for example, Web pages or scientific papers, with descriptive labels called tags. The social annotations contributed by thousands of users can potentially be used to infer categorical knowledge, classify documents, or recommend new relevant information. Traditional text inference methods do not make the best use of social annotation, since they do not take into account variations in individual users’ perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes the interests of individual annotators into account in order to find hidden topics of annotated resources. Unfortunately, that approach had one major shortcoming: the number of topics and interests must be specified a priori. To address this drawback, we extend the model to a fully Bayesian framework, which offers a way to automatically estimate these numbers. In particular, the model allows the number of interests and topics to change as suggested by the structure of the data. We evaluate the proposed model in detail on the synthetic and real-world data by comparing its performance to Latent Dirichlet Allocation on the topic extraction task. For the latter evaluation, we apply the model to infer topics of Web resources from social annotations obtained from Delicious in order to discover new resources similar to a specified one. Our empirical results demonstrate that the proposed model is a promising method for exploiting social knowledge contained in user-generated annotations.

Questions:

  1. How does (if it does) a tagging vocabulary different from a regular vocabulary? (3-5 pages, no citations)
  2. Would this technique be application to tracing vocabulary usage across cited papers? In other words, following an author backwards through materials they cite? (3-5 pages, no citations)
  3. What other characteristics do you think a paper would have where the usage of a term had shifted to a different meaning? (3-5 pages, no citations)

Inductive Logic Programming (and Martian Identifications)

Thursday, December 30th, 2010

Inductive Logic Programming: Theory and Methods Authors: Stephen Muggleton, Luc De Raedt

Abstract:

Inductive Logic Programming (ILP) is a new discipline which investigates the inductive construction of first-order clausal theories from examples and background knowledge. We survey the most important theories and methods of this new eld. Firstly, various problem specifications of ILP are formalised in semantic settings for ILP, yielding a “model-theory” for ILP. Secondly, a generic ILP algorithm is presented. Thirdly, the inference rules and corresponding operators used in ILP are presented, resulting in a “proof-theory” for ILP. Fourthly, since inductive inference does not produce statements which are assured to follow from what is given, inductive inferences require an alternative form of justification. This can take the form of either probabilistic support or logical constraints on the hypothesis language. Information compression techniques used within ILP are presented within a unifying Bayesian approach to confirmation and corroboration of hypotheses. Also, different ways to constrain the hypothesis language, or specify the declarative bias are presented. Fifthly, some advanced topics in ILP are addressed. These include aspects of computational learning theory as applied to ILP, and the issue of predicate invention. Finally, we survey some applications and implementations of ILP. ILP applications fall under two different categories: firstly scientific discovery and knowledge acquisition, and secondly programming assistants.

A good survey of Inductive Logic Programming (ILP) if a bit dated. Feel free to suggest more recent surveys of the area.

As I mentioned under Mining Travel Resources on the Web Using L-Wrappers, the notion of interpretative domains is quite interesting.

I suspect, but cannot prove (at least at this point), that most useful mappings exist between closely related interpretative domains.

Closely related interpretative domains being composed of identifications of a subject that I will quickly recognize as alternative identifications.

Showing me a mapping that includes a Martian identification of my subject, which is not a closely related interpretative domain is unlikely to be useful, at least to me. (I can’t speak for any potential Martians.)

Dirichlet Processes: Tutorial and Practical Course

Tuesday, December 21st, 2010

Dirichlet Processes: Tutorial and Practical Course Author: Yee Whye Teh
Slides
Paper

Abstract:

The Bayesian approach allows for a coherent framework for dealing with uncertainty in machine learning. By integrating out parameters, Bayesian models do not suffer from overfitting, thus it is conceivable to consider models with infinite numbers of parameters, aka Bayesian nonparametric models. An example of such models is the Gaussian process, which is a distribution over functions used in regression and classification problems. Another example is the Dirichlet process, which is a distribution over distributions. Dirichlet processes are used in density estimation, clustering, and nonparametric relaxations of parametric models. It has been gaining popularity in both the statistics and machine learning communities, due to its computational tractability and modelling flexibility.

In the tutorial I shall introduce Dirichlet processes, and describe different representations of Dirichlet processes, including the Blackwell-MacQueen? urn scheme, Chinese restaurant processes, and the stick-breaking construction. I shall also go through various extensions of Dirichlet processes, and applications in machine learning, natural language processing, machine vision, computational biology and beyond.

In the practical course I shall describe inference algorithms for Dirichlet processes based on Markov chain Monte Carlo sampling, and we shall implement a Dirichlet process mixture model, hopefully applying it to discovering clusters of NIPS papers and authors.

With the last two posts, that is almost 8 hours of video for streaming to your new phone or other personal device.

That should get you past even a Christmas day sports marathon at your in-laws house (or your own should they be visiting).

Bayesian inference and Gaussian processes – In six (6) parts

Tuesday, December 21st, 2010

Bayesian inference and Gaussian processes Authors: Carl Edward Rasmussen

Quite useful as the presenter concludes with disambiguating terminology used differently in the field. Same terms used to mean different things, different terms to mean the same thing. Hmmm, that sounds really familiar. ;-)

Start with this lecture before Dirichlet Processes: Tutorial and Practical Course

BTW, if this seems a bit AI-ish, consider it to be the reverse of supervised classification (person helps machine), that is machine helps person, but the person should say when answer is correct.

Graphical Models

Tuesday, December 21st, 2010

Graphical Models Author: Zoubin Ghahramani

Abstract:

An introduction to directed and undirected probabilistic graphical models, including inference (belief propagation and the junction tree algorithm), parameter learning and structure learning, variational approximations, and approximate inference.

  • Introduction to graphical models: (directed, undirected and factor graphs; conditional independence; d-separation; plate notation)
  • Inference and propagation algorithms: (belief propagation; factor graph propagation; forward-backward and Kalman smoothing; the junction tree algorithm)
  • Learning parameters and structure: maximum likelihood and Bayesian parameter learning for complete and incomplete data; EM; Dirichlet distributions; score-based structure learning; Bayesian structural EM; brief comments on causality and on learning undirected models)
  • Approximate Inference: (Laplace approximation; BIC; variational Bayesian EM; variational message passing; VB for model selection)
  • Bayesian information retrieval using sets of items: (Bayesian Sets; Applications)
  • Foundations of Bayesian inference: (Cox Theorem; Dutch Book Theorem; Asymptotic consensus and certainty; choosing priors; limitations)

Start with this lecture before Dirichlet Processes: Tutorial and Practical Course

Emergent Semantics

Thursday, December 16th, 2010

Philippe Cudré-Mauroux Video, Slides from SOKS: Self-Organising Knowledge Systems, Amsterdam, 29 April 2010

Abstract:

Emergent semantics refers to a set of principles and techniques analyzing the evolution of decentralized semantic structures in large scale distributed information systems. Emergent semantics approaches model the semantics of a distributed system as an ensemble of relationships between syntactic structures.

They consider both the representation of semantics and the discovery of the proper interpretation of symbols as the result of a self-organizing process performed by distributed agents exchanging symbols and having utilities dependent on the proper interpretation of the symbols. This is a complex systems perspective on the problem of dealing with semantics.

A “must see” presentation!

More comments/questions to follow.

*****
Apologies but content/postings will be slow starting today, for a few days. Diagnostic on left hand has me doing hunt-and-peck with my right.

Machine Learning and Data Mining with R – Post

Monday, December 13th, 2010

Machine Learning and Data Mining with R

Announcement of course notes and slides, plus live classes in San Francisco, January 2012, courtesy of the Revolutions blog from Revolution Analytics.

Check the post for details and links.

Bayesian Model Selection and Statistical Modeling – Review

Wednesday, December 8th, 2010

Bayesian Model Selection and Statistical Modeling by Tomohiro Ando, reviewed by Christian P. Robert.

If you are planning on using Bayesian models in your topic maps activities, read this review first.

You will thank the reviewer later.