factorie: Probabilistic programming with imperatively-defined factor graphs
The website says factorie has been applied to:
FACTORIE has been successfully applied to various tasks in natural language processing and information integration, including
- named entity recognition
- entity resolution
- relation extraction
- parsing
- schema matching
- ontology alignment
- latent-variable generative models, including latent Dirichlet allocation.
Sound like topic map tasks to me!
Currently at version 0.90 but the website indicates the the project is planning on a 1.0 release in early 2011.
Just so you know what you are looking forward to:
FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference. Key features:
- It is object-oriented, enabling encapsulation, abstraction and inheritance in the definition of random variables, factors, inference and learning methods.
- It is scalable, with demonstrated success on problems with many millions of variables and factors, and on models that have changing structure, such as case factor diagrams. It has also been plugged into a database back-end, representing a new approach to probabilistic databases capable of handling billions of variables.
- It is flexible, supporting multiple modeling and inference paradigms. Its original emphasis was on conditional random fields, undirected graphical models, MCMC inference, online training, and discriminative parameter estimation. However, it now also supports directed generative models (such as latent Dirichlet allocation), and has preliminary support for variational inference, including belief propagation and mean-field methods.
- It is embedded into a general purpose programming language, providing model authors with familiar and extensive resources for implementing the procedural aspects of their solution, including the ability to beneficially mix data pre-processing, diagnostics, evaluation, and other book-keeping code in the same files as the probabilistic model specification.
- It allows the use of imperative (procedural) constructs to define the factor graph—an unusual and powerful facet that enables significant efficiencies and also supports the injection of both declarative and procedural domain knowledge into model design.
The structure of generative models can be expressed as a program that describes the generative storyline. The structure undirected graphical models can be specified in an entity-relationship language, in which the factor templates are expressed as compatibility functions on arbitrary entity-relationship expressions; alternatively, factor templates may also be specified as formulas in first-order logic. However, most generally, data can be stored in arbitrary data structures (much as one would in deterministic programming), and the connectivity patterns of factor templates can be specified in a Turing-complete imperative style. This usage of imperative programming to define various aspects of factor graph construction and operation is an innovation originated in FACTORIE; we term this approach imperatively-defined factor graphs. The above three methods for specifying relational factor graph structure can be mixed in the same model.