Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 30, 2013

Probabilistic Programming and Bayesian Methods for Hackers

Probabilistic Programming and Bayesian Methods for Hackers

From the webpage:

Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author’s own prior opinion.

After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight-days of reading examples and trying to put the pieces together to understand the methods. There was simplely not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming. That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap.

DARPA (Logic and Probabilistic Programming) should be glad that someone else is working on probabilistic programming.

I first saw this at Nat Torkington’s Four short links: 29 March 2103.

March 6, 2012

Stanford – Delayed Classes – Enroll Now!

If you have been waiting for notices about the delayed Stanford courses for Spring 2012, your wait is over!

Even if you signed up for more information, you must register at the course webpage to take the course.

Details as I have them on 6 March 2012 (check course pages for official information):

Cryptography Starts March 12th.

Design and Analysis of Algorithms Part 1 Starts March 12th.

Game Theory Starts March 19th.

Natural Language Processing Starts March 12th.

Probabilistic Graphical Models Starts March 19th.

You may be asking yourself, “Are all these courses useful for topic maps?”

I would answer by pointing out that librarians and indexers have rely on a broad knowledge of the world to make information more accessible to users.

By way of contrast, “big data” and Google, have made it less accessible.

Something to think about while you are registering for one or more of these courses!

January 30, 2012

Topic maps and graphical structures

Filed under: Graphs,Probabilistic Graphical Models,Topic Maps — Patrick Durusau @ 8:02 pm

Topic maps and graphical structures

Interesting webpage that explores the potential for adding probabilistic measures and operators to topic maps.

Moreover, it points out the lack of benchmarks for topic maps.

You might want to make note the last update was 4 November 2000.

Anyone care to point out any work on benchmarks for topic maps?

Suggestions for how to formulate benchmarks for topic maps?

Questions to myself would include:

  • Is the topic map being generated from source or is this a pre-created topic map being loaded into a topic map engine?
  • If a pre-created topic map, what syntax and/or data model is being tested?
  • What information items in the topic map will meet merging requirements? (by overall percentage and per item)
  • If created from source, what set of subjects need to result in items?
  • Use a common memory size/setting for comparisons.
  • Can we use existing corpora and tests to bootstrap topic map benchmarks?

What others would you ask?

December 7, 2011

Dr. Watson?

I got up thinking that there needs to be a project for automated authoring of a topic map and the name, Dr. Watson suddenly occurred to me. After all, Dr. Watson was Sherlock Holmes’ sidekick so it would not be like saying it could stand on its own. Plus there would be some name recognition and/or confusion with the real Dr. Watson, or rather imaginary Dr. Watson of Sherlock Holmes’ fame.

And there would be confusion with the Dr. Watson that is the internal debugger for Windows (MS, I never can remember if the ™ goes on Windows or MS. Not that anyone else would want to call themselves MS. 😉 ) Plus the Watson research center at IBM.

Well, I suspect being an automated, probabilistic topic map authoring system will be enough to distinguish it from the foregoing uses.

Any others that I should be aware of?

I say probabilistic because even with the TMDM’s string matching on URIs, it is only probable that two or more topics actually represent the same topic. It is always possible that a URI has been incorrectly used to identity the subject that a topic represents. And in such cases, the error perpetuates itself across a topic map.

So we start off with the realization that even string matching results in a probability of less than 1.0 (where 1.0 is absolute certainty) that two or more topics represent the same subject.

Since we are already being probabilistic, why not be openly so?

But, before we get into the weeds and details, the project has to have a cool name. (As in not an acronym that is cool and we make up a long name to fit the acronym.)

All those in favor of Dr. Watson, please signify by raising your hands (or the beer you are holding).

More to follow.

November 21, 2011

Probabilistic Graphical Models (class)

Probabilistic Graphical Models (class) by Daphne Koller. (Stanford University)

From the web page:

What are Probabilistic Graphical Models?

Uncertainty is unavoidable in real-world applications: we can almost never predict with certainty what will happen in the future, and even in the present and the past, many important aspects of the world are not observed with certainty. Probability theory gives us the basic foundation to model our beliefs about the different possible states of the world, and to update these beliefs as new evidence is obtained. These beliefs can be combined with individual preferences to help guide our actions, and even in selecting which observations to make. While probability theory has existed since the 17th century, our ability to use it effectively on large problems involving many inter-related variables is fairly recent, and is due largely to the development of a framework known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as Bayesian networks and Markov random fields, uses ideas from discrete data structures in computer science to efficiently encode and manipulate probability distributions over high-dimensional spaces, often involving hundreds or even many thousands of variables. These methods have been used in an enormous range of application domains, which include: web search, medical and fault diagnosis, image understanding, reconstruction of biological networks, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, robot navigation, and many more. The PGM framework provides an essential tool for anyone who wants to learn how to reason coherently from limited and noisy observations.

About The Course

In this class, you will learn the basics of the PGM representation and how to construct them, using both human knowledge and machine learning techniques; you will also learn algorithms for using a PGM to reach conclusions about the world from limited and noisy evidence, and for making good decisions under uncertainty. The class covers both the theoretical underpinnings of the PGM framework and practical skills needed to apply these techniques to new problems. Topics include: (i) The Bayesian network and Markov network representation, including extensions for reasoning over domains that change over time and over domains with a variable number of entities; (ii) reasoning and inference methods, including exact inference (variable elimination, clique trees) and approximate inference (belief propagation message passing, Markov chain Monte Carlo methods); (iii) learning methods for both parameters and structure in a PGM; (iv) using a PGM for decision making under uncertainty. The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply PGM methods to computer vision, text understanding, medical decision making, speech recognition, and many other areas.

Another very strong resource from Stanford.

Serious (or aspiring) data miners will be lining up for this course!

Powered by WordPress