Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 25, 2010

LingPipe Blog

Filed under: Data Mining,Natural Language Processing,Text Analytics — Patrick Durusau @ 11:07 am

LingPipe Blog: Natural Language Processing and Text Analytics

Blog for the LingPipe Toolkit.

If you want to move beyond hand-authored topic maps, NLP and other techniques are in your future.

Imagine using LingPipe to generate entity profiles that you then edit (or not) and market for particular data resources.

On entity profiles, see: Sig.ma.

November 24, 2010

Text Analysis with LingPipe 4. Draft 0.2

Filed under: Data Mining,Natural Language Processing,Text Analytics — Patrick Durusau @ 9:53 am

Text Analysis with LingPipe 4. Draft 0.2

Draft 0.2 is up to 363 pages.

Chapters:

  1. Getting Started
  2. Characters and Strings
  3. Regular Expressions
  4. Input and Output
  5. Handlers, Parsers, and Corpora
  6. Classifiers and Evaluation
  7. Naive Bayes Classifiers (not done)
  8. Tokenization
  9. Symbol Tables
  10. Sentence Boundary Detection (not done)
  11. Latent Dirichlet Allocation
  12. Singular Value Decomposition (not done)

Extensive annexes.

Projected to see another 1,000 or so pages. So the (not done) chapters will appear along with additional material in other chapters.

Readers welcome!

Christmas came early this year!

Questions:

  1. Class presentation demonstrating use of one of the techniques on library related data set.
  2. Compare and contrast two of the techniques on a library related data set. (Project)
  3. Annotated and updated bibliography for any chapter.

Update: Same questions as before but look at the updated version of the book (split into text processing and NLP as separate parts): LingPipe and Text Processing Books.

October 18, 2010

The X Factor of Information Systems

Filed under: Information Retrieval,Natural Language Processing,Semantics — Patrick Durusau @ 5:02 am

David Segal’s “The X Factor of Economics,” NYT, Sunday, October 17, 2010, Week in Review, concludes that standard economic models don’t account for one critical factor.

Economics can be dressed up in mathematical garb, with after the fact precision, but the X factor causes it to lack before the fact precision. Precision? Seems like an inadequate term for a profession that can’t agree on what has happened, is in fact happening, much less what is about to happen.

But in any event, the X factor? That would be us, people.

People who gleefully buy, save, work, rest and generally live our lives without any regard for theories of economic behavior.

The same people who live without any regard for theories of semantics.

People are the X factor in information systems.

Just a caution to take into account when evaluating information, metadata or semantic systems.

October 17, 2010

IEEE Computer Society Technical Committee on Semantic Computing (TCSEM)

The IEEE Computer Society Technical Committee on Semantic Computing (TCSEM)

addresses the derivation and matching of the semantics of computational content to that of naturally expressed user intentions in order to retrieve, manage, manipulate or even create content, where “content” may be anything including video, audio, text, software, hardware, network, process, etc.

Being organized by Phillip C-Y Sheu (UC Irvine), psheu@uci.edu, Phone: +1 949 824 2660. Volunteers are needed for both organizational and technical committees.

This is a good way to meet people, make a positive contribution and, have a lot of fun.

October 4, 2010

Finding your way in a multi-dimensional semantic space with Luminoso

Filed under: Clustering,Interface Research/Design,Natural Language Processing — Patrick Durusau @ 4:53 am

Finding your way in a multi-dimensional semantic space with luminoso Authors: Robert H. Speer, Catherine Havasi, K. Nichole Treadway, Henry Lieberman Keywords: common sense, n-dimensional visualization, natural language processing, SVD

Abstract:

In AI, we often need to make sense of data that can be measured in many different dimensions — thousands of dimensions or more — especially when this data represents natural language semantics. Dimensionality reduction techniques can make this kind of data more understandable and more powerful, by projecting the data into a space of many fewer dimensions, which are suggested by the computer. Still, frequently, these results require more dimensions than the human mind can grasp at once to represent all the meaningful distinctions in the data.

We present Luminoso, a tool that helps researchers to visualize and understand a multi-dimensional semantic space by exploring it interactively. It also streamlines the process of creating such a space, by inputting text documents and optionally including common-sense background information. This interface is based on the fundamental operation of “grabbing” a point, which simultaneously allows a user to rotate their view using that data point, view associated text and statistics, and compare it to other data points. This also highlights the point’s neighborhood of semantically-associated points, providing clues for reasons as to why the points were classified along the dimensions they were. We show how this interface can be used to discover trends in a text corpus, such as free-text responses to a survey.

I particularly like the interactive rotation about a data point.

Makes me think of rotating identifications or even within complexes of subjects.

The presentation of “rotation” I suspect to be domain specific.

The “geek” graph/node presentation probably isn’t the best one for all audiences. Open question as to what might work better.

See: Luminoso (homepage) and Luminoso (Github)

September 29, 2010

Natural Language Toolkit

Natural Language Toolkit is a set of Python modules for natural language processing and text analytics. Brought to my attention by Kirk Lowery.

Two near term tasks come to mind:

  • Feature comparison to LingPipe
  • Finding linguistic software useful for topic maps

Suggestions of other toolkits welcome!

« Newer Posts

Powered by WordPress