Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 10, 2011

The Language Problem: Jaguars & The Turing Test

Filed under: Ambiguity,Authoring Topic Maps,Indexing,Language — Patrick Durusau @ 6:10 pm

The Language Problem: Jaguars & The Turing Test by Gord Hotchkiss.

The post begins innocently enough:

“I love Jaguars!”

When I ask you to understand that sentence, I’m requiring you to take on a pretty significant undertaking, although you do it hundreds of times each day without really thinking about it.

The problem comes with the ambiguity of words.

If you appreciate discussions of language, meaning and the short falls of our computing companions, you will really like this article and the promised following posts.

Not to mention bringing into sharp relief the issues that topic map authors (or indexers) face when trying to specify a subject that will be recognized and used by N unknown users.

I suppose that is really the tricky part, or at least part of it, the communication channel for an index or topic map is only one way. There is no opportunity for correcting a reading/mis-reading by the author. All that lies with the user/reader alone.

July 14, 2011

Computer learns language by playing games

Filed under: Artificial Intelligence,Language — Patrick Durusau @ 4:12 pm

Computer learns language by playing games

From the post:

Computers are great at treating words as data: Word-processing programs let you rearrange and format text however you like, and search engines can quickly find a word anywhere on the Web. But what would it mean for a computer to actually understand the meaning of a sentence written in ordinary English — or French, or Urdu, or Mandarin?

One test might be whether the computer could analyze and follow a set of instructions for an unfamiliar task. And indeed, in the last few years, researchers at MIT’s Computer Science and Artificial Intelligence Lab have begun designing machine-learning systems that do exactly that, with surprisingly good results.

The original paper, Learning to Win by Reading Manuals in a Monte-Carlo Framework by S.R.K. Branavan, David Silver, and, Regina Barzilay reports:

Abstract:

This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal is to enrich a stochastic player with high level guidance expressed in text. Our model jointly learns to identify text that is relevant to a given game state in addition to learning game strategies guided by the selected text. Our method operates in the Monte-Carlo search framework, and learns both text analysis and game strategies based only on environment feedback. We apply our approach to the complex strategy game Civilization II using the official game manual as the text guide. Our results show that a linguistically-informed game-playing agent significantly outperforms its language-unaware counterpart, yielding a 27% absolute improvement and winning over 78% of games when playing against the built-in AI of Civilization II.

Deeply interesting work, particularly as assistive authoring for topic maps.

Think about the number of recipes, manuals, IETMs, etc., that are in electronic format. Identifying common steps, despite differences in descriptions, could be quite useful.

Come to think of it, most regulations and laws are written that way. Imagine the difference between a strictly textual search of legal resources and a semantically aware search of legal resources? Not today or tomorrow, but given the rate of progression, that sort of killer app may not be far off.

Do you want to be selling it or buying it?


Sci-fi fans take note:

Games are won by gaining control of the entire world map.

July 8, 2011

Languages of the World (Wide Web)
(Google Research Blog)

Filed under: Language — Patrick Durusau @ 3:54 pm

Languages of the World (Wide Web)

Interesting post about linking between sites in different languages, if using somewhat outdated (2008) data. The authors allude to later data but give no specifics.

I mention it here as an example of where different subjects (the websites in particular languages), are treated as collective subjects for the purpose of examining links (associations in topic map speak) between the collective subjects.

Or as described by the authors:

To see the connections between languages, start by taking the several billion most important pages on the web in 2008, including all pages in smaller languages, and look at the off-site links between these pages. The particular choice of pages in our corpus here reflects decisions about what is `important’. For example, in a language with few pages every page is considered important, while for languages with more pages some selection method is required, based on pagerank for example.

We can use our corpus to draw a very simple graph of the web, with a node for each language and an edge between two languages if more than one percent of the offsite links in the first language land on pages in the second. To make things a little clearer, we only show the languages which have at least a hundred thousand pages and have a strong link with another language, meaning at least 1% of off-site links go to that language. We also leave out English, which we’ll discuss more in a moment. (Figure 1)

Being able to decompose the collective subjects to reveal numbers for sites in particular locations or particular sites would have made this study more compelling.

June 12, 2011

A Few Subjects Go A Long Way

Filed under: Data Analysis,Language,Linguistics,Text Analytics — Patrick Durusau @ 4:11 pm

A post by Rich Cooper (Rich AT EnglishLogicKernel DOT com) Analyzing Patent Claims demonstrates the power of small vocabularies (sets of subjects) for the analysis of patent claims.

It is a reminder that a topic map author need not identify every possible subject, but only so many of those as necessary. Other subjects abound and await other authors who wish to formally recognize them.

It is also a reminder that a topic map need only be as complex or as complete as necessary for a particular task. My topic map may not be useful for Mongolian herdsmen or even the local bank. But, the test isn’t an abstract but a practical. Does it meet the needs of its intended audience?

May 19, 2011

Kill Math

Filed under: Interface Research/Design,Language,Natural Language Processing,Semantics — Patrick Durusau @ 3:27 pm

Kill Math

Bret Victor writes:

The power to understand and predict the quantities of the world should not be restricted to those with a freakish knack for manipulating abstract symbols.

When most people speak of Math, what they have in mind is more its mechanism than its essence. This “Math” consists of assigning meaning to a set of symbols, blindly shuffling around these symbols according to arcane rules, and then interpreting a meaning from the shuffled result. The process is not unlike casting lots.

This mechanism of math evolved for a reason: it was the most efficient means of modeling quantitative systems given the constraints of pencil and paper. Unfortunately, most people are not comfortable with bundling up meaning into abstract symbols and making them dance. Thus, the power of math beyond arithmetic is generally reserved for a clergy of scientists and engineers (many of whom struggle with symbolic abstractions more than they’ll actually admit).

We are no longer constrained by pencil and paper. The symbolic shuffle should no longer be taken for granted as the fundamental mechanism for understanding quantity and change. Math needs a new interface.

A deeply interesting post that argues that Math needs a new interface, one more accessible to more people.

Since computers can present mathematical concepts and operations in visual representations.

Ironic the same computers gave rise to impoverished and difficult to use (for most people) representations of semantics.

Moving away from the widely adopted, easy to use and flexible representations of semantics in natural languages.

Do we need an old interface for semantics?

May 16, 2011

Can robots create their own language?

Filed under: Artificial Intelligence,Language — Patrick Durusau @ 3:25 pm

Can robots create their own language?

Another Luc Steels item from Robert Cerny:

Note

An overview over Luc Steels work including video demonstrations of robots playing the Naming Game.

Quote

What do we need to put in [the robots] so that they would self-organize a symbolic communication system?

Video of robots playing naming game.

Thought occurs to me, what if we had a video of people playing the naming game?

Or better yet, the subject identification game?

Or better still, the two instances are the same subject game?

Or are all three of those the same games?

The Recruitment Theory of Language Origins

Filed under: Artificial Intelligence,Language — Patrick Durusau @ 3:20 pm

The Recruitment Theory of Language Origins

An entry by Robert Cerny to Luc Steels’ acquisition of language research:

Note

Section 5, titled “The Naming Challenge”, describes a game in the field of robotics where agents need to find a way to communicate about a set of objects. This game is known as the “Naming Game”. It is interesting to look at these insights with a Topic Mappish mindset. It also confirms my point that subject descriptions decay in space and time.


Quote

Clearly every human language has a way to name individual objects or more generally categories to identify classes of objects. Computer simulations have already been carried out to determine what strategy for tackling this naming challenge could have become innate through natural selection or how a shared lexicon could arise through a simulation of genetic evolution.


The recruitment theory argues instead that each agent should autonomonously discover strategies that allow him to successfully build up and negotiate a shared lexicon in peer-to-peer interaction and that the emerging lexicon is a temporal cultural consensus which is culturally transmitted.

Follow the link at Robert’s post to read Steels paper in full. It’s important.

April 11, 2011

Probabilistic Models in the Study of Language

Filed under: Language,Probalistic Models — Patrick Durusau @ 5:41 am

Probabilistic Models in the Study of Language

From the website:

I’m in the process of writing a textbook on the topic of using probabilistic models in scientific work on language ranging from experimental data analysis to corpus work to cognitive modeling. A current (partial) draft is available here. The intended audience is graduate students in linguistics, psychology, cognitive science, and computer science who are interested in using probabilistic models to study language. Feedback (both comments on existing drafts, and expressed desires for additional material to include!) is more than welcome — send it to rlevy@ucsd.edu.

Just scanning the chapters titles this looks like a useful work for anyone concerned with language issues.

« Newer Posts

Powered by WordPress