Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 2, 2010

Apache Tika – a content analysis toolkit

Filed under: Authoring Topic Maps,Data Mining,Software — Patrick Durusau @ 7:57 pm

Apache Tika – a content analysis toolkit

From the website:

Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

Formats include:

  • HyperText Markup Language
  • XML and derived formats
  • Microsoft Office document format
  • OpenDocument Format
  • Portable Document Format
  • Electronic Publication Format
  • Rich Text Format
  • Compression and packaging formats
  • Text formats
  • Audio formats
  • Image formats
  • Video formats
  • Java class files and archives
  • The mbox format

Sounds like we are getting close to pipelines for topic map production.

Comments?

Silverlight Pivotviewer – Addendum

Filed under: Interface Research/Design,Pivotviewer,Software — Patrick Durusau @ 11:19 am

I was amused to read:

At a high-level, CXML can be thought of as a set of property/value pairings. Facets are like property values on an item, and facet categories are groups of facets. For example: if a collection had a facet category called “U.S. State,” then “Georgia” could be a facet in that category. Depending on authoring choices, these facets may be displayed as filters in the PivotViewer collection experience, or included in the details of an item. Collection XML Schema

Sounds a lot like the Topic Maps Reference Model.

Or, the game of twenty questions.

That is the subject you are identifying is broader or narrower depending upon the number of key/value pairs you specify.

The Pivotviewer allows you to go from a very broad subject to a very narrow, even a specific one by, adding on key/value pairs.

Legends enable users to arrive at the same broad or narrow subject, even if they have different key/value pairs for that subject.

Hey, that is rather neat and practical isn’t it? (Take note Lars. He knows which one.)

Will have to investigate how to combine collection XML schemas to make that point.

More to follow on this topic (sorry) anon.

Silverlight Pivotviewer – A Turning Point For Visualizing Topic Maps?

Filed under: Interface Research/Design,Pivotviewer,Software — Patrick Durusau @ 9:32 am

Andrew Townley has suggested Gary Flake: is Pivot a turning point for web exploration? as an example of “wow” factor.

From a search on Pivot I found: Silverlight Pivotviewer is no longer experimental.

On the issue of “wow” factor, I have to agree. This is truly awesome.

I am sure there are corner cases and bugs, but I think kudos are due to the developers of Silverlight Pivotviewer.

Now the question is what do “we,” as in the topic maps community, do with this nice shiny tool?

Questions:

  1. What are the factors you would consider for navigation of your topic map? (3-5 pages, no citations)
  2. How would you test your navigation choices? (3-5 pages, no citations)
  3. Demonstration of navigation of your topic map. (class demonstration)

OpenSecrets.org

Filed under: Data Source — Patrick Durusau @ 5:59 am

OpenSecrets.org

From the website:

OpenSecrets.org is your nonpartisan guide to money’s influence on U.S. elections and public policy. Whether you’re a voter, journalist, activist, student or interested citizen, use our free site to shine light on your government. Count cash and make change.

Of particular interest to topic mapppers will be their OpenSecrets Developer Tools which include:

  • APIs — for live mashups
  • OpenData — itemized tables for analysis and recombinations
  • Widgets — with the ease of cut and paste

Offers a number of interesting possibilities.

*****

I wonder if contract data is available that lists who approved contracts or was part of the approval process and the winners of those contracts, both in terms of organizations and individuals?

Seems to me that would be an interesting set of dots to put together. I will ask around.

Suggestions of data sources for other governments welcome!

Questions:

  1. Document sources of political funding for a non-US government.
  2. How would you apply topic maps to the OpenSecrets.org data? (3-5 pages, no citations)
  3. What additional data would you include in your topic map in #2? (3-5 pages, list sources of other data)

December 1, 2010

*Sparsity technologies – vendor

Filed under: Graphs,NoSQL — Patrick Durusau @ 2:43 pm

*Sparsity technologies

I encountered this site on a nosql database mailing list.

Of particular interest is their *dex graph database.

From the website:

A DEX graph is a Labeled Directed Attributed Multigraph. Labeled because nodes and edges in a graph belong to types. Directed because it supports directed edges as well as undirected. An attributed graph allows a variable list of attributes for each node and edge, where an attribute is a value associated to a name, simplifying the graph structure. A multigraph allows multiple edges between two nodes. This means that two nodes can be connected several times by different edges, even if two edges have the same tail, head and label.

There is a free non-commercial use version that allows up to a million nodes and unlimited edges.

I haven’t looked at it yet nor do I have any relationship with the company. I mention it as an FYI item for the moment.

I will be suggesting to them that topic maps would allow them to be a little more specific than: Allows to bring together content from multiple sources.

Hidden Video Courses in Math, Science and Engineering

Filed under: CS Lectures — Patrick Durusau @ 2:08 pm

Hidden Video Courses in Math, Science and Engineering

Pete Skomoroch complied this list of video lectures, some of which are not easy to find. Including lectures by Knuth on the internals of TeX.

A number of the CS lectures are on topic relevant to topic maps.

The holiday season is coming on and you won’t be able to stay at your keyboard/terminal with relatives in the house.

You can always listen to a CS/Math lecture or two on your cellphone.

Your distracted state will convince everyone that you are really concentrating on the game on TV. 😉

Campaign Finance API (US-centric)

Filed under: Data Source — Patrick Durusau @ 1:48 pm

Campaign Finance API (US-centric)

The New York Times sponsors an API that accesses United States Federal Election Commission filings. Requires registration but is otherwise free. There are some limits on queries, etc.

I mention it because topic map applications that “tag” (in another sense of the word) candidates with particular contributions and legislation need to start sooner rather than later.

The 2012 election cycle (US), will be here sooner than you expect.

BTW, similar data sources for other countries would be good to bring to the attention of the topic mapping community.

Semantic Ambiguity and Perceived Ambiguity

Filed under: Ambiguity — Patrick Durusau @ 1:14 pm

Semantic Ambiguity and Perceived Ambiguity by Massimo Poesio.

Abstract:

I explore some of the issues that arise when trying to establish a connection between the underspecification hypothesis pursued in the NLP literature and work on ambiguity in semantics and in the psychological literature. A theory of underspecification is developed `from the first principles’, i.e., starting from a definition of what it means for a sentence to be semantically ambiguous and from what we know about the way humans deal with ambiguity. An underspecified language is specified as the translation language of a grammar covering sentences that display three classes of semantic ambiguity: lexical ambiguity, scopal ambiguity, and referential ambiguity. The expressions of this language denote sets of senses. A formalization of defeasible reasoning with underspecified representations is presented, based on Default Logic. Some issues to be confronted by such a formalization are discussed.

Practice is grounded on actual experience (“the burnt hand learns best”) and on understanding the nature of the task and applying that understanding. Neither is really complete without the other.

Poesio’s paper makes for good mental exercise and hopefully deeper insight into the difficulties that surround ambiguity and its reduction.

Semantic Overlay Networks for P2P Systems

Filed under: Semantic Overlay Network,TMQL — Patrick Durusau @ 1:06 pm

Semantic Overlay Networks for P2P Systems Authors: Garcia-Molina, Hector and Crespo, Arturo

Date: 2003

Abstract:

In a peer-to-peer (P2P) system, nodes typically connect to a small set of random nodes (their neighbors), and queries are propagated along these connections. Such query flooding tends to be very expensive. We propose that node connections be influenced by content, so that for example, nodes having many “Jazz” files will connect to other similar nodes. Thus, semantically related nodes form a Semantic Overlay Network (SON). Queries are routed to the appropriate SONs, increasing the chances that matching files will be found quickly, and reducing the search load on nodes that have unrelated content. We have evaluated SONs by using an actual snapshot of music-sharing clients. Our results show that SONs can significantly improve query performance while at the same time allowing users to decide what content to put in their computers and to whom to connect.

The root article for the term Semantic Overlay Network that I mentioned last summer, Semantic Overlay Networks.

The emphasis on query and query efficiency seems particularly relevant for work on TMQL.

« Newer Posts

Powered by WordPress