Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 6, 2014

Why the Feds (U.S.) Need Topic Maps

Filed under: Data Mining,Project Management,Relevance,Text Mining — Patrick Durusau @ 7:29 pm

Earlier today I saw this offer to “license” technology for commercial development:

ORNL’s Piranha & Raptor Text Mining Technology

From the post:

UT-Battelle, LLC, acting under its Prime Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy (DOE) for the management and operation of the Oak Ridge National Laboratory (ORNL), is seeking a commercialization partner for the Piranha/Raptor text mining technologies. The ORNL Technology Transfer Office will accept licensing applications through January 31, 2014.

ORNL’s Piranha and Raptor text mining technology solves the challenge most users face: finding a way to sift through large amounts of data that provide accurate and relevant information. This requires software that can quickly filter, relate, and show documents and relationships. Piranha is JavaScript search, analysis, storage, and retrieval software for uncertain, vague, or complex information retrieval from multiple sources such as the Internet. With the Piranha suite, researchers have pioneered an agent approach to text analysis that uses a large number of agents distributed over very large computer clusters. Piranha is faster than conventional software and provides the capability to cluster massive amounts of textual information relatively quickly due to the scalability of the agent architecture.

While computers can analyze massive amounts of data, the sheer volume of data makes the most promising approaches impractical. Piranha works on hundreds of raw data formats, and can process data extremely fast, on typical computers. The technology enables advanced textual analysis to be accomplished with unprecedented accuracy on very large and dynamic data. For data already acquired, this design allows discovery of new opportunities or new areas of concern. Piranha has been vetted in the scientific community as well as in a number of real-world applications.

The Raptor technology enables Piranha to run on SharePoint and MS SQL servers and can also operate as a filter for Piranha to make processing more efficient for larger volumes of text. The Raptor technology uses a set of documents as seed documents to recommend documents of interest from a large, target set of documents. The computer code provides results that show the recommended documents with the highest similarity to the seed documents.

Gee, that sounds so very hard. Using seed documents to recommend documents “…from a large, target set of documents.”?

Many ways to do that but just looking for “Latent Dirichlet Allocation” in “.gov” domains, my total is 14,000 “hits.”

If you were paying for search technology to be developed, how many times would you pay to develop the same technology?

Just curious.

In order to have a sensible development of technology process, the government needs a topic map to track its development efforts. Not only to track but prevent duplicate development.

Imagine if every web project had to develop its own httpd server, instead of the vast majority of them using Apache HTTPD.

With a common server base, a community has developed to maintain and extend that base product. That can’t happen where the same technology is contracted for over and over again.

Suggestions on what might be an incentive for the Feds to change their acquisition processes?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress