Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 27, 2011

Flapjax

Filed under: Software,Web Applications — Patrick Durusau @ 2:01 pm

Flapjax

From the website:

Flapjax is a new programming language designed around the demands of modern, client-based Web applications. Its principal features include:

  • Event-driven, reactive evaluation
  • An event-stream abstraction for communicating with web services
  • Interfaces to external web services

Flapjax is easy to learn: it is just a JavaScript framework. Furthermore, because Flapjax is built entirely atop JavaScript, it runs on traditional Web browsers without the need for plug-ins or other downloads. It integrates seamlessly with existing JavaScript code and other frameworks.

Don’t know if anyone will find this useful but some of the demos looked interesting.

Thought it would be worth mentioning for anyone looking to build client-based topic map applications.

Think Outside the (Comment) Box

Filed under: Database,Semantics,Software — Patrick Durusau @ 8:36 am

Think Outside the (Comment) Box: Social Applications for Publishers

From the announcement:

Learn about the next generation of social applications and how publishers are leveraging them for editorial and financial benefit.

I will spare you the rest of the breathless language.

Still, I will be there and suggest you be there as well.

Norm Walsh, who needs no introduction in markup circles, works at MarkLogic.

That gives me confidence this may be worth hearing.

Details:

February 9, 2011 – 8:00 am pacific, 11:00 am eastern – 4:00 pm GMT

*****
PS: For anyone who has been under a rock for the last several years, MarkLogic makes an excellent XML database solution.

See for example, MarkMail, a collection of technical mailing lists from around the web.

Searching it also illustrates how much semantic improvement can be made to searching.

January 26, 2011

A Quick WebApp with Scala, MongoDB, Scalatra and Casbah – Practice for TMs

Filed under: MongoDB,Scala,Software,Topic Maps — Patrick Durusau @ 8:41 am

A Quick WebApp with Scala, MongoDB, Scalatra and Casbah

However clever, topic maps aren’t of much interest unless they are delivered to users.

In the general case that means a web based application.

This post is a short introduction to several tools you may find handy with building and/or delivering topic maps.

*****
PS: We will know topic maps have arrived when the technology keeps changing but management of subject identity is inherent in both programming languages and application design. Ways to go yet.

January 25, 2011

COIN-OR

Filed under: Graphs,Software — Patrick Durusau @ 10:35 am

COmputational INfrastructure for Operations Research (COIN-OR)

From the website:

The Computational Infrastructure for Operations Research (COIN-OR**, or simply COIN) project is an initiative to spur the development of open-source software for the operations research community.

Check the related resource page for a number of graph and other software packages.

January 21, 2011

GraphDB-Bench

Filed under: Graphs,Software — Patrick Durusau @ 4:11 pm

GraphDB-Bench

From the website:

GraphDB-Bench is an extensible graph database benchmarking tool. Its goal is to provide an easy-to-use library for defining and running application/domain-specific benchmarks against different graph database implementations. To achieve this the core code-base has been kept relatively simple, through extensive use of lower layers in the TinkerPop stack.

Should be useful for experimenting with different graph database implementations as well as generating test graphs with different typologies.

GRAPHITE: A Visual Query System for Large Graphs

Filed under: Graphs,Software,Visual Query Language,Visualization — Patrick Durusau @ 6:45 am

GRAPHITE: A Visual Query System for Large Graphs

Watch the video, then imagine not having to convert from one data model to another but being able to treat aspects of data models as subjects.

Such that every users queries the graph using their data model but can retrieve information entered by others, using different data models.

For a more formal treatment, see: GRAPHITE: A Visual Query System for Large Graphs.

January 19, 2011

Scrapy

Filed under: Data Mining,Searching,Software — Patrick Durusau @ 1:34 pm

Scrapy

From the website:

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Another tool to assist with data gathering for topic map authoring.

January 13, 2011

January 9, 2011

AutoMap – Extracting Topic Maps from Texts?

Filed under: Authoring Topic Maps,Entity Extraction,Networks,Semantics,Software — Patrick Durusau @ 10:59 am

AutoMap: Extract, Analyze and Represent Relational Data from Texts (according to its webpage).

From the webpage:

AutoMap is a text mining tool that enables the extraction of network data from texts. AutoMap can extract content analytic data (words and frequencies), semantic networks, and meta-networks from unstructured texts developed by CASOS at Carnegie Mellon. Pre-processors for handling pdf’s and other text formats exist. Post-processors for linking to gazateers and belief inference also exist. The main functions of AutoMap are to extract, analyze, and compare texts in terms of concepts, themes, sentiment, semantic networks and the meta-networks extracted from the texts. AutoMap exports data in DyNetML and can be used interoperably with *ORA.

AutoMap uses parts of speech tagging and proximity analysis to do computer-assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words.

AutoMap subsumes classical Content Analysis by analyzing the existence, frequencies, and covariance of terms and themes.

For a rough cut at a topic map from a text, AutoMap looks like a useful tool.

In addition to the software, training material and other information is available.

My primary interest is the application of such a tool to legislative debates, legislation and court decisions.

None of those occur in a vacuum and topic maps could help provide a context for understand such material.

ORA – Topic Maps as Networks?

Filed under: Networks,Software — Patrick Durusau @ 10:28 am

ORA (Organization Risk Analyzer) is a toolkit developed for the analysis of organizational networks that could prove to be very useful for topic maps when viewed as networks.

From the website:

*ORA is a dynamic meta-network assessment and analysis tool developed by CASOS at Carnegie Mellon. It contains hundreds of social network, dynamic network metrics, trail metrics, procedures for grouping nodes, identifying local patterns, comparing and contrasting networks, groups, and individuals from a dynamic meta-network perspective. *ORA has been used to examine how networks change through space and time, contains procedures for moving back and forth between trail data (e.g. who was where when) and network data (who is connected to whom, who is connected to where …), and has a variety of geo-spatial network metrics, and change detection techniques. *ORA can handle multi-mode, multi-plex, multi-level networks. It can identify key players, groups and vulnerabilities, model network changes over time, and perform COA analysis. It has been tested with large networks (106 nodes per 5 entity classes).Distance based, algorithmic, and statistical procedures for comparing and contrasting networks are part of this toolkit.

Comments on which parts of this toolkit you find the most useful welcome.

January 8, 2011

NetworkX

Filed under: Graphs,Maps,Networks,Software — Patrick Durusau @ 11:21 am

NetworkX

From the website:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
….
Features:

  • Standard graph-theoretic and statistical physics functions
  • Easy exchange of network algorithms between applications,
    disciplines, and platforms
  • Many classic graphs and synthetic networks
  • Nodes and edges can be "anything"
    (e.g. time-series, text, images, XML records)
  • Exploits existing code from high-quality legacy software in C,
    C++, Fortran, etc.
  • Open source (encourages community input)
  • Unit-tested

NetworkX is a nice way to display topic maps as graphs.

Its importance for topic maps lies in the ability to study properties of nodes (representatives of subjects, including relationships) and composition of nodes (merging in topic map speak).

January 7, 2011

Apache OODT – Top Level Project

Filed under: Data Integration,Data Mining,Data Models,OODT,Software — Patrick Durusau @ 6:02 am

Apache OODT is the first ASF Top Level Project status for NASA developed software.

From the website:

Just what is Apache™ OODT?

It’s metadata for middleware (and vice versa):

  • Transparent access to distributed resources
  • Data discovery and query optimization
  • Distributed processing and virtual archives

But it’s not just for science! It’s also a software architecture:

  • Models for information representation
  • Solutions to knowledge capture problems
  • Unification of technology, data, and metadata

Looks like a project that could benefit from having topic maps as part of its tool kit.

Check out the 0.1 OODT release and see what you think.

January 6, 2011

Moving Forward – Library Project Blog

Filed under: Interface Research/Design,Library,Library software,Software,Solr — Patrick Durusau @ 8:30 am

Moving Forward is a blog I discovered via alls things cataloged.

From the Forward blog:

Forward is a Resource Discovery experiment that builds a unified search interface for library data.

Today Forward is 100% of the UW System Library catalogs and two UW digital collections. The project also experiments with additional search contextualization by using web service APIs.

Forward can be accessed at the URL:
http://forward.library.wisconsin.edu/.

Sounds like a great opportunity for topic map fans with an interest in library interfaces to make a contribution.

January 5, 2011

Bribing Statistics

Filed under: Data Source,Marketing,Software — Patrick Durusau @ 1:03 pm

Bribing Statistics by Aleks Jakulin.

Self reporting (I paid a bribe is the name of the application) of bribery in the United States is uncommon, at least characterized as a bribe.

There are campaign finance reports and analysis that link organizations/causes to particular candidates. Not surprisingly, candidates vote in line with their major sources of funding.

The reason I mention it here is to suggest that topic maps could be used to provide a more granular mapping between contributions, office holders (or agency staff) and beneficiaries of legislation or contracts.

None of those things exist in isolation or without identity.

While one researcher might only be interested in DARPA contracts (to use a U.S. based example), the contract officers and the beneficiaries of those contracts, another researcher may be collecting data on campaign contributions that may include some of the beneficiaries of the DARPA contracts.

Topic maps are a great way to accumulate that sort of research over time.

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Filed under: Computation,Parallelism,Software — Patrick Durusau @ 12:36 pm

Is Parallel Programming Hard, And, If So, What Can You Do About It? Editor Paul E. McKenney

Kirk Lowery forwarded this link to my attention.

Just skimming the first couple of chapters, I have to say it has some of the most amusing graphics I have seen in any CS book.

Performance, productivity and generality are all goals of parallel programming, as cited by this book.

I have to wonder though if subject recognition tasks, analogous to computer vision, that are inherently parallel.

Doing them in parallel does not make them easier but not doing them in parallel certainly makes them harder.

For example, consider the last time you failed to recognize someone who wasn’t in the location or context where you normally see them.

Do you recognize the context in addition or in parallel to your recognition of the person’s face?

Questions:

  1. What benefits/drawbacks do you see in parallel processing of TMDM instances? (3-5 pages, citations)
  2. How would you design subject identifications for processing in a parallel environment? (3-5 pages, citations)
  3. How would you evaluate the need for parallel processing of subject identifications? (3-5 pages, citations)

January 3, 2011

Zotero – Software

Filed under: Bibliography,Marketing,Software — Patrick Durusau @ 2:55 pm

Zotero

I don’t remember now how I stumbled across interesting project.

Looks like fertile ground for the discussion of subject identity.

Particularly since shared bibliographies are nice but merged bibliographies would be better.

Drop in, introduce yourself and topic map thinking about subject identity.

December 30, 2010

How to Design Programs

Filed under: Software,Topic Map Software,Topic Map Systems — Patrick Durusau @ 4:03 pm

How to Design Programs: An Introduction to Computing and Programming Authors: Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi (2003 version)

Update: see How to Design Programs, Second Edition.

Website includes the complete text.

The Amazon product description reads:

This introduction to programming places computer science in the core of a liberal arts education. Unlike other introductory books, it focuses on the program design process. This approach fosters a variety of skills–critical reading, analytical thinking, creative synthesis, and attention to detail–that are important for everyone, not just future computer programmers. The book exposes readers to two fundamentally new ideas. First, it presents program design guidelines that show the reader how to analyze a problem statement; how to formulate concise goals; how to make up examples; how to develop an outline of the solution, based on the analysis; how to finish the program; and how to test. Each step produces a well-defined intermediate product. Second, the book comes with a novel programming environment, the first one explicitly designed for beginners. The environment grows with the readers as they master the material in the book until it supports a full-fledged language for the whole spectrum of programming tasks. All the book’s support materials are available for free on the Web. The Web site includes the environment, teacher guides, exercises for all levels, solutions, and additional projects.

If we are going to get around to solving the hard subject identity problems in addition to those that are computationally convenient, there will need to be more collaboration across the liberal arts.

The Amazon page, How to Design Programs is in error. I checked the ISBN numbers at: http://www.bookhttp://www.books-by-isbn.com/s-by-isbn.com/ The ISBN-13 works but the French, German and UK details point back to the 2001 printing. Bottom line: There is no 2008 edition of this work.

If you are interested, Matthias Felleisen, along with Robert Bruce Findler and Matthew Flatt, has authored Semantics Engineering with PLT Redex in 2009. Sounds interesting but the only review I saw was on Amazon.

December 27, 2010

Data Management Slam Dunk – SPAM Warning

Filed under: Data Integration,Knowledge Management,Software — Patrick Durusau @ 2:19 pm

The Data Management Slam Dunk: A Unified Integration Platform is a spam message that landed in my inbox today.

I have heard good things about Talend software but gibberish like:

There will never be a silver bullet for marshalling the increasing volumes of data, but at least there is one irrefutable truth: a unified data management platform can solve most of the problems that information managers encounter. In fact, by creating a centralized repository for data definitions, lineage, transformations and movements, companies can avoid many troubles before they occur.

makes me wonder if any of it is true?

Did you notice that the irrefutable fact is a sort of magic incantation?

If everything is dumped in one place, troubles just melt away.

It isn’t that simple.

The “presentation” never gives a clue as to how anyone would achieve these benefits in practice. It just keeps repeating the benefits and oh, that Talend is the way to get them.

Not quite as annoying as one of those belly-buster infomercials but almost.

I have been planning on reviewing the Talend software from a topic map perspective.

Suggestions of issues, concerns or particularly elegant parts that I should be aware of are most welcome.

December 18, 2010

KNIME Version 2.3.0 released – News

Filed under: Heterogeneous Data,Mapping,Software,Subject Identity — Patrick Durusau @ 12:48 pm

KNIME Version 2.3.0 released

From the announcement:

The new version is a greatly enhancing the usability of KNIME. It adds new features like workflow annotations, support for hotkeys, inclusion of R-views in reports, data flow switches, option to hide node labels, variable support in the database reader/connector and R-nodes, and the ability to export KNIME workflows as SVG Graphics.

With the 2.3 release we are also introducing a community node repository, which includes KNIME extensions for bio- and chemoinformatics and an advanced R-scripting environment.

December 17, 2010

Google Books Ngram Viewer

Filed under: Dataset,Software — Patrick Durusau @ 4:33 pm

Google Books Ngram Viewer

From the website:

Scholars interested in topics such as philosophy, religion, politics, art and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it’s increasingly possible to apply quantitative methods to complement that research. So today Will Brockman and I are happy to announce a new visualization tool called the Google Books Ngram Viewer, available on Google Labs. We’re also making the datasets backing the Ngram Viewer, produced by Matthew Gray and intern Yuan K. Shen, freely downloadable so that scholars will be able to create replicable experiments in the style of traditional scientific discovery.

Since 2004, Google has digitized more than 15 million books worldwide. The datasets we’re making available today to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.

Tracing shifts in language usage will help topic map designers create maps for historical materials that require less correction by users.

One wonders if the extracts can be traced back to particular works?

That would enable a map developed for these extracts to be used with the scanned texts themselves.

December 14, 2010

Invenio – Library Software

Filed under: Library software,OPACS,Software — Patrick Durusau @ 7:51 am

Invenio (new release)

From the website:

Invenio is a free software suite enabling you to run your own digital library or document repository on the web. The technology offered by the software covers all aspects of digital library management from document ingestion through classification, indexing, and curation to dissemination. Invenio complies with standards such as the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic format. The flexibility and performance of Invenio make it a comprehensive solution for management of document repositories of moderate to large sizes (several millions of records).

Invenio has been originally developed at CERN to run the CERN document server, managing over 1,000,000 bibliographic records in high-energy physics since 2002, covering articles, books, journals, photos, videos, and more. Invenio is being co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and is being used by about thirty scientific institutions worldwide (see demo).

One of many open source library projects where topic maps are certainly relevant.

Questions:

Choose one site for review and one for comparison from General/Demo – Invenio

  1. What features of the site you are reviewing could be enhanced by the use of topic maps? Give five (5) specific search results that could be improved and then say how they could be improved. (3-5 pages, include search results)
  2. Are your improvements domain specific? Use the comparison site in answering this question. (3-5 pages, no citations)
  3. How would you go about making the case for altering the current distribution? What is the payoff for the end user? (not the same as enhancement, asking about when end users would find easier/better/faster. Perhaps you should ask end users? How would you do that?) (3-5 pages, no citations)

December 12, 2010

Daylife Developer

Filed under: Data Source,Dataset,Software — Patrick Durusau @ 5:54 pm

Daylife Developer

News aggregation and analysis service.

Offers free developer access to their API, capped at 5,000 calls per day.

From the website:

Have an idea for the next big news application? Build a great app using the Daylife API, then we’ll market it to our clients and give you 70% of the proceeds from any sales. Learn more.

I started to not mention this site so I could keep the 70% to myself but there is more than one great news app using topic maps. 😉

Oh, but that means creating an app.

An app that uses topic maps to deliver substantively different and useful aggregation of news.

Both of those are critical requirements.

The app must be substantively different in delivery of a unique value-add from the use of topic maps. Something the user can’t get somewhere else.

The app must be useful in delivery of value-add found useful by some community. A community willing to pay for that usefulness.

See you at Daylife Developer?

******
PS: Send pointers to similar resources to: patrick@durusau.net.

The more resources become available, including aggregation services, the greater the opportunity for topic maps!

Szl – A Compiler and Runtime for the Sawzall Language

Filed under: Data Mining,Software — Patrick Durusau @ 5:52 pm

Szl – A Compiler and Runtime for the Sawzall Language

From the website:

Szl is a compiler and runtime for the Sawzall language. It includes support for statistical aggregation of values read or computed from the input. Google uses Sawzall to process log data generated by Google’s servers.

Since a Sawzall program processes one record of input at a time and does not preserve any state (values of variables) between records, it is well suited for execution as the map phase of a map-reduce. The library also includes support for the statistical aggregation that would be done in the reduce phase of a map-reduce.

The reading of one record at a time reminds me of the record linkage work that was developed in the late 1950’s in medical epidemiology.

Of course, there the records were converted into a uniform presentation, losing their original equivalents to column headers, etc. So the technique began with semantic loss.

I suppose you could say it was a lossy semantic integration technique.

Of course, that’s true for any semantic integration technique that doesn’t preserve the original language of a data set.

I will have to dig out some record linkage software to compare to Szl.

December 11, 2010

Accidental Complexity

Filed under: Clojure,Data Mining,Software — Patrick Durusau @ 3:22 pm

Nathan Marz in Clojure at Backtype uses the term accidental complexity.

accidental complexity: Complexity caused by the tool to solve a problem rather than the problem itself

According to Nathan, Clojure helps avoid accidental complexity, something that would be useful in any semantic integration system.

The presentation is described as:

Clojure has led to a significant reduction in complexity in BackType’s systems. BackType uses Clojure all over the backend, from processing data on Hadoop to a custom database to realtime workers. In this talk Nathan will give a crash course on Clojure and using it to build data-driven systems.

Very much worth the time to view it, even more than once.

December 10, 2010

Scala in Depth

Filed under: Scala,Software — Patrick Durusau @ 7:18 am

Scala in Depth Authors: Josh Suereth

Abstract:

Scala is a unique and powerful new programming language for the JVM. Blending the strengths of the Functional and Imperative programming models, Scala is a great tool for building highly concurrent applications without sacrificing the benefits of an OO approach. While information about the Scala language is abundant, skilled practitioners, great examples, and insight into the best practices of the community are harder to find. Scala in Depth bridges that gap, preparing you to adopt Scala successfully for real world projects. Scala in Depth is a unique new book designed to help you integrate Scala effectively into your development process. By presenting the emerging best practices and designs from the Scala community, it guides you though dozens of powerful techniques example by example. There’s no heavy-handed theory here-just lots of crisp, practical guides for coding in Scala.

For example:

  • Discover the “sweet spots” where object-oriented and functional programming intersect.
  • Master advanced OO features of Scala, including type member inheritance, multiple inheritance and composition.
  • Employ functional programming concepts like tail recursion, immutability, and monadic operations.
  • Learn good Scala style to keep your code concise, expressive and readable.

As you dig into the book, you’ll start to appreciate what makes Scala really shine. For instance, the Scala type system is very, very powerful; this book provides use case approaches to manipulating the type system and covers how to use type constraints to enforce design constraints. Java developers love Scala’s deep integration with Java and the JVM Ecosystem, and this book shows you how to leverage it effectively and work around the rough spots.

There is little doubt that concurrent programming is a dawning reality. Which languages will be the best for concurrent programming in general (if there is such a case) or for topic maps is particular isn’t as clear.

Only time and usage can answer those questions.

December 8, 2010

Bayesian Model Selection and Statistical Modeling – Review

Filed under: Authoring Topic Maps,Bayesian Models,Software — Patrick Durusau @ 9:47 am

Bayesian Model Selection and Statistical Modeling by Tomohiro Ando, reviewed by Christian P. Robert.

If you are planning on using Bayesian models in your topic maps activities, read this review first.

You will thank the reviewer later.

Webinar: Revolution R is 100% R and More
9 AM Pacific 8 December 2010 (today)

Filed under: Authoring Topic Maps,R,Software — Patrick Durusau @ 7:59 am

Webinar: Revolution R is 100% R and More

Apologies for the short notice but this webinar may be of interest to those using R to mine data sets as part of topic map construction.

It was in my morning sweep of resources and was just posted yesterday.

I have a scheduling conflict but the webinar is said to be available for asynchronous viewing.

December 6, 2010

GT.M High end TP database engine

Filed under: Data Structures,GT.M,node-js,NoSQL,Software — Patrick Durusau @ 4:55 am

GT.M High end TP database engine (Sourceforge)

Description from the commercial version:

The GT.M data model is a hierarchical associative memory (i.e., multi-dimensional array) that imposes no restrictions on the data types of the indexes and the content – the application logic can impose any schema, dictionary or data organization suited to its problem domain.* GT.M’s compiler for the standard M (also known as MUMPS) scripting language implements full support for ACID (Atomic, Consistent, Isolated, Durable) transactions, using optimistic concurrency control and software transactional memory (STM) that resolves the common mismatch between databases and programming languages. Its unique ability to create and deploy logical multi-site configurations of applications provides unrivaled continuity of business in the face of not just unplanned events, but also planned events, including planned events that include changes to application logic and schema.

There are clients for node-js:

http://github.com/robtweed/node-mdbm
http://github.com/robtweed/node-mwire

Local topic map software is interesting and useful but for scaling to the enterprise level, something different is going to be required.

Reports of implementing the TMDM or other topic map legends with a GT.M based system are welcome.

December 5, 2010

Amazon Web Services

Filed under: Software,Topic Map Software — Patrick Durusau @ 8:27 pm

Amazon Web Services

The recent Wikileaks story drew my attention to the web services offered by Amazon. Knew they were there but had not really paid as much attention as I should.

I don’t know the details but be aware that there is a one year free service tier to introduce new users to the cloud.

Curious if anyone is already offering topic map services with someone like Amazon Web Services?

Subject identity management as a service seems like a likely commodity in the cloud.

Data sets may expose different identity APIs as it were depending upon the degree of access required.

« Newer PostsOlder Posts »

Powered by WordPress