Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 6, 2010

The RelFinder user interface: interactive exploration of relationships between objects of interest

Filed under: Associations,Interface Research/Design,RDF,Semantic Web,Software — Patrick Durusau @ 7:00 am

The RelFinder user interface: interactive exploration of relationships between objects of interest Authors: Steffen Lohmann, Philipp Heim, Timo Stegemann, Jürgen Ziegler Keywords: dbpedia, decision support, graph visualization, linked data, relationship discovery, relationship web, semantic user interfaces, semantic web, sparql, visual exploration

Abstract:

Being aware of the relationships that exist between objects of interest is crucial in many situations. The RelFinder user interface helps to get an overview: Even large amounts of relationships can be visualized, filtered, and analyzed by the user. Common concepts of knowledge representation are exploited in order to support interactive exploration both on the level of global filters and single relationships. The RelFinder is easy-to-use and works on every RDF knowledge base that provides standardized SPARQL access

Software: RelFinder

RelFinder presents a way to leverage data already in RDF for the creation of associations in topic maps.

Or to explore data already available in RDF.

Exploration of relationships is important for “data” but even more important for the syntaxes that contain data.

Such as equivalence between subjects represented by syntax tokens.

September 28, 2010

International Workshop on Similarity Search and Applications (SISAP)

Filed under: Indexing,Information Retrieval,Search Engines,Searching,Software — Patrick Durusau @ 4:47 pm

International Workshop on Similarity Search and Applications (SISAP)

Website:

The International Workshop on Similarity Search and Applications (SISAP) is a conference devoted to similarity searching, with emphasis on metric space searching. It aims to fill in the gap left by the various scientific venues devoted to similarity searching in spaces with coordinates, by providing a common forum for theoreticians and practitioners around the problem of similarity searching in general spaces (metric and non-metric) or using distance-based (as opposed to coordinate-based) techniques in general.

SISAP aims to become an ideal forum to exchange real-world, challenging and exciting examples of applications, new indexing techniques, common testbeds and benchmarks, source code, and up-to-date literature through a Web page serving the similarity searching community. Authors are expected to use the testbeds and code from the SISAP Web site for comparing new applications, databases, indexes and algorithms.

Proceedings from prior years, source code, sample data, a real gem of a site.

Mining Billion-node Graphs: Patterns, Generators and Tools

Filed under: Authoring Topic Maps,Data Mining,Graphs,Software,Subject Identity — Patrick Durusau @ 9:38 am

Mining Billion-node Graphs: Patterns, Generators and Tools Author: Christos Faloutsos (CMU)

Presentation on the Pegasus (PETRA GrAph mining System) project.

If you have large amounts of real world data and need some motivation, take a look at this presentation.

September 25, 2010

SciDB – Numeric Array Database (NAD)

Filed under: Arrays,SciDB,Software — Patrick Durusau @ 8:25 pm

SciDB announced its first source-code release Open Letter to the SciDB Community on 24 September 2010.

In Overview of SciDB, Large Scale Array Storage, Processing and Analysis, the SciDB team says scientific data differs from business data because:

  1. scientific analysis typically requires mathematically and algorithmically sophisticated data processing methods
  2. data generated by modern scientific instruments is extremely large

I don’t find those convincing.

The article also claims: “…scientific data has a necessary and implicit ordering; for each element or data value there are other values left, right, up, down, next, previous, or adjacent to it.”

The content of such arrays is always numeric data and you can talk about numeric array databases.

I find the overall approach refreshing because it isn’t aiming for a general solution to all data issues.

Instead, a solution for numeric data in an array.

Now if we can just get past the search for a general semantic solution.

September 23, 2010

HUGO Gene Nomenclature Committee

Filed under: Bioinformatics,Biomedical,Data Mining,Entity Extraction,Indexing,Software — Patrick Durusau @ 8:32 am

HUGO Gene Nomenclature Committee, a committee assigning unique names to genes.

Become familiar with the HUGO site, then read: The success (or not) of HUGO nomenclature (Genome Biology, 2006).

Now read: Moara: a Java library for extracting and normalizing gene and protein mentions (BMC Bioinformatics 2010)

Q: How you would apply the techniques in the Moara article to build a topic map? Would you keep/discard normalization?

PS: Moara Project (software, etc.)

KP-Lab Knowledge Practices Lab

Filed under: Interface Research/Design,RDF,Semantic Web,Software — Patrick Durusau @ 7:06 am

KP-Lab Knowledge Practices Lab.

KP-Lab project design and implement a modular, flexible, and extensible ICT system that supports pedagogical methods to foster knowledge creation in educational and workplace settings. The system provides tools for collaborative work around shared objects, and for knowledge practices in the various settings addressed by the project.

Offer the following tools:

  • Knowledge Practices Environment (KPE)
  • The Visual Modeling (Language) Editor
  • Activity System Design Tools (ASDT)
  • Semantic Multimedia Annotation tool (SMAT)
  • Map-It and M2T (meeting practices)
  • The CASS-Query tool
  • The CASS-Memo tool
  • Awareness Services
  • RDF Suite
  • KMS-Persistence API
  • Text Mining Services

Pick any one of these tools and name five (5) things you like about it and five (5) things you dislike about it. How would you change the things you dislike? (General prose description is sufficient.)

September 22, 2010

S-Match

Filed under: Semantics,Software — Patrick Durusau @ 7:33 pm

S-Match. Semantic “matching” software.

Announcement:

S-Match takes any two tree like structures (such as database schemas, classifications, lightweight ontologies) and returns a set of correspondences between those tree nodes which semantically correspond to one another.

It’s late so I have only installed it on an Ubuntu system and run the demo files. Not impressed so far. Run the demo and you will see what I mean.

I will try again over the weekend, in the mean time, if you try it, comments are welcome.

September 18, 2010

TREC 2010/2011

Filed under: Conferences,Heterogeneous Data,Information Retrieval,Searching,Software — Patrick Durusau @ 7:34 am

It’s too late to become a participant in TREC 2010 but everyone interested in building topic maps should be aware of this conference.

The seven tracks for this year are blog, chemical IR, entity, legal, relevance feedback, “session,” and web.

Prior TREC conferences are online, along with a host of other materials, at the Text REtrieval Conference (TREC) site.

The 2011 cycle isn’t that far away so consider being a participant next year.

SimMetrics

Filed under: Binary Distance,Binary Similarity,Software — Patrick Durusau @ 7:29 am

SimMetrics. An extensible Java library of thirty (30) distance or similarity measures.

September 8, 2010

CouchDB: Sell it to Your Boss – Post

Filed under: Graphs,NoSQL,Software — Patrick Durusau @ 8:49 am


CouchDB: Sell it to Your Boss
from Alex Popescu.

CouchDB is one of the many options in the NoSQL world. As a distributed document repository, it is of interest to users of topic maps with document stores. It is written in Erlang, a language for distributed applications, including topic maps.

August 23, 2010

KNIME – Professional Open-Source Software

Filed under: Heterogeneous Data,Mapping,Software,Subject Identity — Patrick Durusau @ 7:27 pm

KNIME – Professional Open-Source Software is another effort by domain bridging folks I mentioned yesterday.

From the homepage:

KNIME (Konstanz Information Miner) is a user-friendly and comprehensive Open-Source data integration, processing, analysis, and exploration platform. From day one, KNIME has been developed using rigorous software engineering practices and is currently being used actively by over 6.000 professionals all over the world, both in industry and academia.

Read the KNIME features page for a very long list of potentially useful subject identity tests.

There is a place for string matching IRIs, but there is a world of subject identity beyond that as well.

August 8, 2010

Gephi – The Open Graph Viz Platform

Filed under: Gephi,Graphs,Information Retrieval,Interface Research/Design,Maps,Software — Patrick Durusau @ 3:51 pm

Gephi is an “interactive visualization and exploration platform” for graphs.

From the site:

  • Exploratory Data Analysis: intuition-oriented analysis by networks manipulations in real time.
  • Link Analysis: revealing the underlying structures of associations between objects, in particular in scale-free networks.
  • Social Network Analysis: easy creation of social data connectors to map community organizations and small-world networks.
  • Biological Network analysis: representing patterns of biological data.
  • Poster creation: scientific work promotion with hi-quality printable maps.

I find the notion of interaction with a graph, or in our case a topic map represented as a graph quite fascinating.

Imagine selecting or even adding properties as the basis for merging and then examining those results in an interactive rather than batch process.

Can “drag-n-drop” topic map authoring be that far away?

August 3, 2010

Freebase?

Filed under: Semantic Web,Software — Patrick Durusau @ 6:53 pm

Freebase looks “lite” enough to actually be useful.

The problem of matching up the Freebase “unique” identifiers with the identifiers actually used by people in real communications remains.

Another problem is how to enable people work in a highly distributed fashion in terms of authoring matches between identifiers.

Are you using Freebase?

Comments?

July 30, 2010

Neo4j 1.1 Released!

Filed under: NoSQL,Software — Patrick Durusau @ 1:39 pm

Neo4j 1.1 has arrived!

From Peter Neubauer’s blog entry:

The Neo4j graph database release 1.1 has just arrived, so here’s some information on the new things that have been included. The main points are the additions of monitoring support, an event framework and a new traversal framework to the kernel. Then two useful components have been added to the default distribution (called “Apoc”): graph algorithms and online backup.

Peter’s post has pointers to other Neo4j resources.

July 28, 2010

Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997)

Filed under: Data Integration,Database,Semantic Diversity,Software — Patrick Durusau @ 7:54 pm

Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997) by Mary Tork Roth isn’t the latest word on wrappers but is well written. (longer version, A Wrapper Architecture for Legacy Data Sources (1997) )

The wrapper idea is a good one, although Roth uses it in the context of a unified schema, which is then queried. With a topic map, you could query on the basis of any of the underlying schemas and get the data from all the underlying data sources.

That result is possible because a topic map has one representative for a subject and can have any number of sources for information about that single subject.

I haven’t done a user survey but suspect most users would prefer to search for/access data using familiar schemas rather than new “unified” schemas.

July 22, 2010

Introduction to Cassandra – Post

Filed under: NoSQL,Software — Patrick Durusau @ 3:26 pm

Introduction to Cassandra showed up on myNoSQL today with a nice set of further reading links on Cassandra.

Would a listing of resources on graph query languages be helpful to anyone preparing to discuss TMQL in Leipzig?

July 13, 2010

The FLAMINGO Project on Data Cleaning – Site

The FLAMINGO Project on Data Cleaning is the other project that has influenced the self-similarity work with MapReduce.

From the project description:

Supporting fuzzy queries is becoming increasingly more important in applications that need to deal with a variety of data inconsistencies in structures, representations, or semantics. Many existing algorithms require an offline analysis of data sets to construct an efficient index structure to support online query processing. Fuzzy join queries of data sets are more time consuming due to the computational complexity. The PI is studying three research problems: (1) constructing high-quality inverted lists for fuzzy search queries using Hadoop; (2) supporting fuzzy joins of large data sets using Hadoop; and (3) using the developed techniques to improve data quality of large collections of documents.

See the project webpage to learn more about their work on “us[ing] limited programming primitives in the cloud to implement index structures and search algorithms.”

The relationship between “dirty” data and the increase in data overall is at least linear, but probably worse. Far worse. Whether data is “dirty” depends on your perspective. The more data that appears on “***” format (fill in the one you like the least) the dirtier the universe of data has become. “Dirty” data will be with you always.

ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis – SITE

ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis is one of the projects behind the self-similarity and MapReduce posting.

From the project page:

The ASTERIX project is developing new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing to vast quantities of semi-structured information. The project is combining ideas from three distinct areas – semi-structured data, parallel databases, and data-intensive computing – to create a next-generation, open source software platform that scales by running on large, shared-nothing computing clusters.

Home of Hydrax Hyrax: Demonstrating a New Foundation for Data-Parallel Computation, “out-of-the-box support for common distributed communication patterns and set-oriented data operators.” (Need I say more?)

July 11, 2010

Efficient Parallel Set-Similarity Joins Using MapReduce

Efficient Parallel Set-Similarity Joins Using MapReduce by Rares Vernica, Michael J. Carey, and, Chen Li, Department of Computer Science, University of California, Irvine, used Citeseer (1.3M publications) and DBLP (1.2M publications) and “…increased their sizes as needed.”

The contributions of this paper are:

  • “We describe efficient ways to partition a large dataset across nodes in order to balance the workload and minimize the need for replication. Compared to the equi-join case, the set-similarity joins case requires “partitioning” the data based on set contents.
  • We describe efficient solutions that exploit the MapReduce framework. We show how to efficiently deal with problems such as partitioning, replication, and multiple
    inputs by manipulating the keys used to route the data in the framework.
  • We present methods for controlling the amount of data kept in memory during a join by exploiting the properties of the data that needs to be joined.
  • We provide algorithms for answering set-similarity self-join queries end-to-end, where we start from records containing more than just the join attribute and end with actual pairs of joined records.
  • We show how our set-similarity self-join algorithms can be extended to answer set-similarity R-S join queries.
  • We present strategies for exceptional situations where, even if we use the finest-granularity partitioning method, the data that needs to be held in the main memory of one node is too large to fit.”

A number of lessons and insights relevant to topic maps in this paper.

Makes me think of domain specific (as well as possibly one or more “general”) set-similarity join interchange languages! What are you thinking of?

NTCIR (NII Test Collection for IR Systems) Project

Filed under: Conferences,Heterogeneous Data,Information Retrieval,Search Engines,Software — Patrick Durusau @ 7:47 am

NTCIR (NII Test Collection for IR Systems) Project focuses on information retrieval tasks in Japanese, Chinese, Korean, English and cross-lingual information retrieval.

From the project description:

For the laboratory-typed testing, we have placed emphasis on (1) information retrieval (IR) with Japanese or other Asian languages and (2) cross-lingual information retrieval. For the challenging issues, (3) shift from document retrieval to “information” retrieval and technologies to utilizing information in the documents, and (4) investigation for realistic evaluation, including evaluation methods for summarization, multigrade relevance judgments and single-numbered averageable measures for such judgments, evaluation methods suitable for retrieval and processing of particular document-genre and its usage of the user group of the genre and so on.

I know there are active topic map communities in both Japan and Korea. Perhaps this is a place to meet researchers working on issues closely similar to those in topic maps and to discuss the contribution that topic maps have to offer.

Forum for Information Retrieval Evaluation (FIRE)

Filed under: Conferences,Heterogeneous Data,Information Retrieval,Search Engines,Software — Patrick Durusau @ 6:44 am

Forum for Information Retrieval Evaluation (FIRE)  aims:

  • to encourage research in South Asian language Information Access technologies by providing reusable large-scale test collections for ILIR experiments
  • to explore new Information Retrieval / Access tasks that arise as our information needs evolve, and new needs emerge
  • to provide a common evaluation infrastructure for comparing the performance of different IR systems
  • to investigate evaluation methods for Information Access techniques and methods for constructing a reusable large-scale data set for ILIR experiments.

I know there is a lot of topic map development in South Asia and this looks like a great place to meet current researchers and to interest others in topic maps.

INEX: Initiative for Evaluation of XML Retrieval

Filed under: Conferences,Heterogeneous Data,Information Retrieval,Search Engines,Software — Patrick Durusau @ 6:30 am

INEX: Initiative for Evaluation of XML Retrieval is another must-see for serious topic map researchers.

No surprise that my first stop was the iNEX Publications page with proceedings from 2002-date.

However, INEX offers an opportunity for evaluation of topic maps in the context of other solutions, providing that one or more of us participate in the initiative.

If you or your institution decided to participate, please let others in the community know. I for one would like to join such an effort.

July 10, 2010

Knowledge-Based Systems – Journal

Filed under: Information Retrieval,Software — Patrick Durusau @ 7:51 am

Knowledge-Based Systems is described on its homepage:

Knowledge-Based Systems is the international, interdisciplinary and applications-oriented journal on KBS.

Knowledge-Based Systems focuses on systems that use knowledge-based techniques to support human decision-making, learning and action. Such systems are capable of cooperating with human users and so the quality of support given and the manner of its presentation are important issues. The emphasis of the journal is on the practical significance of such systems in modern computer development and usage.

As well as being concerned with the implementation of knowledge-based systems, the journal covers the design process, the matching of requirements and needs to delivered systems and the organisational implications of introducing such technology into the workplace and public life, expert systems, application of knowledge-based methods, integration with conventional technologies, software tools for KBS construction, decision-support mechanisms, user interactions, organisational issues, knowledge acquisition, knowledge representation, languages and programming environments, knowledge-based implementation techniques and system architectures. Also included are publication reviews.

Forthcoming articles include:

  • Grammar-Based Geodesics in Semantic Networks
  • Hy-SN: Hyper-graph based Semantic Network
  • A Semantic Backend for Content Management Systems
  • Research on the Model of Rough Set over Dual-universes

Definitely should be on every topic map researcher’s current awareness list.

July 5, 2010

Data-Intensive Text Processing with MapReduce – Book

Filed under: Authoring Topic Maps,MapReduce,Software — Patrick Durusau @ 5:30 am

Data-Intensive Text Processing with MapReduce will help answer the question: What subjects are available in a given torrent of information?

Or, perhaps the more interesting question, What subjects did you find in a given torrent of information?

Not exactly the same question is it?

The first presumes that we are going to find the same subjects and the second does not.

Download the Final Manuscript Support the authors by buying a copy as well: publisher’s site.

Authored by Jimmy Lin and Chris Dyer.

Very interested in hearing from anyone using MapReduce to mine texts for use in topic map construction.

*****
Updated to insert the authors. Opps! 20 April 2011

June 4, 2010

Tinkerpop

Filed under: Graphs,NoSQL,Semantic Web,Software — Patrick Durusau @ 3:58 pm

Tinkerpop is worth a visit, whether you are into graph software (its focus) or not.

Home for:

Pipes: A Data Flow Framework Using Process Graphs

reXster: A Graph Based Ranking Engine

Blueprints (…collection of interfaces and implementations to common, complex data structures.)

Project Gargamel: Distributed Graph Computing

Gremlin: A Graph Based Programming Language

Twitlogic: Real Time #SemanticWeb in <= 140 Chars

Ripple: Semantic Web Scripting Language

LoPSideD: Implementing The Linked Process Protocol

Hadoop-HBase-Lucene-Mahout-Nutch-Solr Digests

Filed under: Indexing,MapReduce,Search Engines,Software — Patrick Durusau @ 5:40 am

More interests than time?

Digests of developments in May 2010:

Hadoop

HBase

Lucene

Mahout

Nutch

Solr

Suggestions of other digest type sources and/or comments on such sources deeply appreciated.

I do not think it means what you think it means

Filed under: Ontology,OWL,RDF,Semantic Web,Software — Patrick Durusau @ 4:30 am

I do not think it means what you think it means by Taylor Cowan is a deeply amusing take on Pellet, an OWL 2 Reasoner for Java.

I particularly liked the line:

I believe the semantic web community is falling into the same trap that the AI community fell into, which is to grossly underestimate the meaning of “reason”. As Inigo Montoya says in the Princess Bride, “You keep using that word. I do not think it means what you think it means.”

(For an extra 5 points, what is the word?)

Taylor’s point that Pellet will underscore unstated assumptions in an ontology and make sure that your ontology is consistent is a good one. If you are writing an ontology to support inferences that is a good thing.

Topic maps can support “consistent” ontologies but I find encouragement in their support for how people actually view the world as well. That some people “logically” infer from Boeing 767 -> “means of transportation” should not prevent me from capturing that some people “logically” infer -> “air-to-ground weapon.”

A formal reasoning system could be extended to include that case, but can that be done as soon as an analyst has that insight or must it be carefully crafted and tested to fit into a reasoning system when “the lights are blinking red?”

June 2, 2010

Restful Interface to Topic Maps

Filed under: Software,Topic Map Software,Topic Maps — Patrick Durusau @ 7:20 pm

ULISSE (USOCs KnowLedge Integration and dissemination for Space Science and Exploration) is a research project to…

describe space experiments and their results using Topic Maps. This allows us to create a knowledge base with innovative navigation, filtering and querying capabilities for the project. We chose Ontopia as the topic maps engine to power this knowledge base.

While designing the overall ULISSE system, we identified the need for a RESTful web interface to Ontopia. Currently, we have been designing this interface internally and plan to start implementing it during the summer at which point we will also make it available as open source under the same license as Ontopia and hopefully/ideally as a part of Ontopia.

We approached the design of this REST interface as a generic interface for accessing a topic maps engine and it is not Ontopia- or ULISSE-specific. It could conceivably be implemented over any Topic Maps engine. (David Damen, 2 June 2010, Time to put Topic Maps to REST?)

Further details are available as a Google doc, http://bit.ly/9NEP2x.

You might also want to consider subscribing to TopicMapMail to follow this and other topic map related discussions.

******
Update: 3 June 2010

I was reminded of Robert Barta’s 2005 presentation at Extreme Markup, TMIP, A RESTful Topic Maps Interaction Protocol, Extreme Conference archive copy. Includes performance analysis.

And Robert points to the specification TMIP, Topic Map Interaction Protocol 0.3, Specification.

May 15, 2010

Index of Relationships

Filed under: Database,Software — Patrick Durusau @ 8:11 pm

Index of Relationships

Documentation on relationships in Hibernate.

Understanding how others model relationships can influence our modeling of relationships.

(Pages are not dated. Suggestion on version(s) of Hibernate covered?)

« Newer Posts

Powered by WordPress