Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 20, 2011

Designing and Refining Schema Mappings via Data Examples

Filed under: Database,Mapping,Schema — Patrick Durusau @ 3:34 pm

Designing and Refining Schema Mappings via Data Examples by Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan, from SIGMOD ’11.

Abstract:

A schema mapping is a specification of the relationship between a source schema and a target schema. Schema mappings are fundamental building blocks in data integration and data exchange and, as such, obtaining the right schema mapping constitutes a major step towards the integration or exchange of data. Up to now, schema mappings have typically been specified manually or have been derived using mapping-design systems that automatically generate a schema mapping from a visual specification of the relationship between two schemas. We present a novel paradigm and develop a system for the interactive design of schema mappings via data examples. Each data example represents a partial specification of the semantics of the desired schema mapping. At the core of our system lies a sound and complete algorithm that, given a finite set of data examples, decides whether or not there exists a GLAV schema mapping (i.e., a schema mapping specified by Global-and-Local-As-View constraints) that “fits” these data examples. If such a fitting GLAV schema mapping exists, then our system constructs the “most general” one. We give a rigorous computational complexity analysis of the underlying decision problem concerning the existence of a fitting GLAV schema mapping, given a set of data examples. Specifically, we prove that this problem is complete for the second level of the polynomial hierarchy, hence, in a precise sense, harder than NP-complete. This worst-case complexity analysis notwithstanding, we conduct an experimental evaluation of our prototype implementation that demonstrates the feasibility of interactively designing schema mappings using data examples. In particular, our experiments show that our system achieves very good performance in real-life scenarios.

Two observations:

1) The use of data examples may help overcome the difficulty of getting users to articulate “why” a particular mapping should occur.

2) Data examples that support mappings, if preserved, could be used to illustrate for subsequent users “why” particular mappings were made or even should be followed in mappings to additional schemas.

Mapping across revisions of a particular schema or across multiple schemas at a particular time is likely to benefit from this technique.

June 7, 2011

Sterling: Isolated Storage on Windows Phone 7

Filed under: Database,Software,Topic Map Software — Patrick Durusau @ 6:18 pm

Sterling: Isolated Storage on Windows Phone 7

Not topic map specific but if you need a backend on for a topic map on Windows Phone 7, this might be of interest.

The launch of Windows Phone 7 provided an estimated 1 million Silverlight developers with the opportunity to become mobile coders practically overnight.

Applications for Windows Phone 7 are written in the same language (C# or Visual Basic) and on a framework that’s nearly identical to the browser version of Silverlight 3, which includes the ability to lay out screens using XAML and edit them with Expression Blend. Developing for the phone provides its own unique challenges, however, including special management required when the user switches applications (called “tombstoning”) combined with limited support for state management.

Sterling is an open source database project based on isolated storage that will help you manage local objects and resources in your Windows Phone 7 application as well as simplify the tombstoning process. The object-oriented database is designed to be lightweight, fast and easy to use, solving problems such as persistence, cache and state management. It’s non-intrusive and works with your existing type definitions without requiring that you alter or map them.

In this article, Windows Phone 7 developers will learn how to leverage the Sterling library to persist and query data locally on the phone with minimal effort, along with a simple strategy for managing state when an application is deactivated during tombstoning.

I use a basic cell phone about once a month. Someone else will have to comment on topic map apps on cell phones. 😉

May 16, 2011

Emerging multidisciplinary research across database management systems

Filed under: Conferences,Database — Patrick Durusau @ 3:19 pm

Emerging multidisciplinary research across database management systems by Anisoara Nica, Fabian Suchanek (INRIA Saclay – Ile de France), Aparna Varde.

Abstract:

The database community is exploring more and more multidisciplinary avenues: Data semantics overlaps with ontology management; reasoning tasks venture into the domain of artificial intelligence; and data stream management and information retrieval shake hands, e.g., when processing Web click-streams. These new research avenues become evident, for example, in the topics that doctoral students choose for their dissertations. This paper surveys the emerging multidisciplinary research by doctoral students in database systems and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D. workshop at the International Conference on Information and Knowledge Management (CIKM). The topics addressed include ontology development, data streams, natural language processing, medical databases, green energy, cloud computing, and exploratory search. In addition to core ideas from the workshop, we list some open research questions in these multidisciplinary areas.

Good overview of papers from PIKM 2010, a number of which will be of interest to topic mappers.

PIKM 2010 (You will need to use the Table of Contents tab.)

March 29, 2011

MongoDB Manual

Filed under: Database,MongoDB,NoSQL — Patrick Durusau @ 12:46 pm

MongoDB Manual

More of a placeholder for myself than anything else.

I am going to create a page of links to the documentation for all the popular DB projects.

January 28, 2011

Why Command Helpers Suck – Post

Filed under: Database,Examples,Mapping — Patrick Durusau @ 6:53 am

Why Command Helpers Suck is an amusing rant by Kristina Chodorow (author of MongdoDB: The Definitive Guide) on the different command helpers for the same underlying database commands.

Shades of XWindows documentation and the origins of topic maps. Same commands, different terminology.

If as Robert Cerny has suggested topic maps don’t offer something new then I think it is fair to observe that the problems topic maps work to solve aren’t new either. 😉

A bit more seriously, topic maps could offer Kristina a partial solution.

Imagine a utility for command helpers that is actively maintained and that has a mapping between all the known command helpers and a given database command.

Just enter the command you know and the appropriate command is sent to the database.

That is the sort of helper application that could easily find a niche.

The master mapping could be maintained with full identifications, notes, etc. but there needs to be a compiled version for speed of response.

January 27, 2011

Think Outside the (Comment) Box

Filed under: Database,Semantics,Software — Patrick Durusau @ 8:36 am

Think Outside the (Comment) Box: Social Applications for Publishers

From the announcement:

Learn about the next generation of social applications and how publishers are leveraging them for editorial and financial benefit.

I will spare you the rest of the breathless language.

Still, I will be there and suggest you be there as well.

Norm Walsh, who needs no introduction in markup circles, works at MarkLogic.

That gives me confidence this may be worth hearing.

Details:

February 9, 2011 – 8:00 am pacific, 11:00 am eastern – 4:00 pm GMT

*****
PS: For anyone who has been under a rock for the last several years, MarkLogic makes an excellent XML database solution.

See for example, MarkMail, a collection of technical mailing lists from around the web.

Searching it also illustrates how much semantic improvement can be made to searching.

December 3, 2010

NoSQL Data Modeling

Filed under: Authoring Topic Maps,Database,Topic Maps — Patrick Durusau @ 4:06 pm

NoSQL Data Modeling

Alex Popescu emphasizes that data modeling is part and parcel of NoSQL database design.

Data modeling practice has something that topic maps practice does not: a wealth of material on data model patterns.

Rather I should say: subject identification patterns (which subjects to identify) and subject identity patterns (how to identify those subjects).

Both of which if developed and written out, could help with the topic map authoring process.

November 18, 2010

InfoGrid: The Web Graph Database

Filed under: Database,Graphs,Infogrid — Patrick Durusau @ 7:04 pm

InfoGrid: The Web Graph Database

From the website:

InfoGrid is a Web Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.

This looks like a very good introduction to graph databases.

Questions:

  1. Suggest any other introductions to graph databases you think would be suitable for library school students.
  2. Of the tutorials on graph databases you found, what would you change or do differently?
  3. What examples would you find compelling as a library school student for graph databases?

September 22, 2010

Journal of Cheminformatics

Filed under: Cheminformatics,Database,Subject Identity — Patrick Durusau @ 8:09 pm

Journal of Cheminformatics.

Journal of Cheminformatics is an open access, peer-reviewed, online journal encompassing all aspects of cheminformatics and molecular modeling including:

  • chemical information systems, software and databases, and molecular modelling
  • chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases
  • computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques

A good starting place for chemical subject identity issues.

September 10, 2010

LNCS Volume 6263: Data Warehousing and Knowledge Discovery

Filed under: Database,Graphs,Heterogeneous Data,Indexing,Merging — Patrick Durusau @ 8:20 pm

LNCS Volume 6263: Data Warehousing and Knowledge Discovery edited by Torben Bach Pedersen, Mukesh K. Mohania, A Min Tjoa, has a number of articles of interest to the topic map community.

Here are five (5) that caught my eye:

August 15, 2010

Index Merging

Filed under: Database,Indexing — Patrick Durusau @ 4:47 pm

Index Merging by Surajit Chaudhuri and Vivek Narasayya caught my eye for obvious reasons!

I must admit to some disappointment when I found it was collecting index columns and placing them together in a single table. I am sure that technique is quite valuable for data warehouses but isn’t what I think of when I use the phrase, “merging indexes.”

The article is well written and was worth reading. As I started to put it to one side, it occurred to me that perhaps I was too hasty in deciding it wasn’t relevant to topic maps.

What if I had a data warehouse with a “merged” index where collectively the columns supported queries based on subject identity? Or if I wanted to use a set of indexes from other applications (say Lucene for example), to query against for similar purposes?

Whether you are into .Net or not, you should add this one to your reading list.

July 28, 2010

Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997)

Filed under: Data Integration,Database,Semantic Diversity,Software — Patrick Durusau @ 7:54 pm

Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources (1997) by Mary Tork Roth isn’t the latest word on wrappers but is well written. (longer version, A Wrapper Architecture for Legacy Data Sources (1997) )

The wrapper idea is a good one, although Roth uses it in the context of a unified schema, which is then queried. With a topic map, you could query on the basis of any of the underlying schemas and get the data from all the underlying data sources.

That result is possible because a topic map has one representative for a subject and can have any number of sources for information about that single subject.

I haven’t done a user survey but suspect most users would prefer to search for/access data using familiar schemas rather than new “unified” schemas.

July 17, 2010

Learning from the Web – Article

Filed under: Database,Topic Map Software,Topic Maps,Usability — Patrick Durusau @ 7:55 pm

Learning from the Web will be five (5) years old this coming December.

Alan Bosworth (then VP of Engineering at Google) outlines eight (8) lessons from the Web.

In brief:

  1. Simple, relaxed, sloppily extensible text formats and protocols often work better than complex and efficient binary ones.
  2. It is worth making things simple enough that one can harness Moore’s law in parallel.
  3. It is acceptable to be stale much of the time.
  4. The wisdom of crowds works amazingly well.
  5. People understand a graph composed of tree-like documents (HTML) related by links (URLs).
  6. Pay attention to physics.
  7. Be as loosely coupled as possible.
  8. KISS. Keep it (the design) simple and stupid.

You will need to read the article to get the full flavor of the lessons.

His comments on how databases have failed to heed almost all the lessons of the web is interesting in light of the recent surge of NoSQL projects.

After you read the article, ask yourself how topic maps has or has not heeded the lessons of the web? If you think not, what would it take for topic maps to heed the lessons of the web?

May 30, 2010

Encyclopedia of Database Systems

Filed under: Database — Patrick Durusau @ 6:54 pm

Encyclopedia of Database Systems is a massive reference work on database systems, numbering some 3752 pages and being 8 inches thick (20.3 centimeters).

My first impression was favorable, particularly since entries included synonyms for entries, historical background materials, cross-references and recommended reading. All the things that I appreciate in a reference work.

The entry for record linkage was disappointing in several respects.

It focuses on statistical disclosure control (SDC), a current use of record linkage, but hardly the range of record linkage uses. For a more accurate account of record linkage see William Winkler’s Overview of Record Linkage and Current Research Directions.

Only two synonyms were given for record linkage, Record Matching and Re-identification. No mention of entity heterogeneity, list washing, entity reconciliation, co-reference resolution, etc.

The “synonyms” under Record Matching (the main article for record linkage) point back to the article Record Matching. Multiple terms that point to the main entry are useful. But to have the main entry point to terms that only point back to it waste a reader’s time.

There was a quality control problem in terms of currency of cited research. For William Winkler, one of the leading researchers on record linkage, the most recent citation under Record Matching dates from 1999. Which omits Record Linkage References (Winkler, 2008), Overview of Record Linkage for Name Matching (Winkler, 2008), and, Overview of Record Linkage and Current Research Directions (Winkler, 2006).

My question becomes: What is missing from entries where I lack the familiarity to notice the loss?

Resources that are online is should have hyperlinks. Under the record linkage, Winkler’s 1999 The state of record linkage and current research problems is listed but without any link to the online version. Most of the cited resources are available either from commercial publishers (like the publisher of this tome) or freely online. Hyperlinks would be a value-add to readers.

The 10,696 bibliographic entries are scattered across 3752 pages. In addition to listing the bibliographic entries with each entry (as hyperlinks when possible), there should be comprehensive bibliography for the work. Such hyperlinks could be the basis for a cited-by value-add feature.

With clever use of the subject listings and more complete synonym lists, another value-add would be to provide readers with a dynamic “latest” research on each subject listing.

This review was of the electronic version, which was delivered as a series of separate PDF files. Which quite naturally means that the hyperlinks entries that occur in different sections, do not work. Defeats part of the utility of having an electronic version, at least in my view.

To their credit, Springer has made the subject listing for this work available in XML. Perhaps some enterprising graduate student will use that as a basis for a “latest” research listing.

I will be doing a more systematic review but stumbled across the entry for the W3C. The synonym for W3C is not World Wide Web consortium. Note the lowercase “consortium.” Rather, World Wide Web Consortium. And “Recommended Reading” for that entry, “W3C. Available at: http://www.w3.org” reinforces my point about quality control on references.

This is a very expensive work but I have no objection to commercial publishing, even expensive commercial publishing. I do have an expectation that I will find quality, innovation and value-add as the result of commercial publishing. So far, that expectation has been disappointed in this case.

PS: Every time an author’s name appears either for an entry or a cited work, there should be a hyperlink to the author’s entry in DBLP. That gives a reader access to a constantly updated bibliography of the author’s publications. Another value-add.

May 15, 2010

Index of Relationships

Filed under: Database,Software — Patrick Durusau @ 8:11 pm

Index of Relationships

Documentation on relationships in Hibernate.

Understanding how others model relationships can influence our modeling of relationships.

(Pages are not dated. Suggestion on version(s) of Hibernate covered?)

« Newer Posts

Powered by WordPress