Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 29, 2011

Beyond the Triple Count

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 6:38 pm

Beyond the Triple Count by Leigh Dodds.

From the post:

I’ve felt for a while now that the Linked Data community has an unhealthy fascination on triple counts, i.e. on the size of individual datasets.

This was quite natural in the boot-strapping phase of Linked Data in which we were primarily focused on communicating how much data was being gathered. But we’re now beyond that phase and need to start considering a more nuanced discussion around published data.

If you’re a triple store vendor then you definitely want to talk about the volume of data your store can hold. After all, potential users or customers are going to be very interested in how much data could be indexed in your product. Even so, no-one seriously takes a headline figure at face value. As users we’re much more interested in a variety of other factors. For example how long does it take to load my data? Or, how well does a store perform with my usage profile, taking into account my hardware investment? Etc. This is why we have benchmarks, so we can take into account additional factors and more easily compare stores across different environments.

But there’s not nearly enough attention paid to other factors when evaluating a dataset. A triple count alone tells us nothing. They’re not even a good indicator of the number of useful “facts” in a dataset.

Watch Leigh’s presentation (embedded with his post) and read the post.

I think his final paragraph sets the goal for a wide variety of approaches, however we might disagree about how to best get there! 😉

Very much worth your time to read and ponder.

September 21, 2011

Dydra

Filed under: Dydra,RDF,SPARQL — Patrick Durusau @ 7:07 pm

Dydra

From What is Dydra?:

Dydra

Dydra is a cloud-based graph database. Whether you’re using existing social network APIs or want to build your own, Dydra treats your customers’ social graph as exactly that.

With Dydra, your data is natively stored as a property graph, directly representing the relationships in the underlying data.

Expressive

With Dydra, you access and update your data via an industry-standard query language specifically designed for graph processing, SPARQL. It’s easy to use and we provide a handy in-browser query editor to help you learn.

From the QuickStart

Dydra is an RDF store meant to be quick and easy for developers. Getting started quickly will require already being familiar with RDF and SPARQL

OK, so yes a “graph database,” but in the sense of being an RDF store.

Under What is RDF? -> Overview, the site authors say:

The use of URIs allows multiple data sources to talk about the same entities using the same language.

Really? That must mean all the 303 stuff that no less than Tim Berners-Lee and others have been talking about is unnecessary. I understand that several years ago that was the W3C “position,” but leaving aside all my ranting, it isn’t quite the current position.

There is a fundamental ambiguity when an address is used as an identifier. Does it identify what you find at the location it specifies or is it simply an identifier and what is at the location is additional information about what the address has identified?

The prose is out of date or the authors have a seriously dated view of RDF. Either way, it doesn’t inspire a lot of confidence.

.

September 12, 2011

LinkedGeoData Release 2

LinkedGeoData Release 2

From the webpage:

The aim of the LinkedGeoData (LGD) project is to make the OpenStreetMap (OSM) datasets easily available as RDF. As such the main target audience is the Semantic Web community, however it may turn out to be useful to a much larger audience. Additionally, we are providing interlinking with DBpedia and GeoNames and integration of class labels from translatewiki and icons from the Brian Quinion Icon Collection.

The result is a rich, open, and integrated dataset which we hope to be useful for research and application development. The datasets can be publicly accessed via downloads, Linked Data, and SPARQL-endpoints. We have also launched an experimental “Live-SPARQL-endpoint” that is synchronized with the minutely updates from OSM whereas the changes to our store are republished as RDF.

More geographic data.

September 9, 2011

Kasabi

Filed under: Data,Data as Service (DaaS),Data Source,RDF — Patrick Durusau @ 7:16 pm

Kasabi

A data as service site that offers access to data (no downloads) via API codes. Has helps for authors to prepare their data, APIs for data, etc. Currently in beta.

I mention it because data as service is one model for delivery of topic map content so the successes, problems and usage of Kasabi may be important milestones to watch.

True, Lexis/Nexis, WestLaw, and any number of other commercial vendors have sold access to data in the past but it was mostly dumb data. That is you had to contribute something to it to make it meaningful. We are in the early stages but I think a data market for data that works with my data is developing.

The options to download citations in formats that fit particular bibliographic programs are an impoverished example of delivered data working with local data.

Not quite the vision for the Semantic Web but it isn’t hard to imagine your calendaring program showing links to current news stories about your appointments. You have to supply the reasoning to cancel the appointment with the bank president just arrested for securities fraud and to increase your airline reservations to two (2).

Authoritative URIs for Geo locations? Multi-lingual labels?

Filed under: Geographic Data,Linked Data,RDF — Patrick Durusau @ 7:14 pm

Some Geo location and label links that came up on the pub-lod list:

Not a complete list nor does it include historical references or designations used over the millenia. Still, you may find it useful.

September 8, 2011

InfiniteGraph and RDF tuples

Filed under: Graphs,InfiniteGraph,RDF — Patrick Durusau @ 5:56 pm

InfiniteGraph and RDF tuples

Short answer to the question: “[Does] InfiniteGraph supports RDF (Resource Descriptive Framework) tuples (triples), whether it works like a triplestore, and/or if we can easily work alongside a triple store[?]”

Yes.

It also raises the question: Why would you want to?

August 31, 2011

Semantic Web Journal – Vol. 2, Number 2 / 2011

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 7:48 pm

Semantic Web Journal – Vol. 2, Number 2 / 2011

Just in case you want to send someone the link to a particular article:

Semantic Web surveys and applications
DOI 10.3233/SW-2011-0047 Authors Pascal Hitzler and Krzysztof Janowicz

Taking flight with OWL2
DOI 10.3233/SW-2011-0048 Author Michel Dumontier

Comparison of reasoners for large ontologies in the OWL 2 EL profile
DOI 10.3233/SW-2011-0034 Authors Kathrin Dentler, Ronald Cornet, Annette ten Teije and Nicolette de Keizer

Approaches to visualising Linked Data: A survey
DOI 10.3233/SW-2011-0037 Authors Aba-Sah Dadzie and Matthew Rowe

Is Question Answering fit for the Semantic Web?: A survey
DOI 10.3233/SW-2011-0041 Authors Vanessa Lopez, Victoria Uren, Marta Sabou and Enrico Motta

FactForge: A fast track to the Web of data
DOI 10.3233/SW-2011-0040 Authors Barry Bishop, Atanas Kiryakov, Damyan Ognyanov, Ivan Peikov, Zdravko Tashev and Ruslan Velkov

August 25, 2011

SERIMI…. (Have you washed your data?)

Filed under: Linked Data,LOD,RDF,Similarity — Patrick Durusau @ 7:04 pm

SERIMI – Resource Description Similarity, RDF Instance Matching and Interlinking

From the website:

The interlinking of datasets published in the Linked Data Cloud is a challenging problem and a key factor for the success of the Semantic Web. Manual rule-based methods are the most effective solution for the problem, but they require skilled human data publishers going through a laborious, error prone and time-consuming process for manually describing rules mapping instances between two datasets. Thus, an automatic approach for solving this problem is more than welcome. We propose a novel interlinking method, SERIMI, for solving this problem automatically. SERIMI matches instances between a source and a target datasets, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms state-of-the-art automatic approaches for solving the interlinking problem on the Linked Data Cloud.

SERIMI-TECH-REPORT-v2.pdf

From the Results section:

The poor performance of SERIMI in the Restaurant1-Reataurant2 is mainly due to missing alignment in the reference set. The poor performance in the Person21-Person22 pair is due to the nature of the data. These datasets where built by adding spelling mistakes to the properties and literals values of their original datasets. Also only instances of class Person were retrieved into the pseudo-homonym sets during the interlinking process.

Impressive work overall but isn’t dirty data really the test? Just about any process can succeed with clean data.

Or is that really the weakness of the Semantic Web? That it requires clean data?

August 24, 2011

Sesame 2.5.0 Release

Filed under: RDF,Sesame,SPARQL — Patrick Durusau @ 7:00 pm

Sesame 2.5.0 Release

From the webpage:

  • SPARQL 1.1 Query Language support
    Sesame 2.5 features near-complete support for the
    SPARQL 1.1 Query Language Last Call Working Draft ,
    including all new builtin functions and operators, improved aggregation behavior and more.
  • SPARQL 1.1 Update support
    Sesame 2.5 has full support for the new SPARQL 1.1 Update Working Draft. The Repository API has been extended to support creation of SPARQL Update operations, the SAIL API has been extended to allow Update operations to be passed directly to the underlying backend implementation for optimized execution. Also, the Sesame Workbench application has been extended to allow easy execution of SPARQL update operations on your repositories.
  • SPARQL 1.1 Protocol support
    Sesame 2.5 fully supports the SPARQL 1.1 Protocol for RDF Working Draft. The Sesame REST protocol has been extended to allow update operations via SPARQL on repositories. A Sesame server therefore now automatically publishes any repository as a fully compliant SPARQL endpoint.
  • Binary RDF support
    Sesame 2.5 includes a new binary RDF serialization format. This format has been derived from the existing binary tuple results format. It’s main features are reduced parsing overhead and minimal memory requirements (for handling really long literals, a.o.t.).

August 23, 2011

Chemical Entity Semantic Specification:…(article)

Filed under: Cheminformatics,RDF,Semantic Web — Patrick Durusau @ 6:37 pm

Chemical Entity Semantic Specification: Knowledge representation for efficient semantic cheminformatics and facile data integration by Leonid L Chepelev and Michel Dumontier, Journal of Cheminformatics 2011, 3:20doi:10.1186/1758-2946-3-20.

Abstract

Background
Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts have been dedicated to unifying the computational representation of chemical information, disparities between the various chemical databases still persist and stand in the way of cross-domain, interdisciplinary investigations. Through a common syntax and formal semantics, Semantic Web technology offers the ability to accurately represent, integrate, reason about and query across diverse chemical information.

Results
Here we specify and implement the Chemical Entity Semantic Specification (CHESS) for the representation of polyatomic chemical entities, their substructures, bonds, atoms, and reactions using Semantic Web technologies. CHESS provides means to capture aspects of their corresponding chemical descriptors, connectivity, functional composition, and geometric structure while specifying mechanisms for data provenance. We demonstrate that using our readily extensible specification, it is possible to efficiently integrate multiple disparate chemical data sources, while retaining appropriate correspondence of chemical descriptors, with very little additional effort. We demonstrate the impact of some of our representational decisions on the performance of chemically-aware knowledgebase searching and rudimentary reaction candidate selection. Finally, we provide access to the tools necessary to carry out chemical entity encoding in CHESS, along with a sample knowledgebase.

Conclusions
By harnessing the power of Semantic Web technologies with CHESS, it is possible to provide a means of facile cross-domain chemical knowledge integration with full preservation of data correspondence and provenance. Our representation builds on existing cheminformatics technologies and, by the virtue of RDF specification, remains flexible and amenable to application- and domain-specific annotations without compromising chemical data integration. We conclude that the adoption of a consistent and semantically-enabled chemical specification is imperative for surviving the coming chemical data deluge and supporting systems science research.

Project homepage: Chemical Entity Semantic Specification

August 22, 2011

Public Dataset Catalogs Faceted Browser

Filed under: Dataset,Facets,Linked Data,RDF — Patrick Durusau @ 7:42 pm

Public Dataset Catalogs Faceted Browser

A faceted browser for the catalogs, not their content.

Filter on coverage, location, country (not sure how location and country usefully differ), catalog status (seems to mix status and data type), and managed by.

Do be aware that as the little green balloons disappear with your selection that more of the coloring of the map itself appears.

I mention that because at first it seemed the map was being colored based on the facets I choose. Such as Europe is suddenly dark green when I chose the United States in the filter. Confusing at first and makes me wonder, why use a map with underlying coloration anyway? A white map with borders would be a better display background for the green balloons indicating catalog locations.

BTW, if you visit a catalog and then use the back button, all your filters are reset. Not a problem now with a small set of filters and only 100 catalogs but should this resource continue to grow, that could become a usability issue.

August 21, 2011

Trouble with 1 Trillion Triples?

Filed under: AllegroGraph,RDF — Patrick Durusau @ 7:09 pm

Franz’s AllegroGraph® Sets New Record – 1 Trillion RDF Triples”

From the post:

OAKLAND, Calif. — August 16, 2011 — Franz Inc., a leading supplier of Graph Database technology, with critical support from Stillwater SuperComputing Inc. and Intel, today announced it has achieved its goal of being the first to load and query a NoSQL database with a trillion RDF statements. RDF (also known as triples or quads), the cornerstone of the Semantic Web, provides a more flexible way to represent data than relational database and is at the heart of the W3C push for the Semantic Web.

A trillion RDF Statements eclipses the current state of the art for the Semantic Web data management but is a primary interest for companies like Amdocs that use triples to represent real-time knowledge about telecom customers. Per-customer, Amdocs uses about 4,000 triples, so a large telecom like China Mobile would easily need 2 trillion triples to have detailed knowledge about each single customer.

Impressive milestone for a NoSQL solution and the Semantic Web.

The unanswered Semantic Web management question is:

What to do with inconsistent semantics spread over 1 trillion (or more) triples?

August 13, 2011

Chemical Entity Semantic Specification

Filed under: Cheminformatics,RDF,Semantic Web — Patrick Durusau @ 3:46 pm

Chemical Entity Semantic Specification

From the website:

Chemical Entity Semantic Specification (CHESS) framework strives to provide a means of representing chemical data with the goal of facile chemical information federation and addressing increasingly rich and complex queries for biological, pharmaceutical, and synthetic chemistry applications. The principal emphasis of CHESS is data representation to assist in metabolic fate determination, synthetic pathway construction, and automatic chemical entity classification. With explicit semantic specification of reactions for example, CHESS allows the tracing of the mechanisms of chemical transformations on the level of individual atoms, bonds, functional groups, or molecules, as well as the individual “histories” of elements of chemical entities in a pathway. Further, the CHESS framework draws on CHEMINF and SIO ontologies to provide methods for specifying uncertainty, conformer-specific information, units, and circumstances for physical measurements at variable levels of granularity, permitting rich, cross-domain queries over this data. In addition to this, CHESS provides a set of specifications to address data federation through the adoption of unique, canonical identifiers for many classes of chemical entities.

Interesting project but appears to lack uptake.

As of 13 August 2011, I get nine (9) “hits” from a popular search engine on the name as a string.

Useful as a resource for existing ontologies and identification schemes.

July 30, 2011

GDB for the Data Driven Age (STI Summit Position Paper)

Filed under: Graphs,OWL,RDF,Semantic Diversity,Semantic Web — Patrick Durusau @ 9:10 pm

GDB for the Data Driven Age (STI Summit Position Paper) by Orri Erling.

From the post:

The Semantic Technology Institute (STI) is organizing a meeting around the questions of making semantic technology deliver on its promise. We were asked to present a position paper (reproduced below). This is another recap of our position on making graph databasing come of age. While the database technology matters are getting tackled, we are drawing closer to the question of deciding actually what kind of inference will be needed close to the data. My personal wish is to use this summit for clarifying exactly what is needed from the database in order to extract value from the data explosion. We have a good idea of what to do with queries but what is the exact requirement for transformation and alignment of schema and identifiers? What is the actual use case of inference, OWL or other, in this? It is time to get very concrete in terms of applications. We expect a mixed requirement but it is time to look closely at the details.

Interesting post that includes the following observation:

Real-world problems are however harder than just bundling properties, classes, or instances into sets of interchangeable equivalents, which is all we have mentioned thus far. There are differences of modeling (“address as many columns in customer table” vs. “address normalized away under a contact entity”), normalization (“first name” and “last name” as one or more properties; national conventions on person names; tags as comma-separated in a string or as a one-to-many), incomplete data (one customer table has family income bracket, the other does not), diversity in units of measurement (Imperial vs. metric), variability in the definition of units (seven different things all called blood pressure), variability in unit conversions (currency exchange rates), to name a few. What a world!

Yes, quite.

Worth a very close read.

July 27, 2011

Learning SPARQL

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 8:35 am

Learning SPARQL by Bob DuCharme.

From the author’s announcement (email):

It’s the only complete book on the W3C standard query language for linked data and the semantic web, and as far as I know the only book at all that covers the full range of SPARQL 1.1 features such as the ability to update data. The book steps you through simple examples that can all be performed with free software, and all sample queries, data, and output are available on the book’s website.

In the words of one reviewer, “It’s excellent—very well organized and written, a completely painless read. I not only feel like I understand SPARQL now, but I have a much better idea why RDF is useful (I was a little skeptical before!)” I’d like to thank everyone who helped in the review process and everyone who offered to help, especially those in the Charlottesville/UVa tech community.

You can follow news about the book and about SPARQL on Twitter at @learningsparql.

Remembering Bob’s “SGML CD,” I ordered a copy (electronic and print) of “Learning SPARQL” as soon as I saw the announcement in my inbox.

More comments to follow.

July 22, 2011

You Too Can Use Hadoop Inefficiently!!!

Filed under: Algorithms,Graphs,Hadoop,RDF,SPARQL — Patrick Durusau @ 6:15 pm

The headline Hadoop’s tremendous inefficiency on graph data management (and how to avoid it) certainly got my attention.

But when you read the paper, Scalable SPARQL Querying of Large RDF Graphs, it isn’t Hadoop’s “tremendous inefficiency,” but actually that of SHARD, an RDF triple store that uses flat text files for storage.

Or as the authors say in their paper (6.3 Performance Comparison):

Figure 6 shows the execution time for LUBM in the four benchmarked systems. Except for query 6, all queries take more time on SHARD than on the single-machine deployment of RDF-3X. This is because SHARD’s use of hash partitioning only allows it optimize subject-subject joins. Every other type of join requires a complete redistribution of data over the network within a Hadoop job, which is extremely expensive. Furthermore, its storage layer is not at all optimized for RDF data (it stores data in flat files).

Saying that SHARD (not as well known as Hadoop), was using Hadoop inefficiently, would not have the “draw” of allegations about Hadoop’s failure to process graph data efficiently.

Sure, I write blog lines for “draw” but let’s ‘fess up in the body of the blog article. Readers shouldn’t have to run down other sources to find the real facts.

July 17, 2011

RDF data in Neo4J: the Tinkerpop story

Filed under: RDF,Sail,TinkerPop — Patrick Durusau @ 7:25 pm

RDF data in Neo4J: the Tinkerpop story

From the post:

As mentioned in my previous blog post, I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects. Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J. Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative!

Interesting if you are an RDF or biologicals fan, or even if you are not!

July 13, 2011

Storing and querying RDF data in Neo4J through Sail

Filed under: Neo4j,RDF,Sail — Patrick Durusau @ 7:27 pm

Storing and querying RDF data in Neo4J through Sail

Dave Suvee walks through importing RDF triple files into Neo4j.

Update: 17 July 2011 RDF data in Neo4J: the Tinkerpop story advises that neo-rdf-sail is no longer under active development.

July 7, 2011

LAC Releases Government of Canada Core
Subject Thesaurus

Filed under: Government Data,RDF,SKOS — Patrick Durusau @ 4:30 pm

LAC Releases Government of Canada Core Subject Thesaurus

From the post:

The government of Canada has released a new downloadable version of its Core Subject Thesaurus in SKOS/RDF format. According to Library and Archives Canada, “The Government of Canada Core Subject Thesaurus is a bilingual thesaurus consisting of terminology that represents all the fields covered in the information resources of the Government of Canada. Library and Archives Canada is exploring the potential for linked data and the semantic web with LAC vocabularies, metadata and open content.”

When you reach the post with links to the vocabulary you will find it is also available as XML and CVS.

There are changes from the 2009 version.

Here’s an example:

old form new form French equivalent
Adaptive aids
(for persons
with disabilities)
Assistive Technologies Technologie d’aide

Did you notice that the old form and new form don’t share a single word in common?

Imagine that, an unstable core subject thesaurus.

Over time, more terms will be added, changed and deleted. Is there a topic map in the house?

July 6, 2011

SERIMI

Filed under: Ontology,RDF — Patrick Durusau @ 2:13 pm

SERIMI (version 0.9), a tool for automatic RDF data interlinking

From the announcement:

SERIMI matches instances between a source and a target dataset, without prior knowledge of the data, domain or schema of these datasets. Experiments conducted with benchmark collections demonstrate that our approach considerably outperforms published state-of-the-art automatic approaches for solving the interlinking problem in the Linked Data Cloud. An updated reference alignment between Dailymed[1] and TCM[2] that can be used as a golden set is also available for download.

[1] http://code.google.com/p/junsbriefcase/wiki/TGDdataset
[2] http://www4.wiwiss.fu-berlin.de/dailymed/

For the details, see: SERIMI-TECH-REPORT-v2.pdf.

Just skimmed the paper before posting. Deeply interesting work based on Tversky’s contrast model. “Tversky, A. (1977). Features of similarity. Psychological Review 84 (4), 327–352.” As of today, Tversky’s work has been cited 1598 times so it will take a while to look through the subsequent work.

STI Innsbruck

Filed under: OWL,RDF,RDFa,Semantic Web — Patrick Durusau @ 2:12 pm

STI Innsbruck

From the about page:

The Semantic Technology Institute (STI) Innsbruck, formerly known as DERI Innsbruck, was founded by Univ.-Prof. Dr. Dieter Fensel in 2002 and has developed into a challenging and dynamic research institute of approximately 40 people. STI Innsbruck collaborates with an international network of institutes in Asia, Europe and the USA, as well as with a number of global industrial partners.

STI Innsbruck is a founding member of STI International, a collaborative association of leading European and world wide initiatives, ensuring the success and sustainability of semantic technology development. STI Innsbruck utilizes this network, as well as contributing to it, in order to increase the impact of the research conducted within the institute. For more details on Semantics, check this interview with Frank Van Harmelen: “Search and you will find“.

I won’t try to summarize the wealth of resources you will find at STI Innsbruck. From the reading list for the curriculum to the listing of tools and publications, you will certainly find material of interest at this site.

For an optimistic view of Semantic Web activity see the interview with Frank Van Harelen.

A Survey On Data Interlinking Methods

Filed under: Linked Data,LOD,RDF — Patrick Durusau @ 2:11 pm

A Survey On Data Interlinking Methods by Stephan Wölger, Katharina Siorpaes, Tobias Bürger, Elena Simperl, Stefan Thaler, and, Christian Hofer.

From the introduction:

In 2007 the Linking Open Data (LOD) community project started an initiative which aims at increased use of Semantic Web applications. Such applications on the one hand provide new means to enrich a user’s web experience but on the other hand also require certain standards to be adhered to. Two important requirements when it comes to Semantic Web applications are the availability of RDF datasets on the web and having typed links between these datasets in order to be able to browse the data and to jump between them in various directions.

While there exist tools that create RDF output automatically from the application level and tools that create RDF from web sites, interlinking the resulting datasets is still a task that can be cumbersome for humans (either because there is a lack of insentives or due the non-availability of user friendly tools) or not doable for machines (due to the manifoldness of domains). Despite the fact that there are more and more interlinking tools available, those either can be applied only for certain domains of the real world (e.g. publications) or they can be used just for interlinking a specific type of data (e.g. multimedia data).

Another interesting survey article from the Semantic Technology Institute (STI) Innsbruck, University of Innsbruck.

I like the phrase “…manifoldness of domains.” RDF output is useful information about data. The problem I foresee is that the semantics it represents are local, hence the “manifoldness of domains.” Not always, there are some domains that are so close as to not be distinguishable, one from the other, and linking RDF will work quite well.

One imagines that RDF based interlinking OfficeDepot, Staples and OfficeMax should not be difficult. Tiresome, not terribly interesting, but not difficult. And that could prove to be useful for personal and corporate buyers seeking price breaks or competitors trying to decide on loss leaders. Not a lot of reasoning to be done except by the buyers and sellers.

I am sure there would still be some domain differences between those vendors but having a common mapping from one vendor number to all three vendor numbers could prove to be very useful for customers and distributors alike.

For more complex/abstract domains, where “…manifoldness of domains.” is an issue, you can use topic maps.

Joint International Semantic Technology
Conference (JIST2011)

Filed under: Conferences,OWL,RDF,Semantic Web — Patrick Durusau @ 2:10 pm

Joint International Semantic Technology Conference (JIST2011) Dec. 4-7, 2011, Hangzhou, China

Important Dates:


– Submissions due: August 15, 2011, 23:59 (11:59pm) Hawaii time

– Notification: September 22, 2011, 23:59 (11:59pm) Hawaii time

– Camera ready: October 3, 2011, 23:59 (11:59pm) Hawaii time

– Conference dates: December 4-7, 2011

From the call:

The Joint International Semantic Technology Conference (JIST) is a regional federation of Semantic Web related conferences. The mission of JIST is to bring together researchers in disciplines related to the semantic technology from across the Asia-Pacific Region. JIST 2011 incorporates the Asian Semantic Web Conference 2011 (ASWC 2011) and Chinese Semantic Web Conference 2011 (CSWC 2011).

Prof. Ian Horrocks (Oxford University) scheduled to present a keynote address.

June 29, 2011

Providing and discovering definitions of URIs

Filed under: Identifiers,Linked Data,LOD,OWL,RDF,Semantic Web — Patrick Durusau @ 9:10 am

Providing and discovering definitions of URIs by Jonathan A. Rees.

Abstract:

The specification governing Uniform Resource Identifiers (URIs) [rfc3986] allows URIs to mean anything at all, and this unbounded flexibility is exploited in a variety contexts, notably the Semantic Web and Linked Data. To use a URI to mean something, an agent (a) selects a URI, (b) provides a definition of the URI in a manner that permits discovery by agents who encounter the URI, and (c) uses the URI. Subsequently other agents may not only understand the URI (by discovering and consulting the definition) but may also use the URI themselves.

A few widely known methods are in use to help agents provide and discover URI definitions, including RDF fragment identifier resolution and the HTTP 303 redirect. Difficulties in using these methods have led to a search for new methods that are easier to deploy, and perform better, than the established ones. However, some of the proposed methods introduce new problems, such as incompatible changes to the way metadata is written. This report brings together in one place information on current and proposed practices, with analysis of benefits and shortcomings of each.

The purpose of this report is not to make recommendations but rather to initiate a discussion that might lead to consensus on the use of current and/or new methods.

The criteria for success:

  1. Simple. Having too many options or too many things to remember makes discovery fragile and impedes uptake.
  2. Easy to deploy on Web hosting services. Uptake of linked data depends on the technology being accessible to as many Web publishers as possible, so should not require control over Web server behavior that is not provided by typical hosting services.
  3. Easy to deploy using existing Web client stacks. Discovery should employ a widely deployed network protocol in order to avoid the need to deploy new protocol stacks.
  4. Efficient. Accessing a definition should require at most one network round trip, and definitions should be cacheable.
  5. Browser-friendly. It should be possible to configure a URI that has a discoverable definition so that ‘browsing’ to it yields information useful to a human.
  6. Compatible with Web architecture. A URI should have a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

.

I had to look it up to get the page number but I remembered Karl Wiegers in Software Requirements saying:

Feasible

It must be possible to implement each requirement within the known capabilities and limitations of the system and its environment.

The single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name requirement is not feasible. It will stymie this project, despite the array of talent on hand, until it is no longer a requirement.

Need proof? Name one URI with a single agreed meaning globally, whether it’s used as a protocol element, hyperlink, or name.

Not one that the W3C TAG, or TBL or anyone else thinks/wants/prays has a single agree meaning globally, … but one that in fact has such a global meaning.

It’s been more than ten years. Let’s drop the last requirement and let the rather talented group working on this come up with a solution that meets the other five (5) requirements.

It won’t be a universal solution but then neither is the WWW.

LDIF – Linked Data Integration Framework

Filed under: LDIF,Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 9:02 am

LDIF – Linked Data Integration Framework 0.1

From the webpage:

The Web of Linked Data grows rapidly and contains data from a wide range of different domains, including life science data, geographic data, government data, library and media data, as well as cross-domain datasets such as DBpedia or Freebase. Linked Data applications that want to consume data from this global data space face the challenges that:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

This usage of different vocabularies as well as the usage of URI aliases makes it very cumbersome for an application developer to write SPARQL queries against Web data which originates from multiple sources. In order to ease using Web data in the application context, it is thus advisable to translate data to a single target vocabulary (vocabulary mapping) and to replace URI aliases with a single target URI on the client side (identity resolution), before starting to ask SPARQL queries against the data.

Up-till-now, there have not been any integrated tools that help application developers with these tasks. With LDIF, we try to fill this gap and provide an initial alpha version of an open-source Linked Data Integration Framework that can be used by Linked Data applications to translate Web data and normalize URI aliases.

More comments will follow, but…

Isn’t this the reverse of the well-known synonym table in IR?

Instead of substituting synonyms in the query expression, the underlying data is being transformed to produce…a lack of synonyms?

No, not the reverse of a synonym table, in synonym table terms, we would lose the synonym table and transform the underlying textual data to use only a single term where before there were N terms, all of which occurred in the synonym table.

If I search for a term previously listed in the synonym table, but one replaced by a common term, my search result will be empty.

No more synonyms? That sounds like a bad plan to me.

June 27, 2011

Data-gov Wiki

Filed under: Government Data,Public Data,RDF — Patrick Durusau @ 6:32 pm

Data-gov Wiki

From the wiki:

The Data-gov Wiki is a project being pursued in the Tetherless World Constellation at Rensselaer Polytechnic Institute. We are investigating open government datasets using semantic web technologies. Currently, we are translating such datasets into RDF, getting them linked to the linked data cloud, and developing interesting applications and demos on linked government data. Most of the datasets shown on this page come from the US government’s data.gov Web site, although some are from other countries or non-government sources.

Try out their Drupal site with new demos:

Linking Open Government Data

My misgivings about the “openness” that releasing government data brings to one side, the Drupal site is a job well done and merits your attention.

June 17, 2011

Moma, What do URLs in RDF Mean?

Filed under: RDF,Semantic Web — Patrick Durusau @ 7:23 pm

Lars Marius Garshol says in a tweet:

The old “how to find what URIs represent information resources in RDF” issue returns, now with real consequences

pointing to: How to find what URLs in an RDF graph refer to information resources?.

You may also be interested in Jenni Tennison’s summary of a recent TAG meeting on the subject:

URI Definition Discovery and Metadata Architecture

The afternoon session on Tuesday was spent on Jonathan Rees’s work on the Architecture of the World Wide Semantic Web, which covers, amongst other things, what people in semantic web circles call httpRange-14. At core, this is about the kinds of URIs we can use to refer to real-world things, what the response to HTTP requests on those URIs should be, and how we find out information about these resources.

Jonathan has put together a document called Providing and discovering definitions of URIs which covers the various ways that have been suggested over time, including the 303 method that was recommended by the TAG in 2005 and methods that have been suggested by various people since that time.

It’s clear that the 303 method has lots of practical shortcomings for people deploying linked data, and isn’t the way in which URIs are commonly used by Facebook and schema.org, who don’t currently care about using separate URIs for documents and the things those documents are about. We discussed these alongside concerns that we continue to support people who want to do things like describe the license or provenance of a document (as well as the facts that it contains) and don’t introduce anything that is incompatible with the ways in which people who have been following recommended practice are publishing their linked data. The general mood was that we need to support some kind of ‘punning’, whereby a single URI could be used to refer to both a document and a real-world thing, with different properties being assigned to different ‘views’ of that resource.

Jonathan is going to continue to work on the draft, incorporating some other possible approaches. It’s a very contentious topic within the linked data community. My opinion is while we need to provide some ‘good practice’ guides for linked data publishers, we can’t just stick to a theoretical ideal that experience has shown not to be practical. What I’d hope is that the TAG can help to pull together the various arguments for and against different options, and document whatever approach the wider community supports.

My suggested “best practice” is to not trust linked data, RDF, or topic maps data unless it is tested (passes) and you trust its point of origin.

Anymore than you would print your credit card number and pin on the side of your car. Blind trust in any data source is a bad idea.

June 13, 2011

Why Schema.org Will Win

Filed under: Ontology,OWL,RDF,Schema,Semantic Web — Patrick Durusau @ 7:04 pm

It isn’t hard to see why schema.org is going to win out over “other” semantic web efforts.

The first paragraph at the schema.org website says why:

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages.

  • Easy: Uses HTML tags
  • Immediate Utility: Recognized by Bing, Google and Yahoo!
  • Immediate Payoff: People can find the right web pages (your web pages)

Ironic that when HTML came up the scene, any number of hypertext engines offered more complex and useful approaches to hypertext.

But the advantages of HTML were:

  • Easy: Used simple tags
  • Immediate Utility: Useful to the author
  • Immediate Payoff: Joins hypertext network for others to find (your web pages)

I think the third advantage in each case is the crucial one. We are vain enough that making our information more findable is a real incentive, if there is a reasonable expectation of it being found. Today or tomorrow. Not ten years from now.

Linking Science and Semantics… (webinar)
15 June 2011 – 10 AM PT (17:00 GMT)

Filed under: Bioinformatics,Biomedical,OWL,RDF,Semantics — Patrick Durusau @ 7:03 pm

Linking science and semantics with the Annotation Ontology and the SWAN Annotation Tool

Abstract:

The Annotation Ontology (AO) is an open ontology in OWL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation.

The SWAN Annotation Tool, recently renamed DOMEO (Document Metadata Exchange Organizer), is an extensible web application enabling users to visually and efficiently create and share ontology-based stand-off annotation metadata on HTML or XML document targets, using the Annotation Ontology RDF model. The tool supports manual, fully automated, and semi-automated annotation with complete provenance records, as well as personal or community annotation with access authorization and control.
[AO] http://code.google.com/p/annotation-ontology

I’m interested in how “stand-off” annotation is being handled, being an overlapping markup person myself. Also curious how close it comes to HyTime like mechanisms.

More after the webinar.

June 8, 2011

Knoodl!

Filed under: OWL,RDF — Patrick Durusau @ 10:22 am

Knoodl!

From the webpage:

Knoodl is a tool for creating, managing, and analyzing RDF/OWL descriptions. Its many features support collaboration in all stages of these activities. Knoodl’s key component is a semantic software architecture that supports Emergent Analytics. Knoodl is hosted in the Amazon EC2 cloud and can be used for free. It may also be licensed for private use as MyKnoodl.

Mapping to or between the components of RDF/OWL descriptions as subjects will require analysis. Or simply use of RDF/OWL descriptions. In either case this could be a useful tool.

« Newer PostsOlder Posts »

Powered by WordPress