Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 7, 2011

USEWOD 2012 Data Challenge

Filed under: Contest,Linked Data,RDF,Semantic Web,Semantics — Patrick Durusau @ 8:08 pm

USEWOD 2012 Data Challenge

From the website:

The USEWOD 2012 Data Challenge invites research and applications built on the basis of USEWOD 2012 Dataset.

Accepted submissions will be presented at USEWOD2012, where a winner will be chosen. Examples of analyses and research that could be done with the dataset are the following (but not limited to those):

  • correlations between linked data requests and real-world events
  • types of structured queries
  • linked data access vs. conventional access
  • analysis of user agents visiting the sites
  • geographical analysis of requests
  • detection and visualisation of trends
  • correlations between site traffic and available datasets
  • etc. – let your imagination run wild!

USEWOD 2012 Dataset

The USEWOD dataset consists of server logs from from two major web servers publishing datasets on the Web
of linked data. In particular, the dataset contains logs from:

  • DBPedia: slices of log data
    spanning several months from
    the linked data twin of Wikipedia, one of the focal points of the Web of data.
    The logs were kindly made available to us for the challenge
    by OpenLink Software!
    Further details about this part of the dataset to follow.
  • SWDF:
    Semantic Web Dog Food is a
    constantly growing dataset of publications, people and organisations in the Web and Semantic Web area,
    covering several of the major conferences and workshops, including WWW, ISWC and ESWC. The logs
    contain two years of requests to the server from about 12/2008 until 12/2010.
  • Linked Open Geo Data A dataset about geographical data.
  • Bio2RDF Linked Data for life sciences.

Data sets are still under construction. Organizers advise that data sets should be available next week.

Your results should be reported as short papers and are due by 15 February 2011.

USEWOD 2012 : USAGE ANALYSIS AND THE WEB OF DATA

Filed under: Conferences,Semantic Web,Semantics — Patrick Durusau @ 8:06 pm

USEWOD 2012 : USAGE ANALYSIS AND THE WEB OF DATA

Location: Lyon, France (co-located with WWW2012)

Dates:

Release of Dataset for the USEWOD2012 Challenge: 15 December 2011
Paper submission deadline: 15 February 2012
Acceptance notification: 3 March 2012
Workshop and Prize for USEWOD Challenge: 16 or 17 April 2012

From the webpage:

The purpose of this workshop is to investigate new developments concerning the synergy between semantics and semantic-web technology on the one hand, and the analysis and mining of usage data on the other hand. As the first USEWOD workshop at WWW 2011 has shown, these two fields complement each other well. First, semantics can be used to enhance the analysis of usage data. Second, usage data analysis can enhance semantic resources as well as Semantic Web applications. Traces of users can be used to evaluate, adapt or personalise Semantic Web applications and logs can form valuable resources from which semantic knowledge can be extracted bottom-up.

The emerging Web of Data demands a re-evaluation of existing evaluation techniques: the Linked Data community is recognising that it needs to move beyond triple counts. Usage analysis is a key method for the evaluation of a datasets and applications. New ways of accessing information enabled by the Web of Data requires the development or adaptation of algorithms, methods, and techniques to analyse and interpret the usage of Web data instead of Web pages, a research endeavour that can profit from what has been learned in more than a decade of Web usage mining. The results can provide fine-grained insights into how semantic datasets and applications are being accessed and used by both humans and machines – insights that are needed for optimising the design and ultimately ensuring the success of semantic resources.

The primary goals of this workshop are to foster the emerging community of researchers from various fields sharing an interest in usage mining and semantics, to evaluate the developments of the past year, and to further develop a roadmap for future research in this direction.

December 1, 2011

November 27, 2011

Top Three Technologies to Tame the Big Data Beast

Filed under: Description Logic,RDF,Semantic Web — Patrick Durusau @ 8:51 pm

Top Three Technologies to Tame the Big Data Beast by Steve Hamby.

I would re-order some of Steve’s remarks. For example, on the Semantic Web, why not put those paragraphs first:

The first technology needed to tame Big Data — derived from the “memex” concept — is semantic technology, which loosely implements the concept of associative indexing. Dr. Bush is generally considered the godfather of hypertext based on the associative indexing concept, per his 1945 article. The Semantic Web, paraphrased from a definition by the World Wide Web Consortium (W3C), extends hyperlinked Web pages by adding machine-readable metadata about the Web page, including relationships across Web pages, thus allowing machine agents to process the hyperlinks automatically. The W3C provides a series of standards to implement the Semantic Web, such as Web Ontology Language (OWL), Resource Description Framework (RDF), Rule Interchange Format (RIF), and several others.

The May 2001 Scientific American article “The Semantic Web” by Tim Berners-Lee, Jim Hendler, and Ora Lassila described the Semantic Web as agents that query ontologies representing human knowledge to find information requested by a human. OWL ontology is based on Description Logics, which are both expressive and decidable, and provide a foundation for developing precise models about various domains of knowledge. These ontologies provide the “memory index” that enables searches across vast amounts of data to return relevant, actionable information, while addressing key data trust challenges as well. The ability to deliver semantics to a mobile device, such as what the recent release of the iPhone 4S does with Siri, is an excellent step in taming the Big Data beast, since users can get the data they need when and where they need it. Big Data continues to grow, but semantic technologies provide the needed check points to properly index vital information in methods that imitate the way humans think, as Dr. Bush aptly noted.

Follow that with the amount of data recitation and the comments about Vannevar Bush:

In the July 1945 issue of The Atlantic Monthly, Dr. Vannevar Bush’s famous essay, “As We May Think,” was published as one of the first articles addressing Big Data, information overload, or the “growing mountain of research” as stated in the article. The 2010 IOUG Database Growth Survey, conducted in July-August 2010, estimates that more than a zettabyte (or a trillion gigabytes) of data exists in databases, and that 16 percent of organizations surveyed reported a data growth rate in excess of 50 percent annually. A Gartner survey, also conducted in July-August 2010, reported that 47 percent of IT staffers surveyed ranked data growth as one of the top three challenges faced by their IT organization. Based on two recent IBM articles derived from their CIO Survey, one in three CIOs make decisions based on untrusted data; one in two feel they do not have the data they need to make an informed decision; and 83 percent cite better analytics as a top concern. A recent survey conducted for MarkLogic asserts that 35 percent of respondents believe their unstructured data sources will surpass their structured data sources in size in the next 36 months, while 86 percent of respondents claim that unstructured data is important to their organization. The survey further asserts that only 11 percent of those that consider unstructured data important have an infrastructure that addresses unstructured data.

Dr. Bush conceptualized a “private library,” coined “memex” (mem[ory ind]ex) in his essay, which could ingest the “mountain of research,” and use associative indexing — how we think — to correlate trusted data to support human decision making. Although Dr. Bush conceptualized “memex” as a desk-based device complete with levers, buttons, and a microfilm-based storage device, he recognized that future mechanisms and gadgetry would enhance the basic concepts. The core capabilities of “memex” were needed to allow man to “encompass the great record and to grow in the wisdom of race experience.”

That would allow exploration of questions and comments like:

1) With a zettabyte of data and more coming in every day, precisely how are we going to create/impose OWL ontologies to develop “…precise models about various domains of knowledge?”

2) Curious on what grounds hyperlinking is considered the equivalent of associative indexing? Hyperlinks can be used by indexes but hyperlinking isnt indexing. Wasn’t then, isn’t now.

3) The act of indexing is collecting references to a list of subjects. Imposing RDF/OWL may be preparatory steps towards indexing but are not indexing in and of themselves.

4) Description Logics are decidable but why does Steve think human knowledge can be expressed in decidable fashion? There is a vast amount of human knowledge in religion, philosophy, politics, ethics, economics, etc., that cannot be expressed in decidable fashion. Parking regulations can be expressed in decidable fashion, I think, but I don’t know if they are worth the trouble of RDF/OWL.

5) For that matter, where does Steve get the idea that human knowledge is precise? I suppose you could have made that argument in the 1890’s, except for some odd cases, classical physics was sufficient. At least until 1905. (Hint: Think of Albert Einstein.) Human knowledge is always provisional, uncertain and subject to revision. The CERN has apparently observed neutrinos going faster than the speed of light, for example. More revisions of physics are on the way.

Part of what we need to tame the big data “beast” is acceptance that we need information systems that are like ourselves.

That is to say information systems that are tolerant of imprecision, perhaps even inconsistency, that don’t offer a false sense of decidability and omniscience. Then at least we can talk about and recognize the parts of big data that remain to be tackled.

November 23, 2011

SUMMER SCHOOL ON ONTOLOGY ENGINEERING AND THE SEMANTIC WEB

Filed under: Ontology,Semantic Web — Patrick Durusau @ 5:39 pm

9TH SUMMER SCHOOL ON ONTOLOGY ENGINEERING AND THE SEMANTIC WEB (SSSW 2012), 8-14 July, 2012, Cercedilla, near Madrid, Spain.

Applications open: 30 January 2012, close: 31 March 2012

from the webpage:

The groundbreaking SSSW series of summer schools started in 2003. It is now a well-establish event within the research community and a role model for several other initiatives. Presented by leading researchers in the field, it represents an opportunity for both students and practitioners to equip themselves with the range of theoretical, practical, and collaboration skills necessary for full engagement with the challenges involved in developing Ontologies and Semantic Web applications. To ensure a high ratio between tutors and students the school will be limited to 50 participants. Applications for the summer school will open on the 30th January 2012 and will close by the 31st March 2012.

From the very beginning the school pioneered an innovative pedagogical approach, combining the practical with the theoretical, and adding teamwork and a competitive element to the mix. Specifically, tutorial/lecture material is augmented with hands-on practical workshops and we ensure that the sessions complement each other by linking them to a group project. Work on developing and presenting a project in cooperation with other participants serves as a means of consolidating the knowledge and skills gained from lectures and practical sessions. It also introduces an element of competition among teams, as prizes are awarded to the best projects at the end of the week. Participants will be provided with electronic versions of all course lectures and all necessary tools and environments for the hands-on sessions. PC access with all tools pre-installed will be available on site as well. SSSW 2011 will provide a stimulating and enjoyable environment in which participants will benefit not only from the formal and practical sessions but also from informal and social interactions with established researchers and the other participants to the school. To further facilitate communication and feedback all attendees will present a poster on their research.

It may just be me but I never cared for conferences/meetings that were “near” major locations. Academic and professional meetings should be held at or near large international airports. People who want vacation junkets should become politicians.

November 7, 2011

Semantic Division of Labor

Filed under: Semantic Web,Semantics — Patrick Durusau @ 7:27 pm

I was talking to Sam Hunting the other day about identifying subjects in texts. Since we were talking about HTML pages, the use of an <a> element to surround PCDATA seems like a logical choice. Simple, easy, something users are accustomed to doing.

Sam mentioned that is cleaner than RDFa or any of its kin, which require additional effort, a good bit of additional effort, on the part of users. Which made me wonder why the extra effort? If a user has identified a subject, using an IRI, what more is necessary?

After all, if you identify a subject for me, you don’t have to push a lot of information along with the identification. If I want more information, in addition to the information I already have, that’s my responsibility to obtain it.

The scenario where you as a user contribute semantics, to the benefit of others, is a semantic division of labor.

What is really ironic, is you have to create the ontologies, invoke them in the correct way and then use a special syntax that works with their machines, to contribute your knowledge and/or identification of subjects. Not only do you get a beating, but you have to bring your own stick.

It isn’t hard to imagine a different division of labor. One where users identify their subjects using simple <a> elements and Wikipedia or other sources that seem useful to them. I am sure the chemistry folks have sources they would prefer as do other areas of activity.

If someone else wants to impose semantics on the identifications of those subjects, that is on their watch, not yours.

True, people will argue that you are contributing to the rise of an intelligent web by your efforts, etc. Sure, and the click tracking done by Google, merchandisers and others are designed to get better products into my hands for my benefit. You know, there are some things I won’t even politely pretend to believe.

People are not tracking information or semantics to benefit you. Sorry, I wish I could say that were different but its not. To paraphrase Wesley in the Princess Bride, “..anyone who says differently is selling something.”

October 28, 2011

Context and Semantics for Knowledge Management – … Personal Productivity [and Job Security]

Filed under: Context,Knowledge Management,Ontology,Semantic Web,Semantics — Patrick Durusau @ 3:13 pm

Context and Semantics for Knowledge Management – Technologies for Personal Productivity by Warren, Paul; Davies, John; Simperl, Elena (Eds.). 1st Edition., 2011, X, 392 p. 120 illus., 4 in color. Hardcover, ISBN 978-3-642-19509-9

I quite agree with the statement: “the fact that much corporate knowledge only resides in employees’ heads seriously hampers reuse.” True but it is also a source of job security. In organizations both large and small, in the U.S. and in other countries as well.

I don’t think any serious person believes the Pentagon (US) needs to have more than 6,000 HR systems. But, job security presents different requirements from say productivity, accomplishment of mission (aside from the mission of remaining employed), in this case, national defense, etc.

How one overcomes job security is going to vary from system to system. Be aware it is a non-technical issue and technology is not the answer to it. It is a management issue that management would like to treat as a technology problem. Treating personnel issues as problems that can be solved with technology nearly universally fails.

From the announcement:

Knowledge and information are among the biggest assets of enterprises and organizations. However, efficiently managing, maintaining, accessing, and reusing this intangible treasure is difficult. Information overload makes it difficult to focus on the information that really matters; the fact that much corporate knowledge only resides in employees’ heads seriously hampers reuse.

The work described in this book is motivated by the need to increase the productivity of knowledge work. Based on results from the EU-funded ACTIVE project and complemented by recent related results from other researchers, the application of three approaches is presented: the synergy of Web 2.0 and semantic technology; context-based information delivery; and the use of technology to support informal user processes. The contributions are organized in five parts. Part I comprises a general introduction and a description of the opportunities and challenges faced by organizations in exploiting Web 2.0 capabilities. Part II looks at the technologies, and also some methodologies, developed in ACTIVE. Part III describes how these technologies have been evaluated in three case studies within the project. Part IV starts with a chapter describing the principal market trends for knowledge management solutions, and then includes a number of chapters describing work complementary to ACTIVE. Finally, Part V draws conclusions and indicates further areas for research.

Overall, this book mainly aims at researchers in academia and industry looking for a state-of-the-art overview of the use of semantic and Web 2.0 technologies for knowledge management and personal productivity. Practitioners in industry will also benefit, in particular from the case studies which highlight cutting-edge applications in these fields.

October 22, 2011

DQM-Vocabulary

Filed under: Semantic Web,Vocabularies — Patrick Durusau @ 3:17 pm

DQM-Vocabulary announced by Christian Fürber:

The DQM-Vocabulary supports data quality management activities in Semantic Web architectures. It’s major strength is the ability to represent data requirements, i.e. prescribed (individual) directives or consensual agreements that define the content and/or structure that constitute high quality data instances and values, so that computers can interpret the requirements and take further actions. Among other things, the DQM-Vocabulary supports the following tasks:

  • Automated creation of data quality monitoring and assessment reports based on previously specified data requirements
  • Exchange of data quality information and data requirements on web-scale
  • Automated consistency checks between data requirements

The DQM-Vocabulary is available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license at http://purl.org/dqm-vocabulary/v1/dqm

A primer with examples on how to use the DQM-Vocabulary can be found at http://purl.org/dqm-vocabulary

A mailing list for issues and questions around the DQM-Vocabulary can be found at http://groups.google.com/group/dqm-vocabulary

Interesting work but I think pretty obviously only commercial interests are going to have an incentive to put in the time and effort to use such a system.

Which reminds me, how is this different from the OASIS Universal Business Language (UBL) activity? UBL has already been adopted in a number of countries, particularly for government contracts. They have specified the semantics that businesses need to automate some contractual matters.

I suppose more broadly, where is the commercial demand for the DQM-Vocabulary?

Are there identifiable activities that lack data quality management now, for which DQM will be a solution? If so, which ones?

If other data quality management solutions are in place, what advantages over current systems are offered by DQM? Are those sufficient to justify changing present systems?

Edelweiss

Filed under: Interface Research/Design,Ontology,Semantic Web — Patrick Durusau @ 3:17 pm

Edelweiss

From the website:

The research team Edelweiss aims at offering models, methods and techniques for supporting knowledge management and collaboration in virtual communities interacting with information resources through the Web. This research will result in an ergonomic graph-based and ontology-based platform.

This research will result in an ergonomic graph-based and ontology-based platform. Activity Report 2010
Located at INRIA Sophia Antipolis-Méditerranée, Edelweiss was previously known as Acacia.
Edelweiss stands for…

  • Exchanges : communication, diffusion, knowledge capitalization, reuse, learning.
  • Documents : texts, multimedia, XML.
  • Extraction : semi-automated information extraction from documents.
  • Languages : graphs, semantic web languages, logics.
  • Webs : architectures, diffusion, implementation.
  • Ergonomics : user interfaces, scenarios.
  • Interactions : interaction design, protocols, collaboration.
  • Semantics : ontologies, semantic annotations, formalisms, reasoning.
  • Servers : distributed services, distributed data, applications.

Good lists of projects, software, literature, etc.

Anyone care to share any longer acronyms in actual use at projects?

October 21, 2011

RDFa 1.1 Lite

Filed under: RDFa,Semantic Web — Patrick Durusau @ 7:27 pm

RDFa 1.1 Lite

From the post:

Summary: RDFa 1.1 Lite is a simple subset of RDFa consisting of the following attributes: vocab, typeof, property, rel, about and prefix.

During the schema.org workshop, a proposal was put forth by RDFa’s resident hero, Ben Adida, for a stripped down version of RDFa 1.1, called RDFa 1.1 Lite. The RDFa syntax is often criticized as having too much functionality, leaving first-time authors confused about the more advanced features. This lighter version of RDFa will help authors easily jump into the Linked Data world. The goal was to create a very minimal subset that will work for 80% of the folks out there doing simple markup for things like search engines.

I was struck by the line “…that will work for 80% of the folks out there doing simple markup for things like search engines.”

OK, so instead of people authoring content for the web, the target of RDFa 1.1 Lite targets 80% of SEOs?

Targeting people who try to game search engine algorithms? Not a terribly sympathetic group.

October 14, 2011

MongoGraph – MongoDB Meets the Semantic Web

Filed under: MongoDB,RDF,Semantic Web,SPARQL — Patrick Durusau @ 6:24 pm

MongoGraph – MongoDB Meets the Semantic Web

From the post (Franz Inc.):

Recorded Webcast: MongoGraph – MongoDB Meets the Semantic Web From October 12, 2011

MongoGraph is an effort to bring the Semantic Web to MongoDB developers. We implemented a MongoDB interface to AllegroGraph to give Javascript programmers both Joins and the Semantic Web. JSON objects are automatically translated into triples and both the MongoDB query language and SPARQL work against your objects.

Join us for this webcast to learn more about working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject. We’ll discuss the simplicity of the MongoDB interface for working with objects and all the properties of an advanced triplestore, in this case joins through SPARQL queries, automatic indexing of all attributes/values, ACID properties all packaged to deliver a simple entry into the world of the Semantic Web.

I haven’t watched the video, yet, but:

working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject.

certainly caught my eye.

Curious, if this means simply using the triples as sources of values and not “reasoning” with them?

October 9, 2011

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web

Filed under: Artificial Intelligence,P2P,Semantic Web — Patrick Durusau @ 6:43 pm

Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web by P. Adjiman, P. Chatalic, F. Goasdoue, M. C. Rousset, and L. Simon.

Abstract:

In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The main contribution of this paper is to provide the first consequence finding algorithm in a peer-to-peer setting: DeCA. It is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Another important contribution is to apply this general distributed reasoning setting to the setting of the Semantic Web through the Somewhere semantic peer-to-peer data management system. The last contribution of this paper is to provide an experimental analysis of the scalability of the peer-to-peer infrastructure that we propose, on large networks of 1000 peers.

Interesting research on its own but I was struck by the phrase: “but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary.

Can we say that our “peers,” share our “mappings”?

That is mappings between terms and our expectation of others with regard to those terms.

Not the mapping between the label and the subject for which it is a label.

Or is the second mapping encompassed in the first? Or merely a partial expression of the first? (That seems more likely.)

Not immediately applicable to anything but may be important in terms of the mappings we are seeking to capture.

October 5, 2011

Catalog QUDT

Filed under: Measurement,Ontology,Semantic Web — Patrick Durusau @ 6:54 pm

Catalog QUDT

From the website:

The QUDT, or ‘Quantity, Unit, Dimension and Type’ collection of ontologies define base classes, properties, and instances for modeling physical quantities, units of measure, and their dimensions in various measurement systems. The goal of the QUDT collection of models is to provide a machine-processable approach for specifying measurable quantities, units for measuring different kinds of quantities, the numerical values of quantities in different units of measure and the data structures and data types used to store and manipulate these objects in software. A simple treatment of units is separated from a full dimensional treatment of units. Vocabulary graphs will be used to organize units for different disciplines.

Useful in a number of domains. Comparison to other measurement ontology efforts should prove to be interesting.

In Defense of Ambiguity

Filed under: Ambiguity,RDF,Semantic Web — Patrick Durusau @ 6:52 pm

In Defense of Ambiguity by Patrick J. Hayes and Harry A. Halpin.

Abstract:

URIs, a universal identification scheme, are different from human names insofar as they can provide the ability to reliably access the thing identified. URIs also can function to reference a non-accessible thing in a similar manner to how names function in natural language. There are two distinctly different relationships between names and things: access and reference. To confuse the two relations leads to underlying problems with Web architecture. Reference is by nature ambiguous in any language. So any attempts by Web architecture to make reference completely unambiguous will fail on the Web. Despite popular belief otherwise, making further ontological distinctions often leads to more ambiguity, not less. Contrary to appeals to Kripke for some sort of eternal and unique identification, reference on the Web uses descriptions and therefore there is no unambiguous resolution of reference. On the Web, what is needed is not just a simple redirection, but a uniform and logically consistent manner of associating descriptions with URIs that can be done in a number of practical ways that should be made consistent.

Highly readable critique with passages such as:

There are two distinct relationships between names and things: reference and access. The architecture of the Web determines access, but has no direct influence on reference. Identifiers like URIs can be considered types of names. It is important to distinguish these two possible different relationships between a name and a thing.

1. accesses, meaning that the name provides a causal pathway to the thing, perhaps mediated by the Web.

2. refers to, meaning that the name is being used to mention the thing.

Current practice in Web Architecture uses “identifies” to mean both or either of these, apparently in the belief that they are synonyms. They are not, and to think of them as being the same is to be profoundly confused. For example, when uttering the name “Eiffel Tower” one does not in anyway get magically transported to the Eiffel Tower. One can talk about it, have beliefs, plan a trip there, and otherwise have intentions about the Eiffel Tower, but the name has no causal path to the Eiffel Tower itself. In contrast, the URI http://www.tour-eiffel.fr/ offers us access to a group of Web pages via an HTTP-compliant agent. A great deal of the muddle Web architecture finds itself in can be directly traced to this confusion between access and reference.

The solution proffered by Hayes and Halpin:

Regardless of the details, the use of any technology in Web architecture to distinguish between access and reference, including our proposed ex:refersTo and ex:describedBy, does nothing more than allow the author of a URI to explain how they would like the URI to be used.

For those interested in previous recognitions of this distinction, see <resourceRef> and <subjectIndicatorRef> in XTM 1.0.

October 4, 2011

Efficient Multidimensional Blocking for Link Discovery without losing Recall

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 7:57 pm

Efficient Multidimensional Blocking for Link Discovery without losing Recall

Jack Park did due diligence on the SILK materials before I did and forwarded a link to this paper.

Abstract:

Over the last three years, an increasing number of data providers have started to publish structured data according to the Linked Data principles on the Web. The resulting Web of Data currently consists of over 28 billion RDF triples. As the Web of Data grows, there is an increasing need for link discovery tools which scale to very large datasets. In record linkage, many partitioning methods have been proposed which substantially reduce the number of required entity comparisons. Unfortunately, most of these methods either lead to a decrease in recall or only work on metric spaces. We propose a novel blocking method called Multi-Block which uses a multidimensional index in which similar objects are located near each other. In each dimension the entities are indexed by a different property increasing the efficiency of the index significantly. In addition, it guarantees that no false dismissals can occur. Our approach works on complex link specifications which aggregate several di fferent similarity measures. MultiBlock has been implemented as part of the Silk Link Discovery Framework. The evaluation shows a speedup factor of several 100 for large datasets compared to the full evaluation without losing recall.

From deeper in the paper:

If the similarity between two entities exceeds a threshold $\theta$, a link between these two entities is generated. $sim$ is computed by evaluating a link specification $s$ (in record linkage typically called linkage decision rule [23]) which specifies the conditions two entities must fulfi ll in order to be interlinked.

If I am reading this paper correctly, there isn’t a requirement (as in record linkage) that we normalized the data to a common format before writing the rule for comparisons. That in and of itself is a major boon. To say nothing of the other contributions of this paper.

SILK – Link Discovery Framework Version 2.5 released

Filed under: Linked Data,LOD,RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:54 pm

SILK – Link Discovery Framework Version 2.5 released

I was quite excited to see under “New Data Transformations”…”Merge Values of different inputs.”

But the documentation for Transformation must be lagging behind or I have a different understanding of what it means to “Merge Values of different inputs.”

Perhaps I should ask: What does SILK mean by “Merge Values of different inputs?”

Picking out an issue that is of particular interest to me is not meant to be a negative comment on the project. An impressive bit of work for any EU funded project.

Another question: Has anyone looked at the SILK- Link Specification Language (SILK-LSL) as an input into declaring equivalence/processing for arbitrary data objects? Just curious.

Robert Isele posted this announcement about SILK on October 3, 2011:

we are happy to announce version 2.5 of the Silk Link Discovery Framework for the Web of Data.

The Silk framework is a tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify the linkage rules data items must fulfill in order to be interlinked. These linkage rules may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language.

Linkage rules can either be written manually or developed using the Silk Workbench. The Silk Workbench, is a web application which guides the user through the process of interlinking different data sources.

Version 2.5 includes the following additions to the last major release 2.4:

(1) Silk Workbench now includes a function to learn linkage rules from the reference links. The learning function is based on genetic programming and capable of learning complex linkage rules. Similar to a genetic algorithm, genetic programming starts with a randomly created population of linkage rules. From that starting point, the algorithm iteratively transforms the population into a population with better linkage rules by applying a number of genetic operators. As soon as either a linkage rule with a full f-Measure has been found or a specified maximum number of iterations is reached, the algorithm stops and the user can select a linkage rule.

(2) A new sampling tab allows for fast creation of the reference link set. It can be used to bootstrap the learning algorithm by generating a number of links which are then rated by the user either as correct or incorrect. In this way positive and negative reference links are defined which in turn can be used to learn a linkage rule. If a previous learning run has already been executed, the sampling tries to generate links which contain features which are not yet covered by the current reference link set.

(2) The new help sidebar provides the user with a general description of the current tab as well as with suggestions for the next steps in the linking process. As new users are usually not familiar with the steps involved in interlinking two data sources, the help sidebar currently provides basic guidance to the user and will be extended in future versions.

(3) Introducing per-comparison thresholds:

  • On popular request, thresholds can now be specified on each comparison.
  • Backwards-compatible: Link specifications using a global threshold can still be executed.

(4) New distance measures:

  • Jaccard Similarity
  • Dice’s coefficient
  • DateTime Similarity
  • Tokenwise Similarity, contributed by Florian Kleedorfer, Research Studios Austria

(5) New data transformations:

  • RemoveEmptyValues
  • Tokenizer
  • Merge Values of multiple inputs

(6) New DataSources and Outputs

  • In addition to reading from SPARQL endpoints, Silk now also supports reading from RDF dumps in all common formats. Currently the data set is held in memory and it is not available in the Workbench yet, but future versions will improve this.
  • New SPARQL/Update Output: In addition to writing the links to a file, Silk now also supports writing directly to a triple store using SPARQL/Update.

(7) Various improvements and bugfixes

———————————————————————————

More information about the Silk Link Discovery Framework is available at:

http://www4.wiwiss.fu-berlin.de/bizer/silk/

The Silk framework is provided under the terms of the Apache License, Version 2.0 and can be downloaded from:

http://www4.wiwiss.fu-berlin.de/bizer/silk/releases/

The development of Silk was supported by Vulcan Inc. as part of its Project Halo (www.projecthalo.com) and by the EU FP7 project LOD2-Creating Knowledge out of Interlinked Data (http://lod2.eu/, Ref. No. 257943).

Thanks to Christian Becker, Michal Murawicki and Andrea Matteini for contributing to the Silk Workbench.

September 29, 2011

Beyond the Triple Count

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 6:38 pm

Beyond the Triple Count by Leigh Dodds.

From the post:

I’ve felt for a while now that the Linked Data community has an unhealthy fascination on triple counts, i.e. on the size of individual datasets.

This was quite natural in the boot-strapping phase of Linked Data in which we were primarily focused on communicating how much data was being gathered. But we’re now beyond that phase and need to start considering a more nuanced discussion around published data.

If you’re a triple store vendor then you definitely want to talk about the volume of data your store can hold. After all, potential users or customers are going to be very interested in how much data could be indexed in your product. Even so, no-one seriously takes a headline figure at face value. As users we’re much more interested in a variety of other factors. For example how long does it take to load my data? Or, how well does a store perform with my usage profile, taking into account my hardware investment? Etc. This is why we have benchmarks, so we can take into account additional factors and more easily compare stores across different environments.

But there’s not nearly enough attention paid to other factors when evaluating a dataset. A triple count alone tells us nothing. They’re not even a good indicator of the number of useful “facts” in a dataset.

Watch Leigh’s presentation (embedded with his post) and read the post.

I think his final paragraph sets the goal for a wide variety of approaches, however we might disagree about how to best get there! 😉

Very much worth your time to read and ponder.

September 24, 2011

SWJ-SoM 2012 : Semantic Web Journal – Special Issue on The Semantics of Microposts

Filed under: Semantic Web,Semantics — Patrick Durusau @ 6:58 pm

SWJ-SoM 2012 : Semantic Web Journal – Special Issue on The Semantics of Microposts

Dates:

Submission Deadline Nov 15, 2011
Notification Due Jan 15, 2012

From the call:

The aim of this special issue is to publish a collection of papers covering the range of topics relevant to the analysis, use and reuse of Micropost data. This should cover a wide scope of work that represents current efforts in the fields collaborating with the Semantic Web community to address the challenges identified for the extraction of semantics in Microposts, and the development of intuitive, effective tools that make use of the rich, collective knowledge. We especially solicit new research in the field that explores the particular challenges due to, and the influence of the mainstream user, as compared to publication and management by technical experts.

Additionally, we encourage revised versions of research papers and practical demonstrations presented at relevant workshops, symposia and conferences, extended to increase depth and review the authors’ own and other relevant work, and take into account also feedback from discussions and panels at such events.

Perhaps starting with “microposts” will allow researchers to work their way up to the semantics of full texts? Personally I am betting on semantics to continue to be the clear winner that refuses to “fit” into various models and categories. We can create useful solutions but that isn’t the same thing as mastering semantics.

September 23, 2011

Facebook and the Semantic Web

Filed under: Linked Data,Semantic Web — Patrick Durusau @ 7:42 am

Jesse Weaver, Ph.D. Student, Patroon Fellow, Tetherless World Constellation, Rensselaer Polytechnic Institute, http://www.cs.rpi.edu/~weavej3/, announces that:

I would like to bring to subscribers’ attention that Facebook now supports RDF with Linked Data URIs from its Graph API. The RDF is in Turtle syntax, and all of the HTTP(S) URIs in the RDF are dereferenceable in accordance with httpRange-14. Please take some time to check it out.

If you have a vanity URL (mine is jesserweaver), you can get RDF about you:

curl -H ‘Accept: text/turtle’ http://graph.facebook.com/
curl -H ‘Accept: text/turtle’ http://graph.facebook.com/jesserweaver

If you don’t have a vanity URL but know your Facebook ID, you can use that instead (which is actually the fundamental method).

curl -H ‘Accept: text/turtle’ http://graph.facebook.com/
curl -H ‘Accept: text/turtle’ http://graph.facebook.com/1340421292

From there, try dereferencing URIs in the Turtle. Have fun!

And I thought everyone had moved to that other service and someone left the lights on at Facebook. 😉

No flames! Just kidding.

September 13, 2011

3rd Canadian Semantic Web Symposium

Filed under: Biomedical,Concept Detection,Ontology,Semantic Web — Patrick Durusau @ 7:17 pm

CSWS2011: The 3rd Canadian Semantic Web Symposium Proceedings of the 3rd Canadian Semantic Web Symposium
Vancouver, British Columbia, Canada, August 5, 2011

An interesting set of papers! I suppose I can be forgiven for looking at the text mining (Hassanpour & Das) and heterogeneous information systems (Khan, Doucette, and Cohen) papers first. 😉 More comments to follow on those.

What are your favorite papers in this batch and why?

The whole proceedings can also be downloaded as a single PDF file.

Edited by:

Christopher J. O. Baker *
Helen Chen **
Ebrahim Bagheri ***
Weichang Du ****

* University of New Brunswick, Saint John, NB, Canada, Department of Computer Science & Applied Statistics
** University of Waterloo, Waterloo, ON, Canada, School of Public Health and Health Systems
*** Athabasca University, School of Computing and Information Systems
**** University of New Brunswick, NB, Canada, Faculty of Computer Science

Table of Contents

Full Paper

  1. The Social Semantic Subweb of Virtual Patient Support Groups
    Harold Boley, Omair Shafiq, Derek Smith, Taylor Osmun
  2. Leveraging SADI Semantic Web Services to Exploit Fish Ecotoxicology Data
    Matthew M. Hindle, Alexandre Riazanov, Edward S. Goudreau, Christopher J. Martyniuk, Christopher J. O. Baker
  3. Short Paper

  4. Towards Evaluating the Impact of Semantic Support for Curating the Fungus Scientic Literature
    Marie-Jean Meurs, Caitlin Murphy, Nona Naderi, Ingo Morgenstern, Carolina Cantu, Shary Semarjit, Greg Butler, Justin Powlowski, Adrian Tsang, René Witte
  5. Ontology based Text Mining of Concept Definitions in Biomedical Literature
    Saeed Hassanpour, Amar K. Das
  6. Social and Semantic Computing in Support of Citizen Science
    Joel Sachs, Tim Finin
  7. Unresolved Issues in Ontology Learning
    Amal Zouaq, Dragan Gaševic, Marek Hatala
  8. Poster

  9. Towards Integration of Semantically Enabled Service Families in the Cloud
    Marko Boškovic, Ebrahim Bagheri, Georg Grossmann, Dragan Gaševic, Markus Stumptner
  10. SADI for GMOD: Semantic Web Services for Model Organism Databases
    Ben Vandervalk, Michel Dumontier, E Luke McCarthy, Mark D Wilkinson
  11. An Ontological Approach for Querying Distributed Heterogeneous Information Systems
    Atif Khan, John A. Doucette, Robin Cohen

Please see the CSWS2011 website for further details.

September 11, 2011

New Challenges in Distributed Information Filtering and Retrieval

New Challenges in Distributed Information Filtering and Retrieval

Proceedings of the 5th International Workshop on New Challenges in Distributed Information Filtering and Retrieval
Palermo, Italy, September 17, 2011.

Edited by:

Cristian Lai – CRS4, Loc. Piscina Manna, Building 1 – 09010 Pula (CA), Italy

Giovanni Semeraro – Dept. of Computer Science, University of Bari, Aldo Moro, Via E. Orabona, 4, 70125 Bari, Italy

Eloisa Vargiu – Dept. of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123 Cagliari, Italy

Table of Contents:

  1. Experimenting Text Summarization on Multimodal Aggregation
    Giuliano Armano, Alessandro Giuliani, Alberto Messina, Maurizio Montagnuolo, Eloisa Vargiu
  2. From Tags to Emotions: Ontology-driven Sentimental Analysis in the Social Semantic Web
    Matteo Baldoni, Cristina Baroglio, Viviana Patti, Paolo Rena
  3. A Multi-Agent Decision Support System for Dynamic Supply Chain Organization
    Luca Greco, Liliana Lo Presti, Agnese Augello, Giuseppe Lo Re, Marco La Cascia, Salvatore Gaglio
  4. A Formalism for Temporal Annotation and Reasoning of Complex Events in Natural Language
    Francesco Mele, Antonio Sorgente
  5. Interaction Mining: the new Frontier of Call Center Analytics
    Vincenzo Pallotta, Rodolfo Delmonte, Lammert Vrieling, David Walker
  6. Context-Aware Recommender Systems: A Comparison Of Three Approaches
    Umberto Panniello, Michele Gorgoglione
  7. A Multi-Agent System for Information Semantic Sharing
    Agostino Poggi, Michele Tomaiuolo
  8. Temporal characterization of the requests to Wikipedia
    Antonio J. Reinoso, Jesus M. Gonzalez-Barahona, Rocio Muñoz-Mansilla, Israel Herraiz
  9. From Logical Forms to SPARQL Query with GETARUN
    Rocco Tripodi, Rodolfo Delmonte
  10. ImageHunter: a Novel Tool for Relevance Feedback in Content Based Image Retrieval
    Roberto Tronci, Gabriele Murgia, Maurizio Pili, Luca Piras, Giorgio Giacinto

September 2, 2011

Mining Associations and Patterns from Semantic Data

Filed under: Conferences,Data Mining,Pattern Matching,Pattern Recognition,Semantic Web — Patrick Durusau @ 7:52 pm

The editors of a special issue of the International Journal on Semantic Web and Information Systems on Mining Associations and Patterns from Semantic Data have issued the following call for papers:

Guest editors: Kemafor Anyanwu, Ying Ding, Jie Tang, and Philip Yu

Large amounts of Semantic Data is being generated through semantic extractions from and annotation of traditional Web, social and sensor data. Linked Open Data has provided excellent vehicle for representation and sharing of such data. Primary vehicle to get semantics useful for better integration, search and decision making is to find interesting relationships or associations, expressed as meaningful paths, subgraphs and patterns. This special issue seeks theories, algorithms and applications of extracting such semantic relationships from large amount of semantic data. Example topics include:

  • Theories to ground associations and patterns with social, socioeconomic, biological semantics
  • Representation (e.g. language extensions) to express meaningful relationships and patterns
  • Algorithms to efficiently compute and mine semantic associations and patterns
  • Techniques for filtering, ranking and/or visualization of semantic associations and patterns
  • Application of semantic associations and patterns in a domain with significant social or society impact

IJSWIS is included in most major indices including CSI, with Thomson Scientific impact factor 2.345. We seek high quality manuscripts suitable for an archival journal based on original research. If the manuscript is based on a prior workshop or conference submission, submissions should reflect significant novel contribution/extension in conceptual terms and/or scale of implementation and evaluation (authors are highly encouraged to clarify new contributions in a cover letter or within the submission).

Important Dates:
Submission of full papers: Feb 29, 2012
Notification of paper acceptance: May 30, 2012
Publication target: 3Q 2012

Details of the journal, manuscript preparation, and recent articles are available on the website:
http://www.igi-global.com/bookstore/titledetails.aspx?titleid=1092 or http://ijswis.org

Guest Editors: Prof. Kemafor Anyanwu, North Carolina State University
Prof. Ying Ding, Indiana University
Prof. Jie Tang, Tsinghua University
Prof. Philip Yu, University of Illinois, Chicago
Contact Guest Editor: Ying Ding <dingying@indiana.edu>

August 31, 2011

Semantic Web Journal – Vol. 2, Number 2 / 2011

Filed under: OWL,RDF,Semantic Web — Patrick Durusau @ 7:48 pm

Semantic Web Journal – Vol. 2, Number 2 / 2011

Just in case you want to send someone the link to a particular article:

Semantic Web surveys and applications
DOI 10.3233/SW-2011-0047 Authors Pascal Hitzler and Krzysztof Janowicz

Taking flight with OWL2
DOI 10.3233/SW-2011-0048 Author Michel Dumontier

Comparison of reasoners for large ontologies in the OWL 2 EL profile
DOI 10.3233/SW-2011-0034 Authors Kathrin Dentler, Ronald Cornet, Annette ten Teije and Nicolette de Keizer

Approaches to visualising Linked Data: A survey
DOI 10.3233/SW-2011-0037 Authors Aba-Sah Dadzie and Matthew Rowe

Is Question Answering fit for the Semantic Web?: A survey
DOI 10.3233/SW-2011-0041 Authors Vanessa Lopez, Victoria Uren, Marta Sabou and Enrico Motta

FactForge: A fast track to the Web of data
DOI 10.3233/SW-2011-0040 Authors Barry Bishop, Atanas Kiryakov, Damyan Ognyanov, Ivan Peikov, Zdravko Tashev and Ruslan Velkov

August 23, 2011

Chemical Entity Semantic Specification:…(article)

Filed under: Cheminformatics,RDF,Semantic Web — Patrick Durusau @ 6:37 pm

Chemical Entity Semantic Specification: Knowledge representation for efficient semantic cheminformatics and facile data integration by Leonid L Chepelev and Michel Dumontier, Journal of Cheminformatics 2011, 3:20doi:10.1186/1758-2946-3-20.

Abstract

Background
Over the past several centuries, chemistry has permeated virtually every facet of human lifestyle, enriching fields as diverse as medicine, agriculture, manufacturing, warfare, and electronics, among numerous others. Unfortunately, application-specific, incompatible chemical information formats and representation strategies have emerged as a result of such diverse adoption of chemistry. Although a number of efforts have been dedicated to unifying the computational representation of chemical information, disparities between the various chemical databases still persist and stand in the way of cross-domain, interdisciplinary investigations. Through a common syntax and formal semantics, Semantic Web technology offers the ability to accurately represent, integrate, reason about and query across diverse chemical information.

Results
Here we specify and implement the Chemical Entity Semantic Specification (CHESS) for the representation of polyatomic chemical entities, their substructures, bonds, atoms, and reactions using Semantic Web technologies. CHESS provides means to capture aspects of their corresponding chemical descriptors, connectivity, functional composition, and geometric structure while specifying mechanisms for data provenance. We demonstrate that using our readily extensible specification, it is possible to efficiently integrate multiple disparate chemical data sources, while retaining appropriate correspondence of chemical descriptors, with very little additional effort. We demonstrate the impact of some of our representational decisions on the performance of chemically-aware knowledgebase searching and rudimentary reaction candidate selection. Finally, we provide access to the tools necessary to carry out chemical entity encoding in CHESS, along with a sample knowledgebase.

Conclusions
By harnessing the power of Semantic Web technologies with CHESS, it is possible to provide a means of facile cross-domain chemical knowledge integration with full preservation of data correspondence and provenance. Our representation builds on existing cheminformatics technologies and, by the virtue of RDF specification, remains flexible and amenable to application- and domain-specific annotations without compromising chemical data integration. We conclude that the adoption of a consistent and semantically-enabled chemical specification is imperative for surviving the coming chemical data deluge and supporting systems science research.

Project homepage: Chemical Entity Semantic Specification

August 20, 2011

Linked Data Patterns – New Draft

Filed under: Linked Data,Semantic Web — Patrick Durusau @ 8:04 pm

Linked Data Patterns – New Draft – Leigh Dobbs and Ian Davis have released a new draft.

From the website:

A pattern catalogue for modelling, publishing, and consuming Linked Data.

Think of it as Linked Data without all the Put your hand on your computer and feel the power of URI stuff you hear in some quarters.

For example, the solution for for How do we publish non-global identifiers in RDF? is:

Create a custom property, as a sub-class of the dc:identifier property for relating the existing literal key value with the resource.

And the discussion reads:

While hackable URIs are a useful short-cut they don’t address all common circumstances. For example different departments within an organization may have different non-global identifiers for a resource; or the process and format for those identifiers may change over time. The ability to algorithmically derive a URI is useful but limiting in a global sense as knowledge of the algorithm has to be published separately to the data.

By publishing the original “raw” identifier as a literal property of the resource we allow systems to look-up the URI for the associated resource using a simple SPARQL query. If multiple identifiers have been created for a resource, or additional identifiers assigned over time, then these can be added as additional repeated properties.

For systems that may need to bridge between the Linked Data and non-Linked Data views of the world, e.g. integrating with legacy applications and databases that do not store the URI, then the ability to find the identifier for the resource provides a useful integration step.

If I aggregate the non-Linked Data identifiers as sub-classes of dc:identifier, isn’t that a useful integration step whether I am using Linked Data or not?

The act of aggregating identifiers is a useful integration step, by whatever syntax. Yes?

My principal disagreement with Linked Data and other “universal” identification systems is that none of them are truly universal or long lasting. Rhetoric to the contrary notwithstanding.

August 13, 2011

Chemical Entity Semantic Specification

Filed under: Cheminformatics,RDF,Semantic Web — Patrick Durusau @ 3:46 pm

Chemical Entity Semantic Specification

From the website:

Chemical Entity Semantic Specification (CHESS) framework strives to provide a means of representing chemical data with the goal of facile chemical information federation and addressing increasingly rich and complex queries for biological, pharmaceutical, and synthetic chemistry applications. The principal emphasis of CHESS is data representation to assist in metabolic fate determination, synthetic pathway construction, and automatic chemical entity classification. With explicit semantic specification of reactions for example, CHESS allows the tracing of the mechanisms of chemical transformations on the level of individual atoms, bonds, functional groups, or molecules, as well as the individual “histories” of elements of chemical entities in a pathway. Further, the CHESS framework draws on CHEMINF and SIO ontologies to provide methods for specifying uncertainty, conformer-specific information, units, and circumstances for physical measurements at variable levels of granularity, permitting rich, cross-domain queries over this data. In addition to this, CHESS provides a set of specifications to address data federation through the adoption of unique, canonical identifiers for many classes of chemical entities.

Interesting project but appears to lack uptake.

As of 13 August 2011, I get nine (9) “hits” from a popular search engine on the name as a string.

Useful as a resource for existing ontologies and identification schemes.

August 10, 2011

LOD cloud diagram – Next Version

Filed under: Linked Data,LOD,Semantic Web — Patrick Durusau @ 7:17 pm

Anja Jentsch posted the following call on the public-lod@w3.org list:

we would like to thank you for putting so much effort in curating the CKAN packages for Linked Data sets since our last call.

We have compiled statistics for the 256 data sets[1] on CKAN that will be included in the next LOD Cloud: http://lod-cloud.net/state

Altogether 446 data sets are currently tagged on CKAN as LOD [2]. But the description of many of these data sets is still incomplete so that we can not find out whether they fulfil the minimal requirements for being included into the LOD cloud diagram (dereferencable URIs and RDF links to or from other data sources).

A list of data sets that could not include yet and an explanation of what is missing can be found here: http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

Starting next week we will generate the next LOD cloud diagram [3].

Therefore we would like to invite those of you who publish data sets that we could not include yet to please review and update your entries. Please finalize your dataset descriptions until August 15th to ensure that your data set will be part of the LOD Cloud.

In order to aid you in this quest, we have provided a validation page for your CKAN entry with step-by-step guidance for the information needed:
http://www4.wiwiss.fu-berlin.de/lodcloud/ckan/validator/

You can use the CKAN entry for DBpedia as an example:
http://ckan.net/package/dbpedia

Thank you for helping!

Cheers,
Anja, Chris and Richard

[1] http://ckan.net/package/search?q=groups:lodcloud+AND+-tags:lodcloud.unconnected+AND+-tags:lodcloud.needsfixing
[2] http://ckan.net/tag/lod
[3] http://lod-cloud.net/

Just a reminder, today is the 10th of August so don’t wait to review your entry.

Whatever your approach, we all benefit from cleaner data.

July 31, 2011

4th International SWAT4LS Workshop

Filed under: Conferences,Semantic Web — Patrick Durusau @ 7:50 pm

4th International SWAT4LS Workshop Semantic Web Applications and Tools for Life Sciences

December 9th, 2011 London, UK

Important Dates:

  • Expression of interest for turorials: 10 June 2011
  • Submission openinig: 12 September 2011
  • Papers submission deadline: 7 October 2011
  • Posters and demo submission deadline: 31 October 2011
  • Communication of acceptance: 7 November 2011
  • Camera ready: 21 November 2011

From the Call for Papers:

Since 2008, SWAT4LS is a workshop that has provided a platform for the presentation and discussion of the benefits and limits of applying web-based information systems and semantic technologies in Biomedical Informatics and Computational Biology.

Growing steadily each year as Semantic Web applications become more widespread, SWAT4LS has been in Edinburgh 2008, Amsterdam 2009, and Berlin 2010, with London planned for 2011. The last edition of SWAT4LS was held in Berlin, on December 10th, 2010. It was preceded by two days of tutorials and other associated events.

We are confident that the next edition of SWAT4LS will provide the same open and stimulating environment that brought together researchers, both developers and users, from the various fields of Biology, Bioinformatics and Computer Science, to discuss goals, current limits and real experiences in the use of Semantic Web technologies in Life Sciences.

Proceedings from earlier workshops:

1st International SWAT4LS Workshop (Edinburgh, 2008)

2nd International SWAT4LS Workshop (Amsterdam, 2009) Be aware that selected papers were revised and extended to appear in the Journal of Biomedical Semantics, Volume 2, Supplement 1.

3rd International SWAT4LS Workshop (Berlin, 2010)

Take it as fair warning, there is a lot of interesting material here. Come prepared to stay a while.

July 30, 2011

GDB for the Data Driven Age (STI Summit Position Paper)

Filed under: Graphs,OWL,RDF,Semantic Diversity,Semantic Web — Patrick Durusau @ 9:10 pm

GDB for the Data Driven Age (STI Summit Position Paper) by Orri Erling.

From the post:

The Semantic Technology Institute (STI) is organizing a meeting around the questions of making semantic technology deliver on its promise. We were asked to present a position paper (reproduced below). This is another recap of our position on making graph databasing come of age. While the database technology matters are getting tackled, we are drawing closer to the question of deciding actually what kind of inference will be needed close to the data. My personal wish is to use this summit for clarifying exactly what is needed from the database in order to extract value from the data explosion. We have a good idea of what to do with queries but what is the exact requirement for transformation and alignment of schema and identifiers? What is the actual use case of inference, OWL or other, in this? It is time to get very concrete in terms of applications. We expect a mixed requirement but it is time to look closely at the details.

Interesting post that includes the following observation:

Real-world problems are however harder than just bundling properties, classes, or instances into sets of interchangeable equivalents, which is all we have mentioned thus far. There are differences of modeling (“address as many columns in customer table” vs. “address normalized away under a contact entity”), normalization (“first name” and “last name” as one or more properties; national conventions on person names; tags as comma-separated in a string or as a one-to-many), incomplete data (one customer table has family income bracket, the other does not), diversity in units of measurement (Imperial vs. metric), variability in the definition of units (seven different things all called blood pressure), variability in unit conversions (currency exchange rates), to name a few. What a world!

Yes, quite.

Worth a very close read.

July 27, 2011

Learning SPARQL

Filed under: RDF,Semantic Web,SPARQL — Patrick Durusau @ 8:35 am

Learning SPARQL by Bob DuCharme.

From the author’s announcement (email):

It’s the only complete book on the W3C standard query language for linked data and the semantic web, and as far as I know the only book at all that covers the full range of SPARQL 1.1 features such as the ability to update data. The book steps you through simple examples that can all be performed with free software, and all sample queries, data, and output are available on the book’s website.

In the words of one reviewer, “It’s excellent—very well organized and written, a completely painless read. I not only feel like I understand SPARQL now, but I have a much better idea why RDF is useful (I was a little skeptical before!)” I’d like to thank everyone who helped in the review process and everyone who offered to help, especially those in the Charlottesville/UVa tech community.

You can follow news about the book and about SPARQL on Twitter at @learningsparql.

Remembering Bob’s “SGML CD,” I ordered a copy (electronic and print) of “Learning SPARQL” as soon as I saw the announcement in my inbox.

More comments to follow.

« Newer PostsOlder Posts »

Powered by WordPress