Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 12, 2012

Cross Domain Search by Exploiting Wikipedia

Filed under: Linked Data,Searching,Wikipedia — Patrick Durusau @ 8:04 pm

Cross Domain Search by Exploiting Wikipedia by Chen Liu, Sai Wu, Shouxu Jiang, and Anthony K. H. Tung.

Abstract:

The abundance of Web 2.0 resources in various media formats calls for better resource integration to enrich user experience. This naturally leads to a new cross domain resource search requirement, in which a query is a resource in one modal and the results are closely related resources in other modalities. With cross domain search, we can better exploit existing resources.

Intuitively, tags associated with Web 2.0 resources are a straightforward medium to link resources with different modality together. However, tagging is by nature an ad hoc activity. They often contain noises and are affected by the subjective inclination of the tagger. Consequently, linking resources simply by tags will not be reliable. In this paper, we propose an approach for linking tagged resources to concepts extracted from Wikipedia, which has become a fairly reliable reference over the last few years. Compared to the tags, the concepts are therefore of higher quality. We develop effective methods for cross-modal search based on the concepts associated with resources. Extensive experiments were conducted, and the results show that our solution achieves good performance.

When the authors say “cross domain,” they are referring to different types of resources, say text vs. images or images vs. sound or any of those three vs. some other type of resource. One search can return “related” resources of different resource types.

Although the “cross domain” searching is interesting, I am more interested in the mapping that was performed on Wikipedia. The authors define three semantic relationships:

  • Link between Tag and Concept
  • Correlation of Concepts
  • Semantic Distance

It seems to me that the author’s are attacking “big data,” which has unbounded semantics from the “other” end. That is they are mapping a finite universe of semantics (Wikipedia) and then using that finite mapping to mine a much larger, unbounded semantic universe.

Or perhaps creating a semantic lens through which to view “related resources” in a much larger semantic universe. And without the overhead of Linked Data, which is mentioned under other work.

February 22, 2012

Meronymy SPARQL Database Server To Debut With Emphasis on High Performance

Filed under: Linked Data,Meronymy,SPARQL — Patrick Durusau @ 4:47 pm

Meronymy SPARQL Database Server To Debut With Emphasis on High Performance

From the post:

Coming in June from start-up Meronymy is a new RDF enterprise database management system, the Meronymy SPARQL Database Server. The company, founded by Inge Henriksen, began life because of the need he saw for a high-performance and more scalable RDF database server.

The idea to focus on a database server exclusively oriented to Linked Data and the Semantic Web came as a result of Henriksen’s work over the last decade as an IT consultant implementing many semantic solutions for customers in sectors such as government and education. “One issue that always came up was performance,” he explains, especially when performing more advanced SPARQL queries against triple stores using filters, for example.

“Once the data reached a certain size, which it often did very quickly, the size of the data became unmanageable and we had to fall back on caching and the like to resolve these performance issues.” The problem there is that caching isn’t compatible with situations where there is a need for real-time data.

A closed beta is due out soon. Register at Meronymy.

February 15, 2012

From Relational Databases to Linked Data in Epigraphy: Hispania Epigraphica Online

Filed under: Linked Data — Patrick Durusau @ 8:34 pm

From Relational Databases to Linked Data in Epigraphy: Hispania Epigraphica Online by Fernando-Luis Álvarez, Joaquín-L. Gómez-Pantoja and Elena García-Barriocanal.

Abstract:

Epigraphic databases store metadata and digital representations of inscriptions for information purposes, heritage conservation or scientific use. At present, there are several of such databases available, but our focus is on those that are part of the EAGLE consortium, which aims to make available the epigraphy from the ancient classical civilization. Right now, the EAGLE partners share a basic data schema and an agreement on workload and responsibilities, but each repository has it own storage structure, data identification system and even its different idea of what an epigraphic database is or should be. Any of these aspects may lead to redundancy and hampers search and linking. This paper describes a system implementation for epigraphic data sharing as linked data. Although the described system was tested on a specific database, i.e. Hispania Epigraphica Online, it could be easily tailored to other systems, enabling the advantage of semantic search on several disparate databases.

Good work but isn’t it true that most approaches, “…could be easily tailored to other systems, enabling the advantage of semantic search over several disparate databases”?

That is the ability to query disparate databases as disparate databases that continues to elude us.

Isn’t that the question that we need to answer? Yes?

OWL: Yet to Arrive on the Web of Data?

Filed under: Linked Data,OWL,Semantic Web — Patrick Durusau @ 8:33 pm

OWL: Yet to Arrive on the Web of Data? by Angela Guess.

From the post:

A new paper is currently available for download entitled OWL: Yet to arrive on the Web of Data? The paper was written by Birte Glimm, Aidan Hogan, Markus Krötzsch, and Axel Polleres. The abstract states, “Seven years on from OWL becoming a W3C recommendation, and two years on from the more recent OWL 2 W3C recommendation, OWL has still experienced only patchy uptake on the Web. Although certain OWL features (like owl:sameAs) are very popular, other features of OWL are largely neglected by publishers in the Linked Data world.”

It continues, “This may suggest that despite the promise of easy implementations and the proposal of tractable profiles suggested in OWL’s second version, there is still no “right” standard fragment for the Linked Data community. In this paper, we (1) analyse uptake of OWL on the Web of Data, (2) gain insights into the OWL fragment that is actually used/usable on the Web, where we arrive at the conclusion that this fragment is likely to be a simplified profile based on OWL RL, (3) propose and discuss such a new fragment, which we call OWL LD (for Linked Data).”

Interesting and perhaps valuable data about the use of RDFS/OWL primitives on the Web.

I find it curious that the authors don’t survey users about what OWL capabilities they would find compelling. It could be that users are interested in and willing to support some subset of OWL that hasn’t been considered by the authors or others.

Might not be the Semantic Web as the authors envision it, but without broad user support, the author’s Semantic Web will never come to pass.

February 2, 2012

Introducing Hypernotation, an alternative to Linked Data

Filed under: Hypernotation,Linked Data,Semantic Web — Patrick Durusau @ 3:49 pm

Introducing Hypernotation, an alternative to Linked Data

A competing notation to Linked Data:

From the post:

URL, URI, IRI, URIref, CURIE, QName, slash URIs, hash URIs, bnodes, information resources, non-information resources, dereferencability, HTTP 303, redirection, content-negotiation, RDF model, RDF syntax, RDFa core, RDFa lite, Microdata, Turtle, N3, RDF/XML, JSON-LD, RDF/JSON…

Want to publish some data? Well, these are some of the things you will have to learn and understand to do so. Is the concept of data really so hard that you can’t publish it without understanding the concepts of information and non-information resources? Do you really need to deal with the HTTP 303 redirection and a number of different syntaxes? It’s just data, damn it!

Really, how have we got to this?

I did a detailed analysis on the problems of Linked Data, but it seems that I missed the most important thing. It’s not about the Web technologies but about economics. The key Linked Data problem is that it holds a monopoly in the market. One can’t compare it to anything else, and thus one can’t be objective about it. There is no competition, and without competition, there is no real progress. Without competition, it’s possible for many odd ideas to survive, such as requiring people to implement HTTP 303 redirection.

As a competitor to Linked Data, this proposal should lead to a re-examination of many of the decisions that have lead to and sustain Linked Data. I say “should,” not that it will lead to such a re-examination. At least not now. Perhaps when the next “universal” semantic syntax comes along.

You may find An example of Hypernotation useful in reading the Hypernotation post.

January 27, 2012

Alarum – World Wide Shortage of Logicians and Ontologists

Filed under: BigData,Linked Data,Logic,Ontology — Patrick Durusau @ 4:32 pm

Did you know there is an alarming shortage of logicians and ontologists around the world? Apparently everywhere, in all countries.

Came as a complete shock/surprise to me.

I was reading ‘Digital Universe’ to Add 1.8 Zettabytes in 2011 by Rich Miller which says:

More than 1.8 zettabytes of information will be created and stored in 2011, according to the latest IDC Digital Universe Study sponsored by EMC. That’s a mind-boggling figure, equivalent to 1.8 trillion gigabytes -enough information to fill 57.5 billion 32GB Apple iPads. It also illustrates the challenge in storing and managing all that data.

But then I remembered the “state of the Semantic Web” report of 31,634,213,770 triples.

I know it is apples and oranges to some degree but compare the figures for data and linked data:

Data 1,800,000,000,000,000,000,000
Triples 31,634,213,770

Not to mention that the semantics of data is constantly evolving. If not business and scientific data, recall that “texting” was unknown little more than a decade ago.

It is clear that we don’t have enough logicians and ontologists (who have yet to agree on a common upper ontology) to keep up with the increasing flow of data. For that matter, the truth is they have been constantly falling behind for centuries. Systems are proposed, cover some data, only to become data that has to be covered by subsequent systems.

Some options to deal with this crisis:

  • Universal Logician/Ontologist Conscription Act – All 18 year olds world wide have to spend 6 years in the LogoOnto Corps. First four years learning the local flavor of linked data and the last two years coding data.
  • Excess data to /dev/null – Pipe all non-Linked data to /dev/null until logicians/ontologists can catch up. Projected to be sometime after 5500, perhaps late 5500’s. (According to Zager and Evans.)
  • ???

There are other options. Propose yours and/or wait for some suggestions here next week!

January 25, 2012

Searching and Browsing Linked Data with SWSE: the SemanticWeb Search Engine

Filed under: Linked Data,RDF,Search Engines,Semantic Web — Patrick Durusau @ 3:30 pm

Searching and Browsing Linked Data with SWSE: the SemanticWeb Search Engine by Aidan Hogan, Andreas Harth, Jürgen Umbrich, Sheila Kinsella, Axel Polleres and Stefan Decker.

Abstract:

In this paper, we discuss the architecture and implementation of the SemanticWeb Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data { loosely also known as Linked Data { which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web { in terms of scale, unreliability, inconsistency and noise { are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-eff ort manner, for use on Web data. Throughout, we o ffer evaluation and complementary argumentation to support our design choices, and also off er discussion on future directions and open research questions. Later, we also provide candid discussion relating to the diffculties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.

This is the paper that Ivan Herman mentions at Nice reading on Semantic Search.

It covers a lot of ground in fifty-five (55) pages but it doesn’t take long to hit an issue I wanted to ask you about.

At page 2, Google is described as follows:

In the general case, Google is not suitable for complex information gathering tasks requiring aggregation from multiple indexed documents: for such tasks, users must manually aggregate tidbits of pertinent information from various recommended heterogeneous sites, each such site presenting information in its own formatting and using its own navigation system. In e ffect, Google’s limitations are predicated on the lack of structure in HTML documents, whose machine interpretability is limited to the use of generic markup-tags mainly concerned with document rendering and linking. Although Google arguably makes the best of the limited structure available in such documents, most of the real content is contained in prose text which is inherently diffcult for machines to interpret. Addressing this inherent problem with HTML Web data, the Semantic Web movement provides a stack of technologies for publishing machine-readable data on the Web, the core of the stack being the Resource Description Framework (RDF).

A couple of observations:

Although Google needs no defense from me, I would argue that Google never set itself the task of aggregating information from indexed documents. Historically speaking, IR has always been concerned with returning relevant documents and not returning irrelevant documents.

Second, the lack of structure in HTML documents (although the article mixes in sites with different formatting) is no deterrent to a human reader aggregating “tidbits of pertinent information….” I rather doubt that writing all the documents in valid Springer LaTeX would make that much difference on the “tidbits of pertinent information” score.

This is my first pass through the article and I suspect it will take three or more to become comfortable with it.

Do you agree/disagree that the task of IR is to retrieve documents, not “tidbits of pertinent information?”

Do you agree/disagree that HTML structure (or lack thereof) is that much of an issue for interpretation of document?

Thanks!

Nice reading on Semantic Search

Filed under: Linked Data,RDF,Semantic Web — Patrick Durusau @ 3:25 pm

Nice reading on Semantic Search by Ivan Herman.

From the post:

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

Interested to hear your take on Ivan’s comments on owl:sameAs.

The semantics of words, terms, ontology classes are not stable over time and/or users. If you doubt that statement, leaf through the Oxford English Dictionary for ten (10) minutes.

Moreover, the only semantics we “see” in words, terms or ontology classes are those we assign them. We can discuss the semantics of Hebrew words in the Dead Sea Scrolls but those are our semantics, not those of the original users of those words. May be close to what they meant, may not. Can’t say for sure because we can’t ask and would lack the context to understand the answer if we could.

Adding more terms to use as supplements to owl:sameAs just increases the chances for variation. And error if anyone is going to enforce their vision of broadMatch on usages of that term by others.

January 24, 2012

LDIF – Linked Data Integration Framework (0.4)

Filed under: Hadoop,Heterogeneous Data,LDIF,Linked Data — Patrick Durusau @ 3:43 pm

LDIF – Linked Data Integration Framework (0.4)

Version 0.4 News:

Up till now, LDIF stored data purely in-memory which restricted the amount of data that could be processed. Version 0.4 provides two alternative implementations of the LDIF runtime environment which allow LDIF to scale to large data sets: 1. The new triple store backed implementation scales to larger data sets on a single machine with lower memory consumption at the expense of processing time. 2. The new Hadoop-based implementation provides for processing very large data sets on a Hadoop cluster, for instance within Amazon EC2. A comparison of the performance of all three implementations of the runtime environment is found on the LDIF benchmark page.

From the “About LDIF:”

The Web of Linked Data grows rapidly and contains data from a wide range of different domains, including life science data, geographic data, government data, library and media data, as well as cross-domain data sets such as DBpedia or Freebase. Linked Data applications that want to consume data from this global data space face the challenges that:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

This usage of different vocabularies as well as the usage of URI aliases makes it very cumbersome for an application developer to write SPARQL queries against Web data which originates from multiple sources. In order to ease using Web data in the application context, it is thus advisable to translate data to a single target vocabulary (vocabulary mapping) and to replace URI aliases with a single target URI on the client side (identity resolution), before starting to ask SPARQL queries against the data.

Up-till-now, there have not been any integrated tools that help application developers with these tasks. With LDIF, we try to fill this gap and provide an an open-source Linked Data Integration Framework that can be used by Linked Data applications to translate Web data and normalize URI while keeping track of data provenance.

With the addition of Hadoop based processing, definitely worth your time to download and see what you think of it.

Ironic that the problem it solves:

  1. data sources use a wide range of different RDF vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified with different URIs within different data sources.

already existed, prior to Linked Data as:

  1. data sources use a wide range of different vocabularies to represent data about the same type of entity.
  2. the same real-world entity, for instance a person or a place, is identified differently within different data sources.

So the Linked Data drill is to convert data, which already has these problems, into Linked Data, which will still have these problems, and then solve the problem of differing identifications.

Yes?

Did I miss a step?

January 20, 2012

Semantic Tech the Key to Finding Meaning in the Media

Filed under: Ambiguity,Linked Data — Patrick Durusau @ 9:21 pm

Semantic Tech the Key to Finding Meaning in the Media by Chris Lamb.

Chris starts off well enough:

News volume has moved from infoscarcity to infobesity. For the last hundred years, news in print was delivered in a container, called a newspaper, periodically, typically every twenty-four hours. The container constrained the product. The biggest constraints of the old paradigm were periodic delivery and limitations of column inches.

Now information continually bursts through our Google Readers, our cell phones, our tablets, display screens in elevators and grocery stores. Do we really need to read all 88,731 articles on the Bernie Madoff trial? Probably not. And that’s the dilemma for news organizations.

In the old metaphor, column-inches was the constraint. In the new metaphor, reader attention span becomes the constraint.

But, then quickly starts to fade:

Disambiguation is a technique to uniquely identify named entities: people, cities, and subjects. Disambiguation can identify that one article is about George Herbert Walker Bush, the 41st President of the US, and another article is about George Walker Bush, number 43. Similarly, the technology can distinguish between Lincoln Continental, the car, and Lincoln, Nebraska, the town. As part of the metadata, many tagging engines that disambiguate return unique identifiers called Uniform Resource Identifiers (URI). A URI is a pointer into a database.

If tagging creates machine readable assets, disambiguation is the connective tissue between these assets. Leveraging tagging and disambiguation technologies, applications can now connect content with very disparate origins. Today’s article on George W. Bush can be automatically linked to an article he wrote when he owned the Texas Ranger’s baseball team. Similarly the online bio of Bill Gates can be automatically tied to his online New Mexico arrest record in April 1975.

Apparently he didn’t read the paper The communicative function of ambiguity in language.

The problem with disambiguation is that you and I may well set up a system to disambiguate named entities differently. To be sure, we will get some of them the same, but the question becomes which ones? Is 80% of them the same enough?

Depends on the application doesn’t it? What if we are looking for a terrorist who may have fissionable material? Does 80% look good enough?

Ironic. Disambiguation is subject to the same ambiguity as it set out to solve.

PS: URIs aren’t necessarily pointers into databases.

January 19, 2012

RDF silos

Filed under: Linked Data,RDF — Patrick Durusau @ 7:35 pm

Bibliographic Framework: RDF and Linked Data

Karen Coyle writes:

With the newly developed enthusiasm for RDF as the basis for library bibliographic data we are seeing a number of efforts to transform library data into this modern, web-friendly format. This is a positive development in many ways, but we need to be careful to make this transition cleanly without bringing along baggage from our past.

Recent efforts have focused on translating library record formats into RDF with the result that we now have:
    ISBD in RDF
    FRBR in RDF
    RDA in RDF

and will soon have
    MODS in RDF

In addition there are various applications that convert MARC21 to RDF, although none is “official.” That is, none has been endorsed by an appropriate standards body.

Each of these efforts takes a single library standard and, using RDF as its underlying technology, creates a full metadata schema that defines each element of the standard in RDF. The result is that we now have a series of RDF silos, each defining data elements as if they belong uniquely to that standard. We have, for example, at least four different declarations of “place of publication”: in ISBD, RDA, FRBR and MODS, each with its own URI. There are some differences between them (e.g. RDA separates place of publication, manufacture, production while ISBD does not) but clearly they should descend from a common ancestor:
(emphasis added)

Karen makes a very convincing argument about RDF silos and libraries.

I am less certain about her prescription that libraries concentrate on creating data and build records for that data separately.

In part because there aren’t any systems where data exists separate from either an implied or explicit structure to access it. And those structures are just as much “data” as the “data” they enclose. We may not often think of it that way but shortcomings on our part don’t alter our data and the “data” that encloses it.

January 6, 2012

I-CHALLENGE 2012 : Linked Data Cup

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 11:39 am

I-CHALLENGE 2012 : Linked Data Cup

Dates:

When Sep 5, 2012 – Sep 7, 2012
Where Graz, Austria
Submission Deadline Apr 2, 2012
Notification Due May 7, 2012
Final Version Due Jun 4, 2012

From the call for submissions:

The yearly organised Linked Data Cup (formerly Triplification Challenge) awards prizes to the most promising innovation involving linked data. Four different technological topics are addressed: triplification, interlinking, cleansing, and application mash-ups. The Linked Data Cup invites scientists and practitioners to submit novel and innovative (5 star) linked data sets and applications built on linked data technology.

Although more and more data is triplified and published as RDF and linked data, the question arises how to evaluate the usefulness of such approaches. The Linked Data Cup therefore requires all submissions to include a concrete use case and problem statement alongside a solution (triplified data set, interlinking/cleansing approach, linked data application) that showcases the usefulness of linked data. Submissions that can provide measurable benefits of employing linked data over traditional methods are preferred.
Note that the call is not limited to any domain or target group. We accept submissions ranging from value-added business intelligence use cases to scientific networks to the longest tail of information domains. The only strict requirement is that the employment of linked data is very well motivated and also justified (i.e. we rank approaches higher that provide solutions, which could not have been realised without linked data, even if they lack technical or scientific brilliance). (emphasis added)

I don’t know what the submissions are going to look like but the conference organizers should get high marks for academic honesty. I don’t think I have ever seen anyone say:

we rank approaches higher that provide solutions, which could not have been realised without linked data, even if they lack technical or scientific brilliance

We have all seen challenges with qualifying requirements but I don’t recall any that would privilege lesser work because of a greater dependence on a requirement. Or at least that would publicly claim that was the contest policy. Have there been complaints from technically or scientifically brilliant approaches about judging in the past?

Will have to watch the submissions and results to see if technically or scientifically brilliant approaches get passed over in favor of lesser approaches. Will be a signal to all first rate competitors to seek recognition elsewhere.

December 29, 2011

Linked Data Paradigm Can Fuel Linked Cities

Filed under: Linked Data,LOD,Search Requirements,Searching — Patrick Durusau @ 9:16 pm

Linked Data Paradigm Can Fuel Linked Cities

The small city of Cluj in Romania, of some half-million inhabitants, is responsible for a 2.5 million triple store, as part of a Recognos-led project to develop a “Linked City” community portal. The project was submitted for this year’s ICT Call – SME initiative on Digital Content and Languages, FP7-ICT-2011-SME-DCL. While it didn’t receive funding from that competition, Recognos semantic web researcher Dia Miron, is hopeful of securing help from alternate sources in the coming year to expand the project, including potentially bringing the concept of linked cities to other communities in Romania or elsewhere in Europe.

The idea was to publish information from sources such as local businesses about their services and products, as well as data related to the local government and city events, points of interest and projects, using the Linked Data paradigm, says Miron. Data would also be geolocated. “So we take all the information we can get about a city so that people can exploit it in a uniform manner,” she says.

The first step was to gather the data and publish it in a standard format using RDF and OWL; the next phase, which hasn’t taken place yet (it’s funding-dependent), is to build exportation channels for the data. “First we wanted a simple query engine that will exploit the data, and then we wanted to build a faceted search mechanism for those who don’t know the data structure to exploit and navigate through the data,” she says. “We wanted to make it easier for someone not very acquainted with the models. Then we wanted also to provide some kind of SMS querying because people may not always be at their desks. And also the final query service was an augmented reality application to be used to explore the city or to navigate through the city to points of interest or business locations.”

Local Cluj authorities don’t have the budgets to support the continuation of the project on their own, but Miron says the applications will be very generic and can easily be transferred to support other cities, if they’re interested in helping to support the effort. Other collaborators on the project include Ontotext and STI Innsbruck, as well as the local Cluj council.

I don’t doubt this would be useful information for users but is this the delivery model that is going to work for users, assuming it is funded? Here or elsewhere?

How hard do users work with searches? See Keyword and Search Engines Statistics to get an idea by country.

Some users can be trained to perform fairly complex searches but I suspect that is a distinct minority. And the type of searches that need to be performed vary by domain.

For example, earlier today, I was searching for information on “spectral graph theory,” which I suspect has different requirements than searching for 24-hour sushi bars within a given geographic area.

I am not sure how to isolate those different requirements, much less test how close any approach is to satisfying them, but I do think both areas merit serious investigation.

December 9, 2011

British Museum Semantic Web Collection Online

Filed under: British Museum,Linked Data,SPARQL — Patrick Durusau @ 8:24 pm

British Museum Semantic Web Collection Online

From the webpage:

Welcome to this Linked Data and SPARQL service. It provides access to the same collection data available through the Museum’s web presented Collection Online, but in a computer readable format. The use of the W3C open data standard, RDF, allows the Museum’s collection data to join and relate to a growing body of linked data published by other organisations around the world interested in promoting accessibility and collaboration.

The data has also been organised using the CIDOC-CRM (Conceptual Reference Model) crucial for harmonising with other cultural heritage data. The current version is beta and development work continues to improve the service. We hope that the service will be used by the community to develop friendly web applications that are freely available to the community.

Please use the SPARQL menu item to use the SPARQL user interface or click here.

With the British National Bibliography, the British Museum both accessible via SPARQL and Bob DuCharme’s Learning SPARQL book, the excuses for not knowing SPARQL cold are few and far in between.

December 7, 2011

USEWOD 2012 Data Challenge

Filed under: Contest,Linked Data,RDF,Semantic Web,Semantics — Patrick Durusau @ 8:08 pm

USEWOD 2012 Data Challenge

From the website:

The USEWOD 2012 Data Challenge invites research and applications built on the basis of USEWOD 2012 Dataset.

Accepted submissions will be presented at USEWOD2012, where a winner will be chosen. Examples of analyses and research that could be done with the dataset are the following (but not limited to those):

  • correlations between linked data requests and real-world events
  • types of structured queries
  • linked data access vs. conventional access
  • analysis of user agents visiting the sites
  • geographical analysis of requests
  • detection and visualisation of trends
  • correlations between site traffic and available datasets
  • etc. – let your imagination run wild!

USEWOD 2012 Dataset

The USEWOD dataset consists of server logs from from two major web servers publishing datasets on the Web
of linked data. In particular, the dataset contains logs from:

  • DBPedia: slices of log data
    spanning several months from
    the linked data twin of Wikipedia, one of the focal points of the Web of data.
    The logs were kindly made available to us for the challenge
    by OpenLink Software!
    Further details about this part of the dataset to follow.
  • SWDF:
    Semantic Web Dog Food is a
    constantly growing dataset of publications, people and organisations in the Web and Semantic Web area,
    covering several of the major conferences and workshops, including WWW, ISWC and ESWC. The logs
    contain two years of requests to the server from about 12/2008 until 12/2010.
  • Linked Open Geo Data A dataset about geographical data.
  • Bio2RDF Linked Data for life sciences.

Data sets are still under construction. Organizers advise that data sets should be available next week.

Your results should be reported as short papers and are due by 15 February 2011.

December 3, 2011

How to Execute the Research Paper

Filed under: Annotation,Biomedical,Dynamic Updating,Linked Data,RDF — Patrick Durusau @ 8:21 pm

How to Execute the Research Paper by Anita de Waard.

I had to create the category, “dynamic updating,” to at least partially capture what Anita describes in this presentation. I would have loved to be present to see it in person!

The gist of the presentation is that we need to create mechanisms to support research papers being dynamically linked to the literature and other resources. One example that Anita uses is linking a patient’s medical records to reports in literature with professional tools for the diagnostician.

It isn’t clear how Linked Data (no matter how generously described by Jeni Tennison) could be the second technology for making research papers linked to other data. In part because as Jeni points out, URIs are simply more names for some subject. We don’t know if that name is for the resource or something the resource represents. Makes reliable linking rather difficult.

BTW, the web lost its ability to grow in a “gradual and sustainable way” when RDF/Linked Data introduced the notion that URIs cannot be allowed to fail. If you try to reason based on something that fails, the reasoner falls on its side. Not nearly as robust as allowing semantic 404’s.

Anita’s third step, an integrated workflow is certainly the goal to which we should be striving. I am less convinced about the mechanisms, such as generating linked data stores in addition to the documents we already have, are the way forward. For documents, for instance, why do we need to repeat data they already possess? Why can’t documents represent their contents themselves? Oh, because that isn’t how Linked Data/RDF stores work.

Still, I would highly recommend this slide deck and that you catch any presentation by Anita that you can.

November 19, 2011

A Look Into Linking Government Data

Filed under: Linked Data,LOD — Patrick Durusau @ 10:24 pm

A Look Into Linking Government Data

From the post:

Due out next month from Springer Publishing is Linking Government Data, a book that highlights some of the leading-edge applications of Linked Data to problems of government operations and transparency. David Wood, CTO of 3RoundStones and co-chair of the W3C RDF Working Group, writes and edits the volume, which includes contributions from others exploring the intersection of government and the Semantic Web.

….

Some of the hot spots for this are down under, in Australia and New Zealand. The U.K., of course, also has done extremely well, with the data.gov.uk portal an acknowledged leader in LOD efforts – and certainly comfortably ahead of the U.S. data.gov site.

He also thinks it’s worth noting that, just because you might not see a country openly publishing its data as Linked Data, it doesn’t mean that it’s not there. Often someone, somewhere – even if it’s just at one government agency — is using Linked Data principles, experimentally or in real projects. “Like commercial organizations, governments often use them internally and not publish externally,” he notes. “The spectrum of adoption can be individual or a trans-government mandate or everything in between.”

OK, but you would think if there were some major adoption, it would be mentioned in a post promoting the book. Australia, New Zealand and Nixon’s “Silent Majority” in the U.S. are using linked data. Can’t see them but they are there. That’s like RIAA music piracy estimates, just sheer fiction for all but true believers.

But as far as the U.S.A., the rhetoric shifts from tangible benefit to “can lead to,” “can save money,” etc.:

The economics of the Linked Data approach, Wood says, show unambiguous benefit. Think of how it can save money in the U.S. on current expenditures for data warehousing. And think of the time- and productivity-savings, for example, of having government information freely available on the web in a standard format in a way that can be reused and recombined with other data. In the U.S., “government employees wouldn’t have to divert their daily work to answer Freedom of Information requests because the information is proactively published,” he says. It can lead to better policy decisions because government researchers wouldn’t have to spend enormous amounts of time trying to integrate data from multiple agencies in varying formats to combine it and find connections between, for example, places where people live and certain kinds of health problems that may be prevalent there.

And democracy and society also profit when it’s easy for citizens to access published information on where the government is spending their money, or when it’s easy for scientists and researchers to get data the government collects around scientific efforts so that it can be reused for purposes not originally envisioned.

“Unambiguous benefit” means that we have two systems, one using practice Y and other using practice X and when compared (assuming the systems are comparable): there is a clear different of Z% of some measurable metric that can be attributed to the different practices.

Yes?

Personally I think linked data can be beneficial but that is subject to measurement and demonstration in some particular context.

As soon as this work is released, I would appreciate pointers to unambiguous benefit shown by comparison of agencies in the U.S.A. doing comparable work with some metric that makes that demonstration. But that has to be more than speculation or “can.”

October 7, 2011

LDIF – Linked Data Integration Framework Version 0.3.

Filed under: Data Integration,Linked Data,LOD — Patrick Durusau @ 6:17 pm

LDIF – Linked Data Integration Framework Version 0.3

From the email announcement:

The LDIF – Linked Data Integration Framework can be used within Linked Data applications to translate heterogeneous data from the Web of Linked Data into a clean local target representation while keeping track of data provenance. LDIF provides an expressive mapping language for translating data from the various vocabularies that are used on the Web into a consistent, local target vocabulary. LDIF includes an identity resolution component which discovers URI aliases in the input data and replaces them with a single target URI based on user-provided matching heuristics. For provenance tracking, the LDIF framework employs the Named Graphs data model.

Compared to the previous release 0.2, the new LDIF release provides:

  • data access modules for gathering data from the Web via file download, crawling and accessing SPARQL endpoints. Web data is cached locally for further processing.
  • a scheduler for launching data import and integration jobs as well as for regularly updating the local cache with data from remote sources.
  • a second use case that shows how LDIF is used to gather and integrate data from several music-related Web data sources.

More information about LDIF, concrete usage examples and performance details are available at http://www4.wiwiss.fu-berlin.de/bizer/ldif/

Over the next months, we plan to extend LDIF along the following lines:

  1. Implement a Hadoop Version of the Runtime Environment in order to be able to scale to really large amounts of input data. Processes and data will be distributed over a cluster of machines.
  2. Add a Data Quality Evaluation and Data Fusion Module which allows Web data to be filtered according to different data quality assessment policies and provides for fusing Web data according to different conflict resolution methods.

Uses SILK (SILK – Link Discovery Framework Version 2.5) identity resolution semantics.

Uberlic

Filed under: Linked Data,Semantic Diversity — Patrick Durusau @ 6:17 pm

Uberlic

From the about documenation:

The Doppelganger service translates between IDs of entities in third party APIs. When you query Doppelganger with an entity ID, you’ll get back IDs of that same entity in other APIs. In addition, a persistent Uberblic ID serves as an anchor for your application that you can use for subsequent queries.

So why link APIs? is answered in a blog entry:

There is an ever-increasing amount of data available on the Web via APIs, waiting to be integrated by product developers. But actually integrating more than just one API into a product poses a problem to developers and their product managers: how do we make the data sources interoperable, both with one another and with our existing databases? Uberblic launches a service today to make that easy.

A location based product, for example, would aim to pull in information like checkins from Foursquare, reviews from Lonely Planet, concerts from LastFM and social connections from Facebook, and display all that along one place’s description. To do that, one would need to identify this particular place in all the APIs – identify the place’s ‘doppelgangers’, if you will. Uberblic does exactly that, mapping doppelgangers across APIs, as a web service. It’s like a dictionary for IDs, the Rosetta Stone of APIs. And geolocation is just the beginning.

Uberblic’s doppelganger engine links data across a variety of data APIs. By matching equivalent records, the engine connects an entity graph that spans APIs and data services. This entity graph provides rich contextual data for product developers, and Uberblic’s APIs serve as a switchboard and broker between data sources.

See the full post at: http://uberblic.com/2011/08/one-api-to-link-them-all/

Useful. But as you have already noticed, no associations, no types, no way to map to other identifiers.

Not that a topic map could not use Uberlic data if available, just not is all that is possible.

October 4, 2011

Efficient Multidimensional Blocking for Link Discovery without losing Recall

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 7:57 pm

Efficient Multidimensional Blocking for Link Discovery without losing Recall

Jack Park did due diligence on the SILK materials before I did and forwarded a link to this paper.

Abstract:

Over the last three years, an increasing number of data providers have started to publish structured data according to the Linked Data principles on the Web. The resulting Web of Data currently consists of over 28 billion RDF triples. As the Web of Data grows, there is an increasing need for link discovery tools which scale to very large datasets. In record linkage, many partitioning methods have been proposed which substantially reduce the number of required entity comparisons. Unfortunately, most of these methods either lead to a decrease in recall or only work on metric spaces. We propose a novel blocking method called Multi-Block which uses a multidimensional index in which similar objects are located near each other. In each dimension the entities are indexed by a different property increasing the efficiency of the index significantly. In addition, it guarantees that no false dismissals can occur. Our approach works on complex link specifications which aggregate several di fferent similarity measures. MultiBlock has been implemented as part of the Silk Link Discovery Framework. The evaluation shows a speedup factor of several 100 for large datasets compared to the full evaluation without losing recall.

From deeper in the paper:

If the similarity between two entities exceeds a threshold $\theta$, a link between these two entities is generated. $sim$ is computed by evaluating a link specification $s$ (in record linkage typically called linkage decision rule [23]) which specifies the conditions two entities must fulfi ll in order to be interlinked.

If I am reading this paper correctly, there isn’t a requirement (as in record linkage) that we normalized the data to a common format before writing the rule for comparisons. That in and of itself is a major boon. To say nothing of the other contributions of this paper.

SILK – Link Discovery Framework Version 2.5 released

Filed under: Linked Data,LOD,RDF,Semantic Web,SPARQL — Patrick Durusau @ 7:54 pm

SILK – Link Discovery Framework Version 2.5 released

I was quite excited to see under “New Data Transformations”…”Merge Values of different inputs.”

But the documentation for Transformation must be lagging behind or I have a different understanding of what it means to “Merge Values of different inputs.”

Perhaps I should ask: What does SILK mean by “Merge Values of different inputs?”

Picking out an issue that is of particular interest to me is not meant to be a negative comment on the project. An impressive bit of work for any EU funded project.

Another question: Has anyone looked at the SILK- Link Specification Language (SILK-LSL) as an input into declaring equivalence/processing for arbitrary data objects? Just curious.

Robert Isele posted this announcement about SILK on October 3, 2011:

we are happy to announce version 2.5 of the Silk Link Discovery Framework for the Web of Data.

The Silk framework is a tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web. Using the declarative Silk – Link Specification Language (Silk-LSL), developers can specify the linkage rules data items must fulfill in order to be interlinked. These linkage rules may combine various similarity metrics and can take the graph around a data item into account, which is addressed using an RDF path language.

Linkage rules can either be written manually or developed using the Silk Workbench. The Silk Workbench, is a web application which guides the user through the process of interlinking different data sources.

Version 2.5 includes the following additions to the last major release 2.4:

(1) Silk Workbench now includes a function to learn linkage rules from the reference links. The learning function is based on genetic programming and capable of learning complex linkage rules. Similar to a genetic algorithm, genetic programming starts with a randomly created population of linkage rules. From that starting point, the algorithm iteratively transforms the population into a population with better linkage rules by applying a number of genetic operators. As soon as either a linkage rule with a full f-Measure has been found or a specified maximum number of iterations is reached, the algorithm stops and the user can select a linkage rule.

(2) A new sampling tab allows for fast creation of the reference link set. It can be used to bootstrap the learning algorithm by generating a number of links which are then rated by the user either as correct or incorrect. In this way positive and negative reference links are defined which in turn can be used to learn a linkage rule. If a previous learning run has already been executed, the sampling tries to generate links which contain features which are not yet covered by the current reference link set.

(2) The new help sidebar provides the user with a general description of the current tab as well as with suggestions for the next steps in the linking process. As new users are usually not familiar with the steps involved in interlinking two data sources, the help sidebar currently provides basic guidance to the user and will be extended in future versions.

(3) Introducing per-comparison thresholds:

  • On popular request, thresholds can now be specified on each comparison.
  • Backwards-compatible: Link specifications using a global threshold can still be executed.

(4) New distance measures:

  • Jaccard Similarity
  • Dice’s coefficient
  • DateTime Similarity
  • Tokenwise Similarity, contributed by Florian Kleedorfer, Research Studios Austria

(5) New data transformations:

  • RemoveEmptyValues
  • Tokenizer
  • Merge Values of multiple inputs

(6) New DataSources and Outputs

  • In addition to reading from SPARQL endpoints, Silk now also supports reading from RDF dumps in all common formats. Currently the data set is held in memory and it is not available in the Workbench yet, but future versions will improve this.
  • New SPARQL/Update Output: In addition to writing the links to a file, Silk now also supports writing directly to a triple store using SPARQL/Update.

(7) Various improvements and bugfixes

———————————————————————————

More information about the Silk Link Discovery Framework is available at:

http://www4.wiwiss.fu-berlin.de/bizer/silk/

The Silk framework is provided under the terms of the Apache License, Version 2.0 and can be downloaded from:

http://www4.wiwiss.fu-berlin.de/bizer/silk/releases/

The development of Silk was supported by Vulcan Inc. as part of its Project Halo (www.projecthalo.com) and by the EU FP7 project LOD2-Creating Knowledge out of Interlinked Data (http://lod2.eu/, Ref. No. 257943).

Thanks to Christian Becker, Michal Murawicki and Andrea Matteini for contributing to the Silk Workbench.

October 1, 2011

Linked Data for Education and Technology-Enhanced Learning (TEL)

Filed under: Education,Linked Data,LOD — Patrick Durusau @ 8:27 pm

Linked Data for Education and Technology-Enhanced Learning (TEL)

From the website:

Interactive Learning Environments special issue on Linked Data for Education and Technology-Enhanced Learning (TEL)

IMPORTANT DATES
================

  • 30 November 2011: Paper submission deadline (11:59pm Hawaiian time)
  • 30 March 2012: Notification of first review round
  • 30 April 2012: Submission of major revisions
  • 15 July 2012: Notification of major revision reviews
  • 15 August 2012: Submission of minor revisions
  • 30 August 2012: Notification of acceptance
  • late 2012 : Publication

OVERVIEW
=========

While sharing of open learning and educational resources on the Web became common practice throughout the last years a large amount of research was dedicated to interoperability between educational repositories based on semantic technologies. However, although the Semantic Web has seen large-scale success in its recent incarnation as a Web of Linked Data, there is still only little adoption of the successful Linked Data principles in the domains of education and technology-enhanced learning (TEL). This special issue builds on the fundamental belief that the Linked Data approach has the potential to fulfill the TEL vision of Web-scale interoperability of educational resources as well as highly personalised and adaptive educational applications. The special issue solicits research contributions exploring the promises of the Web of Linked Data in TEL by gathering researchers from the areas of the Semantic Web and educational science and technology.

TOPICS OF INTEREST
=================

We welcome papers describing current trends on research in (a) how technology-enhaced learning approaches take advantage of Linked Data on the Web and (b) how Linked Data principles and semantic technologies are being applied in technology-ehnaced learning contexts. Both rather application-oriented as well as theoretical papers are welcome. Relevant topics include but are not limited to the following:

  • Using Linked Data to support interoperability of educational resources
  • Linked Data for informal learning
  • Personalisation and context-awareness in TEL
  • Usability and advanced user interfaces in learning environments and Linked Data
  • Light-weight TEL metadata schemas
  • Exposing learning object metadata via RDF/SPARQL & service-oriented approaches
  • Semantic & syntactic mappings between educational metadata schemas and standards
  • Controlled vocabularies, ontologies and terminologies for TEL
  • Personal & mobile learning environments and Linked Data
  • Learning flows and designs and Linked Data
  • Linked Data in (visual) learning analytics and educational data mining
  • Linked Data in organizational learning and learning organizations
  • Linked Data for harmonizing individual learning goals and organizational objectives
  • Competency management and Linked Data
  • Collaborative learning and Linked Data
  • Linked-data driven social networking collaborative learning

September 30, 2011

DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents

Filed under: DBpedia,Linked Data,LOD — Patrick Durusau @ 7:07 pm

DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents by Pablo Mendes (email announcement)

We are happy to announce the release of DBpedia Spotlight v0.5 – Shedding Light on the Web of Documents.

DBpedia Spotlight is a tool for annotating mentions of DBpedia entities and concepts in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. The DBpedia Spotlight Architecture is composed by the following modules:

  • Web application, a demonstration client (HTML/Javascript UI) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
  • Web Service, a RESTful Web API that exposes the functionality of annotating and/or disambiguating resources in text. The service returns XML, JSON or XHTML+RDFa.
  • Annotation Java / Scala API, exposing the underlying logic that performs the annotation/disambiguation.
  • Indexing Java / Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.

In this release we have provided many enhancements to the Web Service, installation process, as well as the spotting, candidate selection, disambiguation and annotation stages. More details on the enhancements are provided below.

The new version is deployed at:

Instructions on how to use the Web Service are available at: http://spotlight.dbpedia.org

We invite your comments on the new version before we deploy it on our production server. We will keep it on the “dev” server until October 6th, when we will finally make the switch to the production server at http://spotlight.dbpedia.org/demo/ and http://spotlight.dbpedia.org/rest/

If you are a user of DBpedia Spotlight, please join dbp-spotlight-users@lists.sourceforge.net for announcements and other discussions.

Warning: I think they are serious about the requirement of Firefox 6.0.2 and Chromium 12.0.

I tried it on an older version of Firefox on Ubuntu and got no results at all. Will upgrade Firefox but only in my directory.

September 29, 2011

Beyond the Triple Count

Filed under: Linked Data,LOD,RDF,Semantic Web — Patrick Durusau @ 6:38 pm

Beyond the Triple Count by Leigh Dodds.

From the post:

I’ve felt for a while now that the Linked Data community has an unhealthy fascination on triple counts, i.e. on the size of individual datasets.

This was quite natural in the boot-strapping phase of Linked Data in which we were primarily focused on communicating how much data was being gathered. But we’re now beyond that phase and need to start considering a more nuanced discussion around published data.

If you’re a triple store vendor then you definitely want to talk about the volume of data your store can hold. After all, potential users or customers are going to be very interested in how much data could be indexed in your product. Even so, no-one seriously takes a headline figure at face value. As users we’re much more interested in a variety of other factors. For example how long does it take to load my data? Or, how well does a store perform with my usage profile, taking into account my hardware investment? Etc. This is why we have benchmarks, so we can take into account additional factors and more easily compare stores across different environments.

But there’s not nearly enough attention paid to other factors when evaluating a dataset. A triple count alone tells us nothing. They’re not even a good indicator of the number of useful “facts” in a dataset.

Watch Leigh’s presentation (embedded with his post) and read the post.

I think his final paragraph sets the goal for a wide variety of approaches, however we might disagree about how to best get there! 😉

Very much worth your time to read and ponder.

September 27, 2011

Linked Data Semantic Issues (same for topic maps?)

Filed under: Linked Data,LOD,Marketing,Merging,Topic Maps — Patrick Durusau @ 6:51 pm

Sebastian Schaffert posted a message on the pub-lod@w3c.org list that raised several issues about Linked Data. Issues that sound relevant to topic maps. See what you think.

From the post:

We are working together with many IT companies (with excellent software developers) and trying to convince them that Semantic Web technologies are superior for information integration. They are already overwhelmed when they have to understand that a database ID for an object is not enough. If they have to start distinguishing between the data object and the real world entity the object might be representing, they will be lost completely.

I guess being told that a “real world entity” may have different ways to be identified must seem to be the road to perdition.

Curious because the “real world” is a messy place. Or is that the problem? That the world of developers is artificially “clean,” at least as far as identification and reference.

Perhaps CS programs need to train developers for encounter with the messy “real world.”

From the post:

> When you dereference the URL for a person (such as …/561666514#), you get back RDF. Our _expectation_, of course, is that that RDF will include some remarks about that person (…/561666514#), but there can be no guarantee of this, and no guarantee that it won’t include more information than you asked for. All you can reliably expect is that _something_ will come back, which the service believes to be true and hopes will be useful. You add this to your knowledge of the world, and move on.

There I have my main problem. If I ask for “A”, I am not really interested in “B”. What our client implementation therefore does is to throw away everything that is about B and only keeps data about A. Which is – in case of the FB data – nothing. The reason why we do this is that often you will get back a large amount of irrelevant (to us) data even if you only requested information about a specific resource. I am not interested in the 999 other resources the service might also want to offer information about, I am only interested in the data I asked for. Also, you need to have some kind of “handle” on how to start working with the data you get back, like:
1. I ask for information about A, and the server gives me back what it knows about A (there, my expectation again …)
2. From the data I get, I specifically ask for some common properties, like A foaf:name ?N and do something with the bindings of N. Now how would I know how to even formulate the query if I ask for A but get back B?

Ouch! That one cuts a little close. 😉

What about the folks who are “…not really interested in ‘B’.” ?

How do topic maps serve their interests?

Or have we decided for them that more information about a subject is better?

Or is that a matter of topic map design? What information to include?

That “merging” and what gets “merged” is a user/client decision?

That is how it works in practice simply due to time, resources, and other constraints.

Marketing questions:

How to discover data users would like to have appear with other data, prior to having a contract to do so?

Can we re-purpose search logs for that?

September 23, 2011

Facebook and the Semantic Web

Filed under: Linked Data,Semantic Web — Patrick Durusau @ 7:42 am

Jesse Weaver, Ph.D. Student, Patroon Fellow, Tetherless World Constellation, Rensselaer Polytechnic Institute, http://www.cs.rpi.edu/~weavej3/, announces that:

I would like to bring to subscribers’ attention that Facebook now supports RDF with Linked Data URIs from its Graph API. The RDF is in Turtle syntax, and all of the HTTP(S) URIs in the RDF are dereferenceable in accordance with httpRange-14. Please take some time to check it out.

If you have a vanity URL (mine is jesserweaver), you can get RDF about you:

curl -H ‘Accept: text/turtle’ http://graph.facebook.com/
curl -H ‘Accept: text/turtle’ http://graph.facebook.com/jesserweaver

If you don’t have a vanity URL but know your Facebook ID, you can use that instead (which is actually the fundamental method).

curl -H ‘Accept: text/turtle’ http://graph.facebook.com/
curl -H ‘Accept: text/turtle’ http://graph.facebook.com/1340421292

From there, try dereferencing URIs in the Turtle. Have fun!

And I thought everyone had moved to that other service and someone left the lights on at Facebook. 😉

No flames! Just kidding.

September 12, 2011

Linking linked data to U.S. law

Filed under: Law - Sources,Linked Data — Patrick Durusau @ 8:27 pm

Linking linked data to U.S. law

Bob DuCharme does an excellent job of covering resources that will help you create meaningful links to US court decisions, laws and regulations.

That will be useful for readers/researchers but I can’t shake the feeling that it is very impoverished linking.

You can link out to a court decision, law or regulation but you can’t say why, in any computer processable way, a link is being made.

Even worse, if I start from the court decision, law or regulation, all I can search for are invocations of that court decision, law or regulation, but I won’t know why it was being invoked.

There are specialized resources in the legal community (Shepard’s Citations) that alter that result but the need for general solution of more robust linking remains.

LinkedGeoData Release 2

LinkedGeoData Release 2

From the webpage:

The aim of the LinkedGeoData (LGD) project is to make the OpenStreetMap (OSM) datasets easily available as RDF. As such the main target audience is the Semantic Web community, however it may turn out to be useful to a much larger audience. Additionally, we are providing interlinking with DBpedia and GeoNames and integration of class labels from translatewiki and icons from the Brian Quinion Icon Collection.

The result is a rich, open, and integrated dataset which we hope to be useful for research and application development. The datasets can be publicly accessed via downloads, Linked Data, and SPARQL-endpoints. We have also launched an experimental “Live-SPARQL-endpoint” that is synchronized with the minutely updates from OSM whereas the changes to our store are republished as RDF.

More geographic data.

September 9, 2011

Authoritative URIs for Geo locations? Multi-lingual labels?

Filed under: Geographic Data,Linked Data,RDF — Patrick Durusau @ 7:14 pm

Some Geo location and label links that came up on the pub-lod list:

Not a complete list nor does it include historical references or designations used over the millenia. Still, you may find it useful.

LATC – Linked Open Data Around-the-Clock

Filed under: Government Data,Linked Data,LOD — Patrick Durusau @ 7:10 pm

LATC – Linked Open Data Around-the-Clock

This appears to be an early release of the site because it has an “unfinished” feel to it. For example, you to poke around a bit to find the tools link. And it isn’t clear how the project intends to promote the use of those tools or originate others to promote the use of linked data.

I suppose it is too late to avoid the grandiose “around-the-clock” project name? Web servers, barring some technical issue, are up 24 x 7. They keep going even as we sleep. Promise.

Objectives:

increase the number, the quality and the accuracy of data links between LOD datasets. LATC contributes to the evolution of the World Wide Web into a global data space that can be exploited by applications similar to a local database today. By increasing the number and quality of data links, LATC makes it easier for European Commission-funded projects to use the Linked Data Web for research purposes.

support institutions as well as individuals with Linked Data publication and consumption. Many of the practical problems that a European Commission-funded project may discover when interaction with the Web of Data are solved on the conceptual level and the solutions have been implemented into freely available data publication and consumption tools. What is still missing is the dissemination of knowledge about how to use these tools to interact with the Web of Linked Data. We aim at providing this knowledge.

create an in-depth test-bed for data intensive applications by publishing datasets produced by the European Commission, the European Parliament, and other European institutions as Linked Data on the Web and by interlinking them with other governmental data, such as found in the UK and elsewhere.

« Newer PostsOlder Posts »

Powered by WordPress