Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. These technologies include the trio of the Resource Description Framework (RDF), Web Ontology Language (OWL), and SPARQL query language. The PubChemRDF project provides RDF formatted information for the PubChem Compound, Substance, and Bioassay databases.

This document provides detailed technical information (release notes) about the PubChemRDF project. Downloadable RDF data is available on the PubChemRDF FTP Site. Past presentations on the PubChemRDF project are available giving a PubChemRDF introduction and on the PubChemRDF details. The PubChem Blog may provide most recent updates on the PubChemRDF project. Please note that the PubChemRDF is evolving as a function of time. However, we intend for such enhancements to be backwards compatible by adding additional information and annotations.

A twitter post commented on there being 59 billion triples.

Nothing to sneeze at but I was more impressed with the types of connections at page 8 of

I am sure there are others but just on that slide:

  • sio:has_component
  • sio:is_stereoisomer_of
  • sio:is_isotopologue_of
  • sio:has_same_connectivity_as
  • sio:similar_to_by_PubChem_2D_similarity_algorithm
  • sio:similar_to_by_PubChem_3D_similarity_algorithm

Using such annotations, the user could decide on what basis to consider compounds “similar” or not.

True, it is non-obvious how I would offer an alternative vocabulary for isotopologue but in this domain, that may not be a requirement.

That we can offer alternative vocabularies for any domain does not mean there is a requirement for alternative vocabularies in any particular domain.

A great source of data!

I first saw this in a tweet by Paul Groth.

