Patent database of 15 million chemical structures goes public

Patent database of 15 million chemical structures goes public by Richard Van Noorden.

From the post:

The internet’s wealth of free chemistry data just got significantly larger. Today, the European Bioinformatics Institute (EBI) has launched a website — — that allows anyone to search through 15 million chemical structures, extracted automatically by data-mining software from world patents.

The initiative makes public a 4-terabyte database that until now had been sold on a commercial basis by a software firm, SureChem, which is folding. SureChem has agreed to transfer its information over to the EBI — and to allow the institute to use its software to continue extracting data from patents.

“It is the first time a world patent chemistry collection has been made publicly available, marking a significant advance in open data for use in drug discovery,” says a statement from Digital Science — the company that owned SureChem, and which itself is owned by Macmillan Publishers, the parent company of Nature Publishing Group.

This is one of those Selling Data opportunities that Vincent Granville was talking about.

You can harvest data here, combine it (hopefully using a topic map) with other data and market the results. Not everyone who has need for the data has the time or skills required to re-package the data.

What seems problematic to me is how to reach potential buyers of information?

If you produce data and license it to one of the large data vendors, what’s the likelihood your data will get noticed?

On the other hand, direct sale of data seems like a low percentage deal.


Comments are closed.