Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 24, 2014

CERMINE: Content ExtRactor and MINEr

Filed under: Bibliography,PDF — Patrick Durusau @ 9:20 am

CERMINE: Content ExtRactor and MINEr

From the webpage:

CERMINE is a Java library and a web service for extracting metadata and content from scientific articles in born-digital form. The system analyses the content of a PDF file and attempts to extract information such as:

  • Title of the article
  • Journal information (title, etc.)
  • Bibliographic information (volume, issue, page numbers, etc.)
  • Authors and affiliations
  • Keywords
  • Abstract
  • Bibliographic references

CERMINE at Github

I used the following three files for a very subjective test of the online interface: http://cermine.ceon.pl/cermine/index.html.

I am mostly interested in extraction of bibliographic entries and can report that while CERMINE made some mistakes, it is quite useful.

I first saw this in a tweet by Docear.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress