Archive for the ‘Deja vu’ Category

Deja vu: a Database of Highly Similar Citations

Friday, November 18th, 2011

Deja vu: a Database of Highly Similar Citations

From the webpage:

Deja vu is a database of extremely similar Medline citations. Many, but not all, of which contain instances of duplicate publication and potential plagiarism. Deja vu is a dynamic resource for the community, with manual curation ongoing continuously, and we welcome input and comments.

In the scientific research community plagiarism and multiple publications of the same data are considered unacceptable practices and can result in tremendous misunderstanding and waste of time and energy. Our peers and the public have high expectations for the performance and behavior of scientists during the execution and reporting of research. With little chance for discovery and decreasing budgets, yet sustained pressure to publish, or without a clear understanding of acceptable publication practices, the unethical practices of duplicate publication and plagiarism can be enticing to some. Until now, discovery has been through serendipity alone, so these practices have largely gone unchecked.

The application of text similarity searching can robustly detect highly similar text records, offering a new tool for ensuring integrity in scientific publications. Deja vu is a database of computationally identified, manually confirmed highly similar citations (abstracts and titles), as well as user provided commentary and evidence to affirm or deny a given documents putative categorization. It is available via the web and to other database curators for tagging of their indexed articles. The availability of a search tool, eTBLAST, by which journal submissions can be compared to existing databases to identify potential duplicate citations and intercept them before they are published, and this database of highly similar citations (or exhaustive searching and tagging within Medline and other databases) could be deterrents to this questionable scientific behavior and excellent examples of citations that are highly similar but represent very distinct research publications.

I would broaden the statement:

multiple publications of the same data are considered unacceptable practices and can result in tremendous misunderstanding and waste of time and energy.

to include repeating the same analysis or discoveries out of sheer ignorance of prior work.

Not as an ethical issue but one of “…waste of time and energy.”

Given the semantic diversity in all fields, work is repeated simply due to “tribes” as Jack Park calls them, using different terminology.

Will be using Deja vu to explore topics in *informatics, to discover related materials.

If you are already using Deja vu that way, your experience, observations, comments would be deeply appreciated.