Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 17, 2015

What’s New for 2016 MeSH

Filed under: MeSH,Thesaurus,Topic Maps,Vocabularies — Patrick Durusau @ 3:41 pm

What’s New for 2016 MeSH by Jacque-Lynne Schulman.

From the post:

MeSH is the National Library of Medicine controlled vocabulary thesaurus which is updated annually. NLM uses the MeSH thesaurus to index articles from thousands of biomedical journals for the MEDLINE/PubMed database and for the cataloging of books, documents, and audiovisuals acquired by the Library.

MeSH experts/users will need to absorb the details but some of the changes include:

Overview of Vocabulary Development and Changes for 2016 MeSH

  • 438 Descriptors added
  • 17 Descriptor terms replaced with more up-to-date terminology
  • 9 Descriptors deleted
  • 1 Qualifier (Subheading) deleted

and,

MeSH Tree Changes: Uncle vs. Nephew Project

In the past, MeSH headings were loosely organized in trees and could appear in multiple locations depending upon the importance and specificity. In some cases the heading would appear two or more times in the same tree at higher and lower levels. This arrangement led to some headings appearing as a sibling (uncle) next to the heading under which they were treed as a nephew. In other cases a heading was included at a top level so it could be seen more readily in printed material. We reviewed these headings in MeSH and removed either the Uncle or Nephew depending upon the judgement of our Internal and External reviewers. There were over 1,000 tree changes resulting from this work, many of which will affect search retrieval in MEDLINE/PubMed and the NLM Catalog.

and,

MeSH Scope Notes

MeSH had a policy that each descriptor should have a scope note regardless of how obvious its meaning. There were many legacy headings that were created without scope notes before this rule came into effect. This year we initiated a project to write scope notes for all existing headings. Thus far 481 scope notes to MeSH were added and the project continues for 2017 MeSH.

Echoes of Heraclitus:

It is not possible to step twice into the same river according to Heraclitus, or to come into contact twice with a mortal being in the same state. (Plutarch) (Heraclitus)

Semantics and the words we use to invoke them are always in a state of flux. Sometimes more, sometimes less.

The lesson here is that anyone who says you can have a fixed and stable vocabulary is not only selling something, they are selling you a broken something. If not broken on the day you start to use it, then fairly soon thereafter.

It took time for me to come to the realization that the same is true about information systems that attempt to capture changing semantics at any given point.

Topic maps in the sense of ISO 13250-2, for example, can capture and map changing semantics, but if and only if you are willing to accept its data model.

Which is good as far as it goes but what if I want a different data model? That is to still capture changing semantics and map between them, but using a different data model.

We may have a use case to map back to ISO 13250-2 or to some other data model. The point being that we should not privilege any data model or syntax in advance, at least not absolutely.

Not only do communities change but their preferences for technologies change as well. It seems just a bit odd to be selling an approach on the basis of capturing change only to build a dike to prevent change in your implementation.

Yes?

November 5, 2014

MeSH on Demand Update: How to Find Citations Related to Your Text

Filed under: Indexing,Medical Informatics,MeSH — Patrick Durusau @ 7:51 pm

MeSH on Demand Update: How to Find Citations Related to Your Text

From the post:

In May 2014, NLM introduced MeSH on Demand, a Web-based tool that suggests MeSH terms from your text such as an abstract or grant summary up to 10,000 characters using the MTI (Medical Text Indexer) software. For more background information, see the article, MeSH on Demand Tool: An Easy Way to Identify Relevant MeSH Terms.

New Feature

A new MeSH on Demand feature displays the PubMed ID (PMID) for the top ten related citations in PubMed that were also used in computing the MeSH term recommendations.

To access this new feature start from the MeSH on Demand homepage (see Figure 1), add your text, such as a project summary, into the box labeled “Text to be Processed.” Then, click the “Find MeSH Terms” button.

Results page:

mesh results

A clever way to deal with the problem of a searcher not knowing the specialized vocabulary of an indexing system.

Have you seen this method used outside of MeSH?

September 18, 2014

2015 Medical Subject Headings (MeSH) Now Available

Filed under: Bioinformatics,Biomedical,MARC,MeSH — Patrick Durusau @ 10:24 am

2015 Medical Subject Headings (MeSH) Now Available

From the post:

Introduction to MeSH 2015
The Introduction to MeSH 2015 is now available, including information on its use and structure, as well as recent updates and availability of data.

MeSH Browser
The default year in the MeSH Browser remains 2014 MeSH for now, but the alternate link provides access to 2015 MeSH. The MeSH Section will continue to provide access via the MeSH Browser for two years of the vocabulary: the current year and an alternate year. Sometime in November or December, the default year will change to 2015 MeSH and the alternate link will provide access to the 2014 MeSH.

Download MeSH
Download 2015 MeSH in XML and ASCII formats. Also available for 2015 from the same MeSH download page are:

  • Pharmacologic Actions (Forthcoming)
  • New Headings with Scope Notes
  • MeSH Replaced Headings
  • MeSH MN (tree number) changes
  • 2015 MeSH in MARC format

Enjoy!

August 23, 2014

NLM RSS Feeds

Filed under: Medical Informatics,MeSH — Patrick Durusau @ 10:19 am

National Library of Medicine RSS Feeds

RSS feeds covering a broad range National Library of Medicine activities.

I am reporting it here because as soon as I don’t, I will need the listing.

NLM Technical Bulletin

Filed under: Medical Informatics,MeSH — Patrick Durusau @ 10:07 am

NLM Technical Bulletin

A publication of the U.S. National Library of Medicine (NLM). The about page for NLM gives the following overview:

The National Library of Medicine (NLM), on the campus of the National Institutes of Health in Bethesda, Maryland, has been a center of information innovation since its founding in 1836. The world’s largest biomedical library, NLM maintains and makes available a vast print collection and produces electronic information resources on a wide range of topics that are searched billions of times each year by millions of people around the globe. It also supports and conducts research, development, and training in biomedical informatics and health information technology. In addition, the Library coordinates a 6,000-member National Network of Libraries of Medicine that promotes and provides access to health information in communities across the United States.

The bulletin about page says:

The NLM Technical Bulletin, your source for the latest searching information, is produced by: MEDLARS Management Section, National Library of Medicine, Bethesda, Maryland, USA.

Which is true but seems inadequate to describe the richness of what you can find at the bulletin.

For example, in 2014 July&emdash;August No. 399 you find:

MeSH on Demand Update: How to Find Citations Related to Your Text

New CMT Subsets Available

New Tutorial: Searching Drugs or Chemicals in PubMed

If medical terminology touches your field of interest, this is a must read.

MeSH on Demand Tool:…

Filed under: Authoring Topic Maps,Bioinformatics,Medical Informatics,MeSH — Patrick Durusau @ 9:43 am

MeSH on Demand Tool: An Easy Way to Identify Relevant MeSH Terms by Dan Cho.

From the post:

Currently, the MeSH Browser allows for searches of MeSH terms, text-word searches of the Annotation and Scope Note, and searches of various fields for chemicals. These searches assume that users are familiar with MeSH terms and using the MeSH Browser.

Wouldn’t it be great if you could find MeSH terms directly from your text such as an abstract or grant summary? MeSH on Demand has been developed in close collaboration among MeSH Section, NLM Index Section, and the Lister Hill National Center for Biomedical Communications to address this need.

Using MeSH on Demand

Use MeSH on Demand to find MeSH terms relevant to your text up to 10,000 characters. One of the strengths of MeSH on Demand is its ease of use without any prior knowledge of the MeSH vocabulary and without any downloads.

Now there’s a clever idea!

Imagine extending it just a bit so that it produces topics for subjects it detects in your text and associations with the text and author of the text. I would call that assisted topic map authoring. You?

I followed a tweet by Michael Hoffman, which lead to: MeSH on Demand Update: How to Find Citations Related to Your Text, which describes an enhancement to MeSH on demands that finds relevant citations (10) based on your text.

The enhanced version mimics the traditional method of writing court opinions. A judge writes his decision and then a law clerk finds cases that support the positions taken in the opinion. You really thought it worked some other way? 😉

January 3, 2012

Topical Classification of Biomedical Research Papers – Details

Filed under: Bioinformatics,Biomedical,Medical Informatics,MeSH,PubMed,Topic Maps — Patrick Durusau @ 5:11 pm

OK, I registered both on the site and for the contest.

From the Task:

Our team has invested a significant amount of time and effort to gather a corpus of documents containing 20,000 journal articles from the PubMed Central open-access subset. Each of those documents was labeled by biomedical experts from PubMed with several MeSH subheadings that can be viewed as different contexts or topics discussed in the text. With a use of our automatic tagging algorithm, which we will describe in details after completion of the contest, we associated all the documents with the most related MeSH terms (headings). The competition data consists of information about strengths of those bonds, expressed as numerical value. Intuitively, they can be interpreted as values of a rough membership function that measures a degree in which a term is present in a given text. The task for the participants is to devise algorithms capable of accurately predicting MeSH subheadings (topics) assigned by the experts, based on the association strengths of the automatically generated tags. Each document can be labeled with several subheadings and this number is not fixed. In order to ensure that participants who are not familiar with biomedicine, and with the MeSH ontology in particular, have equal chances as domain experts, the names of concepts and topical classifications are removed from data. Those names and relations between data columns, as well as a dictionary translating decision class identifiers into MeSH subheadings, can be provided on request after completion of the challenge.

Data format: The data set is provided in a tabular form as two tab-separated values files, namely trainingData.csv (the training set) and testData.csv (the test set). They can be downloaded only after a successful registration to the competition. Each row of those data files represents a single document and, in the consecutive columns, it contains integers ranging from 0 to 1000, expressing association strengths to corresponding MeSH terms. Additionally, there is a trainingLables.txt file, whose consecutive rows correspond to entries in the training set (trainingData.csv). Each row of that file is a list of topic identifiers (integers ranging from 1 to 83), separated by commas, which can be regarded as a generalized classification of a journal article. This information is not available for the test set and has to be predicted by participants.

It is worth noting that, due to nature of the considered problem, the data sets are highly dimensional – the number of columns roughly corresponds to the MeSH ontology size. The data sets are also sparse, since usually only a small fraction of the MeSH terms is assigned to a particular document by our tagging algorithm. Finally, a large number of data columns have little (or even none) non-zero values (corresponding concepts are rarely assigned to documents). It is up to participants to decide which of them are still useful for the task.

I am looking at it as an opportunity to learn a good bit about automatic text classification and what, if any, role that topic maps can play in such a scenario.

Suggestions as well as team members are most welcome!

January 2, 2012

Topical Classification of Biomedical Research Papers

Filed under: Bioinformatics,Biomedical,Contest,Medical Informatics,MeSH,PubMed — Patrick Durusau @ 6:36 pm

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

From the webpage:

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, is a special event of Joint Rough Sets Symposium (JRS 2012, http://sist.swjtu.edu.cn/JRS2012/) that will take place in Chengdu, China, August 17-20, 2012. The task is related to the problem of predicting topical classification of scientific publications in a field of biomedicine. Money prizes worth 1,500 USD will be awarded to the most successful teams. The contest is funded by the organizers of the JRS 2012 conference, Southwest Jiaotong University, with support from University of Warsaw, SYNAT project and TunedIT.

Introduction: Development of freely available biomedical databases allows users to search for documents containing highly specialized biomedical knowledge. Rapidly increasing size of scientific article meta-data and text repositories, such as MEDLINE [1] or PubMed Central (PMC) [2], emphasizes the growing need for accurate and scalable methods for automatic tagging and classification of textual data. For example, medical doctors often search through biomedical documents for information regarding diagnostics, drugs dosage and effect or possible complications resulting from specific treatments. In the queries, they use highly sophisticated terminology, that can be properly interpreted only with a use of a domain ontology, such as Medical Subject Headings (MeSH) [3]. In order to facilitate the searching process, documents in a database should be indexed with concepts from the ontology. Additionally, the search results could be grouped into clusters of documents, that correspond to meaningful topics matching different information needs. Such clusters should not necessarily be disjoint since one document may contain information related to several topics. In this data mining competition, we would like to raise both of the above mentioned problems, i.e. we are interested in identification of efficient algorithms for topical classification of biomedical research papers based on information about concepts from the MeSH ontology, that were automatically assigned by our tagging algorithm. In our opinion, this challenge may be appealing to all members of the Rough Set Community, as well as other data mining practitioners, due to its strong relations to well-founded subjects, such as generalized decision rules induction [4], feature extraction [5], soft and rough computing [6], semantic text mining [7], and scalable classification methods [8]. In order to ensure scientific value of this challenge, each of participating teams will be required to prepare a short report describing their approach. Those reports can be used for further validation of the results. Apart from prizes for top three teams, authors of selected solutions will be invited to prepare a paper for presentation at JRS 2012 special session devoted to the competition. Chosen papers will be published in the conference proceedings.

Data sets became available today.

This is one of those “praxis” opportunities for topic maps.

Powered by WordPress