Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 19, 2011

MIMI Merge Process

Filed under: Bioinformatics,Biomedical,Data Source,Merging — Patrick Durusau @ 2:01 pm

Michigan Molecular Interactions

From the website:

MiMI provides access to the knowledge and data merged and integrated from numerous protein interactions databases. It augments this information from many other biological sources. MiMI merges data from these sources with “deep integration” (see The MiMI Merge Process section) into its single database. A simple yet powerful user interface enables you to query the database, freeing you from the onerous task of having to know the data format or having to learn a query language. MiMI allows you to query all data, whether corroborative or contradictory, and specify which sources to utilize.

MiMI displays results of your queries in easy-to-browse interfaces and provides you with workspaces to explore and analyze the results. Among these workspaces is an interactive network of protein-protein interactions displayed in Cytoscape and accessed through MiMI via a MiMI Cytoscape plug-in.

MiMI gives you access to more information than you can get from any one protein interaction source such as:

  • Vetted data on genes, attributes, interactions, literature citations, compounds, and annotated text extracts through natural language processing (NLP)
  • Linkouts to integrated NCIBI tools to: analyze overrepresented MeSH terms for genes of interest, read additional NLP-mined text passages, and explore interactive graphics of networks of interactions
  • Linkouts to PubMed and NCIBI’s MiSearch interface to PubMed for better relevance rankings
  • Queriying by keywords, genes, lists or interactions
  • Provenance tracking
  • Quick views of missing information across databases.
  • I found the site looking for tracking of provenance after merging and then saw the following description of merging:

    MIMI Merge Process

    Protein interaction data exists in a number of repositories. Each repository has its own data format, molecule identifier, and supplementary information. MiMI assists scientists searching through this overwhelming amount of protein interaction data. MiMI gathers data from well-known protein interaction databases and deep-merges the information.

    Utilizing an identity function, molecules that may have different identifiers but represent the same real-world object are merged. Thus, MiMI allows the user to retrieve information from many different databases at once, highlighting complementary and contradictory information.

    There are several steps needed to create the final MiMI dataset. They are:

    1. The original source datasets are obtained, and transformed into the MiMI schema, except KEGG, NCBI Gene, Uniprot, Ensembl.
    2. Molecules that can be rolled into a gene are annotated to that gene record.
    3. Using all known identifiers of a merged molecule, sources such as Organelle DB or miBLAST, are queried to annotate specific molecular fields.
    4. The resulting dataset is loaded into a relational database.

    Because this is an automated process, and no curation occurs, any errors or misnomers in the original data sources will also exist in MiMI. For example, if a source indicates that the organism is unknown, MiMI will as well.

    If you find that a molecule has been incorrectly merged under a gene record, please contact us immediately. Because MiMI is completely automatically generated, and there is no data curation, it is possible that we have merged molecules with gene records incorrectly. If made aware of the error, we can and will correct the situation. Please report any problems of this kind to mimi-help@umich.edu.

    Tracking provenance is going to be a serious requirement for mission critical, financial and medical usage topic maps.

    No Comments

    No comments yet.

    RSS feed for comments on this post.

    Sorry, the comment form is closed at this time.

    Powered by WordPress