Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 22, 2011

DocumentLens

Filed under: Bioinformatics,Biomedical,DocumentLens,Navigation — Patrick Durusau @ 6:37 pm

DocumentLens – A Revolution In How Researchers Access Information & Colleagues

From the post:

Keeping up with the flood of scientific information has been challenging…Spotting patterns and extracting useful information has been even harder. DocumentLens™ has just made it easier to gain insightful knowledge from information and to share ideas with collaborators.

Praxeon, Inc., the award-winning Boston-based leader in delivering knowledge solutions for the Healthcare and Life Science communities, today announced the launch of DocumentLens™. Their cloud-based web application helps scientific researchers deal with the ever increasing deluge of online and electronic data and information from peer-reviewed journals, regulatory sites, patents and proprietary sources. DocumentLens provides an easy-to-utilize environment to enrich discovery, enhance idea generation, shorten the investigation time, improve productivity and engage collaboration.

“One of the most challenging problems researchers face is collecting, integrating and understanding new information. Keeping up with peer-reviewed journals, regulatory sites, patents and proprietary sources, even in a single area of research, is time consuming. But failure to keep up with information from many different sources results in knowledge gaps and lost opportunities,” stated Dr. Dennis Underwood, Praxeon CEO.

“DocumentLens is a web-based tool that enables you to ask the research question you want to ask – just as you would ask a colleague,” Underwood went on to say. “You can also dive deeper into research articles, explore the content and ideas using DocumentLens and integrate them with sources that you trust and rely on. DocumentLens takes you not only to the relevant documents, but to the most relevant sections saving an immense amount of time and effort. Our DocumentLens Navigators open up your content, using images and figures, chemistry and important topics. Storylines provide a place to accumulate and share insights with colleagues.”

Praxeon has created www.documentlens.com, a website devoted to the new application that contains background on the use of the software, the Eye of the Lens blog (http://www.documentlens.com/blog), and a live version of DocumentLens™ for visitors to try out free-of-charge to see for themselves firsthand the value of the application.

OK, so I do one of the sandbox pre-composed queries: “What is the incidence and prevalence of dementia?”

and DocumentLens reports back that page 15 of a document has relevant information (note, not the entire document but a particular page), highlighted material included:

conducting a collaborative, multicentre trial in FTLD. Such a collaborative effort will certainly be necessary to recruit the cohort of over 200 FTLD patients per trial that may be needed to demonstrate treatment effects in FTLD.[194]

3. Ratnavalli E, Brayne C, Dawson K, et al. The prevalence of frontotemporal dementia. Neurology 2002;58:1615–21. [PubMed: 12058088]

4. Mercy L, Hodges JR, Dawson K, et al. Incidence of early-onset dementias in Cambridgeshire,

8. Gislason TB, Sjogren M, Larsson L, et al. The prevalence of frontal variant frontotemporal dementia and the frontal lobe syndrome in a population based sample of 85 year olds. J Neurol Neurosurg

The first text block has no obvious (or other) relevance to the question of incidence or prevalence of dementia.

The incomplete marking of citations 4 and 8 occurs for no apparent reason.

Like any indexing resource, its value depends on the skill of the indexers.

There are the usual issues, how do I reliably share information with other DocumentLens or even non-DocumentLens users? Can I and other users create interoperable files in parallel? Do we need or required to have a common vocabulary? How do we integrate materials that use other vocabularies?

(Do send a note to the topic map naysayers. Product first, then start selling it to customers.)

June 13, 2011

Linking Science and Semantics… (webinar)
15 June 2011 – 10 AM PT (17:00 GMT)

Filed under: Bioinformatics,Biomedical,OWL,RDF,Semantics — Patrick Durusau @ 7:03 pm

Linking science and semantics with the Annotation Ontology and the SWAN Annotation Tool

Abstract:

The Annotation Ontology (AO) is an open ontology in OWL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation.

The SWAN Annotation Tool, recently renamed DOMEO (Document Metadata Exchange Organizer), is an extensible web application enabling users to visually and efficiently create and share ontology-based stand-off annotation metadata on HTML or XML document targets, using the Annotation Ontology RDF model. The tool supports manual, fully automated, and semi-automated annotation with complete provenance records, as well as personal or community annotation with access authorization and control.
[AO] http://code.google.com/p/annotation-ontology

I’m interested in how “stand-off” annotation is being handled, being an overlapping markup person myself. Also curious how close it comes to HyTime like mechanisms.

More after the webinar.

June 10, 2011

Improvements in Bio4j Go Tools
(Graph visualization)

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 6:32 pm

Improvements in Bio4j Go Tools (Graph visualization)

From the website:

A new version of Bio4j Go Tools viewer is available, it includes improvements in the graph visualization of GO annotation results.
These are the new features:

  • Load GO annotation results from URL: There’s no need anymore to upload the XML file with the results everytime you want to see the graph visualization. Just enter the publicly accessible URL of the file and the server will directly get the file for you.
  • Restrict the visualization to only one GO sub-ontology at a time: Terms belonging to different sub-ontologies (cellular component, biological process, molecular function) are not mixed up anymore.
  • Choice of layout algorithms: You can choose between two different layout algorithms for the visualization, (Yifan Hu and Fruchterman Reingold).
  • Customizable layout algorithm time: Range of 1-10 minutes.

A tutorial is also linked from this page that demonstrates the features of Bio4j.

June 6, 2011

OBML 2011 – 3. Workshop of
Ontologies in Biomedicine and Life Sciences

Filed under: Biomedical,Conferences,Ontology — Patrick Durusau @ 2:00 pm

OBML 2011 – 3. Workshop of Ontologies in Biomedicine and Life Sciences

Important Dates

Submission of papers June 30, 2011
Notification of review results August 10, 2011
Deadline for revised versions September 9, 2011
Workshop October 6-7, 2011

Goals of the OBML

The series “Ontologies in Biomedicine and Life Sciences” (OBML workshop) was initiated by the workgroup for OBML of the German Society for Computer Science in 2009. The OBML aims to bring together scientists who are working in this area to exchange ideas and discuss new results, to start collaborations and to initiate new projects. The OBML workshop is held once annually and deals with all fundamental aspects of biomedical ontologies as well as additional “hot” topics.

Submissions are requested especially for the following topics:

  • Ontologies and terminologies in biology, medicine, and clinical research;
  • Ontologies for knowledge representation, methods of reasoning, integration and interoperability of ontologies;
  • Methods and tools for the construction and management of ontologies; and 
  • Applications of the Semantic Web in biomedicine and the life sciences.

The focus of the OBML-2011 is Phenotype ontologies in medicine and biomedical research

“Integration” and “interoperability,” it sounds like they are singing the topic map song! 😉

June 5, 2011

Bio4j includes RefSeq data now !

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 3:20 pm

Bio4j includes RefSeq data now !

A word about the RefSeq data (I haven’t reproduced all the hyperlinks, which are many):

NCBI’s Reference Sequence (RefSeq) database is a collection of taxonomically diverse, non-redundant and richly annotated sequences representing naturally occurring molecules of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC). Similar to a review article, a RefSeq is a synthesis of information integrated across multiple sources at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and functional information. They are generated to provide reference standards for multiple purposes ranging from genome annotation to reporting locations of sequence variation in medical records. The RefSeq collection is available without restriction and can be retrieved in several different ways, such as by searching or by available links in NCBI resources, including PubMed, Nucleotide, Protein, Gene, and Map Viewer, searching with a sequence via BLAST, and downloading from the RefSeq FTP site.

Source: http://www.ncbi.nlm.nih.gov/books/NBK21091/

BTW, note that the RefSeq information is stored in the Bio4J DB but the sequences are held as separate files on S3. See the blog post for details. (Thanks to @pablopareja for the correction on storage of refseq information in the Bio4J DB.)

June 2, 2011

Intermine

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 7:45 pm

Intermine

From the website:

InterMine is a powerful open source data warehouse system. Using InterMine, you can create databases of biological data accessed by sophisticated web query tools. Parsers are provided for integrating data from several common biological formats and there is a framework for adding your own data. InterMine includes an attractive, user-friendly web interface that works ‘out of the box’ and can be easily customised for your specific needs, as well as a powerful, scriptable web-service API to allow programmatic access to your data.

Intermine is biological data integration software, the uses of which, provide a window into the complexities of data integration.

Powered by InterMine:

Definitely a project where topic mappers can see what has been done already for integration of biological data as well as find places where topic maps can contribute to further solutions.

Annotation Ontology and the SWAN Annotation Tool Webinar – 15 June 2011 10 AM PT

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 7:44 pm

Linking science and semantics with Annotation Ontology and the SWAN Annotation Tool

From the website:

ABSTRACT:

Annotation Ontology (AO) is an open ontology in OWL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation.

The SWAN Annotation Tool, recently renamed DOMEO (Document Metadata Exchange Organizer), is an extensible web application enabling users to visually and efficiently create and share ontology-based stand-off annotation metadata on HTML or XML document targets, using the Annotation Ontology RDF model. The tool supports manual, fully automated, and semi-automated annotation with complete provenance records, as well as personal or community annotation with access authorization and control.

[AO] http://code.google.com/p/annotation-ontology/

SPEAKER BIO:

Paolo Ciccarese is Instructor at the Harvard Medical School and Assistant in Neurology at the Massachusetts General Hospital. After obtaining his MS in Computer Science, he started his career as a freelance consultant in knowledge management software development. Soon after, Paolo received a PhD in Bioengineering and Bioinformatics from the University of Pavia, Italy. Here, he was also a teaching assistant for five years in courses on the subjects of artificial intelligence in medicine and object orientation programming. Outside of his doctorate work, Paolo co-developed the RDF visualizer Welkin for the SIMILE project and founded the JDPF (Java Data Processing Framework) project, a modular and extendable open source infrastructure for processing big quantities of heterogeneous data.

Immediately following the completion of his PhD, Paolo became a research fellow in Department of Neurology at Massachusetts General Hospital where he co-developed the SWAN (Semantic Web Applications in Neuromedicine) platform. Paolo authored several ontologies including the SWAN Ontology, the PAV (Provenance, Authoring and Versioning) ontology and the Annotation Ontology (AO). Since 3 years, he also serves as coordinator of several subtasks of the Scientific Discourse task force at the W3C HCLS Interest Group. Currently, Paolo is focusing on the design and development of knowledge management tools leveraging Semantic Web technologies integrating the annotation of online resources.

Extra credit: What does “stand-off” annotation have in common with HyTime? 😉

May 31, 2011

Biomedical Annotation…Webinar
1 June 2011 – 10 AM PT (17:00 GMT)

Filed under: Bioinformatics,Biomedical,Ontology — Patrick Durusau @ 6:42 pm

Biomedical Annotation by Humans and computers in a Keyword-driven world

From the website:

Abstract:

As part of our project with the NCBO we have been curating expression experiments housed in NCBI’s GEO data base and annotating a variety of rat-related records using the NCBO Annotator and more recently, mining data from the NCBO Resource Index. The annotation pipelines and curation tools that we have built have demonstrated some strengths and shortfalls of automated ontology annotation. Similarly our manual curation of these records highlights areas where human involvement could be improved to better address the fact that we are living in the Google era where findability is King.

Speaker Bio:

Simon Twigger currently splits his time between being an Assistant Professor in the Human and Molecular Genetics Center at the Medical College of Wisconsin in Milwaukee and exploring the iPhone and iPad as mobile platforms for education and interaction. At MCW he has been an investigator on the Rat Genome Database project for the past 10 years, he worked with the Gene Ontology project and has been active in the BioCuration community as co-organizer of the past three International BioCuration meetings. He is the former Director of Bioinformatics for the MCW Proteomics Center and was previously the Biomedical Informatics Key Function Director for the MCW Clinical & Translational Science Institute. He is a Semantic web enthusiast and is eagerly awaiting the rapture of Web 3.0 when all the data will be taken up into the Linked Data cloud and its true potential realized.

Annotation, useful annotation anyway, is based on recognition of the subject of annotation. Should prove to be an interesting presentation.


Notes from the webinar:

(My personal notes while viewing the webinar in real time. The webinar controls in all cases of conflict. Posted to interest others in viewing the stored version of the webinar.)

Rat Genome Database: http://rgd.mcw.edu / interesting questions that researchers ask / Where to find answers, PubMed 20 million+ citations, almost 1 per minute / search is the critical thing – in all interfaces / “Being able to find information is of great importance to researchers.” / NCBO Annotator www.bioontology.org/wiki/index.php/Annotator_Web_service / records annotated – curated the raw annotations – manual effort needed to track it down – / rat strain synonyms has issues / work flow description / mouse gut maps to course (ex. of mapping issue) / Linking annotations to data / RatMine faceted-search + lucene text indexing , interesting widgets / – Driving “Biological” Problem Part 2 – 55.6 % of researchers rarely use archival databases, 56.0% rarely use published literature / 3rd International biocurator meeting Amos Bairoch – “trying to second guess what the authors really did and found.” / post-publication effort to make content be found. different from academic model where publication simply floats along. / illustration of where the annotation path fails and the consequences of that failure. / very cool visualization of how annotations can be visualized and the value thereof / put in keywords and don’t care about it being found (paper) , NCBO Resource Index could be a “semantic warehouse” of connections, websites: gminer.mcw.edu, github.com/mcwbbc/, bioportal.bioontology.org, simont -at- mcw.edu @simon_t

April 25, 2011

GO Annotation Tools

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 3:36 pm

GO Annotation Tools

Sponsored by the GO Ontology Consortium that I covered here, but I thought the tools merited separate mention.

Many of these tools will be directly applicable to bioinformatics use of topic maps and/or will give you ideas for similar tools in other domains.

Gene Ontology

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 3:35 pm

Gene Ontology

From the website:

The Gene Ontology project is a major bioinformatics initiative with the aim of standardizing the representation of gene and gene product attributes across species and databases. The project provides a controlled vocabulary of terms for describing gene product characteristics and gene product annotation data from GO Consortium members, as well as tools to access and process this data.

I was encouraged by the following in the description of the project:

GO is not a way to unify biological databases (i.e. GO is not a ‘federated solution’). Sharing vocabulary is a step towards unification, but is not, in itself, sufficient. Reasons for this include the following:

Knowledge changes and updates lag behind.

Individual curators evaluate data differently. While we can agree to use the word ‘kinase’, we must also agree to support this by stating how and why we use ‘kinase’, and consistently apply it. Only in this way can we hope to compare gene products and determine whether they are related. GO does not attempt to describe every aspect of biology; its scope is limited to the domains described above. (emphasis added)

It is refreshing to see a project that acknowledges that sharing vocabulary is a major and worthwhile step.

One that falls short of universal unification.

Go Annotation graph ….

Filed under: Bioinformatics,Biomedical,Graphs — Patrick Durusau @ 3:34 pm

GO Annotation graph visualizations with Bio4j Go Tools + Gephi Toolkit + SiGMa project

Interactive graph visualization for protein GO annotations. (GO = Gene Ontology)

From the post:

Bio4j Go Tools includes now a new feature providing you with an interactive graph visualization for protein GO annotations.
The url of the app is still the same old one.

On the server side, we’re using Gephi Toolkit for applying layout algorithms while the corresponding Gexf file is generated with the class GephiExporter from BioinfoUtil project. The service is included in the project Bio4jTestServer, specifically the servlet GetGoAnnotationGexfServlet.

Regarding to the client side, we’re using the open-source project SiGMa for graph-visualization.

Interesting for the visualization aspects as well as the subject matter.

April 18, 2011

Sage Bionetworks Commons Congress – 2011

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 1:42 pm

Sage Bionetworks Commons Congress – 2011

Presentations and videos.

If you are at all concerned with semantic integration in healthcare contexts, this is a must see resource.

It is going to take a while to work through all the individual presentations.

Biomedical Machine Learning Classifiers

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources by Raul Pollán, Miguel Angel Guevara Lopez and Eugénio da Costa Oliveira.

Abstract:

This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.

I recommend a close reading of the article but the concluding lines caught my eye:

…tuning classifier parameters is mostly a heuristic task, not existing rules providing knowledge about what parameters to choose when training a classifier. Through BiomedTK we are gathering data about performance of many classifiers, trained each one with different parameters, ANNs, SVM, etc. This by itself constitutes a dataset that can be data mined to understand what set of parameters yield better classifiers for given situations or even generally. Therefore, we intend to use BiomedTK on this bulk of classifier data to gain insight on classifier parameter tuning.

The dataset about training classifiers may be as important if not more so than use of the framework in harnessing Grid computing resources for biomedical analysis. Looking forward to reports on that dataset.

April 2, 2011

Pathogen Portal

Filed under: Bioinformatics,Biomedical,Dataset — Patrick Durusau @ 5:34 pm

Pathogen Portal, The Bioinformatics Resource Centers Portal.

From the website:

Pathogen Portal is a repository linking to the Bioinformatics Resource Centers (BRCs) sponsored by the National Institute of Allergy and Infectious Diseases (NIAID) and maintained by The Virginia Bioinformatics Institute. The BRCs are providing web-based resources to scientific community conducting basic and applied research on organisms considered potential agents of biowarfare or bioterrorism or causing emerging or re-emerging diseases.

Motherlode of resources and datasets on “…potential agents of biowarfare or bioterrorism….”

I read an article years ago in Popular Science about smearing punji stakes with water buffalo excrement. A primitive, but effective, form of biowarfare.

I suppose that would fall in the realm of applied research for purposes of a topic map.

EuPathDB

Filed under: Bioinformatics,Biomedical,TMQL — Patrick Durusau @ 5:29 pm

EuPathDB

From the website:

EuPathDB Bioinformatics Resource Center for Biodefense and Emerging/Re-emerging Infectious Diseases is a portal for accessing genomic-scale datasets associated with the eukaryotic pathogens (Cryptosporidium, Encephalitozoon, Entamoeba, Enterocytozoon, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma).

OK, other than being cited in the previous post about integration using ontologies, why is this relevant to topic maps?

Check out the web tutorial on search strategies.

Now imagine being able to select/revise/view results for a TMQL query.

It would take some work and no doubt be domain specific, but I thought the example would be worth bringing to your attention.

Not to mention that these are data sets where improved access using topic maps could attract attention.

Using ontologies in integrative tools for protozoan parasite research

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 5:26 pm

Using ontologies in integrative tools for protozoan parasite research

Abstract:

Protozoan parasites such as those that cause malaria and toxoplasmosis remain major threats to global health, and a significant biodefense concern. Current treatments are limited and sometimes compromised by acquired resistance. Solutions will come from the integration and mining of ongoing research. The need for data integration is common among research communities tackling complex topics such as the biology of eukaryotic pathogens, their interaction with hosts, and the search for druggable targets and vaccine candidates. Biomedical researchers have greatly benefited from the Gene Ontology (GO) that provides standardized terms for annotating protein function, location, and participation in processes. GO and other relevant ontologies have largely been developed to support human and model organism biology with only limited representation of protozoan parasite biology. In addition, the availability and use of standard terms is also very limited for the inputs and outputs of bioinformatic tools that are commonly used to analyze protozoan parasite datasets and is a barrier for linking these tools together. In the Integrative Tools for Protozoan Parasite Research (ITPPR) project, we have started addressing these areas by developing tools needed by the communities served by EuPathDB (http://eupathdb.org/). We are using ontology-based models as part of our process to build tools for collecting information on isolates, describing phenotypic outcomes of transgenic parasites, and for joining web services running sequence similarity and alignment analysis. Ontologies are drawn from the OBO Foundry and include the Infectious Disease Ontology (IDO) and OBI (Ontology for Biomedical Investigations).

Topic: NCBO Webinar Series
Date: Wednesday, April 6, 2011
Time: 10:00 am, Pacific Daylight Time (San Francisco, GMT-07:00)

That’s the Wednesday following this post.

An area where integration of data can make a difference.

March 30, 2011

Playing with Gephi, Bio4j and Go

Filed under: Biomedical,Gephi,Visualization — Patrick Durusau @ 12:36 pm

Playing with Gephi, Bio4j and Go

From the blog:

It had already been some time without having some fun with Gephi so today I told myself: why not trying visualizing the whole Gene Ontology and seeing what happens?

First of all I had to generate the corresponding file in gexf format containing all the terms and relationships belonging to the ontology.

For that I did a small program (GenerateGexfGo.java) which uses Bio4j for terms/relationships info retrieval and a couple of XML Gexf wrapper classes from the github project Era7BioinfoXML.

This looks like fun!

And a good way to look at an important data set, that could benefit from a topic map.

March 19, 2011

Bio4j

Filed under: Bioinformatics,Biomedical,Graphs,Neo4j — Patrick Durusau @ 6:10 pm

Bio4j

From the website:

Bio4j is a bioinformatics graph based DB including most data available in UniProt (SwissProt + Trembl), Gene Ontology (GO) and UniRef (50,90,100).

Bio4j provides a completely new and powerful framework for protein related information querying and management. Since it relies on a high-performance graph engine, data is stored in a way that semantically represents its own structure. On the contrary, traditional relational databases must flatten the data they represent into tables, creating “artificial” ids in order to connect the different tuples; which can in some cases eventually lead to domain models that have almost nothing to do with the actual structure of data.

I am particularly interested in incorporate you own data feature:

New data sources and features will be added from time to time and what it’s more important, the Java API allows you to easily incorporate your own data to Bio4j so you can make the best out of it.

March 10, 2011

PSB 2012

Filed under: Bioinformatics,Biomedical,Conferences — Patrick Durusau @ 11:49 am

PSB 2012

From the website:

The Pacific Symposium on Biocomputing (PSB) 2012 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Papers and presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2012 will be held January 3-7, 2012 at the Fairmont Orchid on the Big Island of Hawaii. Tutorials will be offered prior to the start of the conference.

PSB 2012 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. PSB is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.

The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders in the emerging areas and targeted to provide a forum for publication and discussion of research in biocomputing’s “hot topics.” In this way, PSB provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field.

Proceeding from 1996 are available online (approx. 90%)

I will be looking through the proceeding to pull out ones that may be of particular interest to the topic maps community.

March 9, 2011

Getting Genetics Done

Filed under: Bioinformatics,Biomedical,Uncategorized — Patrick Durusau @ 4:28 pm

Getting Genetics Done

Interesting blog site for anyone interested in genetics research and/or data mining issues related to genetics.

If you are looking for a community building exercise, see the Journal club entries.

February 20, 2011

AllegroMCOCE: GPU-accelerated Cytoscape Plugin
TM Explorer?

Filed under: Bioinformatics,Biomedical,Graphic Processors,Networks — Patrick Durusau @ 10:45 am

AllegroMCOCE: GPU-accelerated Cytoscape Plugin

From the website:

AllegroMCODE is a high-performance Cytoscape plugin to find clusters, or highly interconnected groups of nodes in a huge complex network such as a protein interaction network and a social network in real time. AllegroMCODE finds the same clusters as the MCODE plugin does, but the analysis usually takes less than a second even for a large complex network. The plugin user interface of AllegroMCODE is based on MOCDE and has additional features. AllegroMCODE is an open source software and freely available under LGPL.

Cluster has various meanings according to the sources of networks. For instance, a protein-protein interaction network is represented as proteins are nodes and interactions between proteins are edges. Clusters in the network can be considered as protein complexes and functional modules, which can be identified as highly interconnected subgraphs. For social networks, people and their relationships are represented as nodes and edges, respectively. A cluster in the network can be considered as a community which has strong inter-relationship among their members.

AllegroMCODE exploits our high performance GPU computing architecture to make your analysis task faster than ever. The analysis task of the MCODE algorithm to find the clusters can be long for large complex networks even though the MCODE is a relatively fast method of clustering. AllegroMCODE provides our parallel algorithm implementation base on the original sequential MCODE algorithm. It can achieve two orders of magnitude speedup for the analysis of a large complex network by using the latest graphics card. You can also exploit the GPU acceleration without any special graphics hardware since it provides the seamless remote processing in our free GPU computing server.

You do not need to purchase any special GPU hardware or systems and also not to care about the tedious installation task of them. All you have to do are to install the AllegroMCODE plugin module on your computer and create a free account on our server.

Simply awesome!

The ability to dynamically explore and configure topic maps will be priceless.

A greater gap than between hot-lead type and a modern word processor.

Will take weeks/months to fully explore but wanted to bring it to your attention.

February 19, 2011

IOM-NAE Health Data Collegiate Challenge – 27 April 2010 Deadline

Filed under: Biomedical,Challenges — Patrick Durusau @ 4:23 pm

IOM-NAE Health Data Collegiate Challenge

From the website:

The IOM and the National Academy of Engineering (NAE) of the National Academies invite college and university students to participate in an exciting, new initiative to transform health data into innovative, effective new applications (apps) and tools that take on the nation’s pressing health issues. With reams of U.S. Department of Health and Human Services (HHS) data and other health data newly available as part of the Health Data Initiative (HDI), students have an unprecedented opportunity to create interactive apps and other tools that engage and empower people in ways that lead to better health. Working in interdisciplinary teams that meld technological skills with health knowledge, the IOM and NAE believe that college students can generate powerful new products—the next “viral app”— to improve health for communities and individuals.

Along with spreading this one on college campuses, need to also point out the advantages of topic maps!

February 18, 2011

NECOBELAC

Filed under: Biomedical,Marketing,Medical Informatics — Patrick Durusau @ 5:37 am

NECOBELAC

From the webpage:

NECOBELAC is a Network of Collaboration Between Europe & Latin American-Caribbean countries. The project works in the field of public health NECOBELAC aims to improve scientific writing, promote open access publication models, and foster technical and scientific cooperation between Europe & Latin American Caribbean (LAC) countries.

NECOBELAC acts through training activities in scientific writing and open access by organizing courses for trainers in European and LAC institutions.

Topic maps get mentioned in the faqs for the project: NECOBELAC Project FAQs

Is there any material (i.e. introductory manuals) explaining how the topic maps have been generated as knowledge representation and how can be optimally used?

Yes, a reliable tool introducing the scope and use of the topic maps is represented by the “TAO of topic maps” by Steve Pepper. This document clearly describes the characteristics of this model, and provides useful examples to understand how it actually works.

Well,…, but this is 2011 and NECOBELAC represents a specific project focused on public health.

Perhaps using the “TAO of topic maps” as a touchstone, but we surely can produce more project specific guidance. Yes?

Please post a link about your efforts or a comment here if you decide to help out.

February 8, 2011

Stochastic Modelling for Systems Biology

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 11:07 am

Stochastic Modelling for Systems Biology

I stumbled across this while running down material on Monte Carlo models.

From the website:

Although stochastic kinetic models are increasingly accepted as the best way to represent and simulate genetic and biochemical networks, most researchers in the field have limited knowledge of stochastic process theory. Stochastic Modeling for Systems Biology provides an accessible introduction to this theory using examples that are familiar to systems biology researchers. Focusing on computer simulation, the author examines the use of stochastic processes for modeling biological systems. Along with the latest simulation techniques and research material, such as parameter inference, the text includes many examples and figures as well as software code in R for various applications.

The art of constructing or at least reading models is an important skill for topic map authors.

Systems biology has been, is and will continue to be a hot property.

Bringing the advantages of topic maps to models in systems biology would be a win-win situation.

February 3, 2011

Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS) 2010

Filed under: Biomedical,Conferences,Neural Networks — Patrick Durusau @ 3:18 pm

Twenty-Fourth Annual Conference on Neural Information Processing Systems (NIPS) 2010

Another treasure trove of conference presentations, tutorials and other materials of interest to anyone working on information systems.

From the website:

You are invited to participate in the Twenty-Fourth Annual Conference on Neural Information Processing Systems, which is the premier scientific meeting on Neural Computation.

A one-day Tutorial Program offered a choice of six two-hour tutorials by leading scientists. The topics span a wide range of subjects including Neuroscience, Learning Algorithms and Theory, Bioinformatics, Image Processing, and Data Mining.

The NIPS Conference featured a single track program, with contributions from a large number of intellectual communities. Presentation topics include: Algorithms and Architectures; Applications; Brain Imaging; Cognitive Science and Artificial Intelligence; Control and Reinforcement Learning; Emerging Technologies; Learning Theory; Neuroscience; Speech and Signal Processing; and Visual Processing.

There were two Posner Lectures named in honor of Ed Posner who founded NIPS. Ed worked on communications and information theory at Caltech and was an early pioneer in neural networks. He organized the first NIPS conference and workshop in Denver in 1989 and incorporated the NIPS Foundation in 1992. He was an inpiring teacher and an effective leader. His untimely death in a bicycle accident in 1993 was a great loss to our community. Posner Lecturers were Josh Tenebaum and Michael Jordan.

The Poster Sessions offered high-quality posters and an opportunity for researchers to share their work and exchange ideas in a collegial setting. The majority of contributions accepted at NIPS were presented as posters.

The Demonstrations enabled researchers to highlight scientific advances, systems, and technologies in ways that go beyond conventional poster presentations. It provided a unique forum for demonstrating advanced technologies — both hardware and software — and fostering the direct exchange of knowledge.

February 1, 2011

STRING – Known and Predicted Protein-Protein Interactions

Filed under: Associations,Bioinformatics,Biomedical — Patrick Durusau @ 7:43 pm

STRING – Known and Predicted Protein-Protein Interactions

From the website:

STRING is a database of known and predicted protein interactions.

The interactions include direct (physical) and indirect (functional) associations; they are derived from four sources:

  • Genomic Context
  • High-throughput Experiments
  • (Conserved) Coexpression
  • Previous Knowledge

STRING quantitatively integrates interaction data from these sources for a large number of organisms, and transfers information between these organisms where applicable. The database currently covers 2,590,259 proteins from 630 organisms. (Note: I had to alter the presentation from the website, which was a table to a list for the sources for the interactions.)

Looks like fertile ground for research on associations.

January 19, 2011

NCIBI – National Center for Integrative Biomedical Informatics

Filed under: Bioinformatics,Biomedical,Heterogeneous Data,Merging — Patrick Durusau @ 2:13 pm

NCIBI – National Center for Integrative Biomedical Informatics

From the website:

The National Center for Integrative Biomedical Informatics (NCIBI) is one of seven National Centers for Biomedical Computing (NCBC) within the NIH Roadmap. The NCBC program is focused on building a universal computing infrastructure designed to speed progress in biomedical research. NCIBI was founded in September 2005 and is based at the University of Michigan as part of the Center for Computational Medicine and Bioinformatics (CCMB).

Note the use of integrative in the name of the center.

They “get” that part.

They are in fact working on mappings to support integration of data even as I write these lines.

There is a lot to be learned about their strategies for integration and to better understand the integration issues they face in this domain. This site is a good starting place to do both.

MIMI Merge Process

Filed under: Bioinformatics,Biomedical,Data Source,Merging — Patrick Durusau @ 2:01 pm

Michigan Molecular Interactions

From the website:

MiMI provides access to the knowledge and data merged and integrated from numerous protein interactions databases. It augments this information from many other biological sources. MiMI merges data from these sources with “deep integration” (see The MiMI Merge Process section) into its single database. A simple yet powerful user interface enables you to query the database, freeing you from the onerous task of having to know the data format or having to learn a query language. MiMI allows you to query all data, whether corroborative or contradictory, and specify which sources to utilize.

MiMI displays results of your queries in easy-to-browse interfaces and provides you with workspaces to explore and analyze the results. Among these workspaces is an interactive network of protein-protein interactions displayed in Cytoscape and accessed through MiMI via a MiMI Cytoscape plug-in.

MiMI gives you access to more information than you can get from any one protein interaction source such as:

  • Vetted data on genes, attributes, interactions, literature citations, compounds, and annotated text extracts through natural language processing (NLP)
  • Linkouts to integrated NCIBI tools to: analyze overrepresented MeSH terms for genes of interest, read additional NLP-mined text passages, and explore interactive graphics of networks of interactions
  • Linkouts to PubMed and NCIBI’s MiSearch interface to PubMed for better relevance rankings
  • Queriying by keywords, genes, lists or interactions
  • Provenance tracking
  • Quick views of missing information across databases.
  • I found the site looking for tracking of provenance after merging and then saw the following description of merging:

    MIMI Merge Process

    Protein interaction data exists in a number of repositories. Each repository has its own data format, molecule identifier, and supplementary information. MiMI assists scientists searching through this overwhelming amount of protein interaction data. MiMI gathers data from well-known protein interaction databases and deep-merges the information.

    Utilizing an identity function, molecules that may have different identifiers but represent the same real-world object are merged. Thus, MiMI allows the user to retrieve information from many different databases at once, highlighting complementary and contradictory information.

    There are several steps needed to create the final MiMI dataset. They are:

    1. The original source datasets are obtained, and transformed into the MiMI schema, except KEGG, NCBI Gene, Uniprot, Ensembl.
    2. Molecules that can be rolled into a gene are annotated to that gene record.
    3. Using all known identifiers of a merged molecule, sources such as Organelle DB or miBLAST, are queried to annotate specific molecular fields.
    4. The resulting dataset is loaded into a relational database.

    Because this is an automated process, and no curation occurs, any errors or misnomers in the original data sources will also exist in MiMI. For example, if a source indicates that the organism is unknown, MiMI will as well.

    If you find that a molecule has been incorrectly merged under a gene record, please contact us immediately. Because MiMI is completely automatically generated, and there is no data curation, it is possible that we have merged molecules with gene records incorrectly. If made aware of the error, we can and will correct the situation. Please report any problems of this kind to mimi-help@umich.edu.

    Tracking provenance is going to be a serious requirement for mission critical, financial and medical usage topic maps.

    December 5, 2010

    International Conference on Biomedical Ontology

    Filed under: Bioinformatics,Biomedical,Conferences — Patrick Durusau @ 8:17 pm

    International Conference on Biomedical Ontology Buffalo, NY July 26-20, 2011

    February 1st: Deadline for workshop and tutorial proposals
    March 1st: Deadline for papers

    Call for Paper Details

    Emphasis on:

    • Techniques and technologies for collaborative ontology development
    • Reasoning with biomedical ontologies
    • Evaluation of biomedical ontologies
    • Biomedical ontology and the Semantic Web
      Ontologies for :

    • Biomedical imaging
    • Biochemistry and drug discovery
    • Biomedical investigations, experimentation, clinical trials
    • Clinical and translational research
    • Development and anatomy
    • Electronic health records
    • Evolution and phylogeny
    • Metagenomics
    • Neuroscience, psychiatry, cognition

    Questions:

    1. What role (if any) do you see for topic maps in biomedical ontology development, review or use? (3-5 pages, no citations)
    2. Choose a biomedical ontology or some aspect of its use and describe how you would apply a topic map to it. (3-5 pages, citations)
    3. How would you use a topic map to assist in the creation of a biomedical ontology? (3-5 pages, citations)

    SIMCOMP: A Hybrid Soft Clustering of Metagenome Reads

    Filed under: Bioinformatics,Biomedical,Subject Identity — Patrick Durusau @ 6:54 pm

    SIMCOMP: A Hybrid Soft Clustering of Metagenome Reads Authors: Shruthi Prabhakara, Raj Acharya

    Abstract:

    A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. In this paper, we present a two pass semi-supervised algorithm, SimComp, for soft clustering of short metagenome reads, that is a hybrid of comparative and composition based methods. In the first pass, a comparative analysis of the metagenome reads against BLASTx extracts the reference sequences from within the metagenome to form an initial set of seeded clusters. Those reads that have a significant match to the database are clustered by their phylogenetic provenance. In the second pass, the remaining fraction of reads are characterized by their species-specific composition based characteristics. SimComp groups the reads into overlapping clusters, each with its read leader. We make no assumptions about the taxonomic distribution of the dataset. The overlap between the clusters elegantly handles the challenges posed by the nature of the metagenomic data. The resulting cluster leaders can be used as an accurate estimate of the phylogenetic composition of the metagenomic dataset. Our method enriches the dataset into a small number of clusters, while accurately assigning fragments as small as 100 base pairs.

    I cite this article for the proposition that subject identity may be a multi-pass thing. 😉

    Seriously, as topic maps spread out we are going encounter any number of subject identity practices that don’t involve string match.

    No only do we need to have passing familiarity but also the flexibility to incorporate the user’s expectations about subject identity into our topic maps.

    Questions:

    1. Search on the phrase “metagenomic analysis software”.
    2. Become familiar with any one of the software packages listed.
    3. Of the techniques used by the software in #2, which one would you use in another context and why? (3-5 pages, no citations)

    PS: I realize that some students have little or no interest in bioinformatics. The important lesson is learning to generalize the application of a technique in one area to its application in apparently dissimilar areas.

    « Newer PostsOlder Posts »

    Powered by WordPress