Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 2, 2013

ExpressionBlast:… [Value of Mapping and Interchanging Mappings]

Filed under: Genomics,Merging,Topic Maps — Patrick Durusau @ 3:25 pm

ExpressionBlast: mining large, unstructured expression databases by Guy E Zinman, Shoshana Naiman, Yariv Kanfi, Haim Cohen and Ziv Bar-Joseph. (Nature Methods 10, 925–926 (2013))

From a letter to the editor:

To the Editor: The amount of gene expression data deposited in public repositories has grown exponentially over the last decade (Supplementary Fig. 1). Specifically, Gene Expression Omnibus (GEO)1 is one of largest expression-data repositories (Supplementary Table 1), containing hundreds of thousands of microarray and RNA-seq experiment results grouped into tens of thousands of series. Although accessible, data deposited in GEO are not well organized. Even among data sets for a single species there are many different platforms with different probe identifiers, different value scales and very limited annotations of the condition profiled by each array. Current methods for using GEO data to study signaling and other cellular networks either do not scale or cannot fully use the available information (Supplementary Table 2 and Supplementary Results).

To enable queries of such large expression databases, we developed ExpressionBlast (http://www.expression.cs.cmu.edu/): a computational method that uses automated text analysis to identify and merge replicates and determine the type of each array in the series (treatment or control; Fig. 1a and Supplementary Methods). Using this information, ExpressionBlast uniformly processes expression data sets in GEO across all experiments, species and platforms. This is achieved by standardizing the data in terms of gene identifiers, the meaning of the expression values (log ratios) and the distribution of these values (Fig. 1b and Supplementary Methods). Our processing steps achieved a high accuracy in identifying replicates and treatment control cases (Supplementary Results and Supplementary Table 3). We applied these processing steps to arrays from more than 900,000 individual samples collected from >40,000 studies in GEO (new series are updated on a weekly basis), which allowed us to create, to our knowledge, the largest collection of computationally annotated expression data currently available (Supplementary Results and Supplementary Table 4) (emphasis in original).

Now there is a letter to the editor!

Your first question:

How did the team create:

to our knowledge, the largest collection of computationally annotated expression data currently available….?

Hint: It wasn’t by creating a new naming system and then convincing the authors of > 40,000 studies to adopt a new naming system.

They achieved that result by:

This is achieved by standardizing the data in terms of gene identifiers, the meaning of the expression values (log ratios) and the distribution of these values (Fig. 1b and Supplementary Methods).

The benefit from this work begins where “merging” in the topic map sense ends.

One point of curiosity, among many, is the interchangeability of their rule based pattern expressions for merging replicates?

Even if the pattern expression language left execution up to the user, reliably exchanging mappings would be quite useful.

Perhaps a profile of an existing pattern expression language?

To avoid having to write one from scratch?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress