Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 15, 2013

Using molecular networks to assess molecular similarity

Systems chemistry: Using molecular networks to assess molecular similarity by Bailey Fallon.

From the post:

In new research published in Journal of Systems Chemistry, Sijbren Otto and colleagues have provided the first experimental approach towards molecular networks that can predict bioactivity based on an assessment of molecular similarity.

Molecular similarity is an important concept in drug discovery. Molecules that share certain features such as shape, structure or hydrogen bond donor/acceptor groups may have similar properties that make them common to a particular target. Assessment of molecular similarity has so far relied almost exclusively on computational approaches, but Dr Otto reasoned that a measure of similarity might be obtained by interrogating the molecules in solution experimentally.

Important work for drug discovery but there are semantic lessons here as well:

Tests for similarity/sameness are domain specific.

Which means there are no universal tests for similarity/sameness.

Lacking universal tests for similarity/sameness, we should focus on developing documented and domain specific tests for similarity/sameness.

Domain specific tests provide quicker ROI than less useful and doomed universal solutions.

Documented domain specific tests may, no guarantees, enable us to find commonalities between domain measures of similarity/sameness.

But our conclusions will be based on domain experience and not projection from our domain onto others, less well known domains.

February 10, 2013

The Power of Semantic Diversity

Filed under: Bioinformatics,Biology,Contest,Crowd Sourcing — Patrick Durusau @ 3:10 pm

Prize-based contests can provide solutions to computational biology problems by Karim R Lakhani, et al. (Nature Biotechnology 31, 108–111 (2013) doi:10.1038/nbt.2495)

From the article:

Advances in biotechnology have fueled the generation of unprecedented quantities of data across the life sciences. However, finding analysts who can address such ‘big data’ problems effectively has become a significant research bottleneck. Historically, prize-based contests have had striking success in attracting unconventional individuals who can overcome difficult challenges. To determine whether this approach could solve a real big-data biologic algorithm problem, we used a complex immunogenomics problem as the basis for a two-week online contest broadcast to participants outside academia and biomedical disciplines. Participants in our contest produced over 600 submissions containing 89 novel computational approaches to the problem. Thirty submissions exceeded the benchmark performance of the US National Institutes of Health’s MegaBLAST. The best achieved both greater accuracy and speed (1,000 times greater). Here we show the potential of using online prize-based contests to access individuals without domain-specific backgrounds to address big-data challenges in the life sciences.

….

Over the last ten years, online prize-based contest platforms have emerged to solve specific scientific and computational problems for the commercial sector. These platforms, with solvers in the range of tens to hundreds of thousands, have achieved considerable success by exposing thousands of problems to larger numbers of heterogeneous problem-solvers and by appealing to a wide range of motivations to exert effort and create innovative solutions18, 19. The large number of entrants in prize-based contests increases the probability that an ‘extreme-value’ (or maximally performing) solution can be found through multiple independent trials; this is also known as a parallel-search process19. In contrast to traditional approaches, in which experts are predefined and preselected, contest participants self-select to address problems and typically have diverse knowledge, skills and experience that would be virtually impossible to duplicate locally18. Thus, the contest sponsor can identify an appropriate solution by allowing many individuals to participate and observing the best performance. This is particularly useful for highly uncertain innovation problems in which prediction of the best solver or approach may be difficult and the best person to solve one problem may be unsuitable for another19.

An article that merits wider reading that it is likely to get behind a pay-wall.

A semantically diverse universe of potential solvers is more effective than a semantically monotone group of selected experts.

An indicator of what to expect from the monotone logic of the Semantic Web.

Good for scheduling tennis matches with Tim Berners-Lee.

For more complex tasks, rely on semantically diverse groups of humans.

I first saw this at: Solving Big-Data Bottleneck: Scientists Team With Business Innovators to Tackle Research Hurdles.

February 1, 2013

Ocean Biogeographic Information System (OBIS)

Filed under: Biology,Oceanography — Patrick Durusau @ 8:04 pm

Ocean Biogeographic Information System (OBIS)

Someone suggested to me recently that pointers to data for topic maps would be quite useful.

In the vein, consider the records held by the OBIS system:

Below is an overview of some of the vital statistics of OBIS, including number of records available through the search interface, number of species and number of datasets; the numbers between brackets are those for the last two data loads, and show progress booked since then. The graph shows how the number of records increased over time.

  • Number of records: 35.5 (33.6, 32.7, 32.3) million
    • Number of records identified to species or infraspecies: 27.32 (26.3, 25.52, 25.19) million
    • Number of records identified to genus or better: 31.1 (29.8, 28.5, 28.4) million
  • Number of valid species with data reported to OBIS: 146,496 (145,899; 145,317; 145,153)
  • Number of valid marine taxa in OBIS: 163,313 (162,139; 161,620; 161,493)
    • Number of valid marine species: 120,259 (119,337; 118,937; 118,801)
    • Number of valid marine genera: 27,333 (27,228; 27,154; 27,086)
  • Number of datasets: 1,130 (1,125; 1,072; 1,056)

Talk about an opportunity to integrate data into the historical records of marine biology!

« Newer Posts

Powered by WordPress