Archive for the ‘Science’ Category

Scientific Lenses over Linked Data… [Operational Equivalence]

Sunday, April 28th, 2013

Scientific Lenses over Linked Data: An approach to support task specifi c views of the data. A vision. by Christian Brenninkmeijer, Chris Evelo, Carole Goble, Alasdair J G Gray, Paul Groth, Steve Pettifer, Robert Stevens, Antony J Williams, and Egon L Willighagen.

Abstract:

Within complex scienti fic domains such as pharmacology, operational equivalence between two concepts is often context-, user- and task-specifi c. Existing Linked Data integration procedures and equivalence services do not take the context and task of the user into account. We present a vision for enabling users to control the notion of operational equivalence by applying scienti c lenses over Linked Data. The scientifi c lenses vary the links that are activated between the datasets which aff ects the data returned to the user.

Two additional quotes from this paper should convince you of the importance of this work:

We aim to support users in controlling and varying their view of the data by applying a scientifi c lens which govern the notions of equivalence applied to the data. Users will be able to change their lens based on the task and role they are performing rather than having one fixed lens. To support this requirement, we propose an approach that applies context dependent sets of equality links. These links are stored in a stand-off fashion so that they are not intermingled with the datasets. This allows for multiple, context-dependent, linksets that can evolve without impact on the underlying datasets and support diff ering opinions on the relationships between data instances. This flexibility is in contrast to both Linked Data and traditional data integration approaches. We look at the role personae can play in guiding the nature of relationships between the data resources and the desired a ffects of applying scientifi c lenses over Linked Data.

and,

Within scienti fic datasets it is common to fi nd links to the “equivalent” record in another dataset. However, there is no declaration of the form of the relationship. There is a great deal of variation in the notion of equivalence implied by the links both within a dataset’s usage and particularly across datasets, which degrades the quality of the data. The scienti fic user personae have very di fferent needs about the notion of equivalence that should be applied between datasets. The users need a simple mechanism by which they can change the operational equivalence applied between datasets. We propose the use of scientifi c lenses.

Obvious questions:

Does your topic map software support multiple operational equivalences?

Does your topic map interface enable users to choose “lenses” (I like lenses better than roles) to view equivalence?

Does your topic map software support declaring the nature of equivalence?

I first saw this in the slide deck: Scientific Lenses: Supporting Alternative Views of the Data by Alasdair J G Gray at: 4th Open PHACTS Community Workshop.

BTW, the notion of equivalence being represented by “links” reminds me of a comment Peter Neubauer (Neo4j) once made to me, saying that equivalence could be modeled as edges. Imagine typing equivalence edges. Will have to think about that some more.

Miriam Registry [More Identifiers For Science]

Monday, April 15th, 2013

Miriam Registry

From the homepage:

Persistent identification for life science data

The MIRIAM Registry provides a set of online services for the generation of unique and perennial identifiers, in the form of URIs. It provides the core data which is used by the Identifiers.org resolver.

The core of the Registry is a catalogue of data collections (corresponding to controlled vocabularies or databases), their URIs and the corresponding physical URLs or resources. Access to this data is made available via exports (XML) and Web Services (SOAP).

And from the FAQ:

What is MIRIAM, and what does it stand for?

MIRIAM is an acronym for the Minimal Information Required In the Annotation of Models. It is important to distinguish between the MIRIAM Guidelines, and the MIRIAM Registry. Both being part of the wider BioModels.net initiative.

What are the ‘MIRIAM Guidelines’?

The MIRIAM Guidelines are an effort to standardise upon the essential, minimal set of information that is sufficient to annotate a model in such a way as to enable its reuse. This includes a means to identify the model itself, the components of which it is composed, and formalises a means by which unambiguous annotation of components should be encoded. This is essential to allow collaborative working by different groups which may not be spatially co-located, and facilitates model sharing and reuse by the general modelling community. The goal of the project, initiated by the BioModels.net effort, was to produce a set of guidelines suitable for model annotation. These guidelines can be implemented in any structured format used to encode computational models, for example SBML, CellML, or NeuroML . MIRIAM is a member of the MIBBI family of community-developed ‘minimum information’ reporting guidelines for the biosciences.

More information on the requirements to achieve MIRIAM Guideline compliance is available on the MIRIAM Guidelines page.

What is the MIRIAM Registry?

The MIRIAM Registry provides the necessary information for the generation and resolving of unique and perennial identifiers for life science data. Those identifiers are of the URI form and make use of Identifiers.org for providing access to the identified data records on the Web. Examples of such identifiers: http://identifiers.org/pubmed/22140103, http://identifiers.org/uniprot/P01308, …

More identifiers for the life sciences, for those who choose to use them.

The curation may be helpful in terms of mappings to other identifiers.

American Geophysical Union (AGU)

Friday, March 22nd, 2013

American Geophysical Union (AGU)

The mission of the AGU:

The purpose of the American Geophysical Union is to promote discovery in Earth and space science for the benefit of humanity.

While I was hunting down information on DataONE, I ran across the AGU site.

Like all disciplines, data analysis, collection, collation, sharing, etc. are ongoing concerns at the AGU.

My interest in more in the data techniques than the subject matter.

Seeking to avoid re-inventing the wheel and learning new insights than has yet to reach more familiar areas.

App-lifying USGS Earth Science Data

Thursday, January 10th, 2013

App-lifying USGS Earth Science Data

Challenge Dates:

Submissions: January 9, 2013 at 9:00am EST – Ends April 1, 2013 at 11:00pm EDT.

Public Voting: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Judging: April 5, 2013 at 5:00pm EDT – Ends April 25, 2013 at 11:00pm EDT.

Winners Announced: April 26, 2013 at 5:00pm EDT.

From the webpage:

USGS scientists are looking for your help in addressing some of today’s most perplexing scientific challenges, such as climate change and biodiversity loss. To do so requires a partnership between the best and the brightest in Government and the public to guide research and identify solutions.

The USGS is seeking help via this platform from many of the Nation’s premier application developers and data visualization specialists in developing new visualizations and applications for datasets.

USGS datasets for the contest consist of a range of earth science data types, including:

  • several million biological occurrence records (terrestrial and marine);
  • thousands of metadata records related to research studies, ecosystems, and species;
  • vegetation and land cover data for the United States, including detailed vegetation maps for the National Parks; and
  • authoritative taxonomic nomenclature for plants and animals of North America and the world.

Collectively, these datasets are key to a better understanding of many scientific challenges we face globally. Identifying new, innovative ways to represent, apply, and make these data available is a high priority.

Submissions will be judged on their relevance to today’s scientific challenges, innovative use of the datasets, and overall ease of use of the application. Prizes will be awarded to the best overall app, the best student app, and the people’s choice.

Of particular interest for the topic maps crowd:

Data used – The app must utilize a minimum of 1 DOI USGS Core Science and Analytics (CSAS) data source, though they need not include all data fields available in a particular resource. A list of CSAS databases and resources is available at: http://www.usgs.gov/core_science_systems/csas/activities.html. The use of data from other sources in conjunction with CSAS data is encouraged.

CSAS has a number of very interesting data sources. Classifications, thesauri, data integration, metadata and more.

Contest wins you a recognition and bragging rights, not to mention visibility for your approach.

LinkedScience.org

Monday, November 12th, 2012

LinkedScience.org

From the about page:

Linked Science is an approach to interconnect scientific assets to enable transparent, reproducible and transdisciplinary research. LinkedScience.org is a community driven-effort to show what this means in practice.

LinkedScience.org was founded early 2011 and is led by Tomi Kauppinen affiliated with the Institute for Geoinformatics at the University of Muenster (Germany). The term Linked Science was coined in the early paper about Linked Open Science co-authored with Giovana Mira de Espindola from The Brazil’s National Institute for Space Research (INPE) with a reference to LinkedScience.org. At Oxford in March 2011 in discussions between Tomi Kauppinen and Jun Zhao it became evident that a workshop on Linked Science—which was then realized as a collocated event with ISWC 2011 and organized with a big team—would be a perfect start for creating a community for opening and linking science.

Since then LinkedScience.org has grown step by step, or person by person, to include international activities (check the events organized so far), publications about—and related to—Linked Science, the developed vocabularies, tools such as the SPARQL Package for R (please check also the tutorial), and already one sub community, that of spatial@linkedscience to illustrate the benefits and results of linking science.

A large number of resources and projects related to Linked Data and the Sciences.

Linked Science Core Vocabulary Specification

Monday, November 12th, 2012

Linked Science Core Vocabulary Specification (revision 0.91)

Abstract:

LSC, the Linked Science Core Vocabulary, is a lightweight vocabulary providing terms to enable publishers and researchers to relate things in science to time, space, and themes. More precisely, LSC is designed for describing scientific resources including elements of research, their context, and for interconnecting them. We introduce LSC as an example of building blocks for Linked Science to communicate the linkage between scientific resources in a machine-understandable way. The “core” in the name refers to the fact that LSC only defines the basic terms for science. We argue that the success of Linked Science—or Linked Data in general—lies in interconnected, yet distributed vocabularies that minimize ontological commitments. More specific terms needed by different scientific communities can therefore be introduced as extensions of LSC. LSC is hosted at LinkedScience.org; please check also other available vocabularies at LinkedScience.org/vocabularies.

A Linked Data vocabulary that you may encounter.

I first saw this in a tweet by Ivan Herman.

The Units Ontology: a tool for integrating units of measurement in science

Sunday, October 14th, 2012

The Units Ontology: a tool for integrating units of measurement in science by Georgios V. Gkoutos, Paul N. Schofield, and Robert Hoehndorf. ( Database (2012) 2012 : bas033 doi: 10.1093/database/bas03)

Abstract:

Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.

As the paper acknowledges, there are many measurement systems in use today.

Leaves me puzzled as to what happens to data that follows some other drummer? Other than this one?

I assume any coherent system has no difficulty integrating data written in that system.

So how does adding another coherent system assist in that integration?

Unless everyone universally moves to the new system. Unlikely don’t you think?

2013 Workshop on Interoperability in Scientific Computing

Friday, September 28th, 2012

2013 Workshop on Interoperability in Scientific Computing

From the post:

The 13th annual International Conference on Computational Science (ICCS 2013) will be held in Barcelona, Spain from 5th – 7th June 2013. ICCS is an ERA 2010 ‘A’-ranked conference series. For more details on the main conference, please visit www.iccs-meeting.org The 2nd Workshop on Interoperability in Scientific Computing (WISC ’13) will be co-located with ICCS 2013.

Approaches to modelling take many forms. The mathematical, computational and encapsulated components of models can be diverse in terms of complexity and scale, as well as in published implementation (mathematics, source code, and executable files). Many of these systems are attempting to solve real-world problems in isolation. However the long-term scientific interest is in allowing greater access to models and their data, and to enable simulations to be combined in order to address ever more complex issues. Markup languages, metadata specifications, and ontologies for different scientific domains have emerged as pathways to greater interoperability. Domain specific modelling languages allow for a declarative development process to be achieved. Metadata specifications enable coupling while ontologies allow cross platform integration of data.

The goal of this workshop is to bring together researchers from across scientific disciplines whose computational models require interoperability. This may arise through interactions between different domains, systems being modelled, connecting model repositories, or coupling models themselves, for instance in multi-scale or hybrid simulations. The outcomes of this workshop will be to better understand the nature of multidisciplinary computational modelling and data handling. Moreover we hope to identify common abstractions and cross-cutting themes in future interoperability research applied to the broader domain of scientific computing.

How is your topic map information product going to make the lives of scientists simpler?

Physics as a geographic map

Thursday, August 30th, 2012

Physics as a geographic map

Nathan Yau of Flowing Data points to a rendering of the subject area physics as a geographic map.

Somewhat dated (1939) but shows a lot of creativity and not small amount of cartographic skill.

Rather than calling it a “fictional” map I would prefer to say it is an intellectual map of physics.

Like all maps, the objects appear in explicit relationships to each other and there are no doubt as many implicit relationships are there are viewers of the map.

What continuum or dimensions would you use to create a map of modern ontologies?

That could make a very interesting exercise for the topic maps class. To have students create maps and then attempt to draw out what unspoken dimensions were driving the layout between parts of the map.

Suggestions of mapping software anyone?

7 Habits of the Open Scientist

Thursday, August 30th, 2012

7 Habits of the Open Scientist

A series of posts by David Ketcheson that begins:

Science has always been based on a fundamental culture of openness. The scientific community rewards individuals for sharing their discoveries through perpetual attribution, and the community benefits by through the ability to build on discoveries made by individuals. Furthermore, scientific discoveries are not generally accepted until they have been verified or reproduced independently, which requires open communication.

Historically, openness simply meant publishing one’s methods and results in the scientific literature. This enabled scientists all over the world to learn about essential advances made by their colleagues, modulo a few barriers. One needed to have access to expensive library collections, to spend substantial time and effort searching the literature, and to wait while research conducted by other groups was refereed, published, and distributed.

Nowadays it is possible to practice a fundamentally more open kind of research — one in which we have immediate, free, indexed, universal access to scientific discoveries. The new vision of open science is painted in lucid tones in Michael Nielsen’s Reinventing Discovery. After reading Nielsen’s book, I was hungry to begin practicing open science, but not exactly sure where to start. Here are seven ways I’m aware of. Each will be the subject of a longer forthcoming post.

The seven principles are:

  1. Freely accessible publications.
  2. Reproducible research.
  3. Pre-publication dissemination of research.
  4. Open collaboration through social media.
  5. Live open science.
  6. Open expository writing.
  7. Open bibliographies and reviews.

What are your habits for research on topic maps or other semantic technologies?

I first saw this at: Igor Carron’s Around the blogs in 80 summer hours.

Semantic physical science

Sunday, August 12th, 2012

Semantic physical science by Peter Murray-Rust and Henry S Rzepa. (Journal of Cheminformatics 2012, 4:14 doi:10.1186/1758-2946-4-14)

Abstract:

The articles in this special issue arise from a workshop and symposium held in January 2012 (‘Semantic Physical Science’). We invited people who shared our vision for the potential of the web to support chemical and related subjects. Other than the initial invitations, we have not exercised any control over the content of the contributed articles.

There are pointers to videos and other materials for the following workshop presentations:

  • Introduction – Peter Murray-Rust [11]
  • Why we (PNNL) are supporting semantic science – Bill Shelton
  • Adventures in Semantic Materials Informatics – Nico Adams
  • Semantic Crystallographic Publishing – Brian McMahon [12]
  • Service-oriented science: why good code matters and why a fundamental change in thinking is required – Cameron Neylon [13]
  • On the use of CML in computational materials research – Martin Dove [14]
  • FoX, CML and semantic tools for atomistic simulation – Andrew Walker [15]
  • Semantic Physical Science: the CML roadmap – Marcus Hanwell [16]
  • CMLisattion of NWChem and development strategy for FoXification and dictionaries – Bert de Jong
  • NMR working group – Nancy Washton

A remarkable workshop with which I have only one minor difference:

There was remarkable and exciting unanimity that semantics should and could be introduced now and rapidly into the practice of large areas of chemistry. We agreed that we should concentrate on the three main areas of crystallography, computation and NMR spectroscopy. In crystallography, this is primarily a strategy of working very closely with the IUCr, being able to translate crystallographic data automatically into semantic form and exploring the value of semantic publication and repositories. The continued development of Chempound for crystal structures is Open and so can be fed back regularly into mainstream crystallography.

When computers were being introduced to indexing chemistry and other physical sciences in the 1950′s/60′s, the then practitioners were under the impression their data already had semantics. That it did not have to await the next turn of the century in order to have semantics.

Not to take anything away from the remarkable progress that CML and related efforts have made, but they are not the advent of semantics for chemistry.

Clarification of semantics, documentation of semantics, refinement of semantics, all true.

But chemistry (and data) has always had semantics.

First BOSS Data: 3-D Map of 500,000 Galaxies, 100,000 Quasars

Friday, August 10th, 2012

First BOSS Data: 3-D Map of 500,000 Galaxies, 100,000 Quasars

From the post:

The Third Sloan Digital Sky Survey (SDSS-III) has issued Data Release 9 (DR9), the first public release of data from the Baryon Oscillation Spectroscopic Survey (BOSS). In this release BOSS, the largest of SDSS-III’s four surveys, provides spectra for 535,995 newly observed galaxies, 102,100 quasars, and 116,474 stars, plus new information about objects in previous Sloan surveys (SDSS-I and II).

“This is just the first of three data releases from BOSS,” says David Schlegel of the U.S. Department of Energy’s Lawrence Berkeley National Laboratory (Berkeley Lab), an astrophysicist in the Lab’s Physics Division and BOSS’s principal investigator. “By the time BOSS is complete, we will have surveyed more of the sky, out to a distance twice as deep, for a volume more than five times greater than SDSS has surveyed before — a larger volume of the universe than all previous spectroscopic surveys combined.”

Spectroscopy yields a wealth of information about astronomical objects including their motion (called redshift and written “z”), their composition, and sometimes also the density of the gas and other material that lies between them and observers on Earth. The BOSS spectra are now freely available to a public that includes amateur astronomers, astronomy professionals who are not members of the SDSS-III collaboration, and high-school science teachers and their students.

The new release lists spectra for galaxies with redshifts up to z = 0.8 (roughly 7 billion light years away) and quasars with redshifts between z = 2.1 and 3.5 (from 10 to 11.5 billion light years away). When BOSS is complete it will have measured 1.5 million galaxies and at least 150,000 quasars, as well as many thousands of stars and other “ancillary” objects for scientific projects other than BOSS’s main goal.

For data access, software tools, tutorials, etc., see: http://sdss3.org/

Interesting data set but also instructive for the sharing of data and development of tools for operations on shared data. You don’t have to have a local supercomputer to process the data. Dare I say a forerunner of the “cloud?”

Be the alpha geek at your local astronomy club this weekend!

Hard science, soft science, hardware, software

Thursday, March 8th, 2012

Hard science, soft science, hardware, software by John D. Cook.

The post starts:

The hard sciences — physics, chemistry, astronomy, etc. — boasted remarkable achievements in the 20th century. The credibility and prestige of all science went up as a result. Academic disciplines outside the sciences rushed to append “science” to their names to share in the glory.

Science has an image of infallibility based on the success of the hard sciences. When someone says “You can’t argue with science,” I’d rather they said “It’s difficult to argue with hard science.”

Read on….

I think…, well, you decide on John’s basic point for yourself.

Personally I think the world is complicated, historically, linguistically, semantically, theologically, etc. I am much happier searching in hopes of answers that seem adequate for the moment, as opposed to seeking certitudes, particularly for others.

Seismic Data Science: Reflection Seismology and Hadoop

Friday, January 27th, 2012

Seismic Data Science: Reflection Seismology and Hadoop by Josh Wills.

From the post:

When most people first hear about data science, it’s usually in the context of how prominent web companies work with very large data sets in order to predict clickthrough rates, make personalized recommendations, or analyze UI experiments. The solutions to these problems require expertise with statistics and machine learning, and so there is a general perception that data science is intimately tied to these fields. However, in my conversations at academic conferences and with Cloudera customers, I have found that many kinds of scientists– such as astronomers, geneticists, and geophysicists– are working with very large data sets in order to build models that do not involve statistics or machine learning, and that these scientists encounter data challenges that would be familiar to data scientists at Facebook, Twitter, and LinkedIn.

A nice overview of areas of science using “big data” decades before the current flurry of activity. The use of Hadoop in reflection seismology is only one fuller example of that use.

The take away that I have from this post is that Hadoop skills are going to be in demand across business, science and one would hope, the humanities.