Data Visualization: Exploring Biodiversity by Sean Gonzalez.
From the post:
When you have a few hundred years worth of data on biological records, as the Smithsonian does, from journals to preserved specimens to field notes to sensor data, even the most diligently kept records don’t perfectly align over the years, and in some cases there is outright conflicting information. This data is important, it is our civilization’s best minds giving their all to capture and record the biological diversity of our planet. Unfortunately, as it stands today, if you or I were to decide we wanted to learn more, or if we wanted to research a specific species or subject, accessing and making sense of that data effectively becomes a career. Earlier this year an executive order was given which generally stated that federally funded research had to comply with certain data management rules, and the Smithsonian took that order to heart, event though it didn’t necessarily directly apply to them, and has embarked to make their treasure of information more easily accessible. This is a laudable goal, but how do we actually go about accomplishing this? Starting with digitized information, which is a challenge in and of itself, we have a real Big Data challenge, setting the stage for data visualization.
The Smithsonian has already gone a long way in curating their biodiversity data on the Biodiversity Heritage Library (BHL) website, where you can find ever increasing sources. However, we know this curation challenge can not be met by simply wrapping the data with a single structure or taxonomy. When we search and explore the BHL data we may not know precisely what we’re looking for, and we don’t want a scavenger hunt to ensue where we’re forced to find clues and hidden secrets in hopes of reaching our national treasure; maybe the Gates family can help us out…
People see relationships in the data differently, so when we go exploring one person may do better with a tree structure, others prefer a classic title/subject style search, or we may be interested in reference types and frequencies. Why we don’t think about it as one monolithic system is akin to discussing the number of Angels that fit on the head of a pin, we’ll never be able to test our theories. Our best course is to accept that we all dive into data from different perspectives, and we must therefore make available different methods of exploration.
What would you do beyond visualization?