“Big data are not about data,” Djorgovski says. “It’s all about discovery.” [Not re-discovery]

I first saw this quote in a tweet by Kirk Borne. It is the concluding line from George Djorgovski looks for knowledge hidden in data by Rebecca Fairley Raney.

From the post:

When you sit down to talk with an astronomer, you might expect to learn about galaxies, gravity, quasars or spectroscopy. George Djorgovski could certainly talk about all those topics.

But Djorgovski, a professor of astronomy at the California Institute of Technology, would prefer to talk about data.

The AAAS Fellow has spent more than three decades watching scientists struggle to find needles in massive digital haystacks. Now, he is director of the Center for Data-Driven Discovery at Caltech, where staff scientists are developing advanced data analysis techniques and applying them to fields as disparate as plant biology, disaster response, genetics and neurobiology.

The descriptions of the projects at the center are filled with esoteric phrases like “hyper-dimensional data spaces” and “datascape geometry.”

Astronomy was “always advanced as a digital field,” Djorgovski says, and in recent decades, important discoveries in the field have been driven by novel uses of data.

Take the discovery of quasars.

In the early 20th century, astronomers using radio telescopes thought quasars were stars. But by merging data from different types of observations, they discovered that quasars were rare objects that are powered by gas that spirals into black holes in the center of galaxies.

Quasars were discovered not by a single observation, but by a fusion of data.

It is assumed by Djorgovski and his readers that future researchers won’t have to start from scratch when researching quasars. They can but don’t have to re-mine all the data that supported their original discovery or their association with black holes.

Can you say the same for discoveries you make in your data? Are those discoveries preserved for others or just tossed back into the sea of big data?

Contemporary searching is a form of catch-n-release. You start with your question and whether it takes a few minutes or an hour, you find something resembling an answer to your question.

The data is then tossed back to await the next searcher who has the same or similar question.

How are you capturing your search results to benefit the next searcher?

