Data Science – Getting Started With Open Data
23 Resources for Finding Open Data
Ryan Swanstrom has put together two posts will have you using and finding open data.
“Open data” can be a boon to researchers and others, but you should ask the following questions (among others) of any data set:
- Who collected the data?
- Why was the data collected?
- How was the recorded data selected?
- How large was the potential data pool?
- Was the original data cleaned after collection?
- If the original data was cleaned, by what criteria?
- How was the accuracy of the data measured?
- What instruments were used to collect the data?
- How were the instruments used to collect the data developed?
- How were the instruments used to collect the data validated?
- What publications have relied upon the data?
- How did you determine the semantics of the data?
That’s not a compete set but a good starting point.
Just because data is available, open, free, etc. doesn’t mean that it is useful. The best example is the still-in-print Budge translation The book of the dead : the papyrus of Ani in the British Museum. The original was published in 1895, making the current reprints more than a century out of date.
It is a very attractive reproduction (it is rare to see hieroglyphic text with inter-linear transliteration and translation in modern editions) of the papyrus of Ani, but it gives a mis-leading impression of the state of modern knowledge and translation of Middle Egyptian.
Of course, some readers are satisfied with century old encyclopedias as well, but I would not rely upon them or their sources for advice.