Yes, one million song dataset.
A 280 GB dataset. Site suggests you ask someone you know if they already have a copy. Not your average music download.
Amendment: There is no music included in this download. My reference to music download was sarcasm.
From the website:
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Its purposes are:
- To encourage research on algorithms that scale to commercial sizes
- To provide a reference dataset for evaluating research
- As a shortcut alternative to creating a large dataset with The Echo Nest’s API
- To help new researchers get started in the MIR field
The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that sample audio can be fetched from services like 7digital, using code we provide.
The Million Song Dataset is a collaborative project between The Echo Nest and LabROSA. It is supported in part by the NSF.
Two things to notice:
- Not a small data set (remember the post about dealing with data?)
- National Science Foundation funding on #1.
Note the combination: big data + funding. Nuff said?
Not a music download at all. I wonder how many people will download it because they think there’s audio data included.
Comment by Benjamin Bock — February 11, 2011 @ 2:24 pm
Benjamin, I was being sarcastic about the music download but you may be right. I will insert a clarification.
Thanks!
Comment by Patrick Durusau — February 11, 2011 @ 5:28 pm
[…] found this following the links in the Million Song Dataset […]
Pingback by How It Works – The “Musical Brain” « Another Word For It — February 13, 2011 @ 1:46 pm