ken price▸ Organising Data (Task 4)


The Million Songs database at https://labrosa.ee.columbia.edu/millionsong/ holds (amazingly enough ) metadata for about a million popular songs. It's 180GB of data so a smaller randomly generated subset is available (about 2 GB still) or you might find a smaller version still. Be prepared to wait a while for this data to download, and then think about how to share it. USB drives are cheap..

A warning – song titles and band names are not censored so you may wish to check your random subset for offensive words. This is in itself a task that students could consider.

Having got your collection, there are a few data tasks that are worth considering

For example – what proportion of the song titles contain "blues" or "rap"? and how do you make sure you find "rap" on its own rather than within another word such as autobiogRAPhy?

is one artist represented more than others? If so can you get a list of the most popular artists? Can you graph this?

In this task –
Collecting, Managing and Analysing Data (ACTDIP025) and
Collecting, Managing and Analysing Data (ACTDIP026)
#cserApps4

Million Song Dataset | scaling MIR research labrosa.ee.columbia.edu


G+ Comments

one plus one, 1 comments

  • Steven Payne: Wow – I look forward to using this resource!

+ There are no comments

Add yours