Archive for the ‘Music Retrieval’ Category

Musical Genres Classified Using the Entropy of MIDI Files

Thursday, October 15th, 2015

Musical Genres Classified Using the Entropy of MIDI Files (Emerging Technology from the arXiv, October 15, 2015)

Music analysis

Communication is the process of reproducing a message created in one point space at another point in space. It has been studied in depth by numerous scientists and engineers but it is the mathematical treatment of communication that has had the most profound influence.

To mathematicians, the details of a message are of no concern. All that matters is that the message can be thought of as an ordered set of symbols. Mathematicians have long known that this set is governed by fundamental laws first outlined by Claude Shannon in his mathematical theory of communication.

Shannon’s work revolutionized the way engineers think about communication but it has far-reaching consequences in other areas, too. Language involves the transmission of information from one individual to another and information theory provides a window through which to study and understand its nature. In computing, data is transmitted from one location to another and information theory provides the theoretical bedrock that allows this to be done most efficiently. And in biology, reproduction can be thought of as the transmission of genetic information from one generation to the next.

Music too can be thought of as the transmission of information from one location to another, but scientists have had much less success in using information theory to characterize music and study its nature.

Today, that changes thanks to the work of Gerardo Febres and Klaus Jaffé at Simon Bolivar University in Venezuela. These guys have found a way to use information theory to tease apart the nature of certain types of music and to automatically classify different musical genres, a famously difficult task in computer science.

One reason why music is so hard to study is that it does not easily translate into an ordered set of symbols. Music often consists of many instruments playing different notes at the same time. Each of these can have various qualities of timbre, loudness, and so on.

Music viewed by its Entropy content: A novel window for comparative analysis by Gerardo Febres and Klaus Jaffe.


Texts of polyphonic music MIDI files were analyzed using the set of symbols that produced the Fundamental Scale (a set of symbols leading to the Minimal Entropy Description). We created a space to represent music pieces by developing: (a) a method to adjust a description from its original scale of observation to a general scale, (b) the concept of higher order entropy as the entropy associated to the deviations of a frequency ranked symbol profile from a perfect Zipf profile. We called this diversity index the “2nd Order Entropy”. Applying these methods to a variety of musical pieces showed how the space “symbolic specific diversity-entropy – 2nd order entropy” captures some of the essence of music types, styles, composers and genres. Some clustering around each musical category is shown. We also observed the historic trajectory of music across this space, from medieval to contemporary academic music. We show that description of musical structures using entropy allows to characterize traditional and popular expressions of music. These classification techniques promise to be useful in other disciplines for pattern recognition, machine learning, and automated experimental design for example.

The process simplifies the data stream, much like you choose which subjects you want to talk about in a topic map.

Purists will object but realize that objection is because they have chosen a different (and much more complex) set of subjects to talk about in the analysis of music.

The important point is to realize we are always choosing different degrees of granularity of subjects and their identifications, for some specific purpose. Change that purpose and the degree of granularity will change.

Perceptual feature-based song genre classification using RANSAC [Published?]

Tuesday, June 30th, 2015

Perceptual feature-based song genre classification using RANSAC by Arijit Ghosal; Rudrasis Chakraborty; Bibhas Chandra Dhara; Sanjoy Kumar Saha. International Journal of Computational Intelligence Studies (IJCISTUDIES), Vol. 4, No. 1, 2015.


In the context of a content-based music retrieval system or archiving digital audio data, genre-based classification of song may serve as a fundamental step. In the earlier attempts, researchers have described the song content by a combination of different types of features. Such features include various frequency and time domain descriptors depicting the signal aspects. Perceptual aspects also have been combined along with. A listener perceives a song mostly in terms of its tempo (rhythm), periodicity, pitch and their variation and based on those recognises the genre of the song. Motivated by this observation, in this work, instead of dealing with wide range of features we have focused only on the perceptual aspect like melody and rhythm. In order to do so audio content is described based on pitch, tempo, amplitude variation pattern and periodicity. Dimensionality of descriptor vector is reduced and finally, random sample and consensus (RANSAC) is used as the classifier. Experimental result indicates the effectiveness of the proposed scheme.

A new approach to classification of music, but that’s all I can say since the content is behind a pay-wall.

One way to increase the accessibility of texts would be for tenure committees to not consider publications as “published” until they are freely available for the author’s webpage.

That one change could encourage authors to press for the right to post their own materials and to follow through with posting them as soon as possible.

Feel free to forward this post to members of your local tenure committee.

Digital Libraries For Musicology

Thursday, May 15th, 2014

The 1st International Digital Libraries for Musicology workshop (DLfM 2014)

12th September 2014 (full day), London, UK

in conjunction with the ACM/IEEE Digital Libraries conference 2014

From the call for papers:


Many Digital Libraries have long offered facilities to provide multimedia content, including music. However there is now an ever more urgent need to specifically support the distinct multiple forms of music, the links between them, and the surrounding scholarly context, as required by the transformed and extended methods being applied to musicology and the wider Digital Humanities.

The Digital Libraries for Musicology (DLfM) workshop presents a venue specifically for those working on, and with, Digital Library systems and content in the domain of music and musicology. This includes Music Digital Library systems, their application and use in musicology, technologies for enhanced access and organisation of musics in Digital Libraries, bibliographic and metadata for music, intersections with music Linked Data, and the challenges of working with the multiple representations of music across large-scale digital collections such as the Internet Archive and HathiTrust.


Paper submission deadline: 27th June 2014 (23:59 UTC-11)
Notification of acceptance: 30th July 2014
Registration deadline for one author per paper: 11th August 2014 (14:00 UTC)
Camera ready submission deadline: 11th August 2014 (14:00 UTC)

If you want a feel for the complexity of music as a retrieval subject, consult the various proposals at: Music markup languages, which are only some of the possible music encoding languages.

It is hard to say which domains are more “complex” than others in terms of encoding and subject identity, but it is safe to say that music falls towards the complex end of the scale. (sorry)

I first saw this in a tweet by Misanderasaurus Rex.

All of Bach

Monday, May 5th, 2014

All of Bach

From the webpage:

Every week, you will find a new recording here of one Johann Sebastian Bach’s 1080 works, performed by The Netherlands Bach Society and many guest musicians.

Six (6) works posted, only another one thousand and seventy-four (1074) to go. 😉

Music is an area with well known connections to many other domains, people, places, history, literature, religion and many others. Not that other domains lack such connections, but music seems particularly rich in such connections. Which also includes performers, places of performance, reactions to performances, reviews of performances, to say nothing of the instruments and the music itself.

A consequence of this tapestry of connections is that annotating music can draw from almost all known forms of recorded knowledge from an unlimited number of domains and perspectives.

Rather than the clamor of arbitrary links one after the other about a performance or its music, a topic map can support multiple, coherent views of any particular work. Perhaps ranging from the most recent review to the oldest known review of a work. Or exploding one review into historical context. Or exploring the richness of the composition proper.

The advantage of a topic map being that you don’t have to favor one view to the exclusion of another.

Algorithmic Music Discovery at Spotify

Tuesday, January 14th, 2014

Algorithmic Music Discovery at Spotify by Chris Johnson.

From the description:

In this presentation I introduce various Machine Learning methods that we utilize for music recommendations and discovery at Spotify. Specifically, I focus on Implicit Matrix Factorization for Collaborative Filtering, how to implement a small scale version using python, numpy, and scipy, as well as how to scale up to 20 Million users and 24 Million songs using Hadoop and Spark.

Among a number of interesting points, Chris points out differences between movie and music data.

One difference is that songs are consumed over and over again. Another is that users rate movies but “vote” by their streaming behavior on songs.*

While leads to Chris’ main point, implicit matrix factorization. Code. The source code page points to: Collaborative Filtering for Implicit Feedback Datasets by Yifan Hu, Yehuda Koren, and Chris Volinsky.

Scaling that process is represented in blocks for Hadoop and Spark.

* I suspect that “behavior” is more reliable than “ratings” from the same user. Reasoning ratings are more likely to be subject to social influences. I don’t have any research at my fingertips on that issue. Do you?

Overtone 0.9.0

Friday, November 29th, 2013

Overtone 0.9.0

From the webpage:

Overtone is an Open Source toolkit for designing synthesizers and collaborating with music. It provides:

  • A Clojure API to the SuperCollider synthesis engine
  • A growing library of musical functions (scales, chords, rhythms, arpeggiators, etc.)
  • Metronome and timing system to support live-programming and sequencing
  • Plug and play MIDI device I/O
  • A full Open Sound Control (OSC) client and server implementation.
  • Pre-cache – a system for locally caching external assets such as .wav files
  • An API for querying and fetching sounds from
  • A global concurrent event stream

When I saw the announcement for Overtone 0.9.0 I was reminded it was almost a year ago that I posted: Functional Composition [Overtone/Clojure].

Hard to say if Overtone will be of more interest to musicians who want to learn functional programming or functional programmers who want a deeper understanding of music or people for who the usual baseball, book publishing, web pages, etc., examples just don’t cut it. 😉

While looking for holiday music for Overtone, I did stumble across:

Music: a Mathematical Offering by Dave Benson.

At over 500 pages, this living text is also for sale in hard copy by Cambridge University Press. Do us all a favor and if the electronic version proves useful to you, ask your library to order a hard copy. And/or recommend it to others. That will encourage presses to continue to allow electronic versions of hard copy materials to circulate freely.

If you are interested in the mathematics that underlie music or need to know more for use in music retrieval, this is a good place to start.

I struck out on finding Christmas music written with Overtone.

I did find this video:

I would deeply appreciate a pointer to Christmas music with or for Overtone.


Update: @Overtone tweeted this link for Christmas music: …/overtone/examples/compositions/bells.clj.


Musicbrainz in Neo4j – Part 1

Thursday, November 7th, 2013

Musicbrainz in Neo4j – Part 1 by Paul Tremberth.

From the post:

What is MusicBrainz?

Quoting Wikipedia, MusicBrainz is an “open content music database [that] was founded in response to the restrictions placed on the CDDB.(…) MusicBrainz captures information about artists, their recorded works, and the relationships between them.”

Anyone can browse the database at If you create an account with them you can contribute new data or fix existing records details, track lengths, send in cover art scans of your favorite albums etc. Edits are peer reviewed, and any member can vote up or down. There are a lot of similarities with Wikipedia.

With this first post, we want to show you how to import the Musicbrainz data into Neo4j for some further analysis with Cypher in the second post. See below for what we will end up with:

MusicBrainz data

MusicBrainz currently has around 1000 active users, nearly 800,000 artists, 75,000 record labels, around 1,200,000 releases, more than 12,000,000 tracks, and short under 2,000,000 URLs for these entities (Wikipedia pages, official homepages, YouTube channels etc.) Daily fixes by the community makes their data probably the freshest and most accurate on the web.
You can check the current numbers here and here.

This rocks!

Interesting data, walk through how to load the data into Neo4j and the promise of more interesting activities to follow.

However, I urge caution on showing this to family members. 😉

You may wind up scripting daily data updates and teaching Cypher to family members and no doubt their friends.

Up to you.

I first saw this in a tweet by Peter Neubauer.

Parsing arbitrary Text-based Guitar Tab…

Thursday, August 29th, 2013

RiffBank – Parsing arbitrary Text-based Guitar Tab into an Indexable and Queryable “RiffCode for ElasticSearch
by Ryan Robitalle.

Guitar tab is a form of tablature, a form of music notation that records finger positions.

Surfing just briefly, there appear to be a lot of music available in “tab” format.

Deeply interesting post that will take some time to work through.

It is one of those odd things that may suddenly turn out to be very relevant (or not) in another domain.

Looking forward to spending some time with tablature data.

Semantic Computing of Moods…

Friday, August 16th, 2013

Semantic Computing of Moods Based on Tags in Social Media of Music by Pasi Saari, Tuomas Eerola. (IEEE Transactions on Knowledge and Data Engineering, 2013; : 1 DOI: 10.1109/TKDE.2013.128)


Social tags inherent in online music services such as provide a rich source of information on musical moods. The abundance of social tags makes this data highly beneficial for developing techniques to manage and retrieve mood information, and enables study of the relationships between music content and mood representations with data substantially larger than that available for conventional emotion research. However, no systematic assessment has been done on the accuracy of social tags and derived semantic models at capturing mood information in music. We propose a novel technique called Affective Circumplex Transformation (ACT) for representing the moods of music tracks in an interpretable and robust fashion based on semantic computing of social tags and research in emotion modeling. We validate the technique by predicting listener ratings of moods in music tracks, and compare the results to prediction with the Vector Space Model (VSM), Singular Value Decomposition (SVD), Nonnegative Matrix Factorization (NMF), and Probabilistic Latent Semantic Analysis (PLSA). The results show that ACT consistently outperforms the baseline techniques, and its performance is robust against a low number of track-level mood tags. The results give validity and analytical insights for harnessing millions of music tracks and associated mood data available through social tags in application development.

These results make me wonder if the results of tagging represents the average semantic resolution that users want?

Obviously a musician or musicologist would want far finer and sharper distinctions, at least for music of interest to them. Or substitute the domain of your choice. Domain experts want precision, while the average user muddles along with coarser divisions.

We already know from Karen Drabenstott’s work (Subject Headings and the Semantic Web) that library classification systems are too complex for the average user and even most librarians.

On the other hand, we all have some sense of the wasted time and effort caused by the uncharted semantic sea where Google and others practice catch and release with semantic data.

Some of the unanswered questions that remain:

How much semantic detail is enough?

For which domains?

Who will pay for gathering it?

What economic model is best?

Crafting Linked Open Data for Cultural Heritage:…

Wednesday, July 24th, 2013

Crafting Linked Open Data for Cultural Heritage: Mapping and Curation Tools for the Linked Jazz Project by M. Cristina Pattuelli, Matt Miller, Leanora Lange, Sean Fitzell, and Carolyn Li-Madeo.


This paper describes tools and methods developed as part of Linked Jazz, a project that uses Linked Open Data (LOD) to reveal personal and professional relationships among jazz musicians based on interviews from jazz archives. The overarching aim of Linked Jazz is to explore the possibilities offered by LOD to enhance the visibility of cultural heritage materials and enrich the semantics that describe them. While the full Linked Jazz dataset is still under development, this paper presents two applications that have laid the foundation for the creation of this dataset: the Mapping and Curator Tool, and the Transcript Analyzer. These applications have served primarily for data preparation, analysis, and curation and are representative of the types of tools and methods needed to craft linked data from digital content available on the web. This paper discusses these two domain-agnostic tools developed to create LOD from digital textual documents and offers insight into the process behind the creation of LOD in general.

The Linked Data Jazz Name Directory:

consists of 8,725 unique names of jazz musicians as N-Triples.

It’s a starting place if you want to create a topic map about Jazz.

Although, do be aware the Center for Arts and Cultural Policy Studies at Princeton University reports:

Although national estimates of the number of jazz musicians are unavailable, the Study of Jazz Artists 2001 estimated the number of jazz musicians in three metropolitan jazz hubs — New York, San Francisco, and New Orleans — at 33,003, 18,733, and 1,723, respectively. [A total of 53,459. How Many Jazz Musicians Are There?]

And that is only for one point in time. It does not include jazz musicians who perished before the estimate was made.

Much work remains to be done.

Music Information Research Based on Machine Learning

Sunday, June 16th, 2013

Music Information Research Based on Machine Learning by Masataka Goto and Kazuyoshi Yoshii.

From the webpage:

Music information research is gaining a lot of attention after 2000 when the general public started listening to music on computers in daily life. It is widely known as an important research field, and new researchers are continually joining the field worldwide. Academically, one of the reasons many researchers are involved in this field is that the essential unresolved issue is the understanding of complex musical audio signals that convey content by forming a temporal structure while multiple sounds are interrelated. Additionally, there are still appealing unresolved issues that have not been touched yet, and the field is a treasure trove of research topics that could be tackled with state-of-the-art machine learning techniques.

This tutorial is intended for an audience interested in the application of machine learning techniques to such music domains. Audience members who are not familiar with music information research are welcome, and researchers working on music technologies are likely to find something new to study.

First, the tutorial serves as a showcase of music information research. The audience can enjoy and study many state-of-the-art demonstrations of music information research based on signal processing and machine learning. This tutorial highlights timely topics such as active music listening interfaces, singing information processing systems, web-related music technologies, crowdsourcing, and consumer-generated media (CGM).

Second, this tutorial explains the music technologies behind the demonstrations. The audience can learn how to analyze and understand musical audio signals, process singing voices, and model polyphonic sound mixtures. As a new approach to advanced music modeling, this tutorial introduces unsupervised music understanding based on nonparametric Bayesian models.

Third, this tutorial provides a practical guide to getting started in music information research. The audience can try available research tools such as music feature extraction, machine learning, and music editors. Music databases and corpora are then introduced. As a hint towards research topics, this tutorial also discusses open problems and grand challenges that the audience members are encouraged to tackle.

In the future, music technologies, together with image, video, and speech technologies, are expected to contribute toward all-around media content technologies based on machine learning.

Download tutorial slides.

Always nice to start with week with something different.

I first saw this in a tweet by Masataka Goto.

Every Band On Spotify Gets A Soundrop Listening Room [Almost a topic map]

Sunday, May 12th, 2013

Every Band On Spotify Gets A Soundrop Listening Room by Eliot Van Buskirk.

From the post:

Soundrop, a Spotify app that shares a big investor with Spotify, says it alone has the ability to scale listening rooms up so that thousands of people can listen to the same song together at the same time, using a secret sauce called Erlang — a hyper-efficient coding language developed by Ericsson for use on big telecom infrastructures (updated).

Starting today, Soundrop will offer a new way to listen: individual rooms dedicated to any single artist or band, so that fans of (or newcomers to) their music can gather to listen to that bands music. The rooms are filled with tunes already, but anyone in the room can edit the playlist, add new songs (only from that artist or their collaborations), and of course talk to other listeners in the chatroom.

“The rooms are made automatically whenever someone clicks on the artist,” Soundrop head of partnerships Cortney Harding told “No one owns the rooms, though. Artists, labels and management have to come to us to get admin rights.”

In topic map terminology, what I hear is:

Using the Soundrop app, Spotify listeners can create topics for any single artist or band with a single click. Associations between the artist/band and their albums, individual songs, etc., are created automatically.

What I don’t hear is the exposure of subject identifiers to allow fans to merge in information from other resources, such as fan zines, concert reports and of course, covers from the Rolling Stone.

Perhaps Soundrop will offer subject identifiers and merging as a separate, perhaps subscription feature.

Could be a win-win if the Rolling Stone, for example, were to start exposing their subject identifiers for articles, artists and bands.

Some content producers will follow others, some will invent their own subject identifiers.

The important point being that with topic maps we can merge based on their identifiers.

Not some uniform-identifier-in-the-sky-by-an-by, which stymies progress until universal agreement arrives.

Distributed Multimedia Systems (Archives)

Tuesday, February 12th, 2013

Proceedings of the International Conference on Distributed Multimedia Systems

From the webpage:

DMS 2012 Proceedings August 9 to August 11, 2012 Eden Roc Renaissance Miami Beach, USA
DMS 2011 Proceedings August 18 to August 19, 2011 Convitto della Calza, Florence, Italy
DMS 2010 Proceedings October 14 to October 16, 2010 Hyatt Lodge at McDonald’s Campus, Oak Brook, Illinois, USA
DMS 2009 Proceedings September 10 to September 12, 2009 Hotel Sofitel, Redwood City, San Francisco Bay, USA
DMS 2008 Proceedings September 4 to September 6, 2008 Hyatt Harborside at Logan Int’l Airport, Boston, USA
DMS 2007 Proceedings September 6 to September 8, 2007 Hotel Sofitel, Redwood City, San Francisco Bay, USA

For coverage, see the Call for Papers, DMS 2013.

Another archive with topic map related papers!

DMS 2013

Tuesday, February 12th, 2013

DMS 2013: The 19th International Conference on Distributed Multimedia Systems


Paper submission due: April 29, 2013
Notification of acceptance: May 31, 2013
Camera-ready copy: June 15, 2013
Early conference registration due: June 15, 2013
Conference: August 8 – 10, 2013

From the call for papers:

With today’s proliferation of multimedia data (e.g., images, animations, video, and sound), comes the challenge of using such information to facilitate data analysis, modeling, presentation, interaction and programming, particularly for end-users who are domain experts, but not IT professionals. The main theme of the 19th International Conference on Distributed Multimedia Systems (DMS’2013) is multimedia inspired computing. The conference organizers seek contributions of high quality papers, panels or tutorials, addressing any novel aspect of computing (e.g., programming language or environment, data analysis, scientific visualization, etc.) that significantly benefits from the incorporation/integration of multimedia data (e.g., visual, audio, pen, voice, image, etc.), for presentation at the conference and publication in the proceedings. Both research and case study papers or demonstrations describing results in research area as well as industrial development cases and experiences are solicited. The use of prototypes and demonstration video for presentations is encouraged.


Topics of interest include, but are not limited to:

Distributed Multimedia Technology

  • media coding, acquisition and standards
  • QoS and Quality of Experience control
  • digital rights management and conditional access solutions
  • privacy and security issues
  • mobile devices and wireless networks
  • mobile intelligent applications
  • sensor networks, environment control and management

Distributed Multimedia Models and Systems

  • human-computer interaction
  • languages for distributed multimedia
  • multimedia software engineering issues
  • semantic computing and processing
  • media grid computing, cloud and virtualization
  • web services and multi-agent systems
  • multimedia databases and information systems
  • multimedia indexing and retrieval systems
  • multimedia and cross media authoring

Applications of Distributed Multimedia Systems

  • collaborative and social multimedia systems and solutions
  • humanities and cultural heritage applications, management and fruition
  • multimedia preservation
  • cultural heritage preservation, management and fruition
  • distance and lifelong learning
  • emergency and safety management
  • e-commerce and e-government applications
  • health care management and disability assistance
  • intelligent multimedia computing
  • internet multimedia computing
  • virtual, mixed and augmented reality
  • user profiling, reasoning and recommendations

The presence of information/data doesn’t mean topic maps return good ROI.

On the other hand, the presence of information/data does mean semantic impedance is present.

The question is what need you have to overcome semantic impedance and at what cost?

OneMusicAPI Simplifies Music Metadata Collection

Friday, February 8th, 2013

OneMusicAPI Simplifies Music Metadata Collection by Eric Carter.

From the post:

Elsten software, digital music organizer, has announced OneMusicAPI. Proclaimed to be “OneMusicAPI to rule them all,” the API acts as a music metadata aggregator that pulls from multiple sources across the web through a single interface. Elsten founder and OneMusicAPI creator, Dan Gravell, found keeping pace with constant changes from individual sources became too tedious a process to adequately organize music.

Currently covers over three million albums but only returns cover art.

Other data will be added but when and to what degree isn’t clear.

When launched, pricing plans will be available.

A lesson that will need to be reinforced from time to time.

Collation of data/information consumes time and resources.

To encourage collation, collators need to be paid.

If you need an example of what happens without paid collators, search your favorite search engine for the term “collator.”

Depending on how you count “sameness,” I get eight or nine different notions of collator from mine.

Multi-tasking with joint semantic spaces

Saturday, January 26th, 2013

Paper of the Day (Po’D): Multi-tasking with joint semantic spaces by Bob L. Sturm.

From the post:

Hello, and welcome to the Paper of the Day (Po’D): Multi-tasking with joint semantic spaces edition. Today’s paper is: J. Weston, S. Bengio and P. Hamel, “Multi-tasking with joint semantic spaces for large-scale music annotation and retrieval,” J. New Music Research, vol. 40, no. 4, pp. 337-348, 2011.

This article proposes and tests a novel approach (pronounced MUSCLES but written MUSLSE) for describing a music signal along multiple directions, including semantically meaningful ones. This work is especially relevant since it applies to problems that remain unsolved, such as artist identification and music recommendation (in fact the first two authors are employees of Google). The method proposed in this article models a song (or a short excerpt of a song) as a triple in three vector spaces learned from a training dataset: one vector space is created from artists, one created from tags, and the last created from features of the audio. The benefit of using vector spaces is that they bring quantitative and well-defined machinery, e.g., projections and distances.

MUSCLES attempts to learn each vector space together so as to preserve (dis)similarity. For instance, vectors mapped from artists that are similar (e.g., Brittney Spears and Christina Aguilera) should point in nearly the same direction; while those that are not similar (e.g., Engelbert Humperdink and The Rubberbandits), should be nearly orthogonal. Similarly, so should vectors mapped from tags that are semantically close (e.g., “dark” and “moody”), and semantically disjoint (e.g., “teenage death song” and “NYC”). For features extracted from the audio, one hopes the features themselves are comparable, and are able to reflect some notion of similarity at least at the surface level of the audio. MUSCLES takes this a step further to learn the vector spaces so that one can take inner products between vectors from different spaces — which is definitely a novel concept in music information retrieval.

Bob raises a number of interesting issues but here’s one that bites:

A further problem is that MUSCLES judges similarity by magnitude inner product. In such a case, if “sad” and “happy” point in exact opposite directions, then MUSCLES will say they are highly similar.

Ouch! For all the “precision” of vector spaces, there are non-apparent biases lurking therein.

For your convenience:

Multi-tasking with joint semantic spaces for large-scale music annotation and retrieval (full text)


Music prediction tasks range from predicting tags given a song or clip of audio, predicting the name of the artist, or predicting related songs given a song, clip, artist name or tag. That is, we are interested in every semantic relationship between the different musical concepts in our database. In realistically sized databases, the number of songs is measured in the hundreds of thousands or more, and the number of artists in the tens of thousands or more, providing a considerable challenge to standard machine learning techniques. In this work, we propose a method that scales to such datasets which attempts to capture the semantic similarities between the database items by modelling audio, artist names, and tags in a single low-dimensional semantic embedding space. This choice of space is learnt by optimizing the set of prediction tasks of interest jointly using multi-task learning. Our single model learnt by training on the joint objective function is shown experimentally to have improved accuracy over training on each task alone. Our method also outperforms the baseline methods tried and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where the semantic space captures well the similarities of interest.

Just to tempt you into reading the article, consider the following passage:

Artist and song similarity is at the core of most music recommendation or playlist generation systems. However, music similarity measures are subjective, which makes it difficult to rely on ground truth. This makes the evaluation of such systems more complex. This issue is addressed in Berenzweig (2004) and Ellis, Whitman, Berenzweig, and Lawrence (2002). These tasks can be tackled using content-based features or meta-data from human sources. Features commonly used to predict music similarity include audio features, tags and collaborative filtering information.

Meta-data such as tags and collaborative filtering data have the advantage of considering human perception and opinions. These concepts are important to consider when building a music similarity space. However, meta-data suffers from a popularity bias, because a lot of data is available for popular music, but very little information can be found on new or less known artists. In consequence, in systems that rely solely upon meta-data, everything tends to be similar to popular artists. Another problem, known as the cold-start problem, arises with new artists or songs for which no human annotation exists yet. It is then impossible to get a reliable similarity measure, and is thus difficult to correctly recommend new or less known artists.

“…[H]uman perception[?]…” Is there some other form I am unaware of? Some other measure of similarity than our own? Recalling that vector spaces are a pale mockery of our more subtle judgments.


Introduction to Recommendations with Map-Reduce and mrjob [Ode to Similarity, Music]

Saturday, August 25th, 2012

Introduction to Recommendations with Map-Reduce and mrjob by Marcel Caraciolo

From the post:

In this post I will present how can we use map-reduce programming model for making recommendations. Recommender systems are quite popular among shopping sites and social network thee days. How do they do it ? Generally, the user interaction data available from items and products in shopping sites and social networks are enough information to build a recommendation engine using classic techniques such as Collaborative Filtering.

Usual recommendation post except for the emphasis on multiple tests of similarity.

Useful because simply reporting that two (or more) items are “similar” isn’t all that helpful. At least unless or until you know the basis for the comparison.

And have the expectation that a similar notion of “similarity” works for your audience.

For example, I read an article this morning about a “new” invention that will change the face of sheet music publishing, in three to five years. Invention Will Strike a Chord With Musicians

Despite the lack of terms like “markup,” “HyTime,” “SGML,” “XML,” “Music Encoding Initiative (MEI),” or “MusicXML,” all of those seemed quite “similar” to me. That may not be the “typical” experience but it is mine.

If you don’t want to wait three to five years for the sheet music revolution, you can check out MusicXML. It has been reported that more than 150 applications support MusicXML. Oh, that would be today, not three to five years from now.

You might want to pass the word along in the music industry before the next “revolution” in sheet music starts up.

Autocompletion and Heavy Metal

Wednesday, June 13th, 2012

Building an Autocompletion on GWT with RPC, ContextListener and a Suggest Tree: Part 0

René Pickhardt has started a series of posts that should interest anyone with search applications (or an interest metal bands).

From the post:

Over the last weeks there was quite some quality programming time for me. First of all I built some indices on the typology data base in which way I was able to increase the retrieval speed of typology by a factor of over 1000 which is something that rarely happens in computer science. I will blog about this soon. But heaving those techniques at hand I also used them to built a better auto completion for the search function of my online social network

The search functionality is not deployed to the real site yet. But on the demo page you can find a demo showing how the completion is helping you typing. Right now the network requests are faster than google search (which I admit it is quite easy if you only have to handle a request a second and also have a much smaller concept space). Still I was amazed by the ease and beauty of the program and the fact that the suggestions for autocompletion are actually more accurate than our current data base search. So feel free to have a look at the demo:

Right now it consists of about 150 thousand concepts which come from 4 different data sources (Metal Bands, Metal records, Tracks and Germen venues for Heavy metal) I am pretty sure that increasing the size of the concept space by 2 orders of magnitude should not be a problem. And if everything works out fine I will be able to test this hypothesis on my joint project related work which will have a data base with at least 1 mio. concepts that need to be autocompleted.

Well, I must admit that 150,000 concepts sounds a bit “lite” for heavy metal but then being an admirer of the same, that comes as no real surprise. 😉

Still, it also sounds like a very good starting place.


Machine See, Machine Do

Friday, May 4th, 2012

While we wait for maid service robots, news that computers can be trained as human mimics for labeling of multimedia resources. Game-powered machine learning reports success with game based training for music labeling.

The authors, Luke Barrington, Douglas Turnbull, and Gert Lanckriet, neatly summarize music labeling as a problem of volume:

…Pandora, a popular Internet radio service, employs musicologists to annotate songs with a fixed vocabulary of about five hundred tags. Pandora then creates personalized music playlists by finding songs that share a large number of tags with a user-specified seed song. After 10 y of effort by up to 50 full time musicologists, less than 1 million songs have been manually annotated (5), representing less than 5% of the current iTunes catalog.

A problem that extends to the “…7 billion images are uploaded to Facebook each month (1), YouTube users upload 24 h of video content per minute….”

The authors created to:

… investigate and answer two important questions. First, we demonstrate that the collective wisdom of Herd It’s crowd of nonexperts can train machine learning algorithms as well as expert annotations by paid musicologists. In addition, our approach offers distinct advantages over training based on static expert annotations: it is cost-effective, scalable, and has the flexibility to model demographic and temporal changes in the semantics of music. Second, we show that integrating Herd It in an active learning loop trains accurate tag models more effectively; i.e., with less human effort, compared to a passive approach.

The approach promises an augmentation (not replacement) of human judgement with regard to classification of music. An augmentation that would enable human judgement to reach further across the musical corpus than ever before:

…while a human-only approach requires the same labeling effort for the first song as for the millionth, our game-powered machine learning solution needs only a small, reliable training set before all future examples can be labeled automatically, improving efficiency and cost by orders of magnitude. Tagging a new song takes 4 s on a modern CPU: in just a week, eight parallel processors could tag 1 million songs or annotate Pandora’s complete song collection, which required a decade of effort from dozens of trained musicologists.

A promising technique for IR with regard to multimedia resources.

What I wonder about is the extension of the technique, games designed to train machine learning for:

  • e-discovery in legal proceedings
  • “tagging” or indexing if you will, text resources
  • vocabulary expansion for searching
  • contexts for semantic matching
  • etc.

A first person shooter game that annotates the New York Times archives would be really cool!

Amazed by neo4j, gwt and my apache tomcat webserver

Monday, September 19th, 2011

Amazed by neo4j, gwt and my apache tomcat webserver

From the post:

Besides reading papers I am currently implementing the infrastructure of my social news stream for the new metalcon version. For the very first time I was really using neo4j on a remote webserver in a real webapplication built on gwt. This combined the advantages of all these technologies and our new fast server! After seeing the results I am so excited I almost couldn’t sleep last night!


I selected a very small bipartit subgraph of metalcon which means just the fans and bands together with the fanship relation between them. This graph consists of 12’198 nodes (6’870 Bands and 5’328 Users). and 119’379 edges.


  • For every user I displayed all the favourite bands
  • for each of those band I calculated similar bands (on the fly while page request!)
  • this was done by breadth first search (depth 2) and counting nodes on the fly

A page load for a random user with 56 favourite bands ends up in a traversal of 555’372. Together with sending the result via GWT over the web this was done in about 0.9 seconds!

See the post to see how MySQL fared.

And yes, I thought about you, Mary Jane, when I saw this post!

Music Linked Data Workshop (JISC, London, 12 May 2011)

Tuesday, May 24th, 2011

Slides from the Music Linked Data Workshop (JISC, London, 12 May 2011)

Here you will find:

  • MusicNet: Aligning Musicology’s Metadata – David Bretherton, Daniel Alexander Smith, Joe Lambert and mc schraefel (Music, and Electronics and Computer Science, University of Southampton)
  • Towards Web-Scale Analysis of Musical Structure – J. Stephen Downie (Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign), David De Roure (Oxford e-Research Centre, University of Oxford) and Kevin Page (Oxford e-Research Centre, University of Oxford)
  • LinkedBrainz Live – Simon Dixon, Cedric Mesnage and Barry Norton (Centre for Digital Music, Queen Mary University of London)
  • BBC Music – Using the Web as our Content Management System – Nicholas Humfrey (BBC)
  • Early Music Online: Opening up the British Library’s 16th-Century Music Books – Sandra Tuppen (British Library)
  • Musonto – A Semantic Search Engine Dedicated to Music and Musicians – Jean-Philippe Fauconnier (Université Catholique de Louvain, Belgium) and Joseph Roumier (CETIC, Belgium)
  • Listening to Movies – Creating a User-Centred Catalogue of Music for Films – Charlie Inskip (freelance music consultant)

Look like good candidates for the further review inbox!


Friday, May 20th, 2011

Seevl: Reinventing Music Discovery

If you are interested in music or interfaces, this is a must stop location!

Simple search box.

I tried searching for artists, albums, types of music.

In addition to search results you also get suggestions of related information.

The Why is this related? link for related information was particularly interesting. It offers a “why” additional information was offered for a particular search result.

Developers can access their data for non-commercial uses for free.

The simplicity of the interface was a real plus.

The Wekinator

Monday, April 11th, 2011

The Wekinator: Software for using machine learning to build real-time interactive systems

This looks very cool!

I can imagine topic maps of sounds/gestures in a number of contexts that would be very interesting.

From the website:

The Wekinator is a free software package to facilitate rapid development of and experimentation with machine learning in live music performance and other real-time domains. The Wekinator allows users to build interactive systems by demonstrating human actions and computer responses, rather than by programming.

Example applications:

  • Creation of new musical instruments
    • Create mappings between gesture and computer sounds. Control a drum machine using your webcam! Play Ableton using a Kinect!

  • Creation of gesturally-controlled animations and games
    • Control interactive visual environments like Processing or Quartz Composer, or game engines like Unity, using gestures sensed from webcam, Kinect, Arduino, etc.

  • Creation of systems for gesture analysis and feedback
    • Build classifiers to detect which gesture a user is performing. Use the identified gesture to control the computer or to inform the user how he’s doing.

  • Creation of real-time music information retrieval and audio analysis systems
    • Detect instrument, genre, pitch, rhythm, etc. of audio coming into the mic, and use this to control computer audio, visuals, etc.

  • Creation of other interactive systems in which the computer responds in real-time to some action performed by a human user (or users)
    • Anything that can output OSC can be used as a controller
    • Anything that can be controlled by OSC can be controlled by Wekinator


Monday, March 21st, 2011


From the website:

People have been fascinated by music since the dawn of humanity. A wide variety of music genres and styles has evolved, reflecting diversity in personalities, cultures and age groups. It comes as no surprise that human tastes in music are remarkably diverse, as nicely exhibited by the famous quotation: “We don’t like their sound, and guitar music is on the way out” (Decca Recording Co. rejecting the Beatles, 1962).

Yahoo! Music has amassed billions of user ratings for musical pieces. When properly analyzed, the raw ratings encode information on how songs are grouped, which hidden patterns link various albums, which artists complement each other, and above all, which songs users would like to listen to.

Such an exciting analysis introduces new scientific challenges. The KDD Cup contest releases over 300 million ratings performed by over 1 million anonymized users. The ratings are given to different types of items-songs, albums, artists, genres-all tied together within a known taxonomy.

Important dates:

March 15, 2011 Competition begins

June 30, 2011 Competition ends

July 3, 2011 Winners notified

August 21, 2011 Workshop

An interesting data set that focuses on machine learning and prediction.

Equally interesting would be merging this data set with other music data sets.

Thomaner Project

Tuesday, March 8th, 2011

Thomaner Project

Press coverage of a project in connection with the 800th anniversary of the famous boy choral Thomaner.

The topic map project is a database for the chorale’s repertoire from 1808 to 2008.

The German newspaper article report notes that only 20 years of the 200 year span are complete.

Funding is being sought to complete the remainder.

Not exactly the Rolling Stone or Lady Gaga is it?

Challenge to the Opera Topic Map?

Thursday, February 24th, 2011

Well, not quite. Needs topic mapping step but…, you are a little closer that before.

Data mining & Hip Hop reports:

Tahir Hemphil data mined 30 years of hip-hop lyrics to provide a searchable index of the genre’s lexicon.

The project analyzes the lyrics of over 40,000 songs for metaphors, similes, cultural references, phrases, memes and socio-political ideas.[Project] The project is one of its kind with a huge potential offering to the hip hop world, not only can you visualize the artists career’s but also have deeper analysis into their world where you can potential patternize their music.

See the post for more material and links.

How It Works – The “Musical Brain”

Sunday, February 13th, 2011

How It Works – The “Musical Brain”

I found this following the links in the Million Song Dataset post.

One aspect, among others, that I found interesting, was the support for multiple ID spaces.

I am curious about the claim it works by:

Analyzing every song on the web to extract key, tempo, rhythm and timbre and other attributes — understanding every song in the same way a musician would describe it

Leaving aside the ambitious claims about NLP processing made elsewhere on that page, I find it curious that there is a uniform method for describing music.

Or perhaps they mean that the “Musical Brain” uses only one description uniformly across the music it evaluates. I can buy that. And it could well be a useful exercise.

At least from the prospective of generating raw data that could then be mapped to other nomenclatures used by musicians.

I wonder if the Rolling Stone uses the same nomenclature as the “Musical Brain?” Will have to check.

Suggestions for other music description languages? Mappings to the one used by the “Musical Brain?”

BTW, before I forget, the “Musical Brain” offers a free API (for non-commercial use) to its data.

Would appreciate hearing about your experiences with the API.

Finding What You Want

Sunday, October 17th, 2010

The Known World, a column/blog by David Alan Grier, appears both online and in Computer, a publication of the IEEE Computer Society. Finding What You Want appears in the September, 2010 issue of Computer.

Grier explores how Pandora augments our abilities to explore the vastness of musical space. Musical retrieval systems for years had static categories imposed upon them and those work for some purposes. But, also impose requirements upon users for retrieval.

According to Grier, the “Great Napster Crisis of 1999-2001,” resulted in a new field of music retrieval systems because current areas did not quite fit.

I find Grier’s analysis interesting because to his suggestion that the methods by which we find information of interest can shape what we consider as fitting our search criteria.

Perhaps, just perhaps, identifying subjects isn’t quite the string matching, cut-n-dried, approach that is the common approach. Music retrieval systems may be a fruitful area to look for clues as to how to improve more tradition information systems.


  1. Review Music Retrieval: A Tutorial and Review. (Somewhat dated, can you suggest a replacement?)
  2. Pick two or three techniques used for retrieval of music. How would you adapt those for texts?
  3. How would you test your adapted techniques against a text collection?