Content-based Recommendations

futurelab default header

by: David Jennings

I get a fair number of people approaching me to tell me that their music recommender system is the best because of [insert special secret sauce here]. Usually this doesn’t go much further: after all, the sauce is secret and can’t be shared; so I say I’ll be interested to keep in touch with their progress, and I bite my lip to resist repeating my sceptical view that any recommender system only has to be good enough to keep people coming back for more recommendations.

In the case of Berlin-based, launching in private beta today, the story is slightly different, as they sent me all their publicity release information, and Petar Djekic was willing to talk to me on the record as it were. They even gave me an invite code to give away — that’s my disclosure out of the way! grew out of the Fraunhofer Institute, also the birthplace of the MP3 format, and the technically interesting part of what they’re doing builds on that strong research base in audio and acoustics. What does is known as content-based filtering rather than collaborative user-based filtering. In other words, rather than saying "people who like this artist/song also like this artist/song", it says "this song is similar in important ways to this song, so if you like one, you may like the other". Or in other words, think more like Pandora than’s secret sauce is that they analyse tracks automatically, and extract 40 characteristics for each one. So whereas Pandora’s human-driven analysis is labour-intensive,’s can quite easily scale up to enormous numbers of tracks, just by throwing more computing power at them. (Other services that use this automated acoustic fingerprinting as the basis for recommendations include the MusicIP Mixer and AMG’s Tapestry — see my interview with AMG’s Zac Johnson.) [Update, 8 October 2008: Zac corrects me, pointing out that while AMG’s LASSO technology does automated song recognition, this is not used for the recommendations in Tapestry, which are informed by human analysis more akin to Pandora.]

Pandora’s analysis is based on literally hundreds of attributes, which, according to Wikipedia, make very fine-grained distinctions between, for example, "Lyrics by a Famous Rap Artist", "Lyrics by a Rap Icon" and "Lyrics by a Respected Rap Artist". Petar Djekic explained that’s characteristics are determined from mathematical, rather than critical/aesthetic, analysis — so while some of the 40 dimensions map clearly onto culturally established terms like vocals, harmony, and tempo, some are ‘purely statistical’, which means there is no simple way of naming them without inventing new terms.

Petar argues that recommender systems that work just at artist level are often ineffective. If someone says they like David Bowie, what should you recommend to them? Something that sounds like Hunky Dory or Low? Tin Machine?! I tried different starting points for Neil Young recommendations. I started with Young’s Southern Man, a guitar-heavy song, which turned out to be a dead end: "Sorry, no tracks similar to Southern man available". So I switched to Young’s soft piano ballad, Philadelphia. Startlingly, the majority of the recommendations were piano-led classical lieder. But, yes, I could discern a similarity to Philadelphia, even if these pieces had little or no connection to the rest of Young’s oeuvre — and I might even have liked them if I’d had a chance to see more. I tried again on the guitar angle with Rockin’ in the Free World. There must be some glitches in’s metadata because I was recommended a version of the same song by The Alarm, but on listening to the 30-second sample it was evidently Neil Young himself.

Yes, 30-second samples. Petar told me that the team is sticking to the competences they know best, and not entering the music licensing minefield at the moment. Which is understandable, and possibly a wise move, but what does it mean for their service? As I’ve argued before, the user experience of hopping between 30-second samples of multiple tracks is not a satisfying one. You’d have to be a hardcore music forager to put up with it (even with my professional interest, I don’t think I’ve ever lasted more than ten minutes on such a service).

Therefore the most promising applications of seem to me to be, first, licensing their recommender technology. Petar told me that their Software Developer Kits work cross-platform (though the downloadable Mufin MusicFinder app is Windows-only for the foreseeable future), and licensing is an avenue they will be exploring actively. The second route is embedding the music recommender application in other sites: Petar talked me through a demo of using Mufin embedded in a MySpace profile, and it was impressive how the full functionality was available while still retaining reasonable usability (in my short experience of it). According to the media release, users can also share new discoveries with friends in their social network.

Original Post: