On Algorithmic Curation

futurelab default header

The use of algorithms is, I think, one of the big three big elements of content curation. There were two interesting posts that I happened across on the subject yesterday.

The first, from ‘Iphoneographer‘ Chris Smith, who (late last year) attempted to reverse engineer the algorithm that selects photos for the Instagram popular page in order to determine the ‘rules’ for how photos get onto it. The goal of the algorithm, say Instagram, is to ‘surface the most recently interesting photos based on a variety of variables’, deliberately based on more than simply number of likes not least to enable new people who have fewer followers to be discovered. The secret sauce, Smith concludes (albeit using a small sample size of his own photos), is the combination of a minimum threshold in the number of likes, with a calculation based on a likes-to-followers ratio within a defined period of time following the posting of the picture (this he pegs at an average of better than 10% likes to followers for the first 40 minutes). Once on the popular page, he suggests that the length of time you stay there is based on the number of likes you get once there.

The second post was on the Netflix Tech blog, which ‘opened the doors’ on their recommendation system and talked about the myriad different data points around interest, consumption history, title popularity, context, novelty, diversity and ‘freshness’ that are designed to continuously optimise the member experience. Through a Facebook Connect integration, it’s even beginning to incorporate signals from your friends. The recommendation engine is clearly critical. Netflix famously offered $1m a few years ago to anyone that could improve the accuracy of their existing system by 10%. Since there is a high correlation between maximising consumption of video content and both member satisfaction and subscription retention, the goal of their ranking system is to ‘find the best possible ordering of a set of items for a member, within a specific context, in real-time’ and give the highest scores to titles that a member is most likely to enjoy. They have now reached a point where 75% of what people watch is from some sort of recommendation.

Surely it is only a matter of time before a significant proportion of media is perceptive, using machine learning of some kind to surface content that anticipates what we want. One of the comments on this Econsultancy post on the subject notes that “the great thing about data is that it does not need to ask questions, just observe behaviours and learn from them and apply these learnings in real-time”.

But of-course, whilst improving all the time, such content algorithms are far from perfect. Since Netflix is often used through a device which is deployed for group consumption, they have to optimise for a household that is likely to have different people with different tastes. Similar problems occur with Amazon’s recommendation system when more than one person shops through the same browser. But the personal nature of mobile devices offers up a different level of potential (look at Zite). As more consumption shifts to these devices and algorithms get more sophisticated at seamlessly recognising individuals across multiple platforms, machine learning will inevitably come into its own. 

Image courtesy

Original post: http://neilperkin.typepad.com/only_dead_fish/2012/04/on-algorithmic-curation.html