"This is the first time the world has seen this scale and quality of data about human communication" Cameron Marlow, Facebook

Social Networks are of-course giant data gathering machines, and Facebook is the bucket-wheel excavator of data. I wonder if we're even coming close to imagining the potential of how it all might be applied.

This fascinating Technology Review piece by Tom Simonite at least hints at this potential in looking at the work of Facebook's Data Science Team, a team that is not short of raw material to mine. Facebook, he says, has 'the most extensive data set ever assembled on human social behaviour'. If it were a country, it may be the 3rd largest in the world but it would also 'far outstrip any regime past or present in how intimately it records the lives of its citizens'.

Facebook has not only the best mapping of human connections (over 125 Billion of them) we've ever had and the (sometimes in depth) profile data of almost a billion people, but data from the daily interactions and social activity of a large proportion of those users (and what Grant McCracken once called 'exhaust data'), the (now more than) 30 Billion pieces of content shared each month, the 300 million photos uploaded per day, the facial recognition, the location tagging, the historical time lines, the gaming, the app store. Then there's the 3.2 billion Likes and Comments generated by Facebook users every day and the huge distributed platform from the millions of Like buttons served up daily around the web (including, potentially, the capability to track the browsing history of Facebook users even when they're not logged on). And the frictionless sharing apps that have neatly switched the sharing default from active to passive and dramatically increased the data potential in the process. When an Austrian law student asked facebook to send him all the data they have stored about him, he was sent a CD with 1222 PDF files on it.

So Facebook knows a lot. What I find most interesting though about studies that are conducted using Facebook data is their sheer scale. The University of Milan 'four degrees of separation' study for example, that concluded that any two people on Facebook are, on average, separated by no more than 4.74 connections, was based on analysis of 69 billion friend connections among 721 million active Facebook users. Similarly, the "Gross National Happiness" behavioural model developed by the Facebook data team analysed positive and negative words in status updates from hundreds of millions of people to estimate the happiness of people on Facebook in a wide variety of different countries, with some fascinating results. And the Infomation Diversity in Networks work (that considered the role of strong and weak ties in the spread of information) used a large scale field experiment involving over 250 million people.

Clearly a big part of this is about developing new revenue generating models for Facebook so it's small wonder that the company is planning to double the headcount of the Data Science team over the next year. For me though, the really exciting potential around this unprecedented scale of data and Facebook's capability to store, structure and interogate it, is not just around the development of new kinds of business and communications models, but about how it might help advance social science and what we know about human behaviour itself.

Original post: http://neilperkin.typepad.com/only_dead_fish/2012/06/what-facebook-knows.html