The Biases of Links

futurelab default header

by: danah boyd

I have a hard time respecting anyone who believes that science or technology is neutral. Unfortunately, even when people consciously know that they are not, they give credence to the biased outputs without questioning the underlying assumptions. This is why i'm an academic – nothing gives me greater joy than to think about what biases go into the creation of a particular system.

After reminding folks at Blogher that there are gender differences in networking habits, i decided to do some investigation into the network structures of blogs. Kevin Marks of Technorati kindly gave me a random sample of 500 blogs to play with. I began coding them based on gender (which is surprisingly easy to do given the amount of personal information people put about themselves) and looking for patterns in links and blogrolls.

I decided to do the same for non-group blogs in the Technorati Top 100. I hadn't looked at the Top 100 in a while and was floored to realize that most of those blogs are group blogs and/or professional blogs (with "editors" and clear financial backing). Most are covered in advertisements and other things meant to make them money. It's very clear that their creators have worked hard to reach many eyes (for fame, power or money?).

Here are some of the patterns that i saw*:

Blogrolls:

  • All MSNSpaces users have a list of "Updated Spaces" that looks like a blogroll. It's not. It's a random list of 10 blogs on MSNSpaces that have been recently updated. As a result, without special code (like in Technorati), search engines get to see MSNSpace bloggers as connecting to lots of other blogs. This would create the impression of high network density between MSNSpaces which is inaccurate.
  • Few LiveJournals have a blogroll but almost all have a list of friends one click away. This is not considered by search tools that look only at the front page.
  • Bloggers who use hosting services tend to link to only others on the same hosting service (from the blogrolls on Xanga and Rakuten to the friend links on LJ). The blogroll structure on these is often set up to only accept lists of blogs from that service.
  • Blogrolls seem to be very common on politically-oriented blogs and always connect to blogs with similar political views (or to mainstream media).
  • Blogrolls by group blogging companies (like Weblogs, Inc.) always link to other blogs in the domain, using collective link power to help all.
  • A fraction of the Top 100 have blogrolls of blogs. Some have blogrolls that are a link away (like Crooked Timber). Quite a few use that space to advertise or link to mainstream media or companies.
  • Male bloggers who write about technology (particularly social software) seem to be the most likely to keep blogrolls. Their blogrolls tend be be dominantly male, even when few of the blogs they link to are about technology. I haven't found one with >25% female bloggers (and most seem to be closer to 10%).
  • On LJ (even though it doesn't count) and Xanga, there's a gender division in blogrolls whereby female bloggers have mostly female "friends" and vice versa.
  • I was also fascinated that most of the mommy bloggers that i met at Blogher link to Dooce (in Top 100) but Dooce links to no one. This seems to be true of a lot of topical sites – there's a consensus on who is in the "top" and everyone links to them but they link to no one.
  • I also get the impression that blogrolls are not frequently updated (although i have to imagine that the blogs one reads are). I wonder how static blogrolls are.

 Linking patterns:

  • The Top 100 tend to link to mainstream media, companies or websites (like Wikipedia, IMDB) more than to other blogs (Boing Boing is an exception).
  • Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends' list). It looks like there's a gender split in tool use; Mena said that LJ is like 75% female, while Typepad and Moveable Type have far fewer women.
  • Bloggers often talk about other people without linking to their blog (as though the audience would know the blog based on the person). For example, a blogger might talk about Halley Suitt's presence or comments at Blogher but never link to her. This is much rarer in the Top 100 who tend to link to people when they reference them.
  • Content type is correlated with link structure (personal blogs contain few links, politics blogs contain lots of links). There's a gender split in content type.
  • When bloggers link to another blog, it is more likely to be same gender.

 I began this investigation curious about gender differences. There are a few things that we know in social networks. First, our social networks are frequently split by gender (from childhood on). Second, men tend to have large numbers of weak ties and women tend to have fewer, but stronger ties. This means that in traditional social networks, men tend to know far more people but not nearly as intimately as those women know. (This is a huge advantage for men in professional spheres but tends to wreak havoc when social support becomes more necessary and is often attributed to depression later in life.)

While blog linking tends to be gender-dependent, the number of links seems to be primarily correlated with content type and service. Of course, since content type and service are correlated by gender, gender is likely a secondary effect.

Interestingly, there are distinct clusters of norms wrt linking in blogging, not a coherent and consistent one. The search engines (and the Technorati 100 and PubSub's Daily 100 Top Links) are validating one of those clusters, regardless of whether or not that is what searchers are looking for. The Top 100 is a list of blogs who either fit into those norms or have adopted those norms in their patterns (most commonly the companies).

I also want to point out a few other issues in link biases that are relevant here:

  • All links are created equal. All relationships are not. Treating everything like a consistent weak tie is quantity over quality and in social networks, that means male over female.
  • When the data being measured has inconsistent structure rules, any ranking metric is inherently flawed. In blogs, there's no consistency for what a link means, no consistent social norms for blogrolls, no agreed-upon links norms. Metrics inherently squish out this nuance and force all of the square pegs into the round holes.
  • Links indicate no weight, no valence, no attributes. I know Technorati has asked folks to indicate positive/negative in their links or to use nofollow, but few do this. And even if people did, that kind of articulation is a social disaster (::cough:: think Friendster).
  • Traditionally, there is power in keeping your black book shut; one's position in a network can be quite powerful. You get kudos by helping two unconnected people. You can limit information flow and acquire credit when you take something from one group to another. (This is the basis for some interesting work on creativity – creativity is when bridges connect information from disparate worlds.) While some think that transparency is good, some hide their network to maintain power. For example, if as a blogger, you provide "cool links," you want others to read you, not the collection of people you read. Of course, a reasonable counter argument is that this person is no longer needed as a bridge, but as a curator. Still, some people hide so that they must be asked for recommendations directly and thus can control who they send people to. (Note: this is a particular kind of power move; transparency can also be a power move by through gifting.)
  • There are social consequences to linking structures and those who have a lot of eyes on them are probably more aware of the consequences of their linking habits. This is another reason why people with a lot of eyes may get rid of blogrolls. Having to negotiate lots of requests for links can be a real turn-off.
  • People will try to manipulate any ranking if there is an advantage to being up top. Static measurement algorithms cause harm to the entire community that is being measured. Web search engines know this, but it's equally critical for blog search.

 These services are definitely measuring something but what they're measuring is what their algorithms are designed to do, not necessarily influence or prestige or anything else. They're very effectively measuring the available link structure. The difficulty is that there is nothing consistent whatsoever with that link structure. There are disparate norms, varied uses of links and linking artifacts controlled by external sources (like the hosting company). There is power in defining the norms, but one should question whether or companies or collectives should define them. By squishing everyone into the same rule set so that something can be measured, the people behind an algorithm are exerting authority and power, not of the collective, but of their biased view of what should be. This is inherently why there's nothing neutral about an algorithm.

While i've been looking into the linking patterns, Mary Hodder has been thinking through new metrics for measurement. These are very important but not because one is better than the other. In fact, if we all switched to any of her metrics, we'd have just as many biases as we have now. And many of the Top blogs would try to figure out how to get rank in that system. The significance lies in the ability to offer choice.

Of course, choice is difficulty. Lots of people want to know what the "best" one is and don't want to think about the metrics behind it (yes, these are the "neutral" people). Unfortunately, many of those types have a lot of power that motivate people to want their attention. The press want a list of the best and many bloggers want the attention of the press and thus want to be listed among the best. Breaking this cycle is virtually impossible, but it how power maintains power. And in our current system, we are doing a damn fine job of replicating the power structures that pervade everyday life under the auspices of creating a new system that usurps power. Ah, what fun.

Still, i think it's critical to work on new metrics so that we can at least start showing alternate ways of organizing information if for no other reason than to push back against the conception of neutrality. And thus, i'm stoked to help Mary out and i would encourage everyone else interested in altering the power structure to do so as well.

At the least, i do think we need to really think about what is at stake and what we're inadvertently supporting through our current systems. Are these the power structures that we want to maintain? Because there's nothing neutral about our technological choices.

* Note: these are patterns, not findings. The methodology used here is not solid enough for findings. I am not offering quantitative data because i want it to be clear that these are trends based on tracking patterns. Think of them as guesstimated hypotheses (and i'd be ecstatic if someone would compute them).

Updated: Related Links

Note: i don't agree with the points of all of the related posts but i do think they're important to consider and i want to respond more broadly when i can. In the meantime, i figured that those interested in this post should know about them.

Original Post: http://www.zephoria.org/thoughts/archives/2005/08/07/the_biases_of_links.html