The Problem with Automated Sentiment Analysis

Sentiment analysis is a complex beast. Even for humans. Consider this statement: “The hotel room is on the ground floor right by the reception”. Is that neutral, or is it positive or negative? Well the answer is probably that it is different things to different people. If you want a high room with a view away from the noise or reception the review is negative. If have mobility issues and need a room with easy access it is positive.

And for many people it would just be information and so neutral. Sentiment analysis is difficult even in human analysts in ambiguous or more complex situations. For social media monitoring tools it is also complicated and not always as simple or as clear-cut as we might like or expect.

As part of our review of social media monitoring tools we compared their automated sentiment analysis with the findings of a human analyst, looking at seven of the leading social media monitoring tools – Alterian, Brandwatch, Biz360, Neilsen Buzzmetrics, Radian6, Scoutlabs and Sysomos. And the outcome suggests that automated sentiment analysis cannot be trusted to accurately reflect and report on the sentiment of conversations online.

Understanding where automated sentiment analysis fails

On aggregate, automated sentiment analysis looks good with accuracy levels of between 70% and 80% which compares very favourably with the levels of accuracy we would expect from a human analyst. However this masks what is really going on here. In our test case on the Starbucks brand, approximately 80% of all comments we found were neutral in nature. They were mere statements of fact or information, not expressing either positivity or negativity. This volume is common to many brands and terms we have analysed we would typically expect that the majority of discussions online are neutral. These discussions are typically of less interest to a brand that wants to make a decision or perform an action on the basis of what is being said online. For brands the positive and negative conversations are of most importance and it is here that automated sentiment analysis really fails.

No tool consistently distinguishes between positive and negative conversations

When you remove the neutral statements, automated tools typically analyse sentiment incorrectly. In our tests when comparing with a human analyst, the tools were typically about 30% accurate at deciding if a statement was positive or negative. In one case the accuracy was as low as 7% and the best tool was still only 48% accurate when compared to a human. For any brand looking to use social media monitoring to help them interact with and respond to positive or negative comments this is disastrous. More often than not, a positive comment will be classified as negative or vice-versa. In fact no tool managed to get all the positive statements correctly classified. And no tool got all the negative statements right either.

Why this failing matters to brands

This real failing of automated sentiment analysis can cause real problems for brands, especially if they are basing any internal workflow or processes on the basis of your social media monitoring. For example, imagine that you send all your negative conversations to your Customer Care team to respond to where relevant. If two-thirds (or maybe more) of the ‘negative’ conversations sent over are actually positive then this process starts to break down. Perhaps more importantly, a lot of the negative conversations will never make it to the Customer Care team in the first place (having been incorrectly classified as positive). Unhappy customers don’t get routed to the right people and don’t get their problems dealt with. The complete opposite of why many of our clients want to use social media monitoring in the first place.

So what can we do

As with any test, our experiment with the Starbucks brand won’t necessarily reflect findings for every brand and term monitored online. Our test was for a relatively short time period and we only put a randomised, but relatively representative, sample of conversations through human analysis. However, even with these limitations, we were surprised by the very high level of inaccuracy shown by the social media monitoring tools investigated. For businesses looking to make decisions or perform actions on the basis of a conversation being positive or negative this is potentially quite dangerous.

Of course there is much that can be done here and over time the tools can be trained to learn and to improve how they assess conversations about a given brand. But the overall message remains: automated sentiment analysis fails in its role of helping brands to make real decisions and to react to conversations about it online.

Read the other posts from our social media monitoring review 2010.

Original Post: http://www.freshnetworks.com/blog/2010/05/the-problem-with-automated-sentiment-analysis/

data analysis, Matt Rhodes, social media, social media monitorization
May 31, 2010

Matt Rhodes

Occupation: Head of Client Services
Organisation: FreshNetworks

Profile

Entrepreneurial consultant and social media expert, building an online communities business as part of the award-winning UK research firm, FreshMinds
Specialises in sustainable customer engagement, using social networking and online communities to bring customers and stakeholders closer to the brand
Building models for word of mouth and advocacy measurement
Background in market research and strategy consulting – working across the public and private sectors
Regular conference speaker on insight, customer engagement and social media
Linguist and travel-obsessive (currently learning Japanese…though please don’t test me just yet!)

Personal credo: Online communities, insight, innovation, word of mouth, social networks, customer engagement, research 2.0

Blog(s)

FreshNetworks Blog

About this blog

This blog reflects the personal opinions of individual contributors and does not represent the views of Futurelab, Futurelab’s clients, or the contributors’ respective employers or clients.

The Problem with Automated Sentiment Analysis

Understanding where automated sentiment analysis fails