Marketing & Strategy Innovation

The Problem with Automated Sentiment Analysis

by Matt Rhodes on 31 May, 2010 - 18:38

Sentiment analysis is a complex beast. Even for humans. Consider this statement: “The hotel room is on the ground floor right by the reception”. Is that neutral, or is it positive or negative? Well the answer is probably that it is different things to different people. If you want a high room with a view away from the noise or reception the review is negative. If have mobility issues and need a room with easy access it is positive.

And for many people it would just be information and so neutral. Sentiment analysis is difficult even in human analysts in ambiguous or more complex situations. For social media monitoring tools it is also complicated and not always as simple or as clear-cut as we might like or expect.

As part of our review of social media monitoring tools we compared their automated sentiment analysis with the findings of a human analyst, looking at seven of the leading social media monitoring tools – Alterian, Brandwatch, Biz360, Neilsen Buzzmetrics, Radian6Scoutlabs and Sysomos. And the outcome suggests that automated sentiment analysis cannot be trusted to accurately reflect and report on the sentiment of conversations online.

Understanding where automated sentiment analysis fails

On aggregate, automated sentiment analysis looks good with accuracy levels of between 70% and 80% which compares very favourably with the levels of accuracy we would expect from a human analyst. However this masks what is really going on here. In our test case on the Starbucks brand, approximately 80% of all comments we found were neutral in nature. They were mere statements of fact or information, not expressing either positivity or negativity. This volume is common to many brands and terms we have analysed we would typically expect that the majority of discussions online are neutral. These discussions are typically of less interest to a brand that wants to make a decision or perform an action on the basis of what is being said online. For brands the positive and negative conversations are of most importance and it is here that automated sentiment analysis really fails.

No tool consistently distinguishes between positive and negative conversations

When you remove the neutral statements, automated tools typically analyse sentiment incorrectly. In our tests when comparing with a human analyst, the tools were typically about 30% accurate at deciding if a statement was positive or negative. In one case the accuracy was as low as 7% and the best tool was still only 48% accurate when compared to a human. For any brand looking to use social media monitoring to help them interact with and respond to positive or negative comments this is disastrous. More often than not, a positive comment will be classified as negative or vice-versa. In fact no tool managed to get all the positive statements correctly classified. And no tool got all the negative statements right either.

Why this failing matters to brands

This real failing of automated sentiment analysis can cause real problems for brands, especially if they are basing any internal workflow or processes on the basis of your social media monitoring. For example, imagine that you send all your negative conversations to your Customer Care team to respond to where relevant. If two-thirds (or maybe more) of the ‘negative’ conversations sent over are actually positive then this process starts to break down. Perhaps more importantly, a lot of the negative conversations will never make it to the Customer Care team in the first place (having been incorrectly classified as positive). Unhappy customers don’t get routed to the right people and don’t get their problems dealt with. The complete opposite of why many of our clients want to use social media monitoring in the first place.

So what can we do

As with any test, our experiment with the Starbucks brand won’t necessarily reflect findings for every brand and term monitored online. Our test was for a relatively short time period and we only put a randomised, but relatively representative, sample of conversations through human analysis. However, even with these limitations, we were surprised by the very high level of inaccuracy shown by the social media monitoring tools investigated. For businesses looking to make decisions or perform actions on the basis of a conversation being positive or negative this is potentially quite dangerous.

Of course there is much that can be done here and over time the tools can be trained to learn and to improve how they assess conversations about a given brand. But the overall message remains: automated sentiment analysis fails in its role of helping brands to make real decisions and to react to conversations about it online.

Read the other posts from our social media monitoring review 2010.

Original Post: http://www.freshnetworks.com/blog/2010/05/the-problem-with-automated-sentiment-analysis/

Share this
 

3 comments

Brad Einarsen says:

15 Nov 2010, 16:46

This is a great post with significant findings.

My use of tools for clients has led me to also believe that sentiment analysis isn't just wrong, it's nearly reversed (so many "positive" comments are negative because of sarcasm, etc.). The most troubling finding was that accuracy is being advertised on the 70-80% of neutral comments as this seems fairly misleading to me.

Thanks for adding to the discussion.

Frank Strong says:

15 Nov 2010, 16:48

You make a great point in your lead, "Sentiment analysis is a complex beast. Even for humans." Both technology and people don't always do well to interpret sarcasm or humor in the written word. My own experience with NLP shows it's accurate about 80% of the time, and for those that are not, a tool that enables one to make manual or batch edits is a good remedy.

Jan Van den Bergh says:

02 Jun 2010, 06:28

There are sites like Holaba.com.cn (in China and only in Chinese) that ask reviewers to first give a score on the "One Question" and then write a review if they want. That way the reviewers indicate themselves whether their comment is positive, negative or neutral. The neutrals do not influence the recommendation power of brand.
On the other hand measuring on line comments is measuring only 10-15% of the recommendations. The vast majority of recommendations are done face2face at the office, among friends, in bars, restaurants and at family gatherings. They are immeasurable.

Add your comment

The content of this field is kept private and will not be shown publicly.
Mollom CAPTCHA (play audio CAPTCHA)
Type the characters you see in the picture above; if you can't read them, submit the form and a new image will be generated. Not case sensitive.

Author

Click here to view more information about the author.

 

Recent content

  • Extreme Trust: Can Honesty Be a Means of Competitive Advantage? (part 2): I value what Don Peppers and Martha Ro... http://t.co/iF8cUPT3
    22 hours 10 min ago
  • Tell Me Something That Matters To Me: The future of social communication is mobile, at least if you believe the ... http://t.co/g6f3LLy8
    1 day 17 hours ago
  • Presentation Skills I Learnt From Pecha Kucha: I was shocked how hard it was preparing for Pecha Kucha Night. Ev... http://t.co/ECqVsJMD
    2 days 9 min ago
  • Extreme Trust: Can Honesty Be a Means of Competitive Advantage? (Part I): I enjoy reading what Don Peppers and M... http://t.co/37LViluO
    2 days 22 hours ago
  • A Multi-Platform Social Media Strategy Increases Facebook Engagement by 50%+: An occasional hurdle you face in s... http://t.co/x6g1nJsh
    2 days 23 hours ago

This blog reflects the personal opinions of individual contributors and does not represent the views of Futurelab, Futurelab's clients, or the contributors' respective employers or clients.

Subscribe



Follow us on

Archive