They Said What?
06 Aug 2015 | by James Alty
5 Ways to Squeeze Information out of Tweets, Comments and Reviews. Through posted comments, tweets and reviews your customers can provide you with a lot of useful feedback. It has become essential to rapidly glean information from textual data in order to respond to emerging issues and trending topics.
In this short post I’ll summarise different approaches to analysing text. The follow up article “Words Worth – Extracting Meaning and Sentiment from Textual Data” shows how to apply these techniques using FastStats.
1. Selection
Selection uses a query to count or extract all the text that includes a certain word, a set of words, or a wildcard pattern.
For example an airline might wish to count, prioritise and respond to any tweets that mention delay or delayed.
My text is from movie reviews but you get the idea. Of course this is most useful when you know the words that you are looking for.
2. Word Frequencies
When there is a large volume of text items to consider the first task is to extract which words are occurring most frequently in order to get an idea of the important topics. This involves “shredding” the text into words, calculating the word frequencies and ranking. The results can be displayed in a table or Word Cloud which uses font size to reflect the relative frequencies.
This can work well when comparing reviews of two products or trying to distil the essence of a large number of reviews. The two word clouds above are for horror films and comedies. In practice you need to be able to exclude very common words and also those that are generic in your topic area e.g. “hotel” for a holiday company. To analyse hashtags you obviously need to filter to only include words prefixed by #.
3. Scoring Words
When we read a review we can use our knowledge of language, the subject area and the audience to quickly judge whether it expresses a good, bad or neutral opinion. Faced with large quantities of reviews it would be useful to automate this process. One approach is to train a model to recognise the keywords associated with known good or bad reviews then use this model to “score” new reviews.
The keywords and coefficients define the model. For example using our knowledge of language, and some gut feeling, we could assign the following values to keywords
great = 8
unique = 8
funny = 6
laugh = 6
overrated = -5
bad = -8
boring = -10
So with this simple model a review that reads:
"great film with a uniquely funny twist" scores 8 + 6 + 8 = 22
while
"overrated - a few laughs but so boring" scores -5 +6 -10 = -9
The model is improved by basing the coefficients on the odds of the keyword appearing in good vs. bad reviews. Using this technique FastStats can calculate the coefficient estimates from a training set of reviews.
4. Modelling Language
The problem with scoring individual words is that we are missing all the subtlety of language. Scoring “bad” as -8, for example, might make sense overall but there is a big difference between “exceptionally bad”, “plain bad” and “really not bad at all”.
The model can therefore be improved by recognising “qualifier” words such as “very”, “rather”, and “not” which amplify, reduce or even negate the scoring impact of the keyword.
FastStats text model allows for a pre-scored table of qualifier words. Of course we can’t hope to model all the complexity of language and will certainly be defeated by sarcasm. However within a limited subject area where reasonably simple language is the norm it is possible to get a reasonable level of automation.
5. Subject Sentiment
A similar word scoring strategy can be used in the scenario where you are interested in the reviewer’s sentiment around a particular subject. For example if your company produces bicycles and your latest model has changed from an aluminium to a steel frame you might be sensitive to whether the adjective qualifier words near “frame” are expressing positive or negative sentiment.
If you detect a lot of negative opinion then you might try to counter that with your own expert reviews and news releases.
Takeaways:
- Extracting the most frequently used words in reviews can give you actionable insight into customer trends and sentiment.
- Through automated word scoring you can judge the sentiment behind customer reviews and respond accordingly.
- Word scoring around subject sentiment enables you to judge how well a new product or service has been received in the market.