When talking about Big Data one of the possible interesting data sources are social media data. In this short 2 part series I will show 3 approaches to Social Media Analytics based on available Microsoft BI technologies.
In this second part I will cover two different approaches to Sentiment Analysis – the process of automated assessment of the content. First I will show how PowerBI can help with the Self-Service approach to do sentiment based on Facebook user-postings and then sketch a scaled out server solution based on Twitter postings.
Let’s start with PowerBI and the Self-Service approach. PowerBI is a new BI Suite from Microsoft covering the hole BI process from gathering data from different sources, possibly transform the data, build In-Memory Cubes and display the data e.g. in highly interactive dashboards.
To get data from Facebook you can use the PowerBI component PowerQuery which has a Facebook adapter:
For this example we did choose one of the Facebook Fan pages of E-Plus – a german telecommunication company. With PowerQuery you can find and filter the user-postings. After this you need the sentiment which can be provided via an external API. The API request can be wrapped in a PowerQuery Function:
The API returns the sentiment for a complete posting in form of a float data value ranging from negative (-1) to positive (1). You could also dig deeper and show key influence words and so on but for this demonstration we will keep it simple.
One of the nice features of PowerQuery is that you can add custom columns with which the function call can be realized:
After this you can build an In-Memory Cube with PowerPivot and analyze the data via Excel-Pivot or PowerView:
The second scenario I would like to present is a scaled out solution. Of course this needs servers and databases that resides on these – but I if you make use of the Microsoft cloud offerings on Azure you can set up the following scenario within one day! The scenario goes as follows: you want to monitor public opinion on your products, let’s take the Microsoft products Bing, Xbox, Skype, SharePoint and the brand name “Microsoft” for an example. With the Twitter Streaming API it is possible to get all the Twitter postings related to these key words as they are posted. These Twitter postings can then be enriched with sentiment via an external API (equivalent to the PowerBI scenario described above) and then be routed through the Microsoft product StreamInsight. This complex event processing tool can evaluate date on the fly without the necessity to store the data and provides a real-time dashboard. Based on HTML5 and Websockets this dashboard updates in realtime without having to refresh the dashboard website.
The data stream is then stored in an Azure Blob Storage and Hadoop on Azure for historical analysis.
If you are interested in the possibilities of Twitter analysis there is a short cut ready to play for you, you can download an Excel file where you can play with search terms and the resulting (sample extract) of Twitter postings within Excel, making use of PowerPivot and PowerView.