Jens Kröhnert

HDInsight is Microsoft’s distribution of Apache Hadoop – an open source product and one of the key factors for the Big Data Hype. Microsoft partnered with Hortonworks and HDInsight stacks above Hortonworks distribution of Hadooop – the Horton Data Platform (HDP) – and is available as an on-premise installation or in the Microsoft cloud, called Azure.

Tableau is one of the key players for the Visual BI trend, a quite new player in data analytics and visualization but already in the leader quadrant of Gartner’s Magic Quadrant.

One strength of Hadoop is the possibility to store huge amounts of (maybe semi-/unstructured) data a lot less costly than in relational databases, it allows to store data that you don’t know the concrete value of information before, just store and search for potential value afterwards.

So you need a tool to analyze unknown data and search for patterns or other relevant information. This is a key strength of Tableau – most moves while ad-hoc queries are dragged and dropped to analyze data are represented visually, so you get a fast understanding and insight of the data.

Sound as if the two belong together – let’s try to connect them. The demo scenario is based on a web log file that is imported to HDInsight and can be queried via Hive.

clip_image002

First Attempt:

Let’s check the long list of supported data sources within Tableau 8.0. There are some Hadoop distributions listed like Cloudera and Hortonworks, no HDInsight. But wait… Hortonworks Data Plattform is the basis for HDInsight. So let’s try to use the Hortonworks datasource….. without success.

Second Attempt:

Searching the web I found this announcement: Simba Provides HDInsight Big Data Connectivity and enables BI Tools like … Tableau to access HDInsight. (News from June 26, 2013). Doesn’t seem to help right now, no (free) download or other info how to implement.

Third Attempt: (works in the end)

Try the ODBC drivers Microsoft deploys with its HDInsight Plattform. For Microsoft clients like Excel they are used the following way:

1) Install the Microsoft HiveODBCDriver (64 or 32bit)

2) Set up a System DSN based on the driver

3) Connect from e.g. Excel via ODBC and choose the System DSN

Here are my experiences while connecting Tableau to HDInsight:

1) Install the Microsoft HiveODBCDriver (64 or 32bit)

2) Set up a User DSN based on the driver (System DSN doesn’t show up in the select box of ODBC)

3) Connect from Tableau via ODBC and choose the User DSN and the relevant Hadoop/Hive “table”

4) Receive a warning about limited support of this driver and potential problems

5) Successfully connect to the table

6) Try the live-connect to the data: only dimensions can be dragged to the result pane

7) Try to work on an extract of the data: works good, even with a filtered extract

image

With the Microsoft ODBC Drivers for Hive deployed with HDInsight a connect from Tableau to HDInsight can be made and data stored in the distributed filesystem of Hadoop can be analyzed. (Only via Tableau data extracts). It works but I guess in the future there will be a specialized Tableau data connector for HDInsight like there is for Cloudera or Hortonworks today.