Microsoft has deprecated the Hive plug-in for Excel (which is a little disappointing—it was never production ready, but it did offer some really nice functionality). If you download and install the latest driver the Hive plugin will actually go away. So I started attempting to use the connection via ODBC in Excel.
So, I configured a new System DSN in ODBC settings using the default Hive driver and port. This is a local HDInsight installation—I’m still having issues connecting to my Linux cluster, but will work that out in a separate post.
I accept the default port which is 10000. This has worked in the past with HDInsight (I thought—then I reread my documentation). One thing to note—there is no “Test Connection” button in the DSN configuration, there is one in Excel, but not here. If anyone from Microsoft is reading this, a connection test would be a really nice feature to include in the next revision of the driver.
Once in Excel, I go through the process of creating a new ODBC connection though the Data Connection Wizard.
The wizard finds my connection called “hive”. However, a Data Link properties box is launched when the connection is unable to be made directly. If I execute a “Test Connection” I get the error batch size is not set or invalid.
That error was pretty cryptic—fortunately I had tracing turned on for the connection. The log revealed the below:
[DEBUG]-> ThreadID 10716 # 08/15/13 09:31:51 # src\Conn.c # SQLDriverConnectW # 1065 # DB = default, HOST = 10.10.50.111, PORT = 10001, PATH = , FRAMED = 0
[ERROR]-> ThreadID 10716 # 08/15/13 09:31:52 # src\Conn.c # ConnectToHive # 346 # 08001 : Could not connect client socket. Details: <Host: 10.10.50.111 Port: 10001> endpoint: 10.10.50.111, error: connection_refused
[ERROR]-> ThreadID 10716 # 08/15/13 09:31:52 # src\Conn.c # ConnectToHive # 348 # 08001 : Unable to establish connection with hive server
When I googled the error, I found this post which suggested pointing to port 10001 instead of the default 10000. SoI fixed it—changing port to 10001.
And success—we have Hive data into Excel and we can load into Power Pivot if needed. However, my one bit of disappointment with the Hive ODBC driver compared the Hive plugin for Excel is that there is no interactive query mode, it’s pretty much an all or nothing proposition. Ideally, I would like to see that functionality return into the driver.