I gave a presentation on Hadoop to the Toronto SQL Server User’s Group this week—it was a really good time, and I got a ton of questions covering a wide range of areas. One thing to note—distrubution of data in the cluster is managed by the name node, which also does the job task management. I did get one question—that I think was more theoretical than anything, but I wanted to put it here for posterity.
“Can we mix Windows and Linux nodes in Hadoop cluster?”
One of the things I love about Hadoop is ease of deployment—the clustering is all network (no shared storage) and clustering is inherit in the software, there is nothing to configure. So you won’t see tips from me on how to add disk to your Hadoop cluster—you just tell the cluster where your new node is and what it’s IP address is and your set.
Back To the Question
I haven’t worked with Hortonworks’ Windows distribution of Hadoop much—so I don’t even know if it’s theoretically possible to mix nodes. But just don’t do it—a) I would never mix even versions (e.g. Windows 2003/2008—and yes I know you can’t) in any cluster computing scenario. So to mix different O/S platforms strikes me as insanity. Point b) no one else is doing this—support will be painful. One of the beauties of open source software is the community support—forums, email list, and blogs. If you are using a system configuration the virtually no one else is using, it will be much harder to get support.
The Answer
No. A million times no.
Another thing not to try and do with Hadoop is attempt to use Sqoop2 within Oozie and expect to be able to load directly into Hive tables…even though the documentation says that it’s supposed to integrate with Hive on the server side, it does not have full integration yet. We’ve been finding that out the hard way.