What Not To Do With Hadoop

I gave a presentation on Hadoop to the Toronto SQL Server User’s Group this week—it was a really good time, and I got a ton of questions covering a wide range of areas. One thing to note—distrubution of data in the cluster is managed by the name node, which also does the job task management. I did get one question—that I think was more theoretical than anything, but I wanted to put it here for posterity.

“Can we mix Windows and Linux nodes in Hadoop cluster?”

One of the things I love about Hadoop is ease of deployment—the clustering is all network (no shared storage) and clustering is inherit in the software, there is nothing to configure. So you won’t see tips from me on how to add disk to your Hadoop cluster—you just tell the cluster where your new node is and what it’s IP address is and your set.

Back To the Question

I haven’t worked with Hortonworks’ Windows distribution of Hadoop much—so I don’t even know if it’s theoretically possible to mix nodes. But just don’t do it—a) I would never mix even versions (e.g. Windows 2003/2008—and yes I know you can’t) in any cluster computing scenario. So to mix different O/S platforms strikes me as insanity. Point b) no one else is doing this—support will be painful. One of the beauties of open source software is the community support—forums, email list, and blogs. If you are using a system configuration the virtually no one else is using, it will be much harder to get support.

The Answer

No. A million times no.



%d bloggers like this: