Where Should I Use a Columnstore Index in SQL Server?

I’ve been blogging a lot about columnstore indexes in SQL Server 2014—for good reason, I think it is a great feature that adds a lot of value to SQL Server for analytic workloads. One of the other (and probably more talked about) features in SQL Server 2014 is In-Memory OLTP (formerly known as Hekaton)—the SQL team added a neat new report that’s accessible to users of Management Data Warehouse (MDW) to show which tables and procedures in a given database might be well suited for In-Memory. There is nothing like that yet for a columnstore index.

 

Figure 1 SQL 2014 Management Data Warehouse In-Memory Tables Assistant

 

Microsoft’s official guidance is to use columnstores on large fact tables and dimensions greater than 1,000,000 rows. Typically (and this applies to regular page compression as well, you are going to look to tables that don’t have a very high update percentage as columnstore candidates. Despite the ability to update the clustered columnstores, this feature remains largely focused on the append-only type workloads present in data warehousing system. The insert process has been designed around doing bulk inserts very efficiently, and not so much doing singleton updates and inserts.

So I wanted to put logic together to identify candidate tables. The below script checks for tables with > 1,000,000 rows and an update percentage below 20—it would be easy enough to modify, the output list is a the list of candidate tables for columnstore indexes. Additionally, it won’t return any tables that already have a columnstore index on them.

with widetables as

(select a.name,
count(b.column_id)
as ColumnCount, sum(c.row_count)
as Row_Count

from
sys.objects a

inner
join
sys.columns b on b.object_id=a.object_id

inner
join
sys.dm_db_partition_stats c on c.object_id=a.object_id

inner
join
sys.partitions d on d.object_id=a.object_id

where d.data_compression not
in
(3,4)

group
by a.name

having
count(b.column_id)
> 5 and count(b.column_id)
< 1200 and sum(c.row_count)
>1000000)

,

 

–Check Update %

–Checks for Percentage of Update Statements

updaterate as

(

SELECT o.NAME AS [Table_Name]

    ,x.NAME AS [Index_Name]

    ,i.partition_number AS [Partition]

    ,i.index_id AS [Index_ID]

    ,x.type_desc AS [Index_Type]

    ,i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count) AS
[Percent_Update]

FROM
sys.dm_db_index_operational_stats(db_id(),
NULL,
NULL,
NULL) i

INNER
JOIN
sys.objects o ON o.object_id = i.object_id

INNER
JOIN
sys.indexes x ON x.object_id = i.object_id

    AND x.index_id = i.index_id

WHERE (i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
!= 0

    AND
objectproperty(i.object_id,
‘IsUserTable’)
= 1 and i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
> 20

)

 

select name as CandidateTables from widetables

except

select table_name from updaterate

 

I hope this script is useful, and can provide some nice guidance as to where to use a columnstore index.

 

Columnstore Indexes and Transparent Data Encryption in SQL Server 2014

I spoke on columnstore indexes in SQL 2014 last weekend at SQL Saturday #262 in Boston—I had an audience question come up about using Transparent Data Encryption (TDE) in conjunction with columnstore indexes. Of the top of my head, I couldn’t think of any reason why it would be a problem—I was correct, but I also wanted to test it out and check the performance and compression levels. I used the standard BigTransactionHistory table (courtesy of Adam Machanic b|t) for all testing.

Testing

The first thing I did was create a new database with Transparent Data Encryption turned on.

USE
master;

GO

CREATE
DATABASE TDE_TEST;

GO

CREATE
MASTER
KEY
ENCRYPTION
BY
PASSWORD
=
‘Demo4Blog’;

GO

CREATE
CERTIFICATE DemoBlog WITH
SUBJECT
=
‘DemoBlogCert’

GO

USE TDE_Test

GO

CREATE
DATABASE
ENCRYPTION
KEY

WITH
ALGORITHM
=
AES_128

ENCRYPTION
BY
SERVER
CERTIFICATE DemoBlog

GO

ALTER
DATABASE TDE_Test

SET
ENCRYPTION
ON

GO

 

Next, I created a table and a clustered columnstore index in my new database.

select
*
into tde_test.dbo.TDE_CS_Test from bigtransaction_cs;

create
clustered
columnstore
index CCI_TDE_Test on tde_test.dbo.TDE_CS_Test;

 

BigTransaction_CS is my source table which lives in the unencrypted AdventureWorks2012 database—I’m just making a copy of it in my new encrypted TDE_Test database which was created above.

Size Matters

Using a query to find the sizes of objects in my databases I examined the size of each object.

TableName indexName RowCounts TotalPages TotalSpaceMB
Columnstore ClusteredColumnStoreIndex-20140219-151205

62527202

38570

301

Encrypted_Columnstore CCI_TDE_Test

62527202

43322

338

TDE_NoCompression NULL

31263601

143441

1120

Clustered_NoEncrypt pk_bigTransactionHistory

31263601

143693

1122

 

As you can see—the row counts are the same, but the page counts and space used is higher—this is one of the penalties of TDE—the data won’t compress as well, however the difference in compression is only about about 10%. Despite this the compression effects are rather dramatic, even with encryption. The below chart shows the compressed versus uncompressed sizes of the data.

Figure 1 Size of uncompressed and compressed tables with and without encryption

Performance

In my demos on this topic, I have a fairly standard query that I use. Each query was run 6 times, with the last 5 runs counting for the data.

SELECT transactiondate

    ,avg(quantity)

    ,avg(actualcost)

FROM bigtransaction_cs

WHERE TransactionDate <
’2007-07-01′

    AND Quantity > 70

    AND Quantity < 92

GROUP
BY TransactionDate

ORDER
BY transactionDate;

 

Encryption typically does negatively impact query performance, due to the CPU overhead, and it is clearly illustrated below.

Figure 2 Query Performance with Encryption

There’s roughly a 36% performance degradation using TDE—based on the logical read counts from the queries, it is all related the overhead and not really affected by the smaller difference in compression.

Summary

Generally speaking, if you are using encryption there is some legal or regulatory reason why you are using it, and you don’t have any other options. Just be aware, that you can still get pretty good compression levels using it in conjunction with clustered columnstore indexes in SQL Server 2014, even though your performance may not be as amazing as it otherwise would be.

Building Perfect SQL Servers, Every Time

Tomorrow I’ll be presenting to the DBA Fundamentals VC on Building Perfect SQL Servers, Every Time. This is a session I’ve been wanting to build for a while—as many of you know SQL Server does not install with best practices configured out of the box; in my role as a consultant I see a lot of installations where the installer simply clicked next. What I really wanted to call this presentation was “Don’t Just #$%^ing Click Next When You Install SQL Server”

Additionally, you can grab the script files here.

In tomorrow’s session I will cover the following:

  • General best practices for configuring SQL Server
  • What you should change after install
  • How to manage a larger environment
  • How you can automate this process, so you get the same server every time.

We will talk about how to automate physical builds, and a little bit about building your own private SQL Server cloud—what it takes, and what your next steps are.

 

Getting Started with Azure Hybrid Models—Point to Site Networking

I’ve done several presentations on hybrid disaster recovery (DR) using SQL Server and Windows Azure, and configuring the network has been a headache every time. Some SQL DR methods do not require a direct network connection (log shipping, mirroring, backups) and mat be secured using certificates for encyption, more advanced configurations require a VPN connection. Also, if domain services are required (either to connect back to an on-premises Windows Domain Controller (DC) (good) or to replicate your on-premises Domain Controller to a ReadOnly DC (best)) you will need a VPN connection.

If you would like a little background on networking in Azure, I highly recommend Mark Russinovich’s (b|t) session from TechEd last year on Windows Azure internals.

There are three types of direct connections to Windows Azure:

  1. Site to Site Network—an approved VPN device in your DMZ connects via VPN (encrypted, but not dedicated) to a network gateway Windows Azure. Best scenario for low volume, but regularly used scenarios
  2. Point to Site Network—the focus of this post, a certificate based VPN allowing individual machines to connect directly to the network gateway in Windows Azure, using a certificate and a VPN client. Each machine must have the certificate. This is designed for demos, and admin users who may need to connect directly to Azure virtual networks
  3. ExpressRoute—this is the true Enterprise class solution for connecting to Windows Azure. You can connect either at your ISP (participating ISPS on this list) or via a dedicated MPLS VPN from your WAN to Windows Azure. This guarantees a fixed amount of bandwidth and performance.

Figure 1 ExpressRoute Networking — http://www.windowsazure.com/en-us/services/expressroute/

Configuring Point to Site Networking

So if you are reading this, I am going to assume you have a Windows Azure account—if you don’t, get one—there’s lots of interesting things to play with. Connect to the Azure management portal and scroll down to networks.

Figure 2 Add new network in Windows Azure

From that menu select Virtual Network > Custom Create. You will be prompted to give your network a name and select an existing or create a new affinity group. I’m not going to write about affinity groups today, but they are an important part of Windows Azure—you can read more about them here.

Optionally, you can add a DNS server name and address. If you do not, Windows Azure will provide DNS services for you (in your VMs.) In my experience with this, if you want your VMs to connect to the internet, do not setup a DNS server (this mainly applies for point to site demo work—if you are doing this in prod, setup a DNS server.) There is an also an option to configure point to site networking—check this box and if you have a real VPN, you may also check site-to-site.

Figure 4 Configuring Point to Site VPN in Network

The next screen will just cover your IP address space, for limited demos you can just accept the default and click next. Finally click the “Add Gateway Subnet” option.

Figure 5 Add Gateway Subnet

You now have a new network, but with no VPN gateway to connect to. From the network dashboard click the Create Gateway button on the bottom of the screen. This will take about 20-25 minutes—don’t worry, there is other stuff to do.

Figure 6 Network without gateway

You have two other tasks—create an Azure VM, that is on this network (See below screenshot—you can choose network on the third screen of the VM creation process)

Figure 7 Creating a new Azure VM

While your VM is being created, you will need to create your certificates, to both upload to your Azure gateway and to install on the client machine. In my opinion, this isn’t clear in the Microsoft documentation, so I’ll draw a picture.

So you will create the root certificate on the main machine you want to connect to the VPN (in my case it is my on-premises Domain Controller) and then upload it to your Azure network.

Figure 8 Cert uploaded to network

You will also create a client certificate using this root certificate. You will install the client certificate on any machine you want connect to your VPN—do not recreate and upload a new root certificate for each machine that you want to connect. In order to create the certificates, you will need makecert.exe—you can get it as part of the Windows SDK. For the detailed instructions on creating the certificates, go to this MSDN page and find “Create a self-signed root certificate.” Execute those instructions all the way through “Install the client certificate.”

Now, you are ready to connect—on your gateway page (it’s probably finished creating by now) on the right side of the screen you will see “Download the 64-bit (You aren’t really using a 32 bit OS, are you?) Client VPN Package” click that link. It’s an exe that is not signed by Microsoft, so Windows will yell at you for installing it. That will install quickly—if you expand the network pane in Windows, you will see a new connection.

Figure 9 Connect to Azure VPN

Click connect, and then you will see…

Figure 10 Azure VPN Connection

If you run an IPConfig—you will see the connection to Azure.

Figure 11 IP Record from Azure VPN

One warning here—I’ve had this happen with a client, and it happened to me in the office. When I connect the VPN, I can’t get to the internet on any of my connections. Oddly, this has only happened on corporate networks, and hasn’t happened when I’ve been on the same connection through a virtual machine on my laptop. More to come later.

Update: This behavior is tied to Cisco firewall equipment. I will address in a follow up post.

Now (if you’ve turned Windows Firewall off) you can ping your VMs, and get started using SQL Server HA and DR functions with them.

Into the Blue—Building DR in Windows Azure

Tomorrow (at 1300 EDT/1800 GMT), I’ll be presenting to the SQL Pass High Availability and Disaster Recovery VC on one my favorite topics of late—building disaster recovery from your data center into Windows Azure Infrastructure as a Service. You can register for the session here. While I’m not screaming “cloud, cloud, cloud” from the rooftops, I do feel like DR is a really great point of entry to the cloud for many smaller organizations, who may lack the resource to have a second data center, or perhaps only need their applications to be highly available during certain parts of year. Additionally, the ability to backup SQL Server databases directly to Azure Blob Storage also can meet your offsite backup requirements with very little headaches.

The cloud isn’t all unicorns and rainbows however—there are some definite challenges in getting these solutions to work properly. I’ll discuss that and the following in this session:

  • Choosing the right DR solution
  • Networking and the cloud
  • Economics of cloud DR
  • Implementation concerns for AlwaysOn Availability Groups

I hope you can join me on Tuesday.

Stupid Interview Questions—Trying to Find the Right Employees

I was driving into the office this morning, and as I do whenever I’m in the car, I was alternating between NPR and Mike and Mike on ESPN Radio (professional sports are an interest of mine). Anyway, Mike and Mike were discussing a recent story about the Cleveland Brown’s asking their potential draft picks questions such as “Name all of the things you could do with a single brick in one minute?” I’m not sure about you, but I’m not sure how that translates into hiring the most effective person to snap a football and then block a raging defensive lineman, but hey I don’t work in football.

What does this have to do with football?

Bad IT Interviews

I do, however work in IT—and I’ve interviewed for a lot of roles, as well as interviewed a lot of people for roles. Fortunately, for my current role I was hired through people I had worked with in the past, and there was barely a formal interview process. Even my previous role for the internet provider many of you are likely reading this post on, the interview process was straightforward and consisted mostly of conversations about technology and style of design. I actually have to think back to many moons ago to one particularly bad interview, with a director who thought he was the be-all and end-all of IT management. Some of the questions where:

  • How many gas/petrol stations are there in the United States?
  • Why is a manhole cover round?
  • How many pieces of bubble gum are in this jar? (Ok, I made this one up, but you get the idea)

To this list I would like add the following questions, which I hope are destined to the waste bin of history:

  • What is your biggest weakness?
  • Where do you see yourself in five years?
  • How did you hear about this position?

None of the above questions really help me (as the hiring manager) determine if a person a qualified for a role as a data professional (or frankly any other job). They are filler, and additionally any halfway prepared candidate is going to have prepared answers for them.

Building the Better Interview

I’ve been trained on a large number of interview techniques, between business school and corporate America. There is no magic set of interview questions—however, my favorite way to interview a candidate is to get them talking about a technology they’ve worked on and are passionate about. This serves two purposes—it lets me see if they are really into technology, or if it’s just a job to them, and additionally I can pretty quickly gather their technical level with follow on questions. Conversations are much better than trivia e.g.—”What is the third line item when you right click on a database in SQL Server Management Studio?”

One other thing I’ll add is make sure you have qualified people to do an interview—if you are trying to hire a DBA, and you don’t have any on staff, consider bringing in a consultant to do interviews—it’s small investment, that could save you a lot of money down the road.

So what stupid interview questions have you heard? Answer in the comments..

AlwaysOn Availability Groups and Automation

Last week I encountered a unique application requirement—we have a database environment configured with AlwaysOn Availability Groups for high availability (HA) and disaster recovery (DR), and our application was going to be creating and dropping databases from the server on a regular basis. So, I had to develop some code to handle this process automatically. I had a couple of things going in my favor on this—it was our custom developed application that was doing this, so I knew the procedures I was writing would be called as part of the process, and additionally, our Availability Group was only two nodes at the moment, so my version 1 code could be relatively simplistic and work. Aaron Bertrand (b|t), posted on this stack exchange thread, his code is a good start. I’m not going to put all of my code in this post—it’s ready for our prime time, but I have a few more fixed I’d like to make before releasing the code into the wild.

Dropping a Database

First of all, I need to say it’s very important to secure these procedures—they can do bad things to your environment if run out of context—particularly this one. I denied execute on everyone except the user who would be calling the procedure. I didn’t want any accidents happening. Dropping a database from an availability group is slight different then doing it from a standalone server. The process is as follows:

  1. Remove the database from the availability group (on the primary)
  2. Drop the database from the primary
  3. Drop the database from the secondary node(s)

Since, we have to initiate this from the primary instance we need to find out two pieces of data—1) what availability group are we removing the database from and 2) is the instance we are on the primary instance. In my code—I didn’t actually have to drop the database from the primary server. That piece was being called from another proc. So, I just had to remove from the availability group, and remove on the secondary. There are a number of ways to connect the secondary database—this needs to happen in SQLCMD mode, which isn’t possible in a stored procedure. We could use a linked server, or we could enable xp_cmdshell and run a SQLCMD script, and then disable xp_cmdshell. This isn’t my favorite technique from a security perspective, but I was under a time crunch, and the procedure is locked down, but in the future I will probably rebuild this with linked servers (created within the procedure)

SET QUOTED_IDENTIFIER ON

GO

ALTER PROCEDURE [dbo].[RemoveDBfromAG] @dbname VARCHAR(50)

AS

BEGIN

EXEC sp_configure 'show advanced options'

,1

RECONFIGURE

WITH OVERRIDE

EXEC sp_configure 'xp_cmdshell'

,1

RECONFIGURE

WITH OVERRIDE

DECLARE @dbid INT

DECLARE @groupid VARCHAR(50)

DECLARE @groupname VARCHAR(50)

DECLARE @server VARCHAR(50)

DECLARE @primary VARCHAR(50)

DECLARE @secondary VARCHAR(50)

DECLARE @sql NVARCHAR(max)

DECLARE @dropsecondary VARCHAR(500)

SELECT @dbid = database_id

FROM sys.databases

WHERE NAME = @dbname

SELECT @groupid = group_id

FROM sys.availability_databases_cluster

WHERE database_name = @dbname

SELECT @groupname = NAME

FROM sys.availability_groups

WHERE group_id = @groupid

SELECT @server = @@SERVERNAME

SELECT @primary = primary_replica

FROM sys.dm_hadr_availability_group_states

WHERE group_id = @groupid

SELECT @secondary = node_name

FROM sys.dm_hadr_availability_replica_cluster_nodes

WHERE node_name != @primary

SELECT @sql = 'alter availability group ' + @groupname + ' remove database [' + @dbname + ']'

SELECT @dropsecondary = 'sqlcmd -S "' + @secondary + '" -E -Q "exec ReturnsTestInstanceDropSecondaryDB [' + @dbname + ']"'

IF NOT EXISTS (

SELECT primary_replica

FROM sys.dm_hadr_availability_group_states

WHERE primary_replica = @primary

)

BEGIN

RETURN

END

ELSE

BEGIN

EXECUTE sp_executesql @sql

WAITFOR DELAY '00:00:25'

EXEC xp_cmdshell @dropsecondary

EXEC sp_configure 'xp_cmdshell'

,0

RECONFIGURE

WITH OVERRIDE

END

END

GO

The one particularly unique thing you will notice in the code—is a “WAITFORDELAY”—what I observed is that after the secondary database is removed from the availability group, it goes into recovering for about 10 seconds—and we are unable to drop a database while it’s in recovery. By implementing that wait (the T-SQL equivalent of a sleep command) the database was able to be dropped.

Adding a New Database

Adding a new database has similar requirements—we have a couple of additional factors though. We have to verify that the instance we are on, is the primary for the availability group we are adding the database into. This is where I really need to fix my code—it assumes that there are only two nodes in our availability group cluster. I need to refactor the code to potentially loop through the other four secondaries (or 7 if we are talking about SQL Server 2014). Also, I’m using a linked server connection—this also assumes that the drive letter and path to the data file directory on the secondary are also the same. To summarize, the process is as follows:

  1. Accept the availability group and database names as input parameters
  2. Verify that the node you are working on is the primary for the availability group
  3. Backup the database and transaction log of the database to file share you’d like to add to the availability group
  4. Add the database to the availability group on the primary
  5. Restore the database and transaction log to the secondary with norecovery
  6. Alter the database to set the availability group

My code for this isn’t totally flushed out—it works in my environment, but I don’t think it’s ready for sharing. I’ll share later, I promise.

Pitfalls to Avoid

This isn’t that different than just building out an availability group, but many of the same caveats apply, you need to ensure agent jobs and logins affiliated with a database are on all nodes in the availability groups. Additionally, the procedures to add and remove databases from your availability group, need to run on all of the nodes. Also, if you are doing this programmatically, the connection should use the listener, so that your application is always connecting to the primary instance.

On Presenting—Making Bad Days Good

Last weekend I had the pleasure of speaking at a wonderful technology event, in the Caribbean. Ok, so it wasn’t the Caribbean, it was Cleveland, but a lot of my friends were there, I had a good time and the event was awesome. Thanks Adam (b|t) and crew for running a really fantastic SQL Saturday.

Anyway, on Friday my good friend Erin Stellato (b|t) asked if I could do an additional presentation, that I’ve done frequently in the past (it’s on High Availability and Disaster Recovery in SQL Server—one of my favorite topics). I accepted the offer, but the only issue is that I didn’t have a demo environment, and if there’s anything I’ve learned about SQL Server audiences in my years of speaking, is that they love demos. So I spent a decent amount of time on Friday and Saturday morning getting a virtualization environment built, and didn’t pay too much attention to my other presentation, on columnstore indexes.

The HA and DR went great, but I flustered a little in my columnstore session—I knew the materials, but I hadn’t touched the demos in a while, and I just really didn’t like the general flow of the session. Combining the way I felt, with the attendee feedback (attendees—always give feedback, speakers love it, even if it’s bad), I decided to refactor both my demos and some visualizations in my slides, in an effort to make things clearer. Fortunately, I get to present the session again this week (thanks Edmonton SQL Server UG), so I’ll get to test the new slides.

The moral of this story, is sometimes we have bad days—everyone does. However, getting overly upset about it, doesn’t do a whole lot of good—think about where you can improve and do it.

 

T-SQL Tuesday—Bad Bets

The year was 94, in my trunk is raw, and in the rear view mirror was the mother $%^in’ law. Got two choices y’all pull over the car or bounce on the devil, put the pedal to the floor” – Jay Z 99 Problems

Ok, so the situation I’m going to be talking about isn’t as extreme as Jay’s of being chased by the cops, with a large load of uncut cocaine in my trunk. The year was actually 2010, things had been going well for me in my job, I had just completed a major SQL consolidation project, built out DR and HA where there was none before, and was speaking on a regular basis at SQL Community events. A strange thing happened though—I got turned down to go to PASS Summit (in retrospect, I should have just paid out of pocket,) and I wasn’t happy about. All of the resources in the firm were being sucked into the forthcoming SAP project. So, back when I thought staying with a company for a long time was a very good thing (here’s a major career hint—focus on your best interest first, because the company will focus on its best interest ahead of yours), I decided to apply for the role of Infrastructure Lead for the SAP project. I’ve blogged about this in great detail here, so if you want to read the gory details you can, but I’ll summarize below.

The project was a nightmare—my manager was quite possibly mental, the CIO was an idiot who had no idea how to set a technical direction, and as an Infrastructure team we were terribly under-resourced. The only cool part, was due to the poor planning of the CIO and my manager, the servers ended up in Switzerland, so I got to spend about a month there, and work with some really good dedicated folks there. I was working 60-70 hours a week, gaining weight and was just generally unhappy with life. I had a couple of speaking engagements that quarter the biggest of which was SQL Rally in Orlando—I caught up with friends (thank you Karen, Kevin, and Jen) and started talking about career prospects. I realized how what I was doing was useless, and wasn’t going to move my career forward in any meaningful sense. So I started applying for jobs, and I ended up in a great position.

I’ve since left that role, and am consulting for a great company now, but almost every day I look back fondly at the all the experiences I got to have in that job (the day after I met my boss, he offered to pay my travel to PASS Summit 2011) and how much it improved my technical skills and gave me tons of confidence to face any system.

The moral of this story, is to think about your life ahead of your firms. The job market is great for data pros—if you are unhappy, leave. Seriously—I had 5 recruiters contact me today.

What Not To Do With Hadoop

I gave a presentation on Hadoop to the Toronto SQL Server User’s Group this week—it was a really good time, and I got a ton of questions covering a wide range of areas. One thing to note—distrubution of data in the cluster is managed by the name node, which also does the job task management. I did get one question—that I think was more theoretical than anything, but I wanted to put it here for posterity.

“Can we mix Windows and Linux nodes in Hadoop cluster?”

One of the things I love about Hadoop is ease of deployment—the clustering is all network (no shared storage) and clustering is inherit in the software, there is nothing to configure. So you won’t see tips from me on how to add disk to your Hadoop cluster—you just tell the cluster where your new node is and what it’s IP address is and your set.

Back To the Question

I haven’t worked with Hortonworks’ Windows distribution of Hadoop much—so I don’t even know if it’s theoretically possible to mix nodes. But just don’t do it—a) I would never mix even versions (e.g. Windows 2003/2008—and yes I know you can’t) in any cluster computing scenario. So to mix different O/S platforms strikes me as insanity. Point b) no one else is doing this—support will be painful. One of the beauties of open source software is the community support—forums, email list, and blogs. If you are using a system configuration the virtually no one else is using, it will be much harder to get support.

The Answer

No. A million times no.

 

 

Follow

Get every new post delivered to your Inbox.

Join 1,728 other followers

%d bloggers like this: