Columnstore Indexes and Transparent Data Encryption in SQL Server 2014

I spoke on columnstore indexes in SQL 2014 last weekend at SQL Saturday #262 in Boston—I had an audience question come up about using Transparent Data Encryption (TDE) in conjunction with columnstore indexes. Of the top of my head, I couldn’t think of any reason why it would be a problem—I was correct, but I also wanted to test it out and check the performance and compression levels. I used the standard BigTransactionHistory table (courtesy of Adam Machanic b|t) for all testing.

Testing

The first thing I did was create a new database with Transparent Data Encryption turned on.

USE
master;

GO

CREATE
DATABASE TDE_TEST;

GO

CREATE
MASTER
KEY
ENCRYPTION
BY
PASSWORD
=
‘Demo4Blog’;

GO

CREATE
CERTIFICATE DemoBlog WITH
SUBJECT
=
‘DemoBlogCert’

GO

USE TDE_Test

GO

CREATE
DATABASE
ENCRYPTION
KEY

WITH
ALGORITHM
=
AES_128

ENCRYPTION
BY
SERVER
CERTIFICATE DemoBlog

GO

ALTER
DATABASE TDE_Test

SET
ENCRYPTION
ON

GO

 

Next, I created a table and a clustered columnstore index in my new database.

select
*
into tde_test.dbo.TDE_CS_Test from bigtransaction_cs;

create
clustered
columnstore
index CCI_TDE_Test on tde_test.dbo.TDE_CS_Test;

 

BigTransaction_CS is my source table which lives in the unencrypted AdventureWorks2012 database—I’m just making a copy of it in my new encrypted TDE_Test database which was created above.

Size Matters

Using a query to find the sizes of objects in my databases I examined the size of each object.

TableName indexName RowCounts TotalPages TotalSpaceMB
Columnstore ClusteredColumnStoreIndex-20140219-151205

62527202

38570

301

Encrypted_Columnstore CCI_TDE_Test

62527202

43322

338

TDE_NoCompression NULL

31263601

143441

1120

Clustered_NoEncrypt pk_bigTransactionHistory

31263601

143693

1122

 

As you can see—the row counts are the same, but the page counts and space used is higher—this is one of the penalties of TDE—the data won’t compress as well, however the difference in compression is only about about 10%. Despite this the compression effects are rather dramatic, even with encryption. The below chart shows the compressed versus uncompressed sizes of the data.

Figure 1 Size of uncompressed and compressed tables with and without encryption

Performance

In my demos on this topic, I have a fairly standard query that I use. Each query was run 6 times, with the last 5 runs counting for the data.

SELECT transactiondate

    ,avg(quantity)

    ,avg(actualcost)

FROM bigtransaction_cs

WHERE TransactionDate <
‘2007-07-01’

    AND Quantity > 70

    AND Quantity < 92

GROUP
BY TransactionDate

ORDER
BY transactionDate;

 

Encryption typically does negatively impact query performance, due to the CPU overhead, and it is clearly illustrated below.

Figure 2 Query Performance with Encryption

There’s roughly a 36% performance degradation using TDE—based on the logical read counts from the queries, it is all related the overhead and not really affected by the smaller difference in compression.

Summary

Generally speaking, if you are using encryption there is some legal or regulatory reason why you are using it, and you don’t have any other options. Just be aware, that you can still get pretty good compression levels using it in conjunction with clustered columnstore indexes in SQL Server 2014, even though your performance may not be as amazing as it otherwise would be.

Performance of SQL Server 2014 Columnstore Archival Mode

I blogged about this in not a whole lot of detail a couple of month ago, but as an add to my recent testing of columnstore index update and insert performance, I decided to take another shot at testing the next Columnstore Archival mode of compression in SQL 2014 CTP2. What this allows to do is take either an entire columnstore index (really bad idea) or selected relatively unused partitions of our columnstore index, and compress them into an even smaller archive than a typical columnstore index. Microsoft, per conversations I had at PASS Summit mentioned seeing compression gains of up to 26%.

The Test

Like I did in my insert and update testing, I used the Make Big AdventureWorks script, which you can download here courtesy of Adam Machanic’s (b|t). Specifically, my focus was in and around the BigTransaction table. I partitioned the table based on date. Before I compressed, the partitions were the following sizes:

Table

Partition

Rows

Data KB

BigTransaction_Reg_Columnstore

1

3774500

29144

BigTransaction_Reg_Columnstore

2

3152574

24200

BigTransaction_Reg_Columnstore

3

2909218

22200

BigTransaction_Reg_Columnstore

4

2812282

21720

BigTransaction_Reg_Columnstore

5

2660245

20664

BigTransaction_Reg_Columnstore

6

2627890

30008

BigTransaction_Reg_Columnstore

7

2461343

28064

BigTransaction_Reg_Columnstore

8

2568732

29352

BigTransaction_Reg_Columnstore

9

8296817

52512

 

After I rebuilt partitions 1-5 with archival Columnstore compression, I saw the following savings in compression:

Table

Partition

Rows

Data KB

Compression %

BigTransaction_Arch_Columnstore

1

3774500

42480

31.39%

BigTransaction_Arch_Columnstore

2

3152574

35552

31.93%

BigTransaction_Arch_Columnstore

3

2909218

32840

32.40%

BigTransaction_Arch_Columnstore

4

2812282

32072

32.28%

BigTransaction_Arch_Columnstore

5

2660245

30520

32.29%

BigTransaction_Arch_Columnstore

6

2627890

30008

N/A

BigTransaction_Arch_Columnstore

7

2461343

28064

N/A

BigTransaction_Arch_Columnstore

8

2568732

29352

N/A

BigTransaction_Arch_Columnstore

9

8296817

52512

N/A

Average

     

32.06%

 

So for the partitions I rebuilt, I see an average space savings of 32%, but at what cost select performance?

Select Performance

The range query I did—selecting average transaction cost and date, with a group by on the date, with a filter on to only pull in dates in the archived partitions. In terms of pure time to execute the query, the columnstore archival query ran about 18% longer, however it consumed 51% more CPU time. Compression is CPU costly, and the more we compress data, the more CPU cycles are required to uncompress. One other point of note, when I partitioned the table (with either form of compression) I see a lot more logical reads for the same query plans. Here the numbers where around 85k and 120k, but if I run the same query against an unpartitioned table with columnstore the count is much closer to 40k. Aaron Bertrand (b|t) pointed out that this might be related to a bug with statistics IO.

 

 

 

 

 

So there is some benefit to this feature—using it as a cold archive, you can make gains in storage, potentially up to 30+% with a relatively minor penalty for querying. It should be best utilized on data that you don’t plan on accessing on a regular basis.

%d bloggers like this: