Performance of SQL Server 2014 Columnstore Archival Mode

I blogged about this in not a whole lot of detail a couple of month ago, but as an add to my recent testing of columnstore index update and insert performance, I decided to take another shot at testing the next Columnstore Archival mode of compression in SQL 2014 CTP2. What this allows to do is take either an entire columnstore index (really bad idea) or selected relatively unused partitions of our columnstore index, and compress them into an even smaller archive than a typical columnstore index. Microsoft, per conversations I had at PASS Summit mentioned seeing compression gains of up to 26%.

The Test

Like I did in my insert and update testing, I used the Make Big AdventureWorks script, which you can download here courtesy of Adam Machanic’s (b|t). Specifically, my focus was in and around the BigTransaction table. I partitioned the table based on date. Before I compressed, the partitions were the following sizes:

Table

Partition

Rows

Data KB

BigTransaction_Reg_Columnstore

1

3774500

29144

BigTransaction_Reg_Columnstore

2

3152574

24200

BigTransaction_Reg_Columnstore

3

2909218

22200

BigTransaction_Reg_Columnstore

4

2812282

21720

BigTransaction_Reg_Columnstore

5

2660245

20664

BigTransaction_Reg_Columnstore

6

2627890

30008

BigTransaction_Reg_Columnstore

7

2461343

28064

BigTransaction_Reg_Columnstore

8

2568732

29352

BigTransaction_Reg_Columnstore

9

8296817

52512

 

After I rebuilt partitions 1-5 with archival Columnstore compression, I saw the following savings in compression:

Table

Partition

Rows

Data KB

Compression %

BigTransaction_Arch_Columnstore

1

3774500

42480

31.39%

BigTransaction_Arch_Columnstore

2

3152574

35552

31.93%

BigTransaction_Arch_Columnstore

3

2909218

32840

32.40%

BigTransaction_Arch_Columnstore

4

2812282

32072

32.28%

BigTransaction_Arch_Columnstore

5

2660245

30520

32.29%

BigTransaction_Arch_Columnstore

6

2627890

30008

N/A

BigTransaction_Arch_Columnstore

7

2461343

28064

N/A

BigTransaction_Arch_Columnstore

8

2568732

29352

N/A

BigTransaction_Arch_Columnstore

9

8296817

52512

N/A

Average

     

32.06%

 

So for the partitions I rebuilt, I see an average space savings of 32%, but at what cost select performance?

Select Performance

The range query I did—selecting average transaction cost and date, with a group by on the date, with a filter on to only pull in dates in the archived partitions. In terms of pure time to execute the query, the columnstore archival query ran about 18% longer, however it consumed 51% more CPU time. Compression is CPU costly, and the more we compress data, the more CPU cycles are required to uncompress. One other point of note, when I partitioned the table (with either form of compression) I see a lot more logical reads for the same query plans. Here the numbers where around 85k and 120k, but if I run the same query against an unpartitioned table with columnstore the count is much closer to 40k. Aaron Bertrand (b|t) pointed out that this might be related to a bug with statistics IO.

 

 

 

 

 

So there is some benefit to this feature—using it as a cold archive, you can make gains in storage, potentially up to 30+% with a relatively minor penalty for querying. It should be best utilized on data that you don’t plan on accessing on a regular basis.

Columnstore Insert and Update Performance

I had been holding off on writing this blog post with SQL 2014 CTP1, since updateable columnstores were definite a work in progress. I did get several questions about the updatable columnstores during my presentation at the recent SQL PASS summit. So with a fresh VM with SQL 2014 CTP2 installed on my laptop I dove right in.

In order to really test any sort of compression technology, you need to work with a large volume of data. For the purposes of my presentations, I’ve been using Adam Machanic’s (b|t) Make Big AdventureWorks script, which you can download here. Specifically, the table I’m working with is the BigTransaction table, which is about 1.1 GB uncompressed and consists of 31,263,601 rows.

Read Performance

Just in case you aren’t up to speed on columnstore indexes they do offer much better read performance over traditional compression—this is outside the scope of the post, but running a test query to pull average transaction value for a given date, this graph shows the comparison between page compression and columnstore. I know you can’t see the elapsed time for the columnstore query in the chart—it’s 227 milliseconds (ms), as opposed to 5627 ms for the page compressed version of the query.

 

 

First things First

One of the caveats of using a clustered columnstore index, is that it must include all of the columns in the table, so the syntax is very simple.

CREATE TABLE [dbo].[ cluster_Columnstore_load_test]([TransactionID] [int] NOT NULL,

    [ProductID] [int] NOT NULL,

    [TransactionDate] [datetime] NULL,

    [Quantity] [int] NULL,

    [ActualCost] [money] NULL)

create clustered columnstore index CCI_Test on cluster_Columnstore_load_test

Loading the Data

I decided to do a pretty basic test—I took the data from my uncompressed BigTransaction and did an insert into as select. I also, decided to fire up an Extended Events session that captured the following events (column_store_index_build_process_segement, columnstore_tuple_move_begin_compess, columnstore_tuple_move_end_compess), so I could see the process as it happened.

Insert into cluster_Columnstore_load_test select * from cluster_Columnstore_load_test

Here were the results.

Table ‘cluster_Columnstore_load_test’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘bigTransaction_UC’. Scan count 1, logical reads 131820, physical reads 473, read-ahead reads 131816, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 63500 ms, elapsed time = 69404 ms.

(31263601 row(s) affected)

I was really surprised at how quickly that the data loaded (69 seconds). This morning I was also doing benchmark testing using another vendor’s columnar database, and the same load took nearly five minutes. In the xEvents session, I saw the build_process_segment event firing—each row group contains a minimum of 102,400 rows, and I observed highly variable numbers (as low as 120k, as high as 524k) in each rowgroup, based on compression level I suspect.

Here was my CPU during the process—high but not greater than 80%

 

Updating the Data

So my next test was to perform a large update against the data set.

    update cluster_Columnstore_load_test set actualcost=’$50.00′

    where transactionDate > ’01-Jan-2009′ and transactionDate < ’01-Jan-2012′

The results were as follows:

Table ‘cluster_Columnstore_load_test’. Scan count 3, logical reads 156967, physical reads 3636, read-ahead reads 332077, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 176609 ms, elapsed time = 213016 ms.

(8296817 row(s) affected)

From a process (well, xEvent) perspective, nothing happens when that update is performed. The data is moved into a special type of b-tree called a Delta Store, where approximately 1000 rows are stored. This gives the performance benefit of updating the columnstore index asynchronously—or in my case about 5 minutes later, when I see the all three of my events firing—the begin compress, followed by the build_process_segment, followed by the end compress. This was also accompanied by some spiky CPU which I would expect.

 

So How Does Page Compression Compare

So I ran the same series of test using a page compressed version of the table, with a clustered index including transactionid, processed, and transactiondate. Here’s what happened on the initial insert:

insert into page_test_load select * from bigtransaction_uc;

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 2 ms.

Table ‘page_test_load’. Scan count 0, logical reads 101434452, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘bigTransaction_UC’. Scan count 1, logical reads 143648, physical reads 1, read-ahead reads 143658, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(31263601 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 220750 ms, elapsed time = 235712 ms.

This is nearly 4x slower than the same insert into the clustered columnstore. The reason for this is that for the purposes of bulk insert the load process has been extremely optimized. From the Microsoft SIGMOD white paper on the subject:

Given the significant performance advantage of the bulk insert API over traditional insert statements, the query execution engine will internally use the bulk insert API for “insert into … select from …” statements that are targeting a column store. This operation is fairly frequent in ETL pipelines and this change allows it to create directly the highly efficient compressed format.

The page compression algorithm isn’t actually applied here, since I’m not doing an insert into..with (tablock), so these rows aren’t actually being page compressed as they are being loaded, but even with row compression being applied the performance is lower than with the columnstore.

I did do a test with the import/export wizard (which uses the bulk insert API) and my performance was far worse—46444 ms (or 7 min 44 sec)

So let’s take a look at the update on the page compressed table.

update page_test_load set actualcost=’$50.00′

where transactionDate > ’01-Jan-2009′ and transactionDate < ’01-Jan-2012′

SQL Server parse and compile time:

CPU time = 579 ms, elapsed time = 809 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Table ‘page_test_load’. Scan count 1, logical reads 247587, physical reads 1, read-ahead reads 42147, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(8296817 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 53781 ms, elapsed time = 57899 ms

Interestingly, the update was much faster (just under 4x)—that’s mainly because columnstore processes an update as both an insert followed by a delete.

The performance can be shown in the chart below

What’s interesting about this data as that the proportion of CPU time to total time is roughly the equivalent percentage in each case, and the update of the columnstore (8M rows) is roughly the same as inserting 31 M rows into the page compressed table, and vice versa.

So what does this tell us? Even though we can update columnstore indexes now, they are probably still best used for insert only workloads. Anything that requires frequent, quick updates isn’t a great candidate for a columnstore index.


 

%d bloggers like this: