I blogged about this in not a whole lot of detail a couple of month ago, but as an add to my recent testing of columnstore index update and insert performance, I decided to take another shot at testing the next Columnstore Archival mode of compression in SQL 2014 CTP2. What this allows to do is take either an entire columnstore index (really bad idea) or selected relatively unused partitions of our columnstore index, and compress them into an even smaller archive than a typical columnstore index. Microsoft, per conversations I had at PASS Summit mentioned seeing compression gains of up to 26%.
The Test
Like I did in my insert and update testing, I used the Make Big AdventureWorks script, which you can download here courtesy of Adam Machanic’s (b|t). Specifically, my focus was in and around the BigTransaction table. I partitioned the table based on date. Before I compressed, the partitions were the following sizes:
Table |
Partition |
Rows |
Data KB |
BigTransaction_Reg_Columnstore |
1 |
3774500 |
29144 |
BigTransaction_Reg_Columnstore |
2 |
3152574 |
24200 |
BigTransaction_Reg_Columnstore |
3 |
2909218 |
22200 |
BigTransaction_Reg_Columnstore |
4 |
2812282 |
21720 |
BigTransaction_Reg_Columnstore |
5 |
2660245 |
20664 |
BigTransaction_Reg_Columnstore |
6 |
2627890 |
30008 |
BigTransaction_Reg_Columnstore |
7 |
2461343 |
28064 |
BigTransaction_Reg_Columnstore |
8 |
2568732 |
29352 |
BigTransaction_Reg_Columnstore |
9 |
8296817 |
52512 |
After I rebuilt partitions 1-5 with archival Columnstore compression, I saw the following savings in compression:
Table |
Partition |
Rows |
Data KB |
Compression % |
BigTransaction_Arch_Columnstore |
1 |
3774500 |
42480 |
31.39% |
BigTransaction_Arch_Columnstore |
2 |
3152574 |
35552 |
31.93% |
BigTransaction_Arch_Columnstore |
3 |
2909218 |
32840 |
32.40% |
BigTransaction_Arch_Columnstore |
4 |
2812282 |
32072 |
32.28% |
BigTransaction_Arch_Columnstore |
5 |
2660245 |
30520 |
32.29% |
BigTransaction_Arch_Columnstore |
6 |
2627890 |
30008 |
N/A |
BigTransaction_Arch_Columnstore |
7 |
2461343 |
28064 |
N/A |
BigTransaction_Arch_Columnstore |
8 |
2568732 |
29352 |
N/A |
BigTransaction_Arch_Columnstore |
9 |
8296817 |
52512 |
N/A |
Average |
32.06% |
So for the partitions I rebuilt, I see an average space savings of 32%, but at what cost select performance?
Select Performance
The range query I did—selecting average transaction cost and date, with a group by on the date, with a filter on to only pull in dates in the archived partitions. In terms of pure time to execute the query, the columnstore archival query ran about 18% longer, however it consumed 51% more CPU time. Compression is CPU costly, and the more we compress data, the more CPU cycles are required to uncompress. One other point of note, when I partitioned the table (with either form of compression) I see a lot more logical reads for the same query plans. Here the numbers where around 85k and 120k, but if I run the same query against an unpartitioned table with columnstore the count is much closer to 40k. Aaron Bertrand (b|t) pointed out that this might be related to a bug with statistics IO.
So there is some benefit to this feature—using it as a cold archive, you can make gains in storage, potentially up to 30+% with a relatively minor penalty for querying. It should be best utilized on data that you don’t plan on accessing on a regular basis.
Pingback: Resources from Columnstore Index Presentation | The SQL Herald