ColumnStore Archival Compression–SQL Server 2014

Spread the love

Since I’m fortunate enough to be presenting on Data Compression at the PASS Summit this year, so in addition to doing a lot of work with both data compression and ColumnStore indexes which were a new feature starting in SQL Server 2012 Enterprise Edition. In 2012, while these column oriented indexes gave us a great deal of both compression and performance (via both deep compression and a new query processing mode called batch) they had a great deal of limitations—the biggest one being that they were non-updateable, and the second biggest one being that they could not be clustered indexes, so we had to store duplicated copies of our data.

Before I go any further, for a great deal more detail I have to recommend my colleague Niko Neugebauer’s in-depth blog series on clustered ColumnStore indexes—Niko is going through the ins and outs of this feature that is new for SQL Server 2014, I am just briefly touching in this post. Additionally, there is a really great white paper from Microsoft research that also goes pretty in-depth on the enhancements to this feature for SQL 2014 here. One thing to note—some of the features the white paper promises (such as removal of some of the data type restrictions) are not implemented in CTP1 of SQL 2014 as far as I can tell.

However, the interesting feature I wanted to touch on the new columnstore_archive compression mode option. From the above white paper:

For example, the data may be partitioned by date and the most recent data is accessed much more frequently than older data. In such cases the older data can benefit from additional compression at the cost of slower query performance. To enable this scenario we added support for archival compression of SQL Server column stores. … The further reduction obtained by archival compression is substantial, ranging from 37% to 66% depending on the data. We also compared with GZIP compression of the raw data. Archival compression consistently achieved a better compression ratio, sometimes considerably better.

–source “Enhancements to SQL Server Column Stores, Larson et al. June 2013”

So this feature looks like it could really attractive, particularly for scenarios where we have archival data that we are really only keeping around for regulatory reasons. Additionally, this has been configured in such a way that we can partition a table with a clustered columnstore index, and then change the compression mode for individual partitions, limiting our performance impact to the older records that we are querying less. There’s not a lot of documentation on this that I’ve seen, so I will show how to do I here.

The first thing I did was load up a copy of the AdventureWorksDW2012 and made a copy of dbo.FactResellerSales—note, clustered columnstore indexes do not currently support foreign key relationships (which was why I created a copy of the table—to get rid of the FKs). I then partitioned the table on date range, and then proceeded to create a clustered ColumnStore index on it.

 

By default all of the columns in the table get added to the index. The syntax for creation is below—note we don’t any compression options in the index build process.

 

 

However, after we build the ColumnStore, if we right click on the table it belongs to in SSMS and select storage, we get the following menu:

I’ve chosen to choose Archival Columnstore compression for the last three partitions in that table. In T-SQL that looks like this:

ALTER
TABLE [dbo].[fact_Cs_test] REBUILD
PARTITION
= 5 WITH(DATA_COMPRESSION = COLUMNSTORE_ARCHIVE )

USE [AdventureWorksDW2012]

ALTER
TABLE [dbo].[fact_Cs_test] REBUILD
PARTITION
= 6 WITH(DATA_COMPRESSION = COLUMNSTORE_ARCHIVE )

USE [AdventureWorksDW2012]

ALTER
TABLE [dbo].[fact_Cs_test] REBUILD
PARTITION
= 7 WITH(DATA_COMPRESSION = COLUMNSTORE_ARCHIVE )

USE [AdventureWorksDW2012]

ALTER
TABLE [dbo].[fact_Cs_test] REBUILD
PARTITION
= 8 WITH(DATA_COMPRESSION = COLUMNSTORE_ARCHIVE )

 

So let’s see what kind of space savings we get from this process. First we’ll look at the space used by the basic ColumnStore compressed partitions:

I used the below query to show the space used per partition:

select b.name, a.partition_number, a.rows,

((c.in_row_data_page_count+c.lob_used_page_count+c.row_overflow_used_page_count)*8)
as
‘Data KB’


from
sys.partitions a, sys.objects b, sys.dm_db_partition_stats c

where a.object_id=b.object_id and b.name=‘Fact_CS_Test’ and a.partition_id=c.partition_id;

 

So here is the space, before I changed the partitions over to archival ColumnStore compression.

After I compressed partitions 5-8 using archive compression the space utilization was as below:

 

So we averaged 10.5% more compression over existing ColumnStore compression, this is lower than Microsoft observed, but I also didn’t really evaluate my data to see how well it would compress. Archival compression has the potential to be a really useful feature, particularly for firms who’s regulatory requirements mean they need to keep data online for extended periods of time.

4 thoughts on “ColumnStore Archival Compression–SQL Server 2014

  1. Pingback: SQL Server 2014: Columnstore Index improvements | James Serra's Blog

  2. Pingback: Compression Presentation Resources | The SQL Herald

  3. Pingback: Performance of SQL Server 2014 Columnstore Archival Mode | The SQL Herald

  4. Pingback: Data Compression | Simple SQL Server

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.