Columnstore Insert and Update Performance

Spread the love

I had been holding off on writing this blog post with SQL 2014 CTP1, since updateable columnstores were definite a work in progress. I did get several questions about the updatable columnstores during my presentation at the recent SQL PASS summit. So with a fresh VM with SQL 2014 CTP2 installed on my laptop I dove right in.

In order to really test any sort of compression technology, you need to work with a large volume of data. For the purposes of my presentations, I’ve been using Adam Machanic’s (b|t) Make Big AdventureWorks script, which you can download here. Specifically, the table I’m working with is the BigTransaction table, which is about 1.1 GB uncompressed and consists of 31,263,601 rows.

Read Performance

Just in case you aren’t up to speed on columnstore indexes they do offer much better read performance over traditional compression—this is outside the scope of the post, but running a test query to pull average transaction value for a given date, this graph shows the comparison between page compression and columnstore. I know you can’t see the elapsed time for the columnstore query in the chart—it’s 227 milliseconds (ms), as opposed to 5627 ms for the page compressed version of the query.

 

 

First things First

One of the caveats of using a clustered columnstore index, is that it must include all of the columns in the table, so the syntax is very simple.

CREATE TABLE [dbo].[ cluster_Columnstore_load_test]([TransactionID] [int] NOT NULL,

    [ProductID] [int] NOT NULL,

    [TransactionDate] [datetime] NULL,

    [Quantity] [int] NULL,

    [ActualCost] [money] NULL)

create clustered columnstore index CCI_Test on cluster_Columnstore_load_test

Loading the Data

I decided to do a pretty basic test—I took the data from my uncompressed BigTransaction and did an insert into as select. I also, decided to fire up an Extended Events session that captured the following events (column_store_index_build_process_segement, columnstore_tuple_move_begin_compess, columnstore_tuple_move_end_compess), so I could see the process as it happened.

Insert into cluster_Columnstore_load_test select * from cluster_Columnstore_load_test

Here were the results.

Table ‘cluster_Columnstore_load_test’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘bigTransaction_UC’. Scan count 1, logical reads 131820, physical reads 473, read-ahead reads 131816, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 63500 ms, elapsed time = 69404 ms.

(31263601 row(s) affected)

I was really surprised at how quickly that the data loaded (69 seconds). This morning I was also doing benchmark testing using another vendor’s columnar database, and the same load took nearly five minutes. In the xEvents session, I saw the build_process_segment event firing—each row group contains a minimum of 102,400 rows, and I observed highly variable numbers (as low as 120k, as high as 524k) in each rowgroup, based on compression level I suspect.

Here was my CPU during the process—high but not greater than 80%

 

Updating the Data

So my next test was to perform a large update against the data set.

    update cluster_Columnstore_load_test set actualcost=’$50.00′

    where transactionDate > ’01-Jan-2009′ and transactionDate < ’01-Jan-2012′

The results were as follows:

Table ‘cluster_Columnstore_load_test’. Scan count 3, logical reads 156967, physical reads 3636, read-ahead reads 332077, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 176609 ms, elapsed time = 213016 ms.

(8296817 row(s) affected)

From a process (well, xEvent) perspective, nothing happens when that update is performed. The data is moved into a special type of b-tree called a Delta Store, where approximately 1000 rows are stored. This gives the performance benefit of updating the columnstore index asynchronously—or in my case about 5 minutes later, when I see the all three of my events firing—the begin compress, followed by the build_process_segment, followed by the end compress. This was also accompanied by some spiky CPU which I would expect.

 

So How Does Page Compression Compare

So I ran the same series of test using a page compressed version of the table, with a clustered index including transactionid, processed, and transactiondate. Here’s what happened on the initial insert:

insert into page_test_load select * from bigtransaction_uc;

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 2 ms.

Table ‘page_test_load’. Scan count 0, logical reads 101434452, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘bigTransaction_UC’. Scan count 1, logical reads 143648, physical reads 1, read-ahead reads 143658, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(31263601 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 220750 ms, elapsed time = 235712 ms.

This is nearly 4x slower than the same insert into the clustered columnstore. The reason for this is that for the purposes of bulk insert the load process has been extremely optimized. From the Microsoft SIGMOD white paper on the subject:

Given the significant performance advantage of the bulk insert API over traditional insert statements, the query execution engine will internally use the bulk insert API for “insert into … select from …” statements that are targeting a column store. This operation is fairly frequent in ETL pipelines and this change allows it to create directly the highly efficient compressed format.

The page compression algorithm isn’t actually applied here, since I’m not doing an insert into..with (tablock), so these rows aren’t actually being page compressed as they are being loaded, but even with row compression being applied the performance is lower than with the columnstore.

I did do a test with the import/export wizard (which uses the bulk insert API) and my performance was far worse—46444 ms (or 7 min 44 sec)

So let’s take a look at the update on the page compressed table.

update page_test_load set actualcost=’$50.00′

where transactionDate > ’01-Jan-2009′ and transactionDate < ’01-Jan-2012′

SQL Server parse and compile time:

CPU time = 579 ms, elapsed time = 809 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Table ‘page_test_load’. Scan count 1, logical reads 247587, physical reads 1, read-ahead reads 42147, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(8296817 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 53781 ms, elapsed time = 57899 ms

Interestingly, the update was much faster (just under 4x)—that’s mainly because columnstore processes an update as both an insert followed by a delete.

The performance can be shown in the chart below

What’s interesting about this data as that the proportion of CPU time to total time is roughly the equivalent percentage in each case, and the update of the columnstore (8M rows) is roughly the same as inserting 31 M rows into the page compressed table, and vice versa.

So what does this tell us? Even though we can update columnstore indexes now, they are probably still best used for insert only workloads. Anything that requires frequent, quick updates isn’t a great candidate for a columnstore index.


 

4 thoughts on “Columnstore Insert and Update Performance

  1. Pingback: Resources from Columnstore Index Presentation | The SQL Herald

  2. nishant

    Thanks for your wonderful article. I would also like to share few of my observations. I did POC to insert\update (500 rows) to live from stage and found that performance has degraded by around 2% with cluster column store index in sql 2016.
    I am really disappointed to see claimed performance my MS is still require more time as it is missing on ground.

    Reply
  3. jdanton1

    Nishant,

    You are testing the wrong types of loading. The insert process for values less than 102,400 rows is exactly the same as loading a regular b-tree table. That code has not changed in many versions of SQL Server. I recreated your test across 2014 and 2016 and saw no significant difference in performance. In fact, those loads are small enough that you are always going to have some degree of variability, especially in a test environment. Try bumping your load volume to 102,500 and you should see some increase in performance.

    Reply
  4. Pingback: What is the best way to store optional values in table?

Leave a Reply to nishantCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.