The Curious Case of the HTDELETE Wait Type

I was working with a client this week and we encountered very long wait types on “insert as select” queries that were part of their data delivery process. This wait type isn’t documented very well, SQL Skills has it documented here  and mentions this:

“Typically these waits occur when queries involve columnstore indexes, but they can also occur without columnstore indexes being involved if a hash operator runs in batch mode.”

Nacho from Microsoft also has a blog post on it here. My theory was that a bad hash join was taking place and causing the wait.

Isolating the Plan

The thing that was very curious about the situation is that waits were only occurring on the readable secondary replica. At first, I tried to examine the query store to try to understand if there were multiple execution plans for a given query. The one problem with that is the readable secondary copy of the data is read-only, which means on that secondary replica you only see the query store data from the primary replica. If there was a plan that was specific to the secondary, I’d have gather them from the plan cache on the secondary. (Thanks to Erin Stellato (b|t) for this idea). There was one other problem—the code in question was executing as dynamic SQL from a stored procedure when meant it was always getting a new execution plan.

Ultimately I think any thought of the readable secondary having a vastly different plan was a red herrings. Statistics are going to be the same on both instances, and if there were a missing statistic on the secondary, SQL Server would create it in TempDB. Anyway, columnstore indexes don’t use statistics in the traditional sense.

Fortunately I was able to catch a query in the process of waiting on HTDELETE, so I no longer had to look for the needle in the haystack, and I could get to tuning the plans. I was able to grab the SELECT part of the query and generate an estimated plan on both the primary and secondary nodes. The plans were virtually the same on both nodes, with just a minor difference in memory grant between them.

 

image

The query is here.

SELECT    Object4.Column1
FROM    Database1.Schema1.Object5 Object4
    INNER JOIN Database2.Schema1.Object6(?, ?) Object7 ON Object4.Column2 = Object7.Column3
    INNER JOIN Database2.Schema1.Object8(?) Object9 ON Object4.Column4 = Object9.Column4 
    INNER JOIN Database1.Schema1.Object10 Object11 ON Object4.Column5 = Object11.Column6
    INNER JOIN Database2.Schema1.Object12(?) Object13 ON Object11.Column7 = Object13.Column7
WHERE    1 = 1
    AND Object4.Column8 >=  ‘01-Jan-2017’

The pattern here was that we were taking all of rows of an ID field in a columnstore index with about 350MM rows and joining them to a function that has 3500 rows. My gut instinct was this was a bad match for batch mode hashing. Additionally, SQL Server was recommending I create a b-tree index on the large columnstore table. there was a key lookup in the plan that I wanted to eliminate, but my hunch was that this join was causing the waits.

image

 

The Solution

So before I created the index, the query was taking at least 2-4 minutes, when it wasn’t getting hung on the HTDELETE wait. After I created the first index, we got done to about 15 seconds. SQL Server then recommended that I create another index on one of the join tables, which brought my query time down to sub-second. The plan looked a lot more traditional and had lots of my favorite operator INDEX SEEK.

image

 

The Moral of the Story

Sometimes you need non-clustered indexes on columnstore indexes. It stinks, because they do add space, but its hard to argue with a performance gain like this. I need to email some friends on the product team to ask, but I’m therorizing that the join was super expensive and causing the query to hang. Anyway, the real answer is to never stop tuning and trust your instincts.

Thanks to Sentry One for making Plan Explorer Free. I used it for the screen shots and anonymization in this post.

Where Should I Use a Columnstore Index in SQL Server?

I’ve been blogging a lot about columnstore indexes in SQL Server 2014—for good reason, I think it is a great feature that adds a lot of value to SQL Server for analytic workloads. One of the other (and probably more talked about) features in SQL Server 2014 is In-Memory OLTP (formerly known as Hekaton)—the SQL team added a neat new report that’s accessible to users of Management Data Warehouse (MDW) to show which tables and procedures in a given database might be well suited for In-Memory. There is nothing like that yet for a columnstore index.

 

Figure 1 SQL 2014 Management Data Warehouse In-Memory Tables Assistant

 

Microsoft’s official guidance is to use columnstores on large fact tables and dimensions greater than 1,000,000 rows. Typically (and this applies to regular page compression as well, you are going to look to tables that don’t have a very high update percentage as columnstore candidates. Despite the ability to update the clustered columnstores, this feature remains largely focused on the append-only type workloads present in data warehousing system. The insert process has been designed around doing bulk inserts very efficiently, and not so much doing singleton updates and inserts.

So I wanted to put logic together to identify candidate tables. The below script checks for tables with > 1,000,000 rows and an update percentage below 20—it would be easy enough to modify, the output list is a the list of candidate tables for columnstore indexes. Additionally, it won’t return any tables that already have a columnstore index on them.

with widetables as

(select a.name,
count(b.column_id)
as ColumnCount, sum(c.row_count)
as Row_Count

from
sys.objects a

inner
join
sys.columns b on b.object_id=a.object_id

inner
join
sys.dm_db_partition_stats c on c.object_id=a.object_id

inner
join
sys.partitions d on d.object_id=a.object_id

where d.data_compression not
in
(3,4)

group
by a.name

having
count(b.column_id)
> 5 and count(b.column_id)
< 1200 and sum(c.row_count)
>1000000)

,

 

–Check Update %

–Checks for Percentage of Update Statements

updaterate as

(

SELECT o.NAME AS [Table_Name]

    ,x.NAME AS [Index_Name]

    ,i.partition_number AS [Partition]

    ,i.index_id AS [Index_ID]

    ,x.type_desc AS [Index_Type]

    ,i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count) AS
[Percent_Update]

FROM
sys.dm_db_index_operational_stats(db_id(),
NULL,
NULL,
NULL) i

INNER
JOIN
sys.objects o ON o.object_id = i.object_id

INNER
JOIN
sys.indexes x ON x.object_id = i.object_id

    AND x.index_id = i.index_id

WHERE (i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
!= 0

    AND
objectproperty(i.object_id,
‘IsUserTable’)
= 1 and i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
> 20

)

 

select name as CandidateTables from widetables

except

select table_name from updaterate

 

I hope this script is useful, and can provide some nice guidance as to where to use a columnstore index.

 

Columnstore Indexes and Transparent Data Encryption in SQL Server 2014

I spoke on columnstore indexes in SQL 2014 last weekend at SQL Saturday #262 in Boston—I had an audience question come up about using Transparent Data Encryption (TDE) in conjunction with columnstore indexes. Of the top of my head, I couldn’t think of any reason why it would be a problem—I was correct, but I also wanted to test it out and check the performance and compression levels. I used the standard BigTransactionHistory table (courtesy of Adam Machanic b|t) for all testing.

Testing

The first thing I did was create a new database with Transparent Data Encryption turned on.

USE
master;

GO

CREATE
DATABASE TDE_TEST;

GO

CREATE
MASTER
KEY
ENCRYPTION
BY
PASSWORD
=
‘Demo4Blog’;

GO

CREATE
CERTIFICATE DemoBlog WITH
SUBJECT
=
‘DemoBlogCert’

GO

USE TDE_Test

GO

CREATE
DATABASE
ENCRYPTION
KEY

WITH
ALGORITHM
=
AES_128

ENCRYPTION
BY
SERVER
CERTIFICATE DemoBlog

GO

ALTER
DATABASE TDE_Test

SET
ENCRYPTION
ON

GO

 

Next, I created a table and a clustered columnstore index in my new database.

select
*
into tde_test.dbo.TDE_CS_Test from bigtransaction_cs;

create
clustered
columnstore
index CCI_TDE_Test on tde_test.dbo.TDE_CS_Test;

 

BigTransaction_CS is my source table which lives in the unencrypted AdventureWorks2012 database—I’m just making a copy of it in my new encrypted TDE_Test database which was created above.

Size Matters

Using a query to find the sizes of objects in my databases I examined the size of each object.

TableName indexName RowCounts TotalPages TotalSpaceMB
Columnstore ClusteredColumnStoreIndex-20140219-151205

62527202

38570

301

Encrypted_Columnstore CCI_TDE_Test

62527202

43322

338

TDE_NoCompression NULL

31263601

143441

1120

Clustered_NoEncrypt pk_bigTransactionHistory

31263601

143693

1122

 

As you can see—the row counts are the same, but the page counts and space used is higher—this is one of the penalties of TDE—the data won’t compress as well, however the difference in compression is only about about 10%. Despite this the compression effects are rather dramatic, even with encryption. The below chart shows the compressed versus uncompressed sizes of the data.

Figure 1 Size of uncompressed and compressed tables with and without encryption

Performance

In my demos on this topic, I have a fairly standard query that I use. Each query was run 6 times, with the last 5 runs counting for the data.

SELECT transactiondate

    ,avg(quantity)

    ,avg(actualcost)

FROM bigtransaction_cs

WHERE TransactionDate <
‘2007-07-01’

    AND Quantity > 70

    AND Quantity < 92

GROUP
BY TransactionDate

ORDER
BY transactionDate;

 

Encryption typically does negatively impact query performance, due to the CPU overhead, and it is clearly illustrated below.

Figure 2 Query Performance with Encryption

There’s roughly a 36% performance degradation using TDE—based on the logical read counts from the queries, it is all related the overhead and not really affected by the smaller difference in compression.

Summary

Generally speaking, if you are using encryption there is some legal or regulatory reason why you are using it, and you don’t have any other options. Just be aware, that you can still get pretty good compression levels using it in conjunction with clustered columnstore indexes in SQL Server 2014, even though your performance may not be as amazing as it otherwise would be.

A Little Bit More on Columnstore (SQL 2014) Internals

I wrote a post on Monday with some details about columnstore index insert and update process. I mistakenly posted initially (I’ve since updated) that the tuple mover process kicked in when 1000 rows were loaded (thanks to Jimmy May from Microsoft b|t)—the correct answer is 220 or 1,048,576 rows. Remus confirms this here, but I wanted to test out the process for myself, and I found some interesting things.

Before I go too deep into my lab work, this is pretty important to the columnstore index performance story for a couple of reasons. First, when a columnstore index has open delta stores, it needs to parse those in order to answer the query. This is an expensive operation, as it prevents the query from being completely execute in batch mode and in general the insert process is slower, as Microsoft has written an optimized insert API for this process. Additionally, we would like our columnstore row groups to be as large as possible to optimize the row elimination process in answering the query. Let’s examine some of the impact.

Batch Loading

I’m not sure if I’ve found this documented anywhere, but the threshold to trigger the bulk insert API appears to be 102,400 rows—if we do an insert of 102,399 rows we see an open delta store. See below:

Note the message “Column 0 was omitted from VertiPaq during column index build.”—that’s indicative of the data getting processed directly into the columnstore. Also note the time—279 ms to load 102,400 rows. If we check the sys.column_store_row_groups for the table.

As we see—that load was added to the columnstore and compressed. Note—102,400 rows in our row group is not optimal for performance. We’d really like to have it be much larger for better segment elimination. In fact, if we do a rebuild on the index, we will see our rows get distributed more evenly.

 

Now let’s see what happens if we only load 102,399 rows.

 

 

So first of all, we see something huge—the execution time is 994 ms compared to 279 ms. That means it most likely got loaded to a b-tree delta store. Let’s confirm.

Sure enough—we see an open delta store. So I run my standard warehouse query against the table:

 

If I rebuild the index (so the delta store gets compressed) my logical reads drop, and my execution time drops slightly. Note—this is a small size dataset in terms of columnstore—in a much larger data set the effects would be much more dramatic.

 

 

One Last Thing

While I was playing with the tuple mover process yesterday I noticed something unique—a row group with the state of “invisible”

This description is documented in this connect item, and will be in books online in the near future. According to Remus:

“The Tuple Mover reads one closed delta store at a time and starts building the corresponding compressed segments. During this time scans continue to see and read the delta store. When the Tuple Mover has finished compressing the delta store, the newly created segments are made visible and the delta store is made invisible in a single, atomic operation. New scans will see and scan the compressed format. The Tuple Mover then waits for all scans still operating in the delta store to drain after which the delta store is removed.” 

I just happened the run the row groups query during the time the tuple mover was firing, and got to see this.

Summary

Ideally, your data warehouse loads are in perfect 1,024,800 row increments, so your row groups are as large as possible, and you avoid having open delta stores. Since the world is not ideal, you will probably want to monitor for open delta stores as they can greatly impact performance. Additionally, if your row groups get too “fragmented” (I use that word hesitantly—it’s not really fragmentation, it’s just a large number of row groups with less than optimal row counts) you may want to consider rebuilding the columnstore periodically—or even better the partition that it’s in.

Performance of SQL Server 2014 Columnstore Archival Mode

I blogged about this in not a whole lot of detail a couple of month ago, but as an add to my recent testing of columnstore index update and insert performance, I decided to take another shot at testing the next Columnstore Archival mode of compression in SQL 2014 CTP2. What this allows to do is take either an entire columnstore index (really bad idea) or selected relatively unused partitions of our columnstore index, and compress them into an even smaller archive than a typical columnstore index. Microsoft, per conversations I had at PASS Summit mentioned seeing compression gains of up to 26%.

The Test

Like I did in my insert and update testing, I used the Make Big AdventureWorks script, which you can download here courtesy of Adam Machanic’s (b|t). Specifically, my focus was in and around the BigTransaction table. I partitioned the table based on date. Before I compressed, the partitions were the following sizes:

Table

Partition

Rows

Data KB

BigTransaction_Reg_Columnstore

1

3774500

29144

BigTransaction_Reg_Columnstore

2

3152574

24200

BigTransaction_Reg_Columnstore

3

2909218

22200

BigTransaction_Reg_Columnstore

4

2812282

21720

BigTransaction_Reg_Columnstore

5

2660245

20664

BigTransaction_Reg_Columnstore

6

2627890

30008

BigTransaction_Reg_Columnstore

7

2461343

28064

BigTransaction_Reg_Columnstore

8

2568732

29352

BigTransaction_Reg_Columnstore

9

8296817

52512

 

After I rebuilt partitions 1-5 with archival Columnstore compression, I saw the following savings in compression:

Table

Partition

Rows

Data KB

Compression %

BigTransaction_Arch_Columnstore

1

3774500

42480

31.39%

BigTransaction_Arch_Columnstore

2

3152574

35552

31.93%

BigTransaction_Arch_Columnstore

3

2909218

32840

32.40%

BigTransaction_Arch_Columnstore

4

2812282

32072

32.28%

BigTransaction_Arch_Columnstore

5

2660245

30520

32.29%

BigTransaction_Arch_Columnstore

6

2627890

30008

N/A

BigTransaction_Arch_Columnstore

7

2461343

28064

N/A

BigTransaction_Arch_Columnstore

8

2568732

29352

N/A

BigTransaction_Arch_Columnstore

9

8296817

52512

N/A

Average

     

32.06%

 

So for the partitions I rebuilt, I see an average space savings of 32%, but at what cost select performance?

Select Performance

The range query I did—selecting average transaction cost and date, with a group by on the date, with a filter on to only pull in dates in the archived partitions. In terms of pure time to execute the query, the columnstore archival query ran about 18% longer, however it consumed 51% more CPU time. Compression is CPU costly, and the more we compress data, the more CPU cycles are required to uncompress. One other point of note, when I partitioned the table (with either form of compression) I see a lot more logical reads for the same query plans. Here the numbers where around 85k and 120k, but if I run the same query against an unpartitioned table with columnstore the count is much closer to 40k. Aaron Bertrand (b|t) pointed out that this might be related to a bug with statistics IO.

 

 

 

 

 

So there is some benefit to this feature—using it as a cold archive, you can make gains in storage, potentially up to 30+% with a relatively minor penalty for querying. It should be best utilized on data that you don’t plan on accessing on a regular basis.

%d bloggers like this: