Where Should I Use a Columnstore Index in SQL Server?

Spread the love

I’ve been blogging a lot about columnstore indexes in SQL Server 2014—for good reason, I think it is a great feature that adds a lot of value to SQL Server for analytic workloads. One of the other (and probably more talked about) features in SQL Server 2014 is In-Memory OLTP (formerly known as Hekaton)—the SQL team added a neat new report that’s accessible to users of Management Data Warehouse (MDW) to show which tables and procedures in a given database might be well suited for In-Memory. There is nothing like that yet for a columnstore index.

 

Figure 1 SQL 2014 Management Data Warehouse In-Memory Tables Assistant

 

Microsoft’s official guidance is to use columnstores on large fact tables and dimensions greater than 1,000,000 rows. Typically (and this applies to regular page compression as well, you are going to look to tables that don’t have a very high update percentage as columnstore candidates. Despite the ability to update the clustered columnstores, this feature remains largely focused on the append-only type workloads present in data warehousing system. The insert process has been designed around doing bulk inserts very efficiently, and not so much doing singleton updates and inserts.

So I wanted to put logic together to identify candidate tables. The below script checks for tables with > 1,000,000 rows and an update percentage below 20—it would be easy enough to modify, the output list is a the list of candidate tables for columnstore indexes. Additionally, it won’t return any tables that already have a columnstore index on them.

with widetables as

(select a.name,
count(b.column_id)
as ColumnCount, sum(c.row_count)
as Row_Count

from
sys.objects a

inner
join
sys.columns b on b.object_id=a.object_id

inner
join
sys.dm_db_partition_stats c on c.object_id=a.object_id

inner
join
sys.partitions d on d.object_id=a.object_id

where d.data_compression not
in
(3,4)

group
by a.name

having
count(b.column_id)
> 5 and count(b.column_id)
< 1200 and sum(c.row_count)
>1000000)

,

 

–Check Update %

–Checks for Percentage of Update Statements

updaterate as

(

SELECT o.NAME AS [Table_Name]

    ,x.NAME AS [Index_Name]

    ,i.partition_number AS [Partition]

    ,i.index_id AS [Index_ID]

    ,x.type_desc AS [Index_Type]

    ,i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count) AS
[Percent_Update]

FROM
sys.dm_db_index_operational_stats(db_id(),
NULL,
NULL,
NULL) i

INNER
JOIN
sys.objects o ON o.object_id = i.object_id

INNER
JOIN
sys.indexes x ON x.object_id = i.object_id

    AND x.index_id = i.index_id

WHERE (i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
!= 0

    AND
objectproperty(i.object_id,
‘IsUserTable’)
= 1 and i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
> 20

)

 

select name as CandidateTables from widetables

except

select table_name from updaterate

 

I hope this script is useful, and can provide some nice guidance as to where to use a columnstore index.

 

3 thoughts on “Where Should I Use a Columnstore Index in SQL Server?

  1. tur lombok

    I will right away take hold of your rss feed as I can’t find your email subscription link or newsletter service.

    Do you have any? Kindly let me realize so that I may just subscribe.
    Thanks.

    Reply
  2. Pingback: 5 Things I Learned at SQLSaturday – nate_the_dba

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.