How Do I Show Memory Usage From A Columnstore Index in SQL Server?

Columnstore indexes in SQL Server are a great feature from an analytic and data warehouse perspective—you can get great compression ratios and batch mode processing allows for great performance. One of the ways these performance gains happen is by columnstore indexes bypassing the 8k pages in the SQL Server bufferpool (mostly—I’ll touch on that in a bit). Columnstore indexes get their own area of memory—which uses large pages (which are stored adjacently) to support the underlying lobs that the columnstore index is stored in. So I was curious to see how much memory my columnstore index was actually using—before I knew better, I looked in the buffer cache, which returned:

This shows my index has a 106 pages in the buffer cache—which is interesting because columnstore indexes aren’t supposed to use that right? So I take a look at sys.dm_os_buffer_descriptors:

where database_id=
6 and allocation_unit_id=72057594085113856;



So there are some pages in the buffer pool from the columnstore—I suspect these are pages from the columnstore dictionary, but I can’t say for certain. So how do we find our actual columnstore memory usage? After chasing around DMVs a bit, I centered on sys.dm_os_memory_cache_entries:



Unfortunately, none of the columns in that DMV tie back directly to a specific object or partition id. Except for entry_data—which presents data in this wonderful format (and not it’s not XML, that would have been easier):

<object_type=’1′ db_id=’12’ hobt_id =’72057594042646528′ column_id=’30’ object_id =’0′ object_size=’22760’/>

I knew if I could get that hobt_id, I could show the memory utilization of the columnstore. So with much thanks to at Jeremiah Peschka (b|t) who helped me work on some of the myriad of string functions and concatenation that I had to do to make this work, I present columnstorebuffers.sql. This procedure will return the name of the source object (the table here) and the amount of memory it’s using in the columnstore object pool:

Note—you’ll have to run this procedure on a database level, even though dm_os_memory_cache_entries is global, the other views that join to bring back the name of the object are not.

The Modern Analytics Architecture—My PASS BA Conference Session

In a couple of weeks, I will be presenting the PASS Business Analytics Conference in San Jose, CA (and this weekend at SQL Saturday Chicago.) This topic feels like it has been a couple of years in the making—I got into “big data” when I was at Comcast, where we had a large number of effectively append-only applications which were running on very expensive relational database engines on top of a lot of equally as expensive enterprise SAN space. Even as recently as two years ago, the solutions weren’t fully enterprise ready, unless you had a really specialized team of engineers. However, now Hadoop is much easier to build and manage, there are SQL and other tools that make connecting to the data very easy. I had been wanting to build this session since I started with it to illustrate from infrastructure, application and business perspective of why the toolsets are changing, and how you can use these new toolsets to gain richer analysis of your data.

This session isn’t just going to be about Hadoop—you will learn about how SQL Server (and other platforms) can be used as very powerful analytics platform across large volumes of data. You will learn the infrastructure that drives big data and analytical solutions, and how to decide on whether to deploy on-premises or to the cloud. You will learn about the skills needed to work on Hadoop and some of other toolsets, and I might even crack a few jokes about Oracle.


Office 365—Are You Done Yet?

Power BI in Office 365 has some new and exciting features like Power Map and Power Q&A, and it is also required to use the Power BI “modern” app (that’s metro to most of us.) In general, I’ve been really happy with the way things work with Office 365, but from time to time I run into odd behavior. This week was one of those weeks.

The upper limit of a file that you can load into Power BI is 250 MB—and if you file is above 30 MB you need to enable it, for Power BI. I was building a demo this week that has a good amount of data, and ends up being right around 200 MB. I was a little nervous about uploading a file that “big” to the cloud over a regular internet connection. As it happened, I was working from home yesterday, and kicked off the upload process right before I ran an errand over lunch. When I got back an hour or so later, I saw this:


Figure 1 Upload Dialog from Power BI

However, when I checked resource monitor for network activity, I saw none from Internet Explorer.

Figure 2 Network Activity

I opened a new window, and my file was actually in Power BI, so I left the upload dialog alone overnight. When I logged in this morning, the dialog box is still there. While this isn’t a big to deal to IT folks (we are used to things not working correctly), and can imagine it being disconcerting to a business user trying to upload a report.


Does Power View Have a Problem with Shrinkage?

I do a decent amount of work with Power View—I think it is a pretty nice tool for doing visualizations across disparate data sets and it is mostly easy enough to be accessible to power users. However there are a few things I feel the need to complain about. My biggest complaint is that the user doesn’t have the ability to change a chart title—the only way to avoid the system generated chart title is to hide it, and then add a text box. However, the other day, I accidentally found a new bug—chart shrinkage.

I was building a very simple chart to test something in Power BI (SharePoint online)—so I used Power Query and my one of my favorite test data sites (Pro Football Reference—they have good HTML tables that work really nicely with Power Query.) So I grabbed the 2013 NFL defensive stats and threw them in a cluster bar chart.

Figure 1 Power View Chart of NFL Defensive Stats 2013

Look at that—the Saints were pretty good on defense last year, particularly pass defense (thanks Rob Ryan and Keenan Lewis.) This chart was fine and served my purpose, but in order to demo something to someone, I tiled it by team.


Figure 2 Tiled by Team


When I remove the tile by the chart now has a scroll bar (note the last visible team is the Arizona Cardinals, rather than the Washington NFL team in figure 1)


Figure 3 Shrunken Chart

Repeating this process several times, the chart ends up looking like this:

Figure 4 Who shrunk my chart?

It is easy to drag down the chart to make the data visible again, but this clearly isn’t the intended behavior. Hopefully, like Power Query (which gets new fixes seemingly every week) this matter (and the chart title one) will get resolved quickly.


When the Cloud is Definitely Right for You

So a lot of my colleagues who have a lot of IT experience working on some very large workloads (and I generally count myself in this number) have balked at the notion that all IT workloads can moved to cloud computing models. In general, being in a multi-tenant environment is not the ideal scenario for very large, intensive database workloads, and cloud computing doesn’t change that. The vendors are incentivized to provide you adequate performance, while maximizing the utilization of their infrastructure. So if you want dedicated performance—you are going to pay (at least at Amazon, as of this writing Microsoft doesn’t have a performance storage offering in Azure.) For the purposes of this post—I’m referring to “cloud computing” as Infrastructure as a Service (IaaS), platform and software as a service are different scenarios with different requirements.

When the Cloud is Right

So right now I see two real scenarios where IaaS is an ideal fit—there are others (quick dev environments, testing new platforms, restore testing), but my focus is attuned to these two scenarios:

  • Disaster Recovery
  • Small Business

Disaster Recovery

If you are a working for a major Multinational Corporation which has three or four data centers, feel free to ignore this section (actually, ignore this post and go read a licensing one.) If you work for a small to medium size business and your DR strategy is to take your backups home with you—you are my target audience. The cloud brings you a lot of really nice options:

  • Backup directly to Azure (or other providers through API)
  • All SQL Server DR options are available in hybrid models

I’m not suggesting you stop taking backups—just that you think about the cloud as your offsite backup repository. You can also think of the cloud as your second data center—that you only have to pay for when you use it. For example, maybe you only DR for an application, during certain business times of the year—IaaS makes this a viable solution.

Small Business

The real genesis of this post, was an email I received following the presentation I did last week called “Building Perfect SQL Servers, Every Time.” My goal was teaching people how to install and configure SQL Server in an optimal and repeatable fashion. The session went well, and I had a number of a follow up questions via email and twitter—but I noticed something in a few of them, people where asking how to optimally configure hardware that simply wasn’t enough to server there needs (e.g. a server with only 2 physical drives running SQL Server), or they were a part time SA and DBA (and probably a couple of other things.)

If you have less than 10 servers, and they are running in a closet, you should definitely consider a cloud model. The networking is getting easier to setup and you can only pay for exactly what you need. There are other things to think about such as traffic and bandwidth, but the more important thing is your data is far more protected than it would be running on a server with two drives. In Azure, there are three copies of your data (this doesn’t mean you shouldn’t take database backups to Azure, but when you take those there are also three copies of the data. You may be concerned about security—Microsoft has all of these security certications, and your servers are in your closet.


Cloud computing isn’t for everyone—but if you are supporting a few servers, it can make both financial and technical sense. Focus your efforts on your core business needs, instead of trying to manage infrastructure with band aids and duct tape.




Where Should I Use a Columnstore Index in SQL Server?

I’ve been blogging a lot about columnstore indexes in SQL Server 2014—for good reason, I think it is a great feature that adds a lot of value to SQL Server for analytic workloads. One of the other (and probably more talked about) features in SQL Server 2014 is In-Memory OLTP (formerly known as Hekaton)—the SQL team added a neat new report that’s accessible to users of Management Data Warehouse (MDW) to show which tables and procedures in a given database might be well suited for In-Memory. There is nothing like that yet for a columnstore index.


Figure 1 SQL 2014 Management Data Warehouse In-Memory Tables Assistant


Microsoft’s official guidance is to use columnstores on large fact tables and dimensions greater than 1,000,000 rows. Typically (and this applies to regular page compression as well, you are going to look to tables that don’t have a very high update percentage as columnstore candidates. Despite the ability to update the clustered columnstores, this feature remains largely focused on the append-only type workloads present in data warehousing system. The insert process has been designed around doing bulk inserts very efficiently, and not so much doing singleton updates and inserts.

So I wanted to put logic together to identify candidate tables. The below script checks for tables with > 1,000,000 rows and an update percentage below 20—it would be easy enough to modify, the output list is a the list of candidate tables for columnstore indexes. Additionally, it won’t return any tables that already have a columnstore index on them.

with widetables as

as ColumnCount, sum(c.row_count)
as Row_Count

sys.objects a

sys.columns b on b.object_id=a.object_id

sys.dm_db_partition_stats c on c.object_id=a.object_id

sys.partitions d on d.object_id=a.object_id

where d.data_compression not


> 5 and count(b.column_id)
< 1200 and sum(c.row_count)



–Check Update %

–Checks for Percentage of Update Statements

updaterate as


SELECT o.NAME AS [Table_Name]

    ,x.NAME AS [Index_Name]

    ,i.partition_number AS [Partition]

    ,i.index_id AS [Index_ID]

    ,x.type_desc AS [Index_Type]

    ,i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count) AS


sys.objects o ON o.object_id = i.object_id

sys.indexes x ON x.object_id = i.object_id

    AND x.index_id = i.index_id

WHERE (i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
!= 0

= 1 and i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count + i.leaf_delete_count + i.leaf_update_count + i.leaf_page_merge_count + i.singleton_lookup_count)
> 20



select name as CandidateTables from widetables


select table_name from updaterate


I hope this script is useful, and can provide some nice guidance as to where to use a columnstore index.


Columnstore Indexes and Transparent Data Encryption in SQL Server 2014

I spoke on columnstore indexes in SQL 2014 last weekend at SQL Saturday #262 in Boston—I had an audience question come up about using Transparent Data Encryption (TDE) in conjunction with columnstore indexes. Of the top of my head, I couldn’t think of any reason why it would be a problem—I was correct, but I also wanted to test it out and check the performance and compression levels. I used the standard BigTransactionHistory table (courtesy of Adam Machanic b|t) for all testing.


The first thing I did was create a new database with Transparent Data Encryption turned on.



















Next, I created a table and a clustered columnstore index in my new database.

into tde_test.dbo.TDE_CS_Test from bigtransaction_cs;

index CCI_TDE_Test on tde_test.dbo.TDE_CS_Test;


BigTransaction_CS is my source table which lives in the unencrypted AdventureWorks2012 database—I’m just making a copy of it in my new encrypted TDE_Test database which was created above.

Size Matters

Using a query to find the sizes of objects in my databases I examined the size of each object.

TableName indexName RowCounts TotalPages TotalSpaceMB
Columnstore ClusteredColumnStoreIndex-20140219-151205




Encrypted_Columnstore CCI_TDE_Test




TDE_NoCompression NULL




Clustered_NoEncrypt pk_bigTransactionHistory





As you can see—the row counts are the same, but the page counts and space used is higher—this is one of the penalties of TDE—the data won’t compress as well, however the difference in compression is only about about 10%. Despite this the compression effects are rather dramatic, even with encryption. The below chart shows the compressed versus uncompressed sizes of the data.

Figure 1 Size of uncompressed and compressed tables with and without encryption


In my demos on this topic, I have a fairly standard query that I use. Each query was run 6 times, with the last 5 runs counting for the data.

SELECT transactiondate



FROM bigtransaction_cs

WHERE TransactionDate <

    AND Quantity > 70

    AND Quantity < 92

BY TransactionDate

BY transactionDate;


Encryption typically does negatively impact query performance, due to the CPU overhead, and it is clearly illustrated below.

Figure 2 Query Performance with Encryption

There’s roughly a 36% performance degradation using TDE—based on the logical read counts from the queries, it is all related the overhead and not really affected by the smaller difference in compression.


Generally speaking, if you are using encryption there is some legal or regulatory reason why you are using it, and you don’t have any other options. Just be aware, that you can still get pretty good compression levels using it in conjunction with clustered columnstore indexes in SQL Server 2014, even though your performance may not be as amazing as it otherwise would be.

%d bloggers like this: