November 10, 2014 Leave a comment
At PASS Summit last week (and if you weren’t there, you missed an amazing conference as usual) Stacia Misner (b|t), whom I co-authored the white paper “Using Power BI in a Hybrid Environment” asked me a question that she was asked in her session. If a user is using Microsoft Azure SQL Database as a data source in Power BI, does that data source incur data egress charges? A couple of background notes here—generally speaking in most cloud computing scenarios data ingress (that is loading your data) is free, whereas data egress (retrieving your data into, say a reporting solution) costs money (pricing details are here, and not terribly expensive). Another generality around most cloud scenarios is that data transfer within the same data center also does not count against your data egress charges.
Another thing to note is that Office 365/Power BI and Microsoft Azure are two completely different services—I tried to be as clear about this as I could in the white paper, and sometimes it does strike me as odd, that two services, quite possibly living in the same data center are not aware of each other, but this does again seem to be the case. So let’s walk through the scenario for this.
Power BI and Microsoft Azure SQL DB
Data refresh in Power BI, is only directly supported by two data sources, SQL Server and Oracle (either on-premises, or running in Azure), if you want to connect to Azure DB (or a vast array of other data sources) the method to use, is to bring the data into your Excel data model using Power Query, and then pass the connection string into the data sources within Power BI. (Note—for full details on this, read the above white paper). So to test this I built a very simple Azure DB using the Adventure Works (AW) 2012 database for Azure (currently, Azure DB is not fully feature compatible with in-box SQL Server, so a special edition of AW is needed). I then generated a data model, against two tables, just enough that auto refresh would work normally.
Figure 1 Workbook in Power BI
In case you were wondering why I called the workbook “Bingo” I was having some issues related to the build of Power Query I was on that may have caused a little bit of frustration (or maybe that was just Drew Brees’ performance in Sunday’s Saints game). Bingo was the workbook that finally worked correctly. If I go to the data refresh screen, and execute “refresh report now”
Figure 2 Data Refresh Screen
From here—you can execute a refresh on demand. There is no concept of an incremental refresh—it seems like all data is refreshed when data refresh happens. So how do we see if this incurs data egress?
Azure DB and DMVs
There is a DMV in AzureDB named sys.bandwith_usage that tracks bandwidth usage for a given database by the hour. So an entry is made for each hour, where there is data usage against a given DB. I haven’t figured out an easy way to show space used in Azure DB for a given table, but from looking at my on-premises version of AW, I can about 12 MB of data in the tables I am using in this workbook.
So let’s check the DMV.
Figure 3 Query Results
As you can see in the results—refreshing my workbook seems to “cost” about 17 MB. From what I can tell there is no differentiation between an Azure data source and an on-premises one (some of that data use, particularly master, was from an on-premises SSMS session). I was hoping to provide a little more detail, but DMVs and netstat let me down—any help in the comments would be greatly appreciated.
In the grand scheme of things this isn’t a big deal (you get 5 GB of egress for free a month), but it is something to think about when designing BI solutions in the cloud. A couple of things that would be great from a Power BI perspective would be the ability to incrementally refresh data, and some sort of Azure direct connect, which didn’t include data charges.