Couchbase at Cloud Field Day 17

Last month, I had the pleasure of attending Cloud Field Day 17 in Boston. If you aren’t familiar with the Tech Field Day events, a bunch of independent technology experts get to meet with a wide array of technology companies. This allows the companies to tell us about their offerings, and we give them feedback and ask them questions. I’ve been to a number of Field Day events, and always enjoy them and leave with great knowledge and excellent contacts.

One of the companies we met with in Boston was Couchbase. (Full disclosure, I have done some promotional work for Couchbase in the past.) Couchbase is a NoSQL database that is primarily used as a JSON document database, but it also supports key value, wide column, and graph database models available in their service, allowing developers to have flexibility in how they want to build their data stores. Couchbase has a cloud database as a service offering that is available on all of the major clouds, called Couchbase Capella.

We met with Couchbase on “National Cloud Database Day” (June 1st), and they talked about their Capella. Capella is particularly interesting, because beyond just running in the cloud as a database service, you can also run it on mobile and edge devices with its Couchbase Lite offering, which allow for a more modern replacement for the classic SQLlite offering. The other aspect of the Couchbase story is more affordable costs, especially when compared to some of the native cloud offerings on Azure and AWS. Like many other cloud and NoSQL databases, high availability is built into the architecture of the solution, as is the ability to horizontally scale without downtime.

Couchbase’s larger goal is developer flexbility–having all those data models available, at a very nice price point, and in any cloud or device, makes Capella a nice option for applications. The benefit of being able to run the same data store on mobile and IoT devices, is also a very nice feature. You can learn more about Capella here.

Syncing Data Between Azure SQL Database and Amazon RDS SQL Server

A while back, a client, who host user-facing databases in Azure SQL Database, had a novel problem. One of their customers, had all of their infrastructure in AWS, and wanted to be able to access my client’s data in an RDS instance. There aren’t many options for doing this–replication doesn’t work with Azure SQL Database as a publisher because there’s no SQL Agent. Managed Instance would have been messy from a network perspective, as well as cost prohibitive compared to Azure SQL DB serverless. Even using an ETL tool like Azure Data Factory would have worked, but would have required a rather large amount of dev cycles to check for changed data. Enter Azure Data Sync.

If you’ve ever noticed this button on the Azure SQL Database blade in the portal:

and wondered what it does. Azure Data Sync works conceptually like transactional replication. If you go to the docs page for Azure Data Sync, you can see it works just like replication, except a little worse. The big advantage that data sync has is that it does not rely on a distribution database, which means you can use an Azure SQL DB as a source for data sync, which you can’t do for a replication. Data sync uses a system of triggers and tables, to identify which rows it needs to send to the receivers. Typically an additional database is used for that, however, in my scenario, I keep all of the data sync objects in my user database, because it’s only being used for syncing purposes.

We had a customer of one our customers who, needed to send data from Azure SQL DB to Amazon RDS–while this is very possible with data sync, however this was challenging, due to vague documentation. I’m not going to cover the network detail of this in great detail, but the basic idea is that the VM that your data sync agent is running on, needs to be able to talk to your RDS instance on port 1433. There are numerous ways to make this network approach happen, but that part is the not that had to figure out.

The sync agent gets installed on a VM. I ran into one major challenge with the VM–my database was about 200 GB, with a lot of compressed and columnstore data. The way data sync works, data is extracted onto the agent VM, and not streamed directly into the database. Also, for whatever reason, that data seems to be extracted twice–which means you need to size the data drive on that VM accordingly. In my case, I had to increase the size of the drive to a TB to allow the process to complete. You will also want to use a premium managed disk, because the performance of the drive directly impacts the duration of the sync process.

My clients data sync scenario pushes data once a day rather than all the time. In order to save money, I used Azure automation to power the agent VM on and off, and after powering down, change the premium drive to a standard to further save costs. The Data Sync process can be started via PowerShell, however, the process is asynchronous, which means, you need a separate automation runbook to monitor the process of that job (which is what I used to power down the VM when the process is complete.

Blocking .zip and .mov Top Level Domains from Office 365 Email

Last week, Google announced that they were selling domain registrations for the .zip and .mov top-level domains (TLDs). Google registered these TLDs as part of ICANN’s generic top level domain program. Spammers and threat actors everywhere have rejoiced at this notion–.zip and .mov files are very common malware vectors. While there haven’t been any real-world observations of attacks the SANS institute is recommended proactively blocking these domains from your network, until we better understand their behavior.

There are a number of places to block these domains (and you will see various blogs from DCAC consultants this week about the different areas). I have become our defacto email admin, so I decided to handle the Office 365 side of this.

The first thing you need to do is login to the Exchange Admin Center, which is admin.exchange.microsoft.com.

The way you are going to block a whole TLD, is using mail flow rules. You can also block an entire domain (hiya Chris Beaver), using the accepted domain feature, but that feature doesn’t not allow you to block a TLD. So on the left, expand the mail flow object in the hive, and click on rules, and then click on “Create a Rule”

In your rule, you will first need to give it a name–this is just metadata–I used Blocked Spammy Domains Demo. For where to apply this rule select “The Sender” and then “address matches any of these text patterns” and then add the patterns \.zip$ and \.mov$ as shown below.

Next you have to specify an action–here I’m going to reject the message and include an explanation that gets sent back to the sender. “Buy a better domain spammer”. Next, I’m going to notify the recipient that a spammy domain was trying to email you.

After that, you can click next, and then you will be on the set rule settings page. Select “enforce” and activate this rule and then click next again.

On the final screen, click finish to complete the rule.

Your email is now protected from these spammy domains, that could be nefarious.

The Importance of In-Person User Group Meetings

I really, really miss speaking at in-person user group meetings–between the death of PASS, and the long tail of the Covid pandemic, while larger scale conferences have returned to some degree, local user groups have been extremely slow to come back in person. While some user groups are still doing remote session (I did a really poorly promoted talk on Azure Active Directory for 3 people last month), they absolutely suck for the speaker, and I suspect are not much fun for the attendees. I have given a couple of talks, but you miss so much in a virtual session.

It’s All About the Networking

When you attend a virtual user group, you might get to talk to a couple of other members of the group for a few minutes before the speaker gives their talk. However, for someone who is new to the group and doesn’t know anyone, it can be really hard to introduce themselves and make connections. I know as a speaker, virtual attendees are far more hesitant to ask questions in a virtual session than they are in a physical one. Additionally, it limits the growth of our community to new members–while virtual events are easy to put on, its the continued relationship building that helps retain new membership.

Let’s Not Forget About New Speakers

One of the challenges of the early days of the pandemic, is that for virtual user group meetings, any UG organizer had their choice of Microsoft MVPs to present to their groups. Typically, limits on travel and such would mean user groups would typically have local speakers, with the occasionally MVP type speaker who happened to be in town. This made it intimidating for newer speakers (the investment in microphones and cameras didn’t help either). Speaking at a user group is a traditional first step for a lot of speakers, additionally, user groups often once or twice a year, held open mic night, giving new speakers a chance to give a short talk to get started with public speaking. Beyond that, it can really help young professionals better understand technology beyond just their jobs.

Note: virtual groups are still valuable–some locations may not have enough population to get speakers every month, or may want to cover specialty topics. One of the things that sucked the most about C&C bankrupting PASS was that the virtual chapters of PASS were thrown out like dust in the wind. They always did a great job of grouping speakers with attendees. But in your local area, there is nothing that beats having a user group to augment your network and learn more about jobs, tech, and make new friendships.

The benefits include:

  • Networking opportunities
  • A start for new speakers
  • Everyone loves pizza.

Finally, if you are having a in-person user group and need a speaker, reach out. I’m not promising anything, but if I’m in the area, or I can make a quick side trip, I’d be happy to speak to your groups.

AZ-700 Designing and Implementing Microsoft Azure Networking Solutions

One of the fun parts of working for a small-ish Microsoft Partner is that you have to take a lot of exams. Some of which aren’t in your direct comfort areas–last year I took a couple of security exams (which was mainly my own doing) and even the Cosmos DB developer exam, which was a bit of a stretch, but I’m pretty familiar with NoSQL, and I just had to understand the specific Cosmos API calls and methods. Azure Networking was something I was more comfortable with, but the breadth of this exam, and some services that I hadn’t worked with meant I had to take a different tact.

white switch hub turned on
Photo by Pixabay on Pexels.com

You should note–I do a ton of work with Azure and am very familiar with a lot of the services, so your preperation may need to be different than mine. I like to write these to talk about what I did to pass, and let you decide your path from there. Denny asked me to take this exam about a month ago, and I scheduled it over the holidays, and I have been doing for the other exams I’ve taken recently, I purchased access to the practice exam from MeasureUp during that process.

I’m of two minds on the practice exams–I feel like, if you think you can pass a given exam cold, you should just take the exam, and use it as your practice. If you fail, you most likely know the areas where you struggled and lacked confidence in your answers and can work to improve those. However, the practice exams (while not being exactly the same format as the real exam) can help you identify your weaknesses and let you address them in real-time. My strategy for most of these recent exams has been to take the practice exam in training mode (where it instantly shows you wrong answers), if I know nothing about a question, I refer to learn.microsoft.com for that subject area, and try to gain understanding. If I think I know the answer, but get it wrong, I read the notes on the answer.  I repeat this process until I’m scoring 85-90% on the exam (note you’ll have the write answers memorized after a few times, but as long as you know why they are correct, I don’t think this is an issue). This strategy has worked well for me for the exams I’ve taken in the last couple of years.

Specific to the exam–you can find the study guide here. I’m not a network engineer, but because of Azure I’ve had to pick up a lot of skills in this area. You need to be familar with basic concepts like:

  • IP ranges and public and private IPs
  • Virtual Networks and Subnets
  • Routing
  • VPNs–the different flavors available in Azure

Some of the specific Azure stuff you should know, includes

  • Azure Firewall and how it works
  • The various load balancing solutions, and when you use them, and how to configure them
  • Virtual WAN
  • Azure Front Door and App Gateway
  • Azure Network Security

Overall, I thought the exam was very fair, and a reasonable test of knowledge. I took it a training center, and my only complaint was that some of the diagrams (or exhibits) were very complex, and on a small-ish monitor, it was hard to get the image, and the relevant text on the same screen as the question.

Tech Field Day 26: ZPE Systems

I recently attended Tech Field Day 26, in Santa Clara, while we spent most of time with the Open Compute summit, and discussing CXL, we got to meet with hardware and software networking vendor, ZPE Systems. ZPE has a number of solutions in the edge and data center spaces that allow you to do secure network management and just as importantly automate large scale data center deployments.

In my opinion there are two main concerns with building out network automation—security and device accessibility. ZPE aims to mitigate both of these risks–with security, by building a secure ring around your infrastructure to limit direct exposure. This ring security model can apply to the OS, but in highly secured networks, access to a secured network boundary layer, can provide similar functionality, ZPE recommends a three ring model, with their hardware solution being the outer boundary. They provide support for multiple network providers as well as support for the latest continuous deployment and integration deployment patterns.

The real secret sauce to ZPE’s solution is that it integrates this automation with a central repository in their cloud service to support:

  1. Device (servers, network gear, storage) upgrades, setup, and patching
  2. Out-of-band access
  3. Access control
  4. Logging
  5. Monitoring

These configurations are all pushed from the centralized cloud data store. Conceptually, this is like a jump host, but way smarter, with a lot of connectors and support for automation process. ZPE showcased their Network Automation Blueprint seen on the slide above, which launches from their out of band devices.

Understanding CXL for Servers–Tech Field Day #tfd26 at OCP Summit

Last week, I was fortunate enough to be able to attend Tech Field Day 26 in San Jose. While we met with several companies, this was a bit of a special edition of Tech Field Day, and we attended the CXL Forum at the Open Compute Project conference. In case you don’t know the Open Compute Project is project from a consortium of large scale compute providers like AWS, Microsoft, Meta, and Google, amongst others. They aim to optimize hyperscale data centers in terms of power and cooling, deployments, and automation. So how does CXL fit into that equation?

CXL stands for Compute Express Link, which is a standard developed by Intel, but also includes a large number of both cloud providers and hardware manufacturers. The CXL standard defines three separate protocols (definition source Wikipedia) :

  • CXL.io – based on PCIe 5.0 with a few enhancements, it provides configuration, link initialization and management, device discovery and enumeration, interrupts, DMA, and register I/O access using non-coherent loads/stores
  • CXL.cache – allows peripheral devices to coherently access and cache host CPU memory with a low latency request/response interface
  • CXL.mem – allows host CPU to coherently access cached device memory with load/store commands for both volatile (RAM) and persistent non-volatile (flash memory) storage

The main area focus for cloud vendors like Microsoft and Amazon is CXL.mem, which would allow them to add additional memory to cloud VM hosts. Why is this such a big deal? Memory represents the largest expense to cloud providers, and the requirements for memory keeps increasing.

Beyond that—supporting a mix of workloads means memory can become “stranded”. If you are a database administrator, you can think of this like index fragementation—which leads to wasted space. Ideally, cloud vendors would like to completely disaggregate memory and CPU, which is one of the goals of CXL (memory being tied to a rack and not a specific host), but will likely not occur for 3-5 years.

However, CXL is real, and on-board CXL memory sockets are coming soon. The best explanation of CXL’s use cases I saw last week were from Ryan Baxter, the Senior Director of Micron (Micron has some interesting solutions in the space). You can a version of that talk here. Effectively, you can have additional memory on a server on a CXL bus (which uses PCI-E for its transport mechanism)—this memory will be slightly slower than main memory, but still much faster than any other persistent storage.

Another interesting talk was from Meta, who described their performance testing with CXL. Since memory is remote, there is a performance cost, which was around 15% with no optimizations to their software. However, Meta wrote an application to perform memory (on Linux) management which reduced the overhead to < 2%.

You might imagine a database engine, would be aware of this remote memory configuration, and might age pages it did not think were going to be reused outside of main memory and into remote memory.

I learned a lot last week—hardware is still a very robust business, even though most of the focus is still on the needs of the cloud providers. CXL promises some foundational changes to the way servers get built, and I think it will be exciting. Stay tuned for more posts from Tech Field Day 26.

Using the Dedicated Administrator Connection with SQL Server in a Docker Container

I use a Mac (Intel, you can only run SQL Server on Edge on Apple chips) as my primary workstation. While I have a Windows VM locally, and several Azure VMs runnining Windows, I can do most of my demo, testing, and development SQL Server work locally using Azure Data Studio, sqlcmd, and SQL Server on Docker. Docker allows me to quickly run any version or edition of SQL Server from 2017-2022 natively on my Mac, and has been nearly 100% compatible with anything I’ve needed Windows for core database functionality. And then this #sqlhelp query came up this morning.

The one reference I found to “ForkID” on the internet was in this DBATools issue, given that and the fact that the tweet also referenced backup and restore, my first thought was to query sys.columns in MSDB. So, I did and there were a couple of tables:

Because as shown in the image above, the table in question is a system_table, in order to query it directly, you need to use the dedicated administrator connection (DAC) in SQL Server. The DAC is a piece of SQL Server that dedicates a CPU scheduler, and some memory for a single admin session. This isn’t designed for ordinary use–you should only use it when your server is hosed, and you are trying to kill a process, or when you need to query a system table to answer a twitter post. The DAC is on by default, with a caveat–it can only be accessed locally on the server by default. This would be connected to a server console or RDP session on Windows, or in the case of a container, by shelling into the container itself. However, Microsoft gives you the ability to turn it on for remote access (and you should, DCAC recommends this as a best practice), by using the following T-SQL.

exec sp_configure 'remote admin connections', 1 
GO
RECONFIGURE
GO

This change does not require a restart. However, when I tried this on my Mac, I got the following error:

Basically–that’s a network error. In my container definition, I had only defined port 1433 as being open, and the DAC uses port 1434. If I were using Kubernetes for this container, I could open another port on a running container, however in Docker, I can only do this by killing and redeploying the container.

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=P@ssw0rd!' -e'MSSQL_PID=Developer' -p 1433:1433 -p 1434:1434 -v /Users/joey/mssql:/mssql -d  mcr.microsoft.com/mssql/server:2022-latest  

I simply expose port 1434 (by the second -p switch in the deployment script) and now I can connect using the DAC. Sadly, there was nothing interesting in sysbrickfiles.

How to Remove a Data Disk from an Azure VM (How not to blow your leg off)

I was working with a client recently, were we had to reconfigure storage within a VM (which is always a messy proposition). In doing so, we were adding and removing disks from the VM. this all happened mostly during a downtime window, so it wasn’t a big deal to down a VM, which is how you can remove a disk from a VM via the portal. However, upon further research, I learned that through the portal you can remove a disk from a running VM.

For the purposes of this demo, I’ve built a a SQL Server VM with two data disks and a single disk for transaction log files. The SQL VMs use Storage Spaces in Windows, which is a recommended best practice–but even if you are not using Storage Spaces, most of this will apply.

How To Identify Your Disk

This is the really important part of this post–how to identify what your disk is in the portal and with your VM. When you define a data disk in the portal, either you or the system will define a LUN number for the individual disk. You can see it on the portal in the below screenshot.

This number is mostly meaningless, except that within Windows, it lets you identify the disk. If you open up Server Manager and navigate to Storage Pools > Physical Disk, you can see where this LUN number shows up.

That number maps back to the values you see in the Azure portal, and unless you size each of your disks differently (which you shouldn’t do for performance reasons). If you aren’t using Storage Spaces, you can also see the LUN number in Disk Management in Windows as shown below.

You can also get this information using PowerShell using the Windows cmdlet Get-PhyiscalDisk.

It is very important to ensure that you have identified the correct disk before you remove it.

Removing the Disk

Azure will let you delete a disk from a running VM. Even if that disk is mounted in the VM and has data on it. Yes, I just did this on my test VM.

If you click one of those highlighted Xs and then click Save, your disk will be removed from your VM. There’s also a series of PowerShell commands you can use to do this. It is also important to note, that at this point your disk is still an Azure resource. Even though you have removed it from the VM, the disk still exists and has all the data it had the moment you detached it from the VM.

If you chose the correct disk to remove from your VM, and you have confirmed that your VM is healthy, you can navigate into the resource group for your VM where you will see your disks.

The important thing to note is that the state of the disk is “Unattached”, which means it’s not connected to a VM. So it can be deleted from Azure–I don’t recommend doing so until you have validated your VMs are running as expected.

You may ask how you can prevent disks from being removed from running VMs. I’ll write a post about this next week, but while you are waiting read up on resource locks in Azure.

Azure SQL Managed Instance versus Amazon RDS for SQL Server—Which Should You Choose? (Or why Managed Instance is faster)

Microsoft, in conjunction with Principle Technologies recently produced a benchmark, comparing the performance of Azure SQL Managed Instance, and Amazon RDS SQL Server. I normally really dislike these benchmarks—it can be really hard to build proper comparisons and the services frequently don’t have perfect equivalent service tiers, making them really hard to ultimately compare their performance. In fact, when I was reading this benchmark, I saw something when comparing the two services that made my eyes light up. And then I realized it was a limitation of RDS.

I immediately saw that Azure MI had 320,000 IOPs while AWS only had 64,000. Obviously Azure is going to crush any database benchmark with that difference. And then I did a bit more research. I the visited the AWS docs.

You’ll note that while Oracle RDS does get up to 256,000 IOPs (I guess those customers have more money), SQL Server has a max number of IOPs of 64,000. Needless to say, in this benchmark comparing the price/performance ratio, Managed Instance crushes RDS before you even add in the hybrid licensing benefits that Microsoft supports for Azure SQL services.

But Wait There’s More

While Managed Instance is by no means a perfect service, there are a number of reasons why I strongly recommend against running your database on RDS. Here are the main ones:

  • You can’t migrate a TDE encrypted database using backup and restore—you have to extract a BACPAC and import into a database in the service
  • The native backup solution doesn’t support restoring a database to point in time.
  • You can’t deploy cross-region, meaning, there is no near real-time option for disaster recovery
  • There is no instant file initialization which can make some restore and file growth operations extra painful

These are the major concerns, with an additional licensing concern of not being able to use Developer edition for your workloads, means your overall costs to run an environment are going to be a lot higher of you use RDS.

When Not to Choose PaaS?

While RDS has a lot of costs associated with it, and performance is limited, there is a price/performance/data volume curve that I feel I applies to both Azure and AWS platforms. If you need high end storage performance (which means using the Business Critical service tier), on Managed Instance, and your data volume, is more than a terabyte, you have to scale your Managed Instance to 24 cores. If your volume is more than 2 terabytes, you need to scale to 32 cores, and if you need more than 8-16 TB, you will need to scale to 80 cores, which will cost close to $30,000/month. I perfectly understand why this is the cost model—the storage is stored locally on the VM itself rather than remote—so Microsoft can’t put other VMs on that piece of physical hardware.

What Should You Do for SQL Server on AWS?

If you need to run SQL Server an AWS, what should you do? The answer is to use EC2 VMs. Sure you lose the minor benefits of having your servers patched and limited benefits of the backup feature, but you have more granular control over your IO performance and overall configuration.

Tl;dr Azure SQL Managed Instance delivers a lot more throughput than Amazon RDS for SQL Server, so your workloads will run a lot faster on Azure.