Here’s my notes from this morning’s SQL Server 2016 security session by Jakub Szymaszek and Joachim Hammer at Ignite 2015 in Chicago:
The first previews of SQL Server 2016 this summer will include the Always Encrypted, row-level security, and dynamic data masking features.
Dynamic Data Masking
Say you need to give developers access to your production database, but you’ve got personally identifiable data in there like social security number or credit card number. For example, say you want to refresh your development server by restoring the production backups onto it.
The old way was to immediately scramble (or “mask”) the data after you restore the backup. That’s time-intensive and hard work, ripping through all of that data on disk. Generates a lot of writes and transaction log activity.
The new way, Dynamic Data Masking, will mask the data on the fly as your queries run. It’ll be available in both SQL Server 2016 and Azure SQL Database, but the demo was done in Azure SQL Database. (Very nice-looking UI for it, by the way.)
You define privileged logins, and everybody else gets the masked data. This is set at the login level, unlike row-level security (which is done via functions and application settings like context_info.) This does mean that web apps with connection pools using a single login for all database access won’t work as-is with dynamic masking.
For Azure SQL Database, only SQL logins are supported, not Azure Active Directory. For SQL Server 2016, both SQL logins and AD logins (and groups) will be supported.
Then you define masking rules – for a table, for a column, you can use a masking field format. Masking formats let you preserve data formats similar to the source, such as the current date/time for date/time fields, or firstname.lastname@example.org for email addresses, or you can build your own masking formats.
Dynamic masking does not affect your ability to query data – for example, if you select all users with a birthdate of 1975/06/06, the rows returned will match that date, but the birthdates shown will be masked.
Always Encrypted: Handling PII in the Database
Transparent Data Encryption (TDE) encrypts data at rest, before you write it to disk. That’s good, but the evil, nasty, corrupt database administrator can still query it and sell it to fund that beautiful classic Singer he’s been eyeing.
SQL Server 2016’s new Always Encrypted feature lets the developers encrypt and decrypt personally identifiable data (email, SSN, credit card number, etc) in the application, automatically, in ADO.NET. This way, only the developers get to sell the personally identifiable data. This also marks my last day as a DBA, because I’ll be learning C# to finance my car collection. But I digress.
The keys are never given to SQL Server. All encryption and decryption are done in the application. The “trust boundary” stops with ADO.NET – SQL Server is considered untrusted.
To implement it, the security admin (not necessarily the DBA) will:
- Generate a column encryption key (CEK) and a column master key (CMK)
- Encrypt the CEK and store them application-side (a certificate store, HSM, Azure Key Vault, etc)
- Store the master key securely (under your bed)
- Hand the encrypted CEK to the DBA, who will upload it to the database (it’s inside the user database, so it will automatically propagate to AlwaysOn AG replicas)
The encrypted keys will be exposed in new system tables, plus in SSMS under the database’s Security tree, under Always Encrypted Keys.
There will be two kinds of encryption available in SQL Server 2016:
- Randomized: if you encrypt the same value (“12345”) twice, you’ll get two different encrypted values. Great for super-duper-security, but useless if you need to join between tables or do GROUP BYs.
- Deterministic encryption: if you encrypt the same value multiple times, you’ll always get the same encrypted values. Less secure, but lets you do equality operations, WHERE clauses, SELECT DISTINCT, GROUP BY, etc.
If you try unsupported operations like range searches, arithmetic, pattern operations, etc., the query will fail. When they say unsupported, they mean the query simply won’t work, not that it will be slow.
When creating a table, there’s a new field-level parameter like this:
ENCRYPTED WITH (ENCRYPTION_TYPE = DETERMINISTIC, ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256', COLUMN_ENCRYPTION_KEY 'MYSUPERSEEKRET'
You get to pick the encryption type (deterministic or random) at the field level because in a single table, you might need deterministic for birthdate, and randomized for SSN. You have to know your workload and how your app queries the fields, since you can’t join/group/distinct the randomized ones.
Migrating to Always Encrypted, and Dealing with the Results
To encrypt your existing data, you have to build a new table, export the data out of SQL Server, and push the data back in. The encryption is simply not done in SQL Server, full stop – it’s only done in the client libraries. This means you’ll be looking at an outage to implement this in an existing database, and most likely using a tool like SSIS 2016, which will have the new ADO.NET driver and thus the encryption/decryption capabilities.
Once the data is encrypted, and you query it from an application that doesn’t have the encryption set up, the returned datatype will be a varbinary.
This is a really important concept, so I’ll put it another way:
When you run SELECT BirthDate FROM dbo.Users:
- Run from an encryption-aware app with the keys: BirthDate will be a DATETIME
- Run any other application: BirthDate will be a VARBINARY
Make sure you’re ready for that when you move to Always Encrypted.
Row-Level Security in SQL Server 2016
I blogged about RLS a few months back when it was announced for Azure SQL Database, and the same concepts apply here. Go read that if you’re new to the concept first, and I’ll only cover the updates and changes here.
RLS’s security policies have to use schema binding to make sure malicious users don’t drop the columns used by the filter predicate.
The security identifier in the demos uses CONTEXT_INFO(), which brings some security risks and limitations. From the stage, they acknowledged CONTEXT_INFO has been limiting in the past, but they’ll have announcements about that in the next couple of weeks. It’ll be more like an XML property bag.
Transparent Data Encryption in Azure SQL Database
Available on v12 servers, all SQL DB editions. “Security shouldn’t be a differentiator between editions – every edition of SQL Server should be secure.” Woohoo! Wonder if that will carry through to the boxed product.
Uses Intel’s AES-NI hardware acceleration, so it’s fast.
Azure manages your keys, so it’s a service-managed TDE. The development team got nicknamed the click-click-done team because implementation is so easy. (They showed a demo, and it’s just a GUI switch – turn it on, encryption happens, you’re done. Nice.) You can also do it with T-SQL in two lines – create the encryption key, and alter database set encryption on. You don’t get access to the keys – and you don’t need them, since you can’t restore an Azure SQL Database onto your own on-premise instances.
The gotcha there is that you can’t rotate the keys yourself. They’ve heard that complaint, and they have a new version coming later this summer to help.
There’s a performance hit, but it’s hard to quantify because it depends on your workload. As with the boxed product, when you turn on TDE, TempDB is also encrypted.
Summary: Lots of Neat Security Stuff Coming.
It’s hard to get me excited about security, but there’s a lot of neat stuff happening here, and they’re features that real world users will appreciate. There’s implementation challenges and performance monitoring challenges across the board, but it looks like a heck of a v1.