The Trouble with Keys

Last Updated September 25, 2012

Scaling up is hard: big hardware gets expensive fast. Scaling out is equally difficult; interesting design problems creep in to scale out solutions. One of the more troublesome issues architects face when scaling out is the issue of distributing identity. It’s often advantageous for object identity to be unique and portable across all database instances in an application – we may need to move a user from server A to server B. Once multiple database servers are involved, a centralized sequence generator can be come a single bottleneck or may even run out of values under heavy load. A solution is needed to generate unique values outside of the database while avoiding the performance problems of random, or semi-random, GUIDs.

Sequential Beginnings: Identity and IDENTITY

In the beginning, there was an empty schema. At this point an architect returned from the coffee pot and made a decision about database identities or GUIDs. There many valid reasons to make a decision in either direction – identities or sequences are controlled in the database and can be used to ensure a physical order to data. Who knew that spinning hard disks work better when data is sequential? (By the way, sequential GUIDs may not be sequential.)

You can make a lot of arguments for the right or wrong way to do things from a logical perspective, but DBAs do have a pretty good point when they say that randomness can cause problems for database performance. Generating sequential identifiers in the database may not be the most elegant solution to identity problems, but it does ensure that data is written in an order that makes sense in a world of spinning disk drives.

Database controlled sequential identity has one problem: sequential identities will need to be generated on a single server. Under sufficient load, that server will become a bottleneck for application scalability. To move past a single server as a bottleneck, a more robust and load tolerant solution is needed.

Distributing Identity: The Case for GUIDs

Architects and developers may be thinking that identities are great, but what happens when everything gets further apart?

As applications grow (or even by design), it becomes common to see teams become worried about distributing workload across many servers, using queues, or even splitting the data out into multiple databases or database servers. It’s at this point in the discussion that things get heated and people start throwing around the idea that GUIDs are the only way to solve this problem. Or that you just can’t rely on identity from the database and application generated identity is the only identity that matters.

This is where the war about numbers vs GUIDs gets nasty and someone gets their feelings hurt. Ignoring the size of GUIDs, I can say that I’ve witnessed several GUID collisions in production systems. GUIDs are only theoretically unique – they may even create a problem that you didn’t know you had.

A Better Solution for Distributed Identity

Combining distributed identity and ordered data seems like it’s a hard problem. Random GUIDs can’t be guaranteed to be unique, sequential GUIDs can’t be guaranteed to be non-overlapping, and database generated identities require a persistent connection to a database (or else they require a looser idea of identity than some folks are comfortable with).

Moving away from the database as the central keeper of all knowledge and identity is difficult and many teams seem to equate moving identity out of the database with moving all logic and functionality out of the database. This doesn’t have to be the case. The database can still be used to provide a considerable amount of declarative functionality and logic but identity generation can be moved outside of the database.

Twitter and Boundary have solved the problems of distributed sequencing by moving the work away from the data tier. Both solutions solve the problem by treating a number as if it were an array of information. The first portion of a sufficiently large number is a timestamp; the timestamp is stored as the number of milliseconds since a previous point in time. The next number is a worker identifier – this can be anything that uniquely identifies the device generating the sequence. Finally there’s a sequence itself. The sequence is typically small (between 8 and 16 bits) and it starts counting again from 0 every time the millisecond counter changes.

The machine identifier, usually the MAC address, doesn’t matter specifically, but we do need to be able to reliably generate separate sequences of IDs. This doesn’t have to be a MAC address, it could be any number of bytes that identify a unique source of sequences. By storing milliseconds since epoch as the first portion of the key, we’re able to produce a sequence that’s mostly ordered with some random jitter at the intermediate levels. On the whole, though, our inserts will be ordered.

If you’re on the .NET side of the house, I solved this problem with a library called Rustflakes. The implementation is lifted wholesale from Boundary’s flake and the generated sequence values are a .NET decimal which lines up with SQL Server’s DECIMAL data type – it’s nothing more than an ordered 128-bit number. Which, coincidentally, is the same size as a GUID.

Wrapping it Up

There’s no easy solution to this problem. GUIDs are an easier approach, but they introduce additional load and maintenance overhead on the data storage mechanism. Distributed sequences don’t solve all of the problems of GUIDs, but they provide additional flexibility for database administrators, developers, and application architects alike.

Microsoft SQL Server 2012 Always On AGs at StackOverflow

Why Your SQL Server Cluster Shouldn’t Be Virtualized

22 Comments. Leave new

Kenneth Fisher
September 25, 2012 9:38 am

The solution we came up with many years ago was very similar to the distributed identity. We knew we would never have more than 10-20k values in a given location so we started the identity column in each location at a multiple of 100k.

So location 1 had numbers like 100001, 100002, 100003 etc
Location 2 200001, 200002, 200003 etc

We were able to set this up on a couple of dozen machines and never had a crossover.

Obviously this wouldn’t work for someone like twitter where they have such large numbers even splitting out a bigint might not be safe but for smaller values it worked quite well.

Reply
Michael J Swart
September 25, 2012 9:55 am

You’ve witnessed several GUID collisions. Did you ever figure out why? Although the effect is the same, I understand that when something like that happens, it’s more likely that the algorithm creating the GUID was not random versus the odds that two random GUIDS collided.

In other words, something weird’s going on. I guess when using rust flakes, an admin has to be vigilant that two machines have different MAC addresses? (I’ve seen that in production!)

Reply
- Jeremiah Peschka
  September 25, 2012 10:05 am
  
  In theory, the GUID generation algorithm should be random. In practice, we were using NEWID() as a default value. In practice, random isn’t always random.
  
  You would have to make sure that machines have different MAC addresses or, if you can’t guarantee that’s the case, pull a different form of arbitrary identity that’s 6-bytes in size. This could be a random number generated at boot, 6 bytes of the CPU identifier, anything at all that you can use to provide an arbitrary machine specific identifier.
  
  Reply
  - Michael J Swart
    September 25, 2012 10:14 am
    
    Thanks Jeremiah. That’s good info!
    
    Reply
  - Jeremy Huppatz
    October 1, 2012 9:16 pm
    
    Actually… there’s nothing unrandom about GUID collisions. If I give you 10 random numbers between 1 and 10, the chances of at least one duplicate are pretty high. Randomness of GUIDs isn’t the problem. It’s UNIQUENESS (or lack thereof) that creates issues.
    
    For a GUID-driven identity solution to work well, there’s a simple check you can do after you call NEWID() – check the key against a distributed (federated) view of all of your keys and see if the GUID you’ve created already exists somewhere in your system and call NEWID() again if you need to. Yes, it’s a less than perfect system, as this will have some performance issues compared with blindly trusting the algorithm. However, it makes more sense than trying to debug a key collision later.
    
    For the record, my preferred approach is to use a compound key where you have an IDENTITY constraint or a rowversion key that you combine with a server ID (that can be made specific to each DB server in your federation group – this can typically be a small-int lookup to a “DatabaseServers” federated view in which each server shares it’s own ID in the view) and a date field. This gives you some flexibility on how you order your reads from the federated view (date, ServerID + ID) and requires very little local overhead on each SQL Server within your federation group.
    
    Reply
Jay Beta
September 25, 2012 10:02 am

Great point and method. Thanks for sharing!

Reply
Jay Beta
September 25, 2012 10:05 am

@Michael, By default MAC addresses are unique. While theoretically applications exist that can alter the MAC address of a network interface they are usually reserved for special troubleshooting or configurations. Of course its always prudent to verify all configuration settings but just FYI, MAC addresses should be in the latter part of that list.

Reply
- Michael J Swart
  September 25, 2012 10:10 am
  
  I hear you Jay, but I would say instead “By default, MAC addresses should be unique”.
  
  A manufacturer could mess up the production of network cards for example.
  
  I would agree that MAC addresses are probably the best bet for uniqueness, but like you said, it’s “prudent to verify”.
  
  Cheers
  
  Reply
Darren
September 26, 2012 6:13 pm

Interesting Article.

I implemented an extremely similar Identity format from the client side using a Int64 (bigint) that I refered to as an LSID (Long Sequential IDentifier)
The first 32bits were a UTC timeoffset from #1/1/2075#, followed by 16 bits being a machine identifier, followed by 16 bits effectively being an increment that loop when the max is hit.

It works great because the items generated from a single machine are always sequential and guaranteed to be unique due to how the Increment is done.
From a server wide perspective (or multiple servers) the identifiers are very close to sequential due to the timestamp part of the ID so re-indexing and fragmentation is minimal.
It is also very handy to pre-generate ID’s instead of having the database generate them.
The Identifiers are still quite efficient and fast, and easy to work with in .NET since they are int64.

The limits come to that a single client can only create 64,000 identifiers per second (50x more then plenty) and that there could still be a potential overlap if the Machine IDs are duplicated (which can also be corrected by allocating them).

Generating them from the SQL Server is another matter, but under a FEW mostly manual cases you can generate ID’s using a fairly simple psudo random function …
CAST(CAST(DATEDIFF(second,’2075-01-01′,GETUTCDATE()) AS BINARY(4))+CAST(NEWID() AS BINARY(4)) AS bigint)

You can also cast the first 4 bytes of the int64 back to an offset and add to the date 1/1/2075 to see the created time of the ID itself.

Reply
Idan
October 17, 2012 2:42 pm

I don’t get it, if the data that is inserted as a primary key is already there, the database will throw an exception, so just try to insert the data again. Then you can use much shorter guids and not worry about anything :] Am I right or nor?

Reply
- Jeremiah Peschka
  October 17, 2012 3:08 pm
  
  The database will throw an error and, hopefully, the programmer is catching the exception. At the point the programmer catches the exception they would then need to examine the exception, look at the error code, and then determine the best course of action – retry or bail out. Most code just says “There’s an error, oops!”
  
  I’m not following you about “shorter GUIDs” – A GUID is a 128-bit number that’s generated using a SHA-1 hash, preferably. There’s no way to make a shorter GUID.
  
  Reply
  - Idan
    October 17, 2012 4:25 pm
    
    I mean a sequence of characters like one youtube uses or short URL services, it’s not guid but I like the short sequence of characters for friendly URLs
    
    Reply
Idan
October 17, 2012 2:47 pm

Another option is to match the generated number based on some of your data. For example, if an article is submitted by a user and the username is fixed in your application, you can translate the username to numerical data or even email address (but ecrypt it in some way) and you can achieve 100% unique id as long as the user doesn’t write millions of articles per second :]

Reply
- Jeremiah Peschka
  October 17, 2012 3:15 pm
  
  This would be a good first attempt at generating unique identifiers. This approach is somewhat naïve in that you’re effectively going to be converting a somewhat random string into a somewhat random number and then adding some arbitrary chaos to it. Even then, so called smart keys like this can be problematic because your mechanism for translating a username to numerical data (a hashing function, if you will) needs to avoid collisions. The hash function will further randomize your insert order and that’s going to cause some performance problems inserting into a B-tree data structure. In other systems, where order isn’t a factor, hashing keys can be a good approach, but you need to be very careful before considering it for SQL Server.
  
  Reply
Idan
October 17, 2012 4:22 pm

Can you take a look at how pinterest does that in the pin URL, I want to implement the same way.

Reply
Idan
October 17, 2012 7:27 pm

I’ve decided to go with uuid_short() as primary key. I need to make sure that each MySQL server has different server-id (<=256). What do you think? – Furthermore, I am still having trouble decide whether to use bigint or binary(16) for storing the uuid_short number.

Reply
- Jeremiah Peschka
  October 17, 2012 8:27 pm
  
  I think you’re on your own here. There’s a lot to factor in about your application design (which I know nothing about), database design (likewise), and MySQL expertise. A discussion about those things is too long for the comments of a blog post. If you’d like, you can hit me up via the contact form and I can let you know what a discussion around those things would look like.
  
  Reply
Jesse Reich
October 18, 2012 6:39 pm

This is very interesting. Do you risk the possibility of collisions if somehow the time on the server is set backwards?

Reply
- Idan
  October 18, 2012 7:21 pm
  
  Why does the time of the server is set backwards?
  
  Scalability most important here and with AI I just can’t see how my DB scales. Many companies use this method produce sequenced keys. IF you have any other good idea how to generate sequence keys without the internal clock please.. suggest something better.
  
  Reply
- Jeremiah Peschka
  October 18, 2012 8:31 pm
  
  There’s definitely a possibility of collisions; clock drift is a very real and nasty thing. In RustFlakes I throw an ApplicationException when it’s detected that the clock is running backwards.
  
  In addition to checking with code, it’s best to elect one local server as the NTP server and have all other key generating servers sync with the time master. It’s a difficult problem, but not attempting to solve it results in some potentially nasty scenarios.
  
  Reply
Idan
October 19, 2012 5:21 am

1) What the difference between Rustlakes and UUID_SHORT() function in MySQL?

2) IS the code is good when running on Amazon EC2? ,what happens when I relaunch a server stop/start, I get a new machine, does it matter or it’s ok, because the Mac address will change
‘
Thanks

Reply
- Jeremiah Peschka
  October 19, 2012 8:26 am
  
  1) I don’t know anything about MySQL. The fantastic manual explains exactly how this code works. I’d be worried that under heavy load the database will become the bottleneck.
  2) Correct. As long as you aren’t hard coding however you identify a machine, you are safe in a virtualized/cloud environment.
  
  Reply