Method 1, Bad: ORDER BY NEWID()
Easy to write, but it performs like hot, hot garbage because it scans the entire clustered index, calculating NEWID() on every row:
That took 6 seconds on my machine, going parallel across multiple threads, using tens of seconds of CPU for all that computing and sorting. (And the Users table isn’t even 1GB.)
Method 2, Better but Strange: TABLESAMPLE
This came out in 2005, and has a ton of gotchas. It’s kinda picking a random page, and then returning a bunch of rows from that page. The first row is kinda random, but the rest aren’t.
SELECT * FROM StackOverflow.dbo.Users TABLESAMPLE (.01 PERCENT);
The plan looks like it’s doing a table scan, but it’s only doing 7 logical reads:
But here’s the results – you can see that it jumps to a random 8K page and then starts reading out rows in order. They’re not really random rows.
You can use the ROWS sample size instead, but it has some rather odd results. For example, in the Stack Overflow Users table, when I said TABLESAMPLE (50 ROWS), I actually got 75 rows back. That’s because SQL Server converts your row size over to a percentage instead.
Method 3, Best but Requires Code: Random Primary Key
Get the top ID field in the table, generate a random number, and look for that ID. Here, we’re sorting by the ID because we wanna find the top record that actually exists (whereas a random number might have been deleted.) Pretty fast, but is only good for a single random row. If you wanted 10 rows, you’d have to call code like this 10 times (or generate 10 random numbers and use an IN clause.)
/* Get a random number smaller than the table's top ID */
DECLARE @rand BIGINT;
DECLARE @maxid INT = (SELECT MAX(Id) FROM dbo.Users);
SELECT @rand = ABS((CHECKSUM(NEWID()))) % @maxid;
/* Get the first row around that ID */
SELECT TOP 1 *
FROM dbo.Users AS u
WHERE u.Id >= @rand;
The execution plan shows a clustered index scan, but it’s only grabbing one row – we’re only talking 6 logical reads for everything you see here, and it finishes near instantaneously:
There’s one gotcha: if the Id has negative numbers, it won’t work as expected. (For example, say you start your identity field at -1 and step -1, heading ever downwards, like my morals.)
Method 4, OFFSET-FETCH (2012+)
Daniel Hutmacher added this one in the comments:
DECLARE @row bigint=(
SELECT RAND(CHECKSUM(NEWID()))*SUM([rows]) FROM sys.partitions
WHERE index_id IN (0, 1) AND [object_id]=OBJECT_ID(‘dbo.thetable’));
ORDER BY (SELECT NULL)
OFFSET @row ROWS FETCH NEXT 1 ROWS ONLY;
And said, “But it only performs properly with a clustered index. I’m guessing that’s because it’ll scan for (@rows) rows in a heap instead of doing an index seek.”
Bonus Track #1: Watch Us Discussing This
Ever wonder what it’s like to be in our company’s chat room? This 10-minute Slack discussion will give you a pretty good idea:
Spoiler alert: there was not. I just took screenshots.
Bonus Track #2: Mitch Wheat Digs Deeper
Want an in-depth analysis of the randomness of several different techniques? Mitch Wheat dives really deep, complete with graphs!