How does ORDER BY ABS(CHECKSUM(NEWID())) work?
I have stumbled upon a random code ID generator and I can't figure out how it works.
SELECT TOP (5) c1 FROM ( VALUES ('A'), ('B'), ('C'), ('D'), ('E'), ('F'), ('G'), ('3'), ('4'), ('5'), ('6'), ('7'), ('8'), ('9') ) AS T1(c1) ORDER BY ABS(CHECKSUM(NEWID()))
I have looked at the following:
- select NEWID() -> E8E142CC-A918-4776-AA99-2D33DC80FE28
- select CHECKSUM('E8E142CC-A918-4776-AA99-2D33DC80FE28') -> -2089106226
- select ABS(-2089106226) -> 2089106226
If I then try and do an "Order by 2089106226" it obviously fails
So what is "ABS(CHECKSUM(NEWID()))" returning and what is this type of ordering called?
An ORDER BY clause doesn't have to sort on a specific column as you've discovered, but rather it can contain any arbitrary expression. I don't believe there is any special name for this, it's just a dynamic value in the order_by_expression part of the ORDER BY documented here
A non-column expression in the ORDER BY might be something like
ORDER BY a_column % 5
to sort rows based on a column's value modulus 5, or like
ORDER BY CASE WHEN LEFT(a_column, 1) = 'Z' THEN 0 ELSE 1 END ASC, a_column ASC
to force all the values in column beginning with Z to sort ahead of all others, but otherwise be alphabetical, by ascribing a 0 to Z rows and 1 otherwise (where zero sorts ahead of on).
In this case, the expression is a common method of randomizing row order in T-SQL:
The NEWID() function will be called for each row returned by the SELECT statement, which results in 5 separate GUID values. From there, each of those 5 is passed to CHECKSUM() which returns an index hash value which happens to be an integer. ABS() finally forces it to be a positive integer.
So the values that are sent ultimately to ORDER BY is just a list of integers derived from multiple calls to NEWID(), and ORDER BY has no problem sorting integers. In the end it isn't really different from sorting a column of integer values - it's just that these are generated at query time.
If you run this demonstration on SQLFiddle a few times, you'll get a different sort order for the otherwise incrementing id on each execution. So it looks like the reason the author of the random code generator SQL in question used that ORDER BY is to shuffle a random set of 5 characters chosen from the VALUES () list. If you run the code generator SQL and comment out the ORDER BY, it will always return the rowset A,B,C,D,E. With the ORDER BY in place, it returns a randomized set.