An interesting question came up in a private conversation. If you want to pick a random value from a random generated result set and want to make sure that the row is really random, is the known ORDER BY NEWID() really random enough ? The point is that once you have one value and knowing the algorithm you could calculate the next value derived from the first value. So, what could you do to make this even more random ? To state something upfront, making it harder to guess the right value will have to be paid by extra computing cycles. No matter which way you choose, you will have to make sure to find the best way (in terms of computing investment and grade of randomness).

One possible solution is the following:

  • Generate two different NewId()s, order by these and generate a RowNumber from both of them
  • Afterwards join on the RowNumbers (as the highest values are the same) and pick one the value from the Joined result set order by another unique value.
  • If that is not random enough, you could create another RowNumber from another NewId() and join this to the second RowNumber and so on…. (Depending on the grade of paranoia :-)

A sample could be something like the following:

 

 

DROP TABLE dbo.Participant

GO

CREATE TABLE dbo.Participant
(
    participantId INT,
    FirstName VARCHAR(255),
    LastName VARCHAR(255),
    Speciality VARCHAR(MAX)
)

GO

INSERT INTO dbo.Participant 
VALUES 
    (1,'Luke', 'Skywalker', 'The force is strong with him') ,
    (2,'Hang', 'Solo', 'Made the Kessel run in less than 12 parsecs (But isn´t that a unit of distance and not time ?!)'),
    (3, 'Chewbacca', 'n/a', 'Doing something like R-R-R-R-R all the time') 


WITH ParticipantCTE(participantId,RowNumber1,RowNumber2,ThedeterministicGuid)
AS
(
    SELECT 
        ParticipantId,
        ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNumber1, 
        ROW_NUMBER() OVER (ORDER BY NEWID()) AS RowNumber2,
        NEWID() AS ThedeterministicGuid
    FROM dbo.Participant
)
SELECT TOP(1) P.ParticipantId
FROM dbo.Participant P
INNER JOIN ParticipantCTE N2
ON N2.participantId = P.participantId
INNER JOIN ParticipantCTE N1
ON N1.RowNumber1 = N2.RowNumber2
Order by N2.ThedeterministicGuid


/*

A sample run with 100000 iterations showed that is will be evenly dispersed.

ParticipantId Counted
------------- -----------
1             33712
2             33226
3             33062
*/

-Jens