Here’s a quick tip if you don’t want to try to use random and figure out seed, etc. If you’ve got a lot of columns in a table, it is pretty likely that the checksum for all those data values are going to be random across your primary keys.
I recently needed to create a 5% random sample of my equities, so just ran the following query:
select TradingSymbol from dbo.EquitySymbol where EquityType=0 and CHECKSUM(*) % 20 = 0
That got me 367 out of about 7300 possible rows. If you’re doing random sampling, I suggest the following table structure.
Then, you can create a random sample by inserting a sample ID into Data Sample and then insert from your selection into the DataSampleValue table:
insert into olap.DataSample select 'Random Sample 5% for EquityType=0' select @@IDENTITY
insert into olap.DataSampleEquity select 18, TradingSymbol from dbo.EquitySymbol where EquityType=0 and CHECKSUM(*) % 20 = 0
select * from olap.DataSample