Last year we worked with a customer on a proof of concept that involved populating and running some performance tests against a 20TB database. The POC was conducted using the following software and hardware:
As it turned out, the bulk of the data was contained within one partitioned table in particular and we ended up having to re-load that table a few times as we optimized the system and made changes to the disk layout.
During the POC, we documented a few tips and lessons along the way in a list that we've been maintaining internally. Now that we've started this blog, I thought it may be useful to publish this information for others that are building or working on very large database systems with SQL Server. Note this is not intended to be a complete list of VLDB considerations but just a list of configuration settings, commands and tips that were helpful in our POC.
Configuration
Partitioning
Parallelism / Performance
select t.session_id, r.command, s.parent_node_id, count(*) as tasks, sum(pending_io_count) as [IO],
sum(case t.task_state
when 'RUNNING' then 1
else 0
end) as [RunningTasks]
from sys.dm_os_tasks t
join sys.dm_exec_requests r on t.session_id = r.session_id
join sys.dm_os_schedulers s on t.scheduler_id = s.scheduler_id
where t.session_id > 50
group by s.parent_node_id, t.session_id, r.command
order by 3
-Scott Hulke, Microsoft Technology Center – Dallas