Just left the final shiproom meeting for SQL Server 2008, and am happy to say every team has signed off, so the product is now in the hands of manufacturing, and in process toward web and media availability for you. We've shipped! MSDN and TechNet subscriber downloads are now live, with more to come.
It's been less than three years since SQL Server 2005, but we're pleased to give you a great new release of SQL Server which not only adds fantastic new capabilities to your data platform, but also delivers broadly better performance. We used new industry standard benchmark workloads, and customer workloads, to drive us toward delivering better real-world performance...and we didn't take the easy road.
The best example of that is our use of TPC-E, a far more modern, realistic, and challenging benchmark workload than its predecessor. We used TPC-E to improve the scalability of our relational engine, in ways that should be more relevant to your own OLTP database workloads. We're proud of partners, like IBM, NEC, and Unisys, who were able to demonstrate great scalability, up to 64 cores, using SQL Server 2008 running TPC-E, and like Fujitsu-Siemens and Dell, who have shown industry leading price/performance. Results like these should also be more useful for system sizing.
Almost five years of effort by TPC members were invested in the development of TPC-E, and it shows. The workload uses synthetic data which is far more realistic, by modeling real-world data. And compared to its antique predecessor, TPC-E's schema has ~3X more tables and primary keys, 2X as many columns, and 4X more foreign keys. And here's a radical thought for a modern OLTP benchmark workload: include check constraints, referential integrity, and reliable storage. Don't customers actually put DBMS servers into production expecting that?
Because we believe so strongly TPC-E drives us to better meet your needs, I am announcing today this is the first release of SQL Server which will not include published TPC-C benchmark results. Like other great thoroughbreds, TPC-C had a great run, and we were proud to ride it while it was still relevant to customers. But today, we're turning that great old race horse out to pasture for a well-deserved rest.
In SQL Server 2008, we have also invested lots of effort to improve our data warehousing performance, and the performance of our BI services. SSIS, SSRS, and SSAS, each show many double-digit gains in performance, which we hope you will enjoy. The new world record we set with SSIS, loading 1 TB of data in under 30 minutes, gives you a sense of this commitment to BI performance. We are also proud of our first-ever TPC-H 10 TB result.
There are a couple other ways in which this release improves on our past work. First, we've focused more energy on improving 64-bit SQL Server's performance on x64 AMD Opteron and Intel Xeon architectures. Given the price and density of RAM, and great new x64 CPUs available, this is a perfect time to take a closer look at the performance of your database servers, and consider doing fresh deployments, or migrations, on x64 SQL Server, which can use the additional RAM for everything, not just the buffer cache. Second, we invested more heavily in performance regression testing, both in automation and in the breadth of our test coverage. These investments have been reflected in broadly positive feedback from the community as well as internal and external beta sites. While we'll always have more to do, we feel this release marks an important step forward.
Welcome to the beginning of a new era for SQL Server. We hope you enjoy working with SQL Server 2008 and look forward to your feedback.
-David Powell
SQL Performance Engineering
Today, as the old saying goes, is a red-letter day, with the launch of Windows Server 2008, Visual Studio 2008, and SQL Server 2008.
Our team has been heads down, working to ensure SQL Server 2008 is delivered to you with great performance and scalability. You’ll see signs of this in the new SQL Server February 2008 Community Technology Preview, which includes great new performance features in the engine, SSRS, SSAS, and SSIS, as well as just thumping good performance.
But don’t take my word for it: ask NEC, IBM, SAP, HP, and Unisys. Today, our partners are delivering proof this is the best release yet of SQL Server!
Here’s a quick round-up of the industry standard benchmark results our partners published today, Feb 27, 2008, using Windows Server 2008 and SQL Server 2008. Details of the Transaction Processing Performance Council (TPC) results can be found on www.tpc.org. More information on the SAP SD result is available on SAP’s web site.
- #1 TPC-E result of 1,126 tpsE at a cost of $2,771/tpsE, using a 64-core Intel Itanium-powered Express5800/1320Xf system. This result demonstrates the power of NEC’s architecture and the scalability of SQL Server 2008 for enterprise OLTP workloads since it is our first 64C TPC-E result
- TPC-E result of 479 tpsE at a cost of $1,591/tpsE, using a quad-socket, 16-core IBM x-Series x3850 M2 system. This is a 14% gain over IBM’s previous x3850 M2 TPC-E result with SQL Server 2005
- HP’s newest SD three-tier result of 34,000 users, which is #1 on quad-processor industry-standard servers, and is 88% faster than a previous quad-processor result on SQL Server 2005. This result shows the power of HP’s BL680C blade servers and Intel Xeon 7300 series processors. Did you ever think you’d see the day a blade could be expected to handle the workload volume of 97% of SAP deployments worldwide?
- And last, but not least: HP published today the first-ever TPC-H result on SQL Server at the 10 TB scale factor: 63,650 QphH at $38.54/QphH, using a powerful 64-core Integrity Superdome server with HP SAS storage. Unless you’re Walmart, the odds are high your DW is smaller than this!
Industry standard benchmarks are great, but unfortunately they don’t yet cover all usage scenarios customers care about. ETL is a key part of any production DW workflow, and we’ve been paying special attention to the performance of SQL Server Integration Services, our ETL tool included with SQL Server. With improvements to the core SSIS processing engine in SQL Server 2008, and improvements in 64-bit connectivity, we decided to take SSIS out for a spin, to show what it could really do. Along the way, we and Unisys set a new world record for loading over 1 TB of data in under 30 minutes, beating a previous result posted by Informatica. Check out Len Wyatt’s more detailed blog post on this. We’d sure love to see the industry come together to create a standardized ETL benchmark workload.
Lastly, some leading ISVs put SQL Server 2008 through its paces, and were very pleased with the result:
- Camstar showed world-record scale of 205 MES transactions/second and 60% space reduction when using SQL Server 2008’s database compression on Windows Server 2008
- Microsoft Dynamics CRM 4.0 demonstrated record scale at 24,000 concurrent users, with sub-second response rate, using SQL Server 2008 on Windows Server 2008
- Siemens Teamcenter 2007, SQL Server 2008, and Windows Server 2008 ran with 5,000 concurrent users, and 50% space reduction from database compression
- Microsoft Dynamics AX showed up to 70% improvement in throughput scalability and response time, maximizing performance while reducing database growth using SQL Server 2008 and database compression
Look for even more performance gains in the final SQL Server CTP, before we ship this summer!
Today at the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS. Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool. Over 1 TB of TPC-H data was loaded in under 30 minutes. I wanted to provide some background material in the form of a Q&A on the record, since it’s hard to give many details in the context of a launch event. We are also planning a paper that talks about all this, so think of this article as a place-holder until the full paper comes along. I hope you find this background information useful.
- Len Wyatt
How fast was the data load?
More than one terabyte of data was parsed from flat files, transferred over the network and loaded into the destination database in less than 30 minutes, a world record beating all previously published results using an ETL tool. That is a rate in excess of 2 TB per hour (650+ MB/second). To be precise, 1.18TB of flat file data was loaded in 1794 seconds. This is equivalent to 1.00TB in 25 minutes 20 seconds or 2.36TB per hour.
Why is this important?
Businesses have ever-increasing volumes of data stored in many heterogeneous systems. Thay want to know that an ETL tool they choose will be able to support any data volumes they might require. Microsoft has been making a significant investment in SQL Server Integration Services (SSIS), and this record illustrates the capability of SQL Server Integration Services 2008, SQL Server 2008 and the Unisys ES7000 to handle a significant volume of data at a dramatic speed.
Why not just do a bulk load of the data?
It is rare in businesses today that data is always available on the destination system, and does not need to be standardized or corrected for errors before loading. These rare cases are the times that bulk loading data makes sense. Data integration can involve complex transformation rules, error checking and data standardization techniques. ETL tools like SSIS can perform these functions such as moving data between systems, reformatting data, integrity checking, key lookups, tracking lineage, and more. SSIS has proven itself to be a versatile ETL tool, and now it is shown to be the fastest one as well.
What data did you choose to load?
DBGEN tool from the TPC-H benchmark was used to generate 1.18 TB of source data. The data were partitioned by DBGEN, allowing it to be loaded in parallel from multiple systems. DBGEN generates data on customers, parts, suppliers, orders and line items. It is broadly representative of a wholesale business. The data contain a variety of data types, including dates, money amounts, integers, strings and flags.
Please note that the ETL loading results are not TPC-H benchmark results and should not be compared to TPC-H benchmark results.
Was this a certified benchmark?
There is no commonly accepted benchmark for ETL tools. Microsoft thinks there should be. Industry standard benchmarks can lead to healthy competition, better products, and better publication of the techniques used to get high performance. Microsoft would welcome the opportunity to join with others in the industry to define a common benchmark that reflects the real-world uses of ETL tools.
The use of TPC-H data for this project was a convenience. This is not a TPC-H benchmark result.
How does this compare to your competitors?
Multiple competitors have published results based on TPC-H data. Informatica has the fastest time previously reported, loading 1 TB in over 45 minutes. SSIS has now beaten that time by more than 15 minutes.
There are other claims of fast times that have been made, but on non-standard data sets and without enough information to allow any meaningful comparison. This is part of the reason Microsoft would support the creation of an industry standard ETL benchmark.
What system configuration was used?
The database server ran on a Unisys ES7000/one Enterprise Server , with 32 socket dual core Intel® XeonTM 3.4 Ghz (7140M) processors , 256 GB RAM and 8 dual port 4Gbit HBA’s . The SQL Server data was stored on an EMC Clariion CX3-80 SAN with 165 (146 GB/15 krpm) spindles. The database server ran a pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4, built just before the “February 2008 CTP”) on the Windows Server 2008 x64 Datacenter Edition operating system.

Four servers acted as data sources, modeling the fact that data comes from a variety of systems in a modern enterprise. Each source server ran SSIS packages that sent data across the network to the database server. The source servers ran SSIS from SQL Server build V10.0.1300.4, on the Windows Server 2008 operating system. Source data came from flat files, as it was generated by DBGEN.
For the source servers, 4 Unisys ES3220L servers with Windows2008 x64 Enterprise Edition were used. Each server is equipped with 2 x 2.0GHz quad core Intel® processors, 4GB RAM, a dual port 4Gbit Emulex HBA and Intel PRO1000/PT network card. The source data was read from 2 x EMC Clariion CX600 SAN’s with 45 spindles each.
The Source servers were connected to the ES7000/one server database server with private dual port 1Gb Ethernet connections.
Why use multiple source systems?
Modern large businesses are complex operations. Large data sets are often the result of multiple data feeds. This made the test more realistic by mimicking a real world ETL scenario.
What do the SSIS packages look like?
There was just one package, though the source systems ran multiple instances of it. It is quite simple: There is one control flow for each “stream” of data generated by DBGEN. The control flow has one data flow for each table, each data flow reading data from a flat file source and writing to the SQL Server database via OLEDB. Using this data set there is a one-to-one column mapping between the flat file data and the database tables.

Did Windows Server 2008 figure in to this?
A lot of innovative engineering work in Windows Server 2008, including significant improvements in memory management, PCI and block storage I/O, and core networking, helped achieve this great performance. Because of these advances, Windows Server 2008 sustained about 960 megabytes per second over the Ethernet network, during processing of one large table.
Were secret internal tricks were needed to make this work?
No secret internal tricks or special builds were needed. Although this project used a pre-release version, it was a regular SQL2008 Enterprise Edition build. No special code in the product was used. Everything we did could be replicated by others.
The main thing done in the relational database was to use “soft NUMA” and port mapping to get a good distribution of work within the system. This is a published technique; you can find articles about it on MSDN. We also set the –x flag on starting SQL Server. This reduces the time SQL Server spends collecting performance statistics at run-time.
In SSIS we made sure the data types used in the SSIS data flows matched the types used in SQL Server, so the data did not need to be converted again after the initial conversion of strings read from flat files. Fast Parse is set on the text file fields where it applied.
The network connections on the server used the built-in Intel PRO/1000 GbE controllers. Released versions of network drivers were used, and Ethernet jumbo frames were configured to better support this bulk streaming scenario. Window Server 2008’s new TCP/IP receive window autotuning was set to “restricted”. The IntPolicy tool was used to ensure the ES7000 server NICs’s interrupts & DPCs occurred on a CPU affinitized to the same NUMA node as the NIC.
A complete list of settings and optimizations will be included in the paper when it is released.
I came across a question in the relationalserver.performance newsgroup where a customer was wondering about the spools seen in a recursive query execution plan. The query is shown below:
USE Northwind;
Go
WITH EmpChart AS
(
SELECT EmployeeId, ReportsTo, 1 AS treelevel
FROM Employees
WHERE (Employees.ReportsTo = 2)
UNION ALL
SELECT e.EmployeeId, e.ReportsTo, treelevel +1
FROM Employees e
JOIN EmpChart ec
ON e.ReportsTo=ec.EmployeeID
)
SELECT * FROM EmpChart;
The plan for the above query shows an index spool and a table spool. They are one and the same. The plan is shown below:
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(1)))
| |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]), WHERE:([Northwind].[dbo].[Employees].[ReportsTo]=(2)))
|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e]), WHERE:([Northwind].[dbo].[Employees].[ReportsTo] as [e].[ReportsTo]=[Recr1006]))
The index spool is also a lazy spool here meaning rows get inserted into the spool during execution of the recursive part also. Additionally, the rows from the index spool is read using a stack-like mechanism otherwise the recursive part may visit the same rows again. Here is how to read the plan with the recursive query:
1. Start with the anchor / top-most part
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(1)))
| |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]),
WHERE:([Northwind].[dbo].[Employees].[ReportsTo]=(2)))
The anchor part of the recursive CTE first gets executed and the spool is created with index. Note the stack option also in the spool. This indicates that rows are read in a FIFO manner.
2. Next the recursive part of the query
|--Nested Loops(Inner Join, OUTER
REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Compute
Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute
Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e]),
WHERE:([Northwind].[dbo].[Employees].[ReportsTo] as
[e].[ReportsTo]=[Recr1006]))
This is the nested loop join between the spool (created in step #1 for the anchor query) and the recursive part of the query. Since this is eager spool, rows will be populated into the spool also until the recursion is completed or the maximum level is reached. The maximum level check is done using the assert operator above the nested loop join:
|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE
NULL END))
3. Now, the way to tell which spools are related or the same is to look at the properties of the spool operator in the query plan output. The index spool has a property called NodeId which will referenced by the table spool as PrimaryNodeId property in another part of the plan.
Lastly, SQL Server can also create a plan with an eager spool which can be seen below for the query. In case of eager spool, query execution can continue only after the eager spool has been fully created. This is different from the lazy spool.
select count(distinct ShipVia), count(distinct ShipCountry)
from Orders as o, Customers as c
where o.CustomerID = c.CustomerID;
1. To read, the plan we will start again from the top part which contains the eager spool population.
| |--Table Spool
| |--Hash Match(Inner Join,
HASH:([c].[CustomerID])=([o].[CustomerID]),
RESIDUAL:([Northwind].[dbo].[Customers].[CustomerID] as
[c].[CustomerID]=[Northwind].[dbo].[Orders].[CustomerID] as
[o].[CustomerID]))
| |--Index
Scan(OBJECT:([Northwind].[dbo].[Customers].[Region] AS [c]))
| |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o]))
Here you can see the hash join between customers and orders table that populates the eager spool table.
2. The ShipVia distinct count is computed as follows by reading from the spool.
|--Compute
Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
| |--Stream
Aggregate(DEFINE:([Expr1010]=COUNT([Northwind].[dbo].[Orders].[ShipVia] as
[o].[ShipVia])))
| |--Hash Match(Aggregate, HASH:([o].[ShipVia]),
RESIDUAL:([Northwind].[dbo].[Orders].[ShipVia] as [o].[ShipVia] =
[Northwind].[dbo].[Orders].[ShipVia] as [o].[ShipVia]))
| |--Table Spool
3. Similarly, the ShipCountry distinct count is computed using the same spool. You can see this by looking at the NodeId and PrimaryNodeId properties of the spool operators in the query plan or showplan xml.
|--Compute
Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream
Aggregate(DEFINE:([Expr1011]=COUNT([Northwind].[dbo].[Orders].[ShipCountry]
as [o].[ShipCountry])))
|--Hash Match(Aggregate, HASH:([o].[ShipCountry]),
RESIDUAL:([Northwind].[dbo].[Orders].[ShipCountry] as [o].[ShipCountry] =
[Northwind].[dbo].[Orders].[ShipCountry] as [o].[ShipCountry]))
|--Table Spool
4. Finally, since COUNT() aggregate always returns one row, we just do a nested loop join between the two parts of the tree above to return a row.
|--Nested Loops(Inner Join)
|--Compute
Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
....
|--Compute
Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
Hope this helps you read execution plans that contain the various spool operators.
--
Umachandar Jayachandran
When doing a performance investigation a useful thing to do is look at what data is present in the buffer pool. This can be used to analyze impact of running a query on the state of data pages in buffer pool. By collecting the pre and post picture of buffer pool, you can see the cost of running a query in terms of physical IOs that happened. You may argue that this can be done by looking at statistics IO output; however if you are running a series of queries and want to see a consolidated data and not data about individual queries, this query is a great help.
The contents of the buffer pool can also reveal which pages are accessed most frequently by your applications and often reflect the actual I/O that is happening. How can frequently access pages in memory also cause disk I/O? When lots of different objects are accessed overtime the proportion of data in the buffer pool reflects the frequency of access. This happens because data pages of infrequently accessed objects get kicked out of main memory over time.
If you are not familiar with the buffer pool, it contains several types of objects such as data pages and plans. For more information on the buffer pool see Buffer Management http://msdn2.microsoft.com/en-us/library/aa337525.aspx
The following query can be used to look at the contents of the buffer pool -
select
count(*)as cached_pages_count,
obj.name as objectname,
ind.name as indexname,
obj.index_id as indexid
from sys.dm_os_buffer_descriptors as bd
inner join
(
select object_id as objectid,
object_name(object_id) as name,
index_id,allocation_unit_id
from sys.allocation_units as au
inner join sys.partitions as p
on au.container_id = p.hobt_id
and (au.type = 1 or au.type = 3)
union all
select object_id as objectid,
object_name(object_id) as name,
index_id,allocation_unit_id
from sys.allocation_units as au
inner join sys.partitions as p
on au.container_id = p.partition_id
and au.type = 2
) as obj
on bd.allocation_unit_id = obj.allocation_unit_id
left outer join sys.indexes ind
on obj.objectid = ind.object_id
and obj.index_id = ind.index_id
where bd.database_id = db_id()
and bd.page_type in ('data_page', 'index_page')
group by obj.name, ind.name, obj.index_id
order by cached_pages_count desc
An example of what it returns –
1. Run following command to remove all clean data pages from the buffer pool – (DO NOT TRY THIS COMMAND ON PRODUCTION MACHINES)
DBCC DROPCLEANBUFFERS
Running buffer pool analysis query had following results –
cached_pages_count ObjectName IndexName IndexId
------------------ ------------------ ---------------------- -----------
15 sysobjvalues clst 1
3 sysallocunits clust 1
2 syshobtcolumns clust 1
2 sysrowsetcolumns clust 1
2 sysrowsets clust 1
2 sysschobjs clst 1
…
2. Run the following query on AdventureWorks database –
select * from Person.Address
where city like 'Bothell'
This is going to read from disk the data pages needed to execute the query. Run the buffer pool analysis query again to see the change.
cached_pages_count ObjectName IndexName IndexId
------------------ ------------------ ---------------------- -----------
278 Address PK_Address_AddressID 1
15 sysobjvalues clst 1
4 sysmultiobjrefs clst 1
3 sysallocunits clust 1
2 syshobtcolumns clust 1
2 sysrowsetcolumns clust 1
…
As you can see now there are data pages in buffer pool from the Address table. Additionally since only clustered index pages for Address table are present, no other indexes were used in the query.
Another tool which can help in this case is DBCC MEMORYSTATUS output. The advantage of the query in the entry is nice result set which can be stored in a temp table.
Authors:
Tony Voellm
Gaurav Bindlish
The data flow task in SSIS (SQL Server Integration Services) sends data in series of buffers. How much data does one buffer hold? This is bounded by DefaultBufferMaxRows and DefaultBufferMaxSize, two Data Flow properties. They have default values of 10,000 and 10,485,760 (10 MB), respectively. That means, one buffer will contain either 10,000 rows or 10 MB of data, whichever is less.
You can adjust these two properties based on your scenario. Setting them to a higher value can boost performance, but only as long as all buffers fit in memory. In other words, no swapping please!
- Runying Mao
Multi tasking is a double-edge sword. In SSIS, we don’t affinitize the threads that we create to any of the processors. So if the number of threads exceeds the number of available processors, you might end up hurting throughput due to an excessive amount of context switches. Be cautious!
- Runying Mao
SQL Server Integration Services provide a set of performance counters. Among them the following few are helpful when you tune or debug your package:
- Buffers in use
- Flat buffers in use
- Private buffers in use
- Buffers spooled
- Rows read
- Rows written
“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover leaks. During package execution time, you will see these counters fluctuating. But once the package finishes execution, their values should return to the same value as what they were before the execution. Otherwise, buffers are leaked. In occasions like that, please contact Microsoft PSS.
“Buffers spooled” has an initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping. In a case like that, please follow my previous blog (“Set BLOBTempStoragePath and BufferTempStoragePath to Fast Drives”) to set Data Flow Task properties BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O bandwidth.
“Rows read” and “Rows written” show how many rows the entire Data Flow has processed. They give you an overall idea about the execution progress.
- Runying Mao
BLOBTempStoragePath and BufferTempStoragePath are two properties on Data Flow Task. They define where on the hard disk(s) to page buffer data to. BLOBs (text, next, image) are written to BLOBTempStoragePath. All other buffer data will be swapped to BufferTempStoragePath. See the figure below for where these two properties are exposed in designer.

Now what could cause a buffer to swap? There are two possible causes. The first one is when a memory allocation fails. The second one is when Windows signals the low memory resource notification event. Both will trigger SSIS to reduce its working set. SSIS does so by moving buffer data to disk.
When BLOBTempStoragePath is not set, the paths as defined by the system variables TEMP and TMP will be used. The same rule applies to BufferTempStoragePath.
For better performance, we recommend both BLOBTempStoragePath and BufferTempStoragePath point to fast drives. We also suggest that the drives for BufferTempStoragePath and BlobTempStoragePath be on separate spindles in order to maximize I/O throughput.
- Runying Mao
Introduction
OLE DB source adapter is one of the most commonly used components in SSIS data flow task. In this article, we will discuss a very important performance observation about this adapter.
Use “SQL Command” to pull data from a view
OLE DB source adapter can be set up to work in either “SQL command” data access mode or “Table or view” data access mode. See the figure below for where to set Data Access Mode in OLE DB Source Editor.
In most cases, whether you use “SQL command” data access mode or “Table or view” data access mode does not make any difference performance wise. But when you are setting up OLE DB source adapter to read data from a view, the performance difference can be huge. In such an occasion, we suggest you set data access mode to “SQL command” and specify the command as, for example, “SELECT * FROM view_name”. In our in-house testing, we have seen “SQL command” data access mode runs about 17 times faster than “Table or view” data access mode in certain scenario.
If you are interested in knowing the cause for such a difference, read on.
In “Table or view” access mode, the OLE DB source adapter calls OpenRowset to get column metadata at Validate phase. The data returned from OpenRowset include more than just column metadata. Thus the adapter issues “SET ROWCOUNT 1” statement to retrieve column metadata information. “SET ROWCOUNT 1” causes an inefficient execution plan (i.e. Nested Loop) to be cached and later used in the subsequent Execute phase.
In “SQL command” access mode, the OLE DB source adapter calls “sp_prepare” to get column metadata at Validate phase, and “sp_execute” at Execute phase. The execution plan used at Execute phase is Hash Match which is more efficient than Nested Loop.
Conclusion
This article has described a performance tip when setting up OLE DB source adapter to read data from a view. We suggest user to set OLE DB source adapter in “SQL command” data access mode for better performance in such a case. We hope you will find this tip useful and helpful in your package design.
- Runying Mao
Introduction
Most users see good performance from SSIS packages using out-of-the-box configurations and with little tuning. Sometimes, though, it is necessary to do tuning to get optimal performance. One of the most commonly used transformations is the Lookup transformation. There are several techniques for getting optimal performance from a Lookup transform in SSIS. A few of these are surprising the first time you see them. This article discusses some tuning techniques that we have found can be important in different situations.
A common need in a data warehousing context would be to look up a surrogate key based on one or more business keys, before a new row is inserted into the warehouse. The screen shot below shows a highly simplified example: An arriving order record contains a customer name and phone number. Before an order can be inserted in the database, the customer name and phone number must be looked up to find the correct customer key. This simplified example will be used as the basis for the discussion in this article.
Most designers would use the obvious default behavior of the Lookup transform, selecting a table or view to look up in, as shown below.
Be selective about the lookup columns
The example scenario showed selecting data from a table. After naming the table, the designer also chooses what columns will be used in the lookup and what columns will be returned as a result of the lookup. This is done in the Columns tab of the Lookup Transformation Editor. The figure below shows that the columns C_NAME and C_PHONE will be referenced from the table, and the data from C_CUSTKEY will be returned.
Only three columns from the ORDERS table are used in this example. However, choosing a table will be interpreted as “SELECT *”. This will cause the transformation to fetch un-needed data in the other columns that do not participate in lookup operation. That is a waste of effort and memory. We recommend choosing “Use results of an SQL query” instead of naming a table, and in the query selecting only the columns that are used.
This in turn changes the column mappings. Now it is clear that only the desired columns have been selected. In our experience this can lead to a substantial gain in Lookup performance. In one case we saw recently, using this technique resulted in a doubling of the package speed.
Enable memory restriction
Reducing the data returned by the Lookup has the advantage that less work is required to return the data, and it also reduces memory usage by SSIS. This is the next area for discussion: memory usage. Being more selective about the rows returned is one technique for saving memory; another is to enable memory restriction in the Lookup transform.
Two things happen when memory restriction is enabled on a Lookup: First, the amount of memory that the Lookup is allowed to use is limited. A cache policy is used and new rows are added to the cache on demand. Second, the new rows are added to the cache individually. In other words, SSIS will query the relational database whenever data for a row cannot be located in the internal cache kept by SSIS. These are singleton (single-row) queries, unlike the large set-based table queries that occur when memory restriction is not enabled. The next figure shows where in the Lookup Transformation Editor the setting is made. In the example, 20 megabytes are allowed for the Lookup cache. This is user-settable and must be adjusted to the needs of each lookup scenario.
The benefit of enabling memory restriction is that large lookups can be performed which might not be possible otherwise. The cost is that singleton queries are used, which collectively are usually slower than a single table query. However, we have found that this performance cost is not always as bad as one might expect, if appropriate indexes are present on the lookup table. In one recent experiment, we ran a package with and without memory restriction in a lookup. The lookup table for this package had millions of rows, and about 10% of them were actually referenced by the incoming data. When there was an index on the lookup table, the package with memory restriction ran only 25% longer than the package without memory restriction. When there was not an index on the lookup table, the package with memory restriction failed to complete after 24 hours.
Consider a case where incoming records must be looked up against a large table, but the incoming records match only a few of the records in the lookup table. It might actually be faster to enable memory restriction, thus avoiding a load of the entire table, and letting the few records be fetched individually. A decision to use memory restriction for this reason must be considered in the specific context of each application.
Conclusion
Most of the time you will get good performance from Lookup transforms with default settings. Sometimes, lookups can become significant in the performance of an SSIS package. This article has given a few tips that we have found make the biggest difference in lookup performance. We hope you find them useful in your package designs.
- Be selective about the columns you request in a lookup
- Enable memory restriction if the lookup is using too much memory
- Be sure you have an appropriate index on the lookup table when using memory restriction
- Consider enabling memory restriction if the lookup refers to a small number of rows from the lookup table
- Runying Mao and Len Wyatt
Other than #temptables, @localvariables and many other objects being stored in tempdb what’s so temporary about it? TempDB is always present on each SQL Server instance and can be a determining factor of your overall performance. In many ways tempdb could be named scratchdb or pagefiledb. The following papers will help you understand what goes into tempdb and how to plan your deployments. These articles are for both developers and administrators.
Working with tempdb in SQL Server 2005
http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx
Capacity Planning for tempdb
http://msdn2.microsoft.com/en-us/library/ms345368.aspx
Enjoy..
Tony Voellm
Bob Ward from PSS has a wonderful blog article that explains the details about the re-release of SQL Server 2005 SP2 and fixes posted later. This is a must read for anyone deploying SQL Server 2005 SP2 to understand the various hotfixes, GDRs and procedures. Please visit his link for more details.
SQL Server 2005 SP2 Re-release and post fixes
-- Umachandar
Microbenchmarking is the art of testing your machine's basic configuration like disk latency, memory latency, CPU performance using series of tests. You can find more details at:
Types of benchmarks
How do you assess the performance of a new machine with SQL Server? Do you run industry standard benchmarks like TPCC and TPCH? If you cannot run those yourself then what are the tests that you run. What are some of the tests that can can assess metrics like disk IO performance, CPU speed or memory speed? You could use simple TSQL batches to access some of the metrics without running your workload. Below are some of the simple tests.
Disk Performance
Below are some of the steps you could take to determine the disk performance of your new hardware.
- Run DBCC DROPCLEANBUFFERS to ensure data is not resident in memory
- Do a SELECT COUNT(*) query using the clustered index of a table that is spread across the LUNs you want to test
- Run DBCC DROPCLEANBUFFERS again to ensure data is not resident in memory
- Next do a SELECT COUNT(*) query from the clustered index of a table that is spread across the LUNs you want to test, but now with a DOP 1
Memory Subsystem Performance
To determine the performance of your memory subsystem, you can run the query with MAXDOP 1 as referenced in step #4 above after all of the data you scan is resident in memory. This will give you the memory subsystem performance.
CPU Performance
To determine the CPU difference, you can select the same single row from a table in a loop or do some simple increment operation in a loop. This should also be done when all of the data you scan is resident in memory.
Network Performance
To determine the network latency, you can run the same CPU test from a remote client and send requests from the client in a tight loop.
During all of these tests, monitor the following using the PerfMon counters:
- Response Time
- CPU usage (on client and server)
- IO throughput
- Network throughput
These are simple tests to perform and metrics to keep track of could be a valuable tool in your toolkit. Each individual test above and also in comparison with each other constitutes a micro benchmark and could be a useful guide in optimizing queries and making tradeoffs.
One test missing form above is something that tests seek performance of the disk subsystem. There are several tests that can be added.
What other tests do you use and why? What metrics do you keep track of? Please share them with us and the rest of the SQL community.
-- Ashit
Check out the new Books Online search functionality online. The link below provides a scoped search of Books Online that returns a more precise and targeted result set. You can use it to search Books Online content quickly.
http://search.live.com/macros/sql_server_user_education/booksonline
Please direct your feedback to SQLServerUE@hotmail.com.
-- Umachandar