Welcome to MSDN Blogs Sign in | Join | Help

SQL Server Performance

Best Practices, Tips, Benchmarks, Troubleshooting and Monitoring - SQL Server, ADO.NET, Analysis Services, and SSIS
New TPC Results on SQL Server 2008 R2

On November 2, 2009, Unisys published two new TPC results (TPC-E and TPC-H) using the newly announced SQL Server 2008 R2* and Windows Server 2008 R2**.  These results were published on the 96-core Unisys ES7000 system.  These are the first SQL Server TPC results using more than 64 cores and they demonstrate the outstanding scalability of the Windows and SQL Server platform. 

The TPC-E publication was 2,012 tpsE with a price performance of $958.23 USD/tpsE.  The configuration will be available by May, 6, 2010.  The TPC-H publication was 102,778 QphH@3000GB with a price performance of $21.05 USD/QphH@3000GB.  This configuration will also be available by May 6, 2010. 

You can see all the details for these outstanding results on the TPC web site at the following links:

Unisys ES7000 7600R Enterprise Server – TPC-E

Unisys ES7000 7600R Enterprise Server – TPC-H

* – SQL Server 2008 R2 will be generally available by May 6, 2010.

** – Windows Server 2008 R2 is generally available now.

- Jamie

Great New TPC-H Results with SQL Server 2008
 

Our partners at Dell and HP have published 2 new TPC-H results with SQL Server 2008 using Fusion-io’s ioDrive solid-state technology.  These are the first TPC results using solid-state technology and they illustrate the high performance along with total cost-of-ownership savings you can achieve.  These results dramatically improved the price/performance for systems using industry standard components.  These results prove that solid-state technologies can be used to reduce costs without giving up high levels of performance.

HP also published a 300GB result on their ProLiant DL785 platform with SQL Server 2008.  This publication illustrates the high performance and solid price/performance using industry standard components from HP and Microsoft.   

Please follow the links for all the details on each benchmark result.

- Jamie

Using SSIS to get data out of Oracle: A big surprise!

Since late last year, Microsoft has made the Attunity connectors to Oracle databases available to Enterprise Edition customers. We all recognized right away that these connectors were faster than the pre-existing options from either Microsoft or Oracle, when moving data into or out of an Oracle database. It wasn’t immediately obvious what speeds we could expect from the connectors, so I did some experimenting to see. This posting summarizes some findings from those experiments done earlier this year, but rather than report all the details I will then zero in on the key lessons and one big surprise that came out of the work.

Before getting in to my findings, let me give a little anecdote heard from a colleague: There is at least one SSIS customer that now uses SSIS and the Attunity connectors to move data from an Oracle database to an Oracle database, because SSIS with Attunity outperforms any of the Oracle options! While I can’t corroborate that, the information to follow is from my own measurements.

To do this work, I used two machines each with 24 cores (64-bit, 2.4 GHz), one for SSIS and one for Oracle. The machines were practically an embarrassment of riches for this simple benchmark. The SSIS machine had flat files to read (when loading data into Oracle) or write (when extracting data from Oracle). The SSIS packages were super simple, just a flat file source and an Oracle destination, or an Oracle source and a flat file destination. The data was 75 million rows, 133 bytes per row, of the LINEITEM table from the TPC-H benchmark, as generated by the DBGEN utility.

Some basic findings (remember, you mileage will vary):

  • Putting data into an Oracle database using the Attunity connectors clocked 30,000 rows per second, on the order of 20 times faster than using the OLE DB connectors to Oracle.
  • Extracting data from Oracle moved 36,000 rows per second, about 20% faster than using the OLE DB connectors.
  • The above measurements were taken using “mixed” data types: Numbers were put in NUMBER fields, dates were put in DATE fields, etc. A funny thing happened though when all the data was put in string fields (VARCHAR2 use used for everything). Now we could hit 42,000 rows per second loading data into Oracle, and 76,000 rows per second extracting from Oracle!
  • The Fast Load option is supposed get higher performance through the use of the DirectPath API. In my experiments, I didn’t see a consistent advantage of Fast Load over non-Fast Load. The thing that Fast Load did seem to do was shift more of the CPU time from the Oracle process to the SSIS process. This could mean that if you have multiple concurrent SSIS packages sending data to Oracle, using Fast Load might let Oracle receive the data faster. Given my experience with Fast Load, I can only recommend that you check its performance in your own situation. Note: I’ve been told that Fast Load will be fixed in a maintenance release later this calendar year. So while I’m not promising anything, it’s likely that this will change.
  • The default batch size for the Oracle destination connector is 100 rows. Setting the batch size to 10,000 rows gave a boost of 10% to 50%, depending on other elements of the configuration. (When using Fast Load, you specify the buffer size instead of the row count. So estimate the buffer size needed to hold the number of rows you want, and use that number.)
  • When using the Oracle source, setting batch size to 10,000 rows gave a boost of around 10%, depending on other elements of the configuration.
  • I wanted to know how important it was for SSIS to be on a separate machine from the Oracle database. There was a good network connecting the source and destination servers, and also plenty of CPUs and memory on the servers. What I saw was a negligible difference between the case where SSIS and Oracle were on the same server and the case where SSIS and Oracle were on separate systems. My recommendation: Look at what resource is the most loaded in your environment, and configure to lighten the load on that resource.

The idea that performance with string data would be so different from performance with natural data types was a big surprise. The difference was especially pronounced when extracting data from Oracle. Now let’s face it, we would prefer to see data extracted from Oracle and placed in SQL Server databases! Given the big speed disparity and the fact that most real-world data needs to be in natural data types, I wondered if the same thing would happen if data was cast to string types in the query that SSIS issues against Oracle. So instead of having SSIS simply read the table, I gave it this query to run:

select
    TO_CHAR(L_SHIPDATE),
    TO_CHAR(L_ORDERKEY),
    TO_CHAR(L_DISCOUNT),
    TO_CHAR(L_EXTENDEDPRICE),
    TO_CHAR(L_SUPPKEY),
    TO_CHAR(L_QUANTITY),
    L_RETURNFLAG,
    TO_CHAR(L_PARTKEY),
    L_LINESTATUS,
    TO_CHAR(L_TAX),
    TO_CHAR(L_COMMITDATE),
    TO_CHAR(L_RECEIPTDATE),
    L_SHIPMODE,
    TO_CHAR(L_LINENUMBER),
    L_SHIPINSTRUCT,
    L_COMMENT
from ATTUSER.LINEITEM

Then before inserting the data into SQL Server using the SQL Server destination, I put in a data conversion task to get all the data into the correct types.

New Bitmap Image

At this point you must be thinking, “Surely converting the data twice can’t be the fastest way!” Well, here are the results: The first run below read the mixed data types using the Attunity Oracle source with default settings, converted to SQL Server types, then wrote to the SQL Server destination. The second run was like the first, with the addition of setting the batch size larger. The third run was like the first, but on reading from Oracle all the columns were converted to text as discussed above. The last test was like the third, with the addition of setting the batch size larger. Using the string conversion and larger batches, the run was over two times faster than the obvious out-of-the-box configuration.

New Bitmap Image - Copy

Overall, the Attunity connectors for Oracle really were fast, as expected. In doing this work a few lessons turned up that hopefully help you get optimal performance.

- Len Wyatt

Great new SQL Server performance on Intel's Xeon 5500 series, aka Nehalem-EP

Yesterday, Intel launched its new Xeon 5500 series processors, code named Nehalem-EP. This is essentially the first server-class version of what Intel launched on the desktop as Core i7 last fall. With no frontside bus, and an embedded, multi-channel memory controller in the processor package, it is smoking fast.

Pat Gelsinger did a side-by-side performance demo which launched an SSRS report, running reporting queries against a 1.5 TB SSAS OLAP cube, built using a Microsoft adCenter data set. The demo showed how Nehalem-EP is 2X faster than a Xeon 5400 on the same workload, with the same DRAM and I/O configuration. Not too shabby, but we've seen even faster results (~3-4X faster) on workloads which are more memory bandwidth-intensive, like data warehousing or in-memory OLAP workloads.

Also yesterday, our good friends at Fujitsu, Dell, and IBM, used SQL Server 2008 to show off the performance of Intel's new processor, in their latest 2P server platforms, by delivering great new TPC-E results which have break-through per-processor performance and price/performance. Please follow the links for the full scoop on each benchmark result. Congratulations to each!

-David

An ETL World Record Revealed (Finally)

We suppose a more appropriate title would have been: Better Late Than Never. 

David should begin by apologizing. Last month was the first anniversary of the ETL world record we set last year with SSIS, loading 1 TB of data in under 30 minutes. While Len did a nice blog post on the project at the time, we had promised to return to our favorite ETL practitioners (that's you) with more details, pulling back the curtain on Oz so to speak.

It only took a mere year to get around to that, and a lot of water has gone under the bridge since then. Happily, the paper's now done and published to the web for your reading pleasure. Whew!

-David Powell, Len Wyatt & Tim Shea

Windows Server 2008 + SQL Server 2008 = Geo-replication Benefits

After running an experiment, using datacenters on each coast of the U.S., our friends in MSCOM ops verified that the combination of Windows Server 2008 and SQL Server 2008 are great for enabling database geo-replication, thanks to a 100X gain in performance. The experiment provided sufficient evidence to motivate a more ambitious proof-of-concept case study with MSDN.

A new whitepaper, published this week, explains how SQL Server 2008 was able to take advantage of the all-new core networking stack in Windows Server 2008 to deliver revolutionary gains in replication performance across wide-area networks. The paper includes information on an MSDN case study, in which MSCOM ops significantly lowered MSDN page load times, thanks to replication of MSDN content from the U.S. to Europe.

-David

SQL Server is Movin' On Up!

At last week's WinHEC in Los Angeles, SQL GM Quentin Clark joined Windows Server VP Bill Laing on stage, to announce our upcoming support for more than 64 logical processors, the current limit in Windows and SQL Server. This expanded scale-up capability is planned to be released in SQL Server code named "Kilimanjaro", when running on top of Windows Server 2008 R2.

We expect to support up to 256 logical processors in this release, though that's a soft limit. The hard limits are much higher than this, but we won't support what we can't test. You'll likely see this 256LP soft ceiling get higher in each release, now that we've finally done the heavy lifting of raising the roof on Windows and SQL Server.

The keynote demo (@33:26), which Bill and Quentin did together, went well. Many thanks to our friends at HP and IBM for all their support and for the use of these great servers.

In case you're wondering, a logical processor is a sub-unit of a physical processor/socket/package. Typically today an LP means a core, but it can also mean a hardware thread (nee hyper-threading).

-David Powell

TPC-E – Raising the Bar in OLTP Performance

Glenn Paulley, Director of Engineering at Sybase iAnywhere, posted a commentary titled “The State of TPC-E” on his blog three weeks ago (10/3/08).  A better title would have been “All TPC-E Results Are On Microsoft SQL Server.  Why?”  Mr. Paulley takes issue with Brian Moran’s statement that “the most rational answer is that Oracle and IBM have tried to top Microsoft’s numbers and simply can’t”.  He says that while it may be true, he doubts it and says there are other plausible reasons why DB2 and Oracle have yet to publish any TPC-E results.  Curiously, he doesn’t say why Sybase hasn’t published TPC-E results.  Since he is, presumably, in a position to know, one can only conclude that he would rather not say.  Readers can reach their own conclusions about what that might mean.

 

To his credit, he cites this IBM whitepaper for explaining that TPC-E was designed to be more realistic than TPC-C.  There are numerous ways, detailed in the whitepaper, in which TPC-E is far superior to TPC-C.  Let’s compare TPC-E to TPC-C.  As the table below shows, in TPC-E the schema is substantially richer and more complex, there are twice as many transactions, and only TPC-E requires essential capabilities such as referential integrity and RAID protected storage.

 

TPC-C

TPC-E

Schema

 

 

Number of database tables

9

33

Foreign keys

9

50

Tables with foreign keys

7

27

Check constraints

0

22

Partitioning Characteristic

unrealistic; single dimension common

to 8 of 9 tables

realistic;

two independent dimensions

Transactions

 

 

Number of transactions

5

10

Database roundtrips per transaction

1

min 1; max 5

Capabilities

 

 

Referential Integrity Required

No

Yes

Storage Protection (e.g. RAID) for Database Required

Log Only

Everything

Timed Database Recovery test

No

Yes


 

Mr. Paulley chooses to focus on the query complexity of TPC-E.  While that’s somewhat interesting, a comparison to TPC-C would have provided important context.  For example, TPC-E has 156 DML statements.    Although TPC-C doesn’t include pseudo-SQL the way that TPC-E does, if it did and followed the TPC-E style, it would be fewer than 30 DML statements.  By this measure, TPC-E has more than five times as many distinct DML operations as TPC-C.

 

But more importantly, TPC-E is not and was never intended to be a query optimizer test.  The pseudo-SQL code in TPC-E is an example, not a requirement.  Unlike TPC-H which strictly limits changing the specified SQL, in TPC-E test sponsors are free to rewrite the SQL anyway they like as long as it is functionally equivalent.  One vendor might rewrite it to remove all joins while another might rewrite it to include more joins or more complex joins.  The same is true of group by and order by clauses.  In our view, Mr. Paulley’s objection that TPC-E isn’t a good optimizer test is misplaced.

 

After discussing query complexity, Mr. Paulley offers four reasons why Microsoft is the only database vendor publishing TPC-E results.

 

·         “TPC-E is a moving target” – While it’s true that the TPC-E spec is up to version 1.6.0, the assertion that the workload has changed significantly is unsupported by the facts.  None of the transactions has changed in any way that impacts performance.  All spec revisions have been classified as “minor” changes by the TPC and results across all spec revisions are comparable.  The number of revisions to the spec since it was first released actually reflects a deep commitment by the members of the TPC-E committee to clean up rough edges and address areas of ambiguity before they become issues in published results.  A better gauge of the high quality of the TPC-E spec is that to-date 18 results have been published by six vendors spanning 15 months, but there have been no compliance challenges. 

 

·         Both DBMS vendors and hardware suppliers have a substantial investment in TPC-C expertise.  On this point we agree with Mr. Paulley.  But we draw different conclusions.  All of the major DBMS companies have spent years picking through every detail of TPC-C.  It has been optimized to such a degree that it long ago stopped driving customer-relevant engineering improvements.  TPC-C is 16 years old and has changed little since 1992.  Saying that we should continue using TPC-C because we know it so well is like saying that we should drive horse and buggies because we have a lot of expertise in blacksmithing.  This is a mindset trapped in the past and doesn’t serve our customers.

 

·         TPC-E isn’t that cheap.  In fact, TPC-E is substantially less expensive to configure and run than TPC-C.  Two results from IBM within the last month prove the point.  As you can see in the table below, running on the same server, the TPC-C configuration was more than five times more expensive than the TPC-E configuration.  Further, on the four proc server, the TPC-C result had 1361 disks with no data protection, while the TPC-E result had 400 disks with RAID-5.  Which is the more customer-relevant configuration?

 

TPC-C

TPC-E

Hardware

IBM System x3850 M2

IBM System x3850 M2

Procs / Cores / Threads

4 / 24 / 24

4 / 24 / 24

Performance

684,508 tpmC

729 tpsE

Price/perf

2.58 $/tpmC

457 $/tpsE

Total System Cost

$ 1,763,438

$ 333,646

Publication Date

9/15/08

9/15/08

Availability Date

10/31/08

10/10/08

Memory

256 GB

128 GB

Storage

1,344 x 73.4GB disks

16 x 500GB disks
1 x 73GB

400 x 73.4GB disks

Data Storage Protection

None

RAID-5

TPC Result Details

Link

Link

·         Customers continue to desire and reference TPC-C results.”  Granted, TPC-C has stood the test of time.  But today it is outdated, over-optimized, and of questionable relevance.  Customers hold onto TPC-C because it is familiar and available, not because it is better.  Database vendors need to exercise leadership.  As Mr. Paulley says “Microsoft is an early adopter of TPC-E”.  At this point, though, the early adopter window has passed.  TPC-E was ratified 20 months ago.  The first result was published 15 months ago.  There are 18 published results.   We believe that customers will readily embrace TPC-E as a superior benchmark as more results become available.

 

The more time that goes by, the more one is inclined to believe that Brian Moran is right – other database vendors aren’t publishing because they can’t beat the existing SQL Server results.  We invite Sybase and Mr. Paulley to prove us wrong.  We are confident that once Sybase runs TPC-E instead of just writing about it, Mr. Paulley will gain a new appreciation for just how challenging and technically rigorous TPC-E is compared with TPC-C. 

 

Charles Levine

SQL Server Performance Engineering

SQL Server 2008 is on its way with great performance

Just left the final shiproom meeting for SQL Server 2008, and am happy to say every team has signed off, so the product is now in the hands of manufacturing, and in process toward web and media availability for you. We've shipped! MSDN and TechNet subscriber downloads are now live, with more to come.

It's been less than three years since SQL Server 2005, but we're pleased to give you a great new release of SQL Server which not only adds fantastic new capabilities to your data platform, but also delivers broadly better performance. We used new industry standard benchmark workloads, and customer workloads, to drive us toward delivering better real-world performance...and we didn't take the easy road.

The best example of that is our use of TPC-E, a far more modern, realistic, and challenging benchmark workload than its predecessor. We used TPC-E to improve the scalability of our relational engine, in ways that should be more relevant to your own OLTP database workloads. We're proud of partners, like IBM, NEC, and Unisys, who were able to demonstrate great scalability, up to 64 cores, using SQL Server 2008 running TPC-E, and like Fujitsu-Siemens and Dell, who have shown industry leading price/performance. Results like these should also be more useful for system sizing.

Almost five years of effort by TPC members were invested in the development of TPC-E, and it shows. The workload uses synthetic data which is far more realistic, by modeling real-world data. And compared to its antique predecessor, TPC-E's schema has ~3X more tables and primary keys, 2X as many columns, and 4X more foreign keys. And here's a radical thought for a modern OLTP benchmark workload: include check constraints, referential integrity, and reliable storage. Don't customers actually put DBMS servers into production expecting that?

Because we believe so strongly TPC-E drives us to better meet your needs, I am announcing today this is the first release of SQL Server which will not include published TPC-C benchmark results. Like other great thoroughbreds, TPC-C had a great run, and we were proud to ride it while it was still relevant to customers. But today, we're turning that great old race horse out to pasture for a well-deserved rest.

In SQL Server 2008, we have also invested lots of effort to improve our data warehousing performance, and the performance of our BI services. SSIS, SSRS, and SSAS, each show many double-digit gains in performance, which we hope you will enjoy. The new world record we set with SSIS, loading 1 TB of data in under 30 minutes, gives you a sense of this commitment to BI performance. We are also proud of our first-ever TPC-H 10 TB result.

There are a couple other ways in which this release improves on our past work. First, we've focused more energy on improving 64-bit SQL Server's performance on x64 AMD Opteron and Intel Xeon architectures. Given the price and density of RAM, and great new x64 CPUs available, this is a perfect time to take a closer look at the performance of your database servers, and consider doing fresh deployments, or migrations, on x64 SQL Server, which can use the additional RAM for everything, not just the buffer cache. Second, we invested more heavily in performance regression testing, both in automation and in the breadth of our test coverage. These investments have been reflected in broadly positive feedback from the community as well as internal and external beta sites. While we'll always have more to do, we feel this release marks an important step forward.

Welcome to the beginning of a new era for SQL Server. We hope you enjoy working with SQL Server 2008 and look forward to your feedback.

-David Powell
SQL Performance Engineering

SQL Server 2008 launched today, with great performance and scalability

Today, as the old saying goes, is a red-letter day, with the launch of Windows Server 2008, Visual Studio 2008, and SQL Server 2008.

Our team has been heads down, working to ensure SQL Server 2008 is delivered to you with great performance and scalability. You’ll see signs of this in the new SQL Server February 2008 Community Technology Preview, which includes great new performance features in the engine, SSRS, SSAS, and SSIS, as well as just thumping good performance.

But don’t take my word for it: ask NEC, IBM, SAP, HP, and Unisys. Today, our partners are delivering proof this is the best release yet of SQL Server!

Here’s a quick round-up of the industry standard benchmark results our partners published today, Feb 27, 2008, using Windows Server 2008 and SQL Server 2008. Details of the Transaction Processing Performance Council (TPC) results can be found on www.tpc.org. More information on the SAP SD result is available on SAP’s web site.

  • #1 TPC-E result of 1,126 tpsE at a cost of $2,771/tpsE, using a 64-core Intel Itanium-powered Express5800/1320Xf system. This result demonstrates the power of NEC’s architecture and the scalability of SQL Server 2008 for enterprise OLTP workloads since it is our first 64C TPC-E result
  • TPC-E result of 479 tpsE at a cost of $1,591/tpsE, using a quad-socket, 16-core IBM x-Series x3850 M2 system. This is a 14% gain over IBM’s previous x3850 M2 TPC-E result with SQL Server 2005
  • HP’s newest SD three-tier result of 34,000 users, which is #1 on quad-processor industry-standard servers, and is 88% faster than a previous quad-processor result on SQL Server 2005. This result shows the power of HP’s BL680C blade servers and Intel Xeon 7300 series processors. Did you ever think you’d see the day a blade could be expected to handle the workload volume of 97% of SAP deployments worldwide?
  • And last, but not least: HP published today the first-ever TPC-H result on SQL Server at the 10 TB scale factor: 63,650 QphH at $38.54/QphH, using a powerful 64-core Integrity Superdome server with HP SAS storage. Unless you’re Walmart, the odds are high your DW is smaller than this!

Industry standard benchmarks are great, but unfortunately they don’t yet cover all usage scenarios customers care about. ETL is a key part of any production DW workflow, and we’ve been paying special attention to the performance of SQL Server Integration Services, our ETL tool included with SQL Server. With improvements to the core SSIS processing engine in SQL Server 2008, and improvements in 64-bit connectivity, we decided to take SSIS out for a spin, to show what it could really do. Along the way, we and Unisys set a new world record for loading over 1 TB of data in under 30 minutes, beating a previous result posted by Informatica. Check out Len Wyatt’s more detailed blog post on this. We’d sure love to see the industry come together to create a standardized ETL benchmark workload.

Lastly, some leading ISVs put SQL Server 2008 through its paces, and were very pleased with the result:

  • Camstar showed world-record scale of 205 MES transactions/second and 60% space reduction when using SQL Server 2008’s database compression on Windows Server 2008
  • Microsoft Dynamics CRM 4.0 demonstrated record scale at 24,000 concurrent users, with sub-second response rate, using SQL Server 2008 on Windows Server 2008
  • Siemens Teamcenter 2007, SQL Server 2008, and Windows Server 2008 ran with 5,000 concurrent users, and 50% space reduction from database compression
  • Microsoft Dynamics AX showed up to 70% improvement in throughput scalability and response time, maximizing performance while reducing database growth using SQL Server 2008 and database compression

Look for even more performance gains in the final SQL Server CTP, before we ship this summer!

ETL World Record!

Today at the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS.  Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool.  Over 1 TB of TPC-H data was loaded in under 30 minutes.  I wanted to provide some background material in the form of a Q&A on the record, since it’s hard to give many details in the context of a launch event.  We are also planning a paper that talks about all this, so think of this article as a place-holder until the full paper comes along.  I hope you find this background information useful.

-          Len Wyatt

How fast was the data load?

More than one terabyte of data was parsed from flat files, transferred over the network and loaded into the destination database in less than 30 minutes, a world record beating all previously published results using an ETL tool.  That is a rate in excess of 2 TB per hour (650+ MB/second).   To be precise, 1.18TB of flat file data was loaded in 1794 seconds.  This is equivalent to 1.00TB in 25 minutes 20 seconds or 2.36TB per hour.

Why is this important?

Businesses have ever-increasing volumes of data stored in many heterogeneous systems.  Thay want to know that an ETL tool they choose will be able to support any data volumes they might require.  Microsoft has been making a significant investment in SQL Server Integration Services (SSIS), and this record illustrates the capability of SQL Server Integration Services 2008, SQL Server 2008 and the Unisys ES7000 to handle a significant volume of data at a dramatic speed.

Why not just do a bulk load of the data?

It is rare in businesses today that data is always available on the destination system, and does not need to be standardized or corrected for errors before loading. These rare cases are the times that bulk loading data makes sense. Data integration can involve complex transformation rules, error checking and data standardization techniques. ETL tools like SSIS can perform these functions such as moving data between systems, reformatting data, integrity checking, key lookups, tracking lineage, and more. SSIS has proven itself to be a versatile ETL tool, and now it is shown to be the fastest one as well.

What data did you choose to load?

DBGEN tool from the TPC-H benchmark was used to generate 1.18 TB of source data.  The data were partitioned by DBGEN, allowing it to be loaded in parallel from multiple systems.   DBGEN generates data on customers, parts, suppliers, orders and line items.  It is broadly representative of a wholesale business.  The data contain a variety of data types, including dates, money amounts, integers, strings and flags.

Please note that the ETL loading results are not TPC-H benchmark results and should not be compared to TPC-H benchmark results. 

Was this a certified benchmark?

There is no commonly accepted benchmark for ETL tools.  Microsoft thinks there should be.  Industry standard benchmarks can lead to healthy competition, better products, and better publication of the techniques used to get high performance.  Microsoft would welcome the opportunity to join with others in the industry to define a common benchmark that reflects the real-world uses of ETL tools.

The use of TPC-H data for this project was a convenience.  This is not a TPC-H benchmark result.

How does this compare to your competitors?

Multiple competitors have published results based on TPC-H data.  Informatica has the fastest time previously reported, loading 1 TB in over 45 minutes.  SSIS has now beaten that time by more than 15 minutes.

There are other claims of fast times that have been made, but on non-standard data sets and without enough information to allow any meaningful comparison.  This is part of the reason Microsoft would support the creation of an industry standard ETL benchmark.

What system configuration was used?

The database server ran on a Unisys ES7000/one Enterprise Server , with 32 socket dual core Intel® XeonTM  3.4 Ghz (7140M) processors , 256 GB RAM and 8 dual port 4Gbit HBA’s .  The SQL Server data was stored on an EMC Clariion CX3-80 SAN with 165 (146 GB/15 krpm) spindles. The database server ran a pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4, built just before the “February 2008 CTP”) on the Windows Server 2008 x64 Datacenter Edition operating system.

 

Four servers acted as data sources, modeling the fact that data comes from a variety of systems in a modern enterprise.  Each source server ran SSIS packages that sent data across the network to the database server. The source servers ran SSIS from SQL Server build V10.0.1300.4, on the Windows Server 2008 operating system.  Source data came from flat files, as it was generated by DBGEN.

For the source servers, 4 Unisys ES3220L servers with Windows2008 x64 Enterprise Edition were used. Each server is equipped with 2 x 2.0GHz quad core Intel® processors, 4GB RAM, a dual port 4Gbit Emulex HBA and Intel PRO1000/PT network card. The source data was read from 2 x EMC Clariion CX600 SAN’s with 45 spindles each.

The Source servers were connected to the ES7000/one server database server with private dual port 1Gb Ethernet connections.

Why use multiple source systems?

Modern large businesses are complex operations.  Large data sets are often the result of multiple data feeds.  This made the test more realistic by mimicking a real world ETL scenario.

What do the SSIS packages look like?

There was just one package, though the source systems ran multiple instances of it.  It is quite simple:  There is one control flow for each “stream” of data generated by DBGEN.  The control flow has one data flow for each table, each data flow reading data from a flat file source and writing to the SQL Server database via OLEDB.  Using this data set there is a one-to-one column mapping between the flat file data and the database tables.

Did Windows Server 2008 figure in to this?

A lot of innovative engineering work in Windows Server 2008, including significant improvements in memory management, PCI and block storage I/O, and core networking, helped achieve this great performance. Because of these advances, Windows Server 2008 sustained about 960 megabytes per second over the Ethernet network, during processing of one large table.

Were secret internal tricks were needed to make this work?

No secret internal tricks or special builds were needed.  Although this project used a pre-release version, it was a regular SQL2008 Enterprise Edition build.  No special code in the product was used.  Everything we did could be replicated by others.

The main thing done in the relational database was to use “soft NUMA” and port mapping to get a good distribution of work within the system.  This is a published technique; you can find articles about it on MSDN.  We also set the –x flag on starting SQL Server.  This reduces the time SQL Server spends collecting performance statistics at run-time.

In SSIS we made sure the data types used in the SSIS data flows matched the types used in SQL Server, so the data did not need to be converted again after the initial conversion of strings read from flat files.  Fast Parse is set on the text file fields where it applied.

The network connections on the server used the built-in Intel PRO/1000 GbE controllers. Released versions of network drivers were used, and Ethernet jumbo frames were configured to better support this bulk streaming scenario. Window Server 2008’s new TCP/IP receive window autotuning was set to “restricted”.  The IntPolicy tool was used to ensure the ES7000 server NICs’s interrupts & DPCs occurred on a CPU affinitized to the same NUMA node as the NIC.

A complete list of settings and optimizations will be included in the paper when it is released.

 

Spool operators in query plan...

I came across a question in the relationalserver.performance newsgroup where a customer was wondering about the spools seen in a recursive query execution plan. The query is shown below:

USE Northwind;
Go

WITH EmpChart AS
(
SELECT EmployeeId, ReportsTo, 1 AS treelevel
FROM Employees
WHERE (Employees.ReportsTo = 2)
UNION ALL
SELECT e.EmployeeId, e.ReportsTo, treelevel +1
FROM Employees e
JOIN EmpChart ec
ON e.ReportsTo=ec.EmployeeID
)
SELECT * FROM EmpChart;

The plan for the above query shows an index spool and a table spool. They are one and the same. The plan is shown below:

|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(1)))
| |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]), WHERE:([Northwind].[dbo].[Employees].[ReportsTo]=(2)))
|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e]), WHERE:([Northwind].[dbo].[Employees].[ReportsTo] as [e].[ReportsTo]=[Recr1006]))

The index spool is also a lazy spool here meaning rows get inserted into the spool during execution of the recursive part also. Additionally, the rows from the index spool is read using a stack-like mechanism otherwise the recursive part may visit the same rows again. Here is how to read the plan with the recursive query:

1. Start with the anchor / top-most part

|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1013]=(0)))
| |--Compute Scalar(DEFINE:([Expr1003]=(1)))
| |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]),
WHERE:([Northwind].[dbo].[Employees].[ReportsTo]=(2)))

The anchor part of the recursive CTE first gets executed and the spool is created with index. Note the stack option also in the spool. This indicates that rows are read in a FIFO manner.

2. Next the recursive part of the query

|--Nested Loops(Inner Join, OUTER
REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
|--Compute
Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
| |--Table Spool(WITH STACK)
|--Compute
Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
|--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e]),
WHERE:([Northwind].[dbo].[Employees].[ReportsTo] as
[e].[ReportsTo]=[Recr1006]))

This is the nested loop join between the spool (created in step #1 for the anchor query) and the recursive part of the query. Since this is eager spool, rows will be populated into the spool also until the recursion is completed or the maximum level is reached. The maximum level check is done using the assert operator above the nested loop join:


|--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE
NULL END))

3. Now, the way to tell which spools are related or the same is to look at the properties of the spool operator in the query plan output. The index spool has a property called NodeId which will referenced by the table spool as PrimaryNodeId property in another part of the plan.

Lastly, SQL Server can also create a plan with an eager spool which can be seen below for the query. In case of eager spool, query execution can continue only after the eager spool has been fully created. This is different from the lazy spool.

select count(distinct ShipVia), count(distinct ShipCountry)
from Orders as o, Customers as c
where o.CustomerID = c.CustomerID;

1. To read, the plan we will start again from the top part which contains the eager spool population.

| |--Table Spool
| |--Hash Match(Inner Join,
HASH:([c].[CustomerID])=([o].[CustomerID]),
RESIDUAL:([Northwind].[dbo].[Customers].[CustomerID] as
[c].[CustomerID]=[Northwind].[dbo].[Orders].[CustomerID] as
[o].[CustomerID]))
| |--Index
Scan(OBJECT:([Northwind].[dbo].[Customers].[Region] AS [c]))
| |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o]))

Here you can see the hash join between customers and orders table that populates the eager spool table.

2. The ShipVia distinct count is computed as follows by reading from the spool.

|--Compute
Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
| |--Stream
Aggregate(DEFINE:([Expr1010]=COUNT([Northwind].[dbo].[Orders].[ShipVia] as
[o].[ShipVia])))
| |--Hash Match(Aggregate, HASH:([o].[ShipVia]),
RESIDUAL:([Northwind].[dbo].[Orders].[ShipVia] as [o].[ShipVia] =
[Northwind].[dbo].[Orders].[ShipVia] as [o].[ShipVia]))
| |--Table Spool

3. Similarly, the ShipCountry distinct count is computed using the same spool. You can see this by looking at the NodeId and PrimaryNodeId properties of the spool operators in the query plan or showplan xml.

|--Compute
Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream
Aggregate(DEFINE:([Expr1011]=COUNT([Northwind].[dbo].[Orders].[ShipCountry]
as [o].[ShipCountry])))
|--Hash Match(Aggregate, HASH:([o].[ShipCountry]),
RESIDUAL:([Northwind].[dbo].[Orders].[ShipCountry] as [o].[ShipCountry] =
[Northwind].[dbo].[Orders].[ShipCountry] as [o].[ShipCountry]))
|--Table Spool

4. Finally, since COUNT() aggregate always returns one row, we just do a nested loop join between the two parts of the tree above to return a row.

|--Nested Loops(Inner Join)
|--Compute
Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
....
|--Compute
Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))

Hope this helps you read execution plans that contain the various spool operators.

--

Umachandar Jayachandran

What's swimming in your bufferpool?

When doing a performance investigation a useful thing to do is look at what data is present in the buffer pool.  This can be used to analyze impact of running a query on the state of data pages in buffer pool. By collecting the pre and post picture of buffer pool, you can see the cost of running a query in terms of physical IOs that happened. You may argue that this can be done by looking at statistics IO output; however if you are running a series of queries and want to see a consolidated data and not data about individual queries, this query is a great help.

 

The contents of the buffer pool can also reveal which pages are accessed most frequently by your applications and often reflect the actual I/O that is happening.  How can frequently access pages in memory also cause disk I/O?   When lots of different objects are accessed overtime the proportion of data in the buffer pool reflects the frequency of access. This happens because data pages of infrequently accessed objects get kicked out of main memory over time. 

 

If you are not familiar with the buffer pool, it contains several types of objects such as data pages and plans.  For more information on the buffer pool see Buffer Management  http://msdn2.microsoft.com/en-us/library/aa337525.aspx

 

The following query can be used to look at the contents of the buffer pool -

 

select

       count(*)as cached_pages_count,

       obj.name as objectname,

       ind.name as indexname,

       obj.index_id as indexid

from sys.dm_os_buffer_descriptors as bd

    inner join

    (

        select       object_id as objectid,

                           object_name(object_id) as name,

                           index_id,allocation_unit_id

        from sys.allocation_units as au

            inner join sys.partitions as p

                on au.container_id = p.hobt_id

                    and (au.type = 1 or au.type = 3)

        union all

        select       object_id as objectid,

                           object_name(object_id) as name,

                           index_id,allocation_unit_id

        from sys.allocation_units as au

            inner join sys.partitions as p

                on au.container_id = p.partition_id

                    and au.type = 2

    ) as obj

        on bd.allocation_unit_id = obj.allocation_unit_id

left outer join sys.indexes ind 

  on  obj.objectid = ind.object_id

 and  obj.index_id = ind.index_id

where bd.database_id = db_id()

  and bd.page_type in ('data_page', 'index_page')

group by obj.name, ind.name, obj.index_id

order by cached_pages_count desc

 

An example of what it returns –

 

1.       Run following command to remove all clean data pages from the buffer pool –  (DO NOT TRY THIS COMMAND ON PRODUCTION MACHINES)

 

DBCC DROPCLEANBUFFERS

 

Running buffer pool analysis query had following results –

 

cached_pages_count ObjectName         IndexName              IndexId

------------------ ------------------ ---------------------- -----------

15                 sysobjvalues       clst                   1

3                  sysallocunits      clust                  1

2                  syshobtcolumns     clust                  1

2                  sysrowsetcolumns   clust                  1

2                  sysrowsets         clust                  1

2                  sysschobjs         clst                   1

 

2.       Run the following query on AdventureWorks database –

 

select * from Person.Address

where city like 'Bothell'

 

This is going to read from disk the data pages needed to execute the query. Run the buffer pool analysis query again to see the change. 

 

cached_pages_count ObjectName         IndexName              IndexId

------------------ ------------------ ---------------------- -----------

278                Address            PK_Address_AddressID   1

15                 sysobjvalues       clst                   1

4                  sysmultiobjrefs    clst                   1

3                  sysallocunits      clust                  1

2                  syshobtcolumns     clust                  1

2                  sysrowsetcolumns   clust                  1

 

As you can see now there are data pages in buffer pool from the Address table. Additionally since only clustered index pages for Address table are present, no other indexes were used in the query.

 

Another tool which can help in this case is DBCC MEMORYSTATUS output. The advantage of the query in the entry is nice result set which can be stored in a temp table.

 

 

Authors:

Tony Voellm

Gaurav Bindlish
Adjust buffer size in SSIS data flow task

The data flow task in SSIS (SQL Server Integration Services) sends data in series of buffers. How much data does one buffer hold? This is bounded by DefaultBufferMaxRows and DefaultBufferMaxSize, two Data Flow properties. They have default values of 10,000 and 10,485,760 (10 MB), respectively. That means, one buffer will contain either 10,000 rows or 10 MB of data, whichever is less.

 

You can adjust these two properties based on your scenario. Setting them to a higher value can boost performance, but only as long as all buffers fit in memory. In other words, no swapping please!

 

-          Runying Mao

Implement Parallel Execution in SSIS

SQL Server Integration Services (SSIS) allows parallel execution in two different ways. These are controlled by two properties as outlined below.

 

The first one is MaxConcurrentExecutables, a property of the package. It defines how many tasks (executables) can run simultaneously. It defaults to -1 which is translated to the number of processors plus 2. Please note that if your box has hyperthreading turned on, it is the logical processor rather than the physically present processor that is counted.

 

Example:

Suppose we have a package with 3 Data Flow Tasks. Each task has 10 flows in the form of “OLE DB Source -> SQL Server Destination”.

Set MaxConcurrentExecutables to 3, then all 3 Data Flow Tasks will run simultaneously.

 

Now whether all 10 flows in each individual Data Flow Task get started concurrently is a different story. This is controlled by the second property: EngineThreads.

 

The EngineThreads is a property of the Data Flow Task that defines how many work threads the scheduler will create and run in parallel. Its default value is 5.

 

Example:

Again let’s use the above example.

If we set EngineThreads to 10 on all 3 Data Flow Tasks, then all the 30 flows will start off at once.

 

One thing we want to be clear about EngineThreads is that it governs both source threads (for source components) and work threads (for transformation and destination components). Source threads and work threads are both engine threads created by the Data Flow’s scheduler. So in the above example, a value of 10 for Engine Threads means up to 10 source threads and 10 work threads.

 

Multi tasking is a double-edge sword. In SSIS, we don’t affinitize the threads that we create to any of the processors. So if the number of threads exceeds the number of available processors, you might end up hurting throughput due to an excessive amount of context switches. Be cautious!

 

-          Runying Mao

More Posts Next page »
Page view tracker