The Case for Moving From TPC to Database Throughput Units in Database Performance Comparisons

The Case for Moving From TPC to Database Throughput Units in Database Performance Comparisons

Rate This
  • Comments 8

Scientific testing is based on controls, transparency, and repeatability. Whenever we as technical professionals want to test the performance of a database system, we search for a series of tests that show the system’s metrics against a standard.

But the scientific basis for using the most common standard, the Transaction Performance Council (TPC) measurements (http://www.tpc.org/), is difficult for most database professionals. The TPC metrics are divided up in “Benchmarks”, classified as C, DS, E, H and “Energy” as of this writing. These involve everything from measuring OLTP (in multiple types), virtualization technology, and all the way to business-intelligence type workloads. It takes no small amount of study to understand what these measurements show and how they apply to the systems that are tested.

And that forms the main issue with TPC numbers – the testing is done by and for the various database vendors (Microsoft included), which leads to the problems in the other areas – controls, transparency and repeatability. While the TPC standard is public (and lengthy, and sound), each vendor tunes the hardware, platform and workloads as much as possible to favor their database (controls), doesn’t often disclose those parameters (transparency) which of course leads to a problem of your reproducing those results to ensure that you can verify them (repeatability).

And in the end, none of this matters anyway – your workloads don’t resemble those controls at all. They are a statistically spread, standardized way of measuring various vendor systems and hardware using transactions. In your case, you want something that resembles your workloads, future workloads, and you want a standard way of reproducing those results. So in many shops where I’ve worked, I created my own tests. This works, but I was never sure that I had covered all of the areas I needed to ensure that the workloads were representative.

So at Microsoft we’re starting to focus more on a scientific methodology that more closely resembles real-world workloads, is repeatable on your own systems, and measured (starting with our SQL Databases offering in Microsoft Azure) in a published document. We call this new measurement “Database Throughput Units” or DTU.  You can find the complete document here: http://msdn.microsoft.com/en-us/library/azure/dn741327.aspx. It’s short – and that’s on purpose. A more simple description allows you to replicate what we’ve done, and change it to be more relevant to your own workloads. Almost all parts of the process are under your control. And while we have standards published based on our testing, we recommend you use the same methodology on all your systems and ours, to show a true benchmark. The culmination of the process is throughput – the time it takes a user to make a request for a database operation and get a result. That’s all they care about, and in the end it’s what your final decision will be judged on.

There are multiple areas in the standard, including:

  • The Schema – A variety and complexity within the structure to show the broadest range of operations.
  • Transactions – A mix of types within the CREATE, READ, UPDATE and DELETE operations (CRUD Matrix) that can be tuned to a real-world observation.
  • Workload Mix – A distribution of the above measures that more accurately resemble your environment.
  • Users and Pacing – The number of virtual “users” that a measurement should show, and how often each user performs each action to show spikes, lulls and other anomalies faced in real-world systems.
  • Scaling Rules – A scale factor applied to the number of virtual users per database.
  • Duration – The length of time for the test run – one hour is considered minimum, longer is better for a true statistical result.
  • Metrics – DTU focuses on only two end measurements for simplicity: throughput and response time.

You can read the full document at the link above. As always, all comments are welcomed.

 

Leave a Comment
  • Please add 7 and 2 and type the answer here:
  • Post
  • We cannot very well replicate the actual DTU benchmark until Microsoft actually publishes more complete details about it.

    I agree that conceptually, the standard is a useful starting point for what to think about when trying to create your own benchmark, based on your own workload.

  • Good point, Glenn. We will be coming out with more prescriptive guidance soon on the specifics of how we implemented our test run. The bigger picture is the methodology rather than the specific measurements. We wanted to ensure customers could implement something that is repeatable on real-world workloads.

  • I'm excited that Microsoft are putting in the work to help us understand OLTP performance in Azure. The full disclosure of the benchmark will make it possible to reproduce the workload locally so we can compare our own hardware and workloads to what we're getting in Der Cloud(TM).

    Even this document is a great first step along the way. It has made a number of customer decisions easier.

  • Thanks, Jeremiah! Yes, I think the transparency around the methodology is really important. You don't have to trust what a vendor says - you can simply run the tests yourself, get a unit of measure, and use that to compare. We hope this is the way forward.

  • Sad trombone, Shawn Bice doesn't like transparency: blogs.technet.com/.../azure-sql-database-service-tiers-amp-performance-q-amp-a.aspx

  • Hi Buck, Any update on when partners can get there hands on the benchmark itself?

    RGds

    Mat

  • Stay tuned and follow the SQL Database's team blog (and this one). They'll announce as soon as they publish.

  • I'm interested in the details of DTU benchmark, but I wonder how portable such a beast would be? How would one compare it to other Azure offerings that don't use SQL Server? Or other cloud providers? Or on premises networks? Without open, replicable benchmarks, we are talking marketing numbers, not science.

Page 1 of 1 (8 items)