Ed Draper

scalability != performance

Harry Pierson raises the question, “Is the middle tier dead.“

He goes on to say:

Why do we distribute applications across multiple systems today? Is it because we like managing multiple systems? No! It's for scalability. We exploit the fact that tiers of a multi-tier app have different processing loads and scalability methods. Typically, we scale the web/app tier by throwing more servers at the farm while we scale the data tier with bigger servers. However, as Moore's law increases the performance of these machines, the need to scale becomes reduced

I disagree with this assertion.  Moore's law is all about CPU speed.  CPU speed has relatively little to do with scalability.  I see this issue come up again and again.  Let's be clear: scalability != performance.  Yes, a faster CPU will run any one thread faster than a slower CPU on the same task.  So what?  The issue of scalability is all about lots of “things” happening simultaneously.  Highly scalable systems can both have lots of “things” happening at the same time and can be both horizontally and vertically scaled to have even more “things” happen simultaneously.

How is a faster processor going to help in the prevention of lock contention in memory, disk, thread, and transactions?  If locks are held for shorter periods of time, that does not obviate the need to have them in the first place.  They exist.  And yes, everything comes to a screeching halt when one is encountered – regardless of how fast the CPU is.  Scaling means being able to deal with this problem.  For example, if you are CPU-bound and you’ve got numerous threads that are piling up one on top of the other, is the answer going to be to get a faster CPU or get a second CPU?  Which design would yield the ability to have more “things” happen simultaneously?  Let’s look at this scenario’s concurrency and state management characteristics given that:

Threads have state

CPUs have state in terms of their instruction cache, registers, and stack

Memory space has state

Your storage device is all about state

Now, say your app needs to do A LOT of “things” concurrently.  What happens when your app has only one of each of the above?  The first thing you’d do is increase your thread count.  Okay, you can do that on one CPU.  Now you’ve got multiple threads vying for the same RAM, IO, and CPU.  After each thread is allocated its time slice by the OS, what needs to be done?  The CPU needs to flush its state and set up its registers and stack for the current thread.  During the course of the current thread’s execution, if it is using any shared resource that another thread might access, it will need to place a lock on it.  Any thread that is allotted a time slice by the CPU will block until the lock is released.  This consumes a valuable thread that could be used elsewhere.  So, how do you fix this problem?  A faster processor?  More likely you’d do whatever you could to remove the need to use shared resources across your threads.

And what would you be doing in the process?  Making your app more scaleable by making more “things” happen simultaneously. 

Now, let’s throw in another CPU.  Now, when the OS needs to schedule the execution of a thread, it has the option to run that thread on another CPU.  This means that both threads can execute truly concurrently rather than time sliced across one another with the associated context switching. 

Adding CPUs is all about vertical scaling.  But you’re always going to run into the resource limit of that machine’s resources – principally your process’s RAM.  Horizontal scaling is all about breaking down those barriers by scaling out across multiple machines.

Just like the OS’s thread scheduler, a load-balancer serves to allocate tasks to workers.  So the more workers (computers) that are available, the more that can be done at the same time.  Regardless of how fast one super-worker can handle jobs, it is always going to be bound by its physical constraints.  Multiple workers, each with their own physical constraints, will be able to out-produce the super-worker for work done simultaneously.

Yes, distributed computing is a good thing.

 

Published Wednesday, April 14, 2004 1:28 AM by draper

Comments

 

Marco Trova said:

We are currently evaluating some PDF products, because our current un-managed toolkit sometimes blocks entire server (we have a webfarm, 4 server with 4 CPU each one).

Now, the discussion is to migrate to a .NET product, but we have tested most of them and they use all CPU resources available so for one is not scalable and for me we have tested only the product not the whole system..
April 15, 2004 2:35 AM
 

Philip Wheat said:

Ed, you might want to drop a link to the Architecture center - I know you tell everyone about it, but people who stumble across your blog (like me) might find it useful to see what's out there for them. http://msdn.microsoft.com/architecture/default.aspx
April 15, 2004 2:29 PM
 

Tiago Pascoal's WebLog said:

April 21, 2004 4:17 AM
 

Narayana Pakala said:

Scalability != performance. It is quite correct. There are several misconceptions which have been addressed by the Gurus. For instance Don Box focused on the 2PC and performance bottle-neccks when discussing the prospect of .NET objects as a part of SQL Server database.

This is what he observed:
It is interesting to look at vendors like Oracle, who
have this monolithic database solution, and in those
cases two-phase commit is not that critical. If you
look at IBM and Microsoft, who went with these
multi-tier architectures, they produced two-phase
commit and in many ways lost performance. I think
you’re seeing a move away from two-phase commit at
Microsoft and I think Yukon [the next version of SQL
Server] is the ultimate expression of that, this
notion that everything is going to run inside the
database. Basically it says that the nineties were
wrong. The eighties and nineties, where we have all
these heavyweight two-phase commit transactions, just
isn’t going to cut it.

You can find the complete discussion at this URL:
http://www.dnjonline.com/TechEd2001/DonBox_head2head.html

I would like to thank Roger Sessions who clarified in an e-mail communication, the performance and scalability issues in the context of 2PC.

Best Regards
Narayana
May 19, 2004 2:52 AM
Anonymous comments are disabled

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker