Today on ARCast.TV we have another episode from Kuala Lumpur on Performance and Scalability.  So I thought I would jot down a few notes from my thoughts on the subject.

When it comes to performance what is the role of the solution architect?

The hardware guys will take care of getting a really fast server and tweaking it out so is that a part of the role?

The developers will optimize the code using the best patterns and techniques to insure that the code functions as quickly as possible – is that a part of the role?

The test team will test the system under load – is that a part of the role?

The answer is a resounding YES to all of these.  Now of course I’m not arguing that the architect needs to be tweaking servers or writing code and tests but the direction and influence of the architect will be felt in all of these areas because the primary thing the architect has to do is to describe the performance and scalability goals of the system in very clear and testable terms.  If you don’t do this, how can you tweak servers, optimize code or test for a goal that is undefined?

When I ask project teams what their performance and scalability targets are sometimes they look at me as though I came from Mars.  “Well, we want our system to be really fast and highly scalable” they say – of course you do, but do you realize that depending on how you define terms like “Fast” and “Scalable” these goals could actually be in conflict?

In my view performance goals should be stated in three dimensions


The throughput goal answers the question “How many”.  In this case the focus is usually on the backend of the system as a whole. 

·         “How many requests per second does the server farm handle?”

·         “How many transactions per second can it sustain?”

For non-technical people you can state these goals in business terms as well

·         “How many widgets per hour can we sell?”

·         “How many orders per second can the system accept?”

Optimizing for throughput means making server resources (threads, memory, etc.) very efficient and can actually work against the second dimension in some cases.

Response Time

Response time is the clients perception.  When people say a system is “Fast” or “Slow” what they mean is that it feels that way because the time it took from the moment they clicked the button to the moment they saw a response felt pretty fast.  You’ll notice that I am using non-specific terms like “Perception” and “Feel” because this is how people speak about these things.  Of course a very specific measure for response time is typically time to last byte (TTLB) and you should optimize this as much as possible (while balancing the throughput dimension) but there are many things you can do to make a system “feel” faster than it is.


Workload defines “How much” in some given slice of time.  Typically we speak of number of concurrent users and the volume of data they are sending or receiving.  To get accurate test results you must create an environment in which the code is being exercised to the same degree as it will be in the production environment.

·         “How many concurrent users do you expect on average?”

·         “What about peak processing times?”

·         “How much data will a typical user retrieve, store, search etc.?”

Once the architect has established targets in these areas you can create a testable goal.  Something very specific that the test team can take to the lab, run a scenario and tell you if the code can meet the goal or not.  There is no sense arguing about “slow”, “fast” or anything else until you know the numbers. 

As Lord Kelvin wrote

When You measure what you are speaking about, and express it in numbers, you know something about it; but when You cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science.

Define, Measure, Rinse and Repeat

You can do it – now go out there and start architecting…