Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
Suppose a city has a whole bunch of bank branches, each of which has a whole bunch of tellers and one gofer. There are a whole bunch of customers in the city, each of whom wants to withdraw a whole bunch of money from the bank at some varying time throughout the day.
The algorithm goes like this:
A customer finds the nearest bank branch and evaluates its line. If the line is out the door, then the customer goes to another bank branch. This continues until they either find one with a short enough line, or they give up and go home.
Suppose they find one with a short enough line. The customer queues up. (Perhaps they queue up using the M model or using the W model; it doesn't particularly matter for the purpose of this analogy. Let's suppose its the W model.) When the customer reaches a teller, the transaction goes like this: the customer requests a certain number of bills of various denominations, and the teller counts them out: one, two, three, four, five, there you go. The customer leaves and the teller services the next customer.
This seems perfectly reasonable. But suppose the teller runs out of the right denomination of cash halfway through the transaction. The teller tells the gofer to go get more fifties out of the vault, and then has a little nap while the gofer is running to the vault and back. Now the customer, and every customer behind them, has to wait for the teller to wake up, which doesn't happen until the gofer gets back from the vault.
In case it's not clear, in this analogy a city is a server farm, a bank branch is a machine, a teller is a worker thread, the gofer is an I/O completion thread, and a customer is a client. The money is the computation that the server performs on behalf of the client, and waiting for the gofer to go to the vault is a synchronous delay waiting on I/O completion. Deciding whether the line is too long to choose a branch is load balancing the server farm.
Now, you might say that this could be more efficient. That time when the teller is asleep waiting for the gofer is troubling. Ideally you want one thread per core going full out all the time. The teller could be servicing the next customer in line, and then pick up with the first customer when the gofer gets back. This seems like an ideal use of task-based asynchrony.
You have to be really careful to understand the implications on the whole system when you make a change like this. It is really easy to accidentally implement the following system:
A customer finds the nearest bank branch. There is no line, so they go right in. There are no tellers. There's just a "take a number" machine and an arrow pointing to a door. The customer takes a number and walks through the door into an enormous warehouse containing tens of thousands of customers. There are a bunch of tellers running at top speed from customer to customer, giving them each one bill at a time. If a teller runs out of a needed denomination then they yell at the gofer, and keep right on going to the next customer who they can serve. Eventually the gofer brings them the needed denomination. The tellers keep track of who all they are servicing "at the same time", and no one leaves until they get all their money. If the gofer can't keep up then the warehouse just gets more and more crowded, and the more crowded it gets, the longer it takes for a customer to get all their money, and the more customers who arrive while they're waiting, and it just snowballs.
In this scheme the tellers are almost never asleep, so the CPU cores are being heavily utilized, and everyone gets their first bill reasonably fast. "Time to first byte of results" is potentially excellent. But the load balancing just went out the window; because there is never a line, there's no way for the load balancer to know that there are too many unserved customers. And time to last byte served can be potentially bad.
This sounds like a silly story, but we accidentally did something like this to ourselves just the other day. We built a code analyzer that uses task based asynchrony on the UI thread. The idea was that on every keystroke we would start an asynchronous task that then itself started other asynchronous tasks. The tasks would be :
1) Check to see if there's been a subsequent keystroke recently; if so, then cancel any unfinished tasks associated with this keystroke.2) Recolour the user interface text based on the differential analysis of the change to the syntax tree induced by the keystroke.3) Set a timer to see if there has been half a second of time with no keystrokes. If so, the user may have paused typing and we can kick off a background worker to perform deeper program analysis.
See the problem? The tasks are created so fast between keystrokes that if you're typing quickly soon you end up with a warehouse full of tens of thousands of tasks, 99.99% of which have not yet run in order to cancel themselves. And half the ones that do manage to run are creating timers, the vast majority of which are going to be deleted before they ever tick. The garbage collection pressure from all those timers alone was enough to destroy performance. Asynchrony is awesome, but you need to make sure that you get the granularity of the asynchrony at the appropriate level. The code analyzer was rewritten to enqueue tasks that referred to a single, global timer, and to be less aggressive about enqueueing tasks that were highly likely to need cancellation a tenth of a second later. Performance improved drastically.
Task-based asynchrony is awesome but it is not a panacea; be careful and measure as you go. The fact that there is hardly any waiting between the time you make the request and the time the asynchronous request enqueues itself onto the work queue means that there is hardly any restriction on the number of tasks you can create quickly.
Next time: More thoughts on syntactic concerns
The warehouse analogy is great. I now want to open a bank that works that way, just to see how it works.
Hah! Well, I am always relieved to see a post like this, reassuring me that I'm not the only person who doesn't get the algorithm quite right the first try. :)
Thank you very much for sharing.
The Rx design guidelines show a similar situation of capturing keystrokes and kicking off a dictionary search. However, they use a Throttle which appears to be a more clever approach this time round :)
The more I learn about implementations of asynchrony the more I love it.
"More thoughts on syntactic concerns" desperately waiting :)
I think typing 100 wpm is nearly 10 keystrokes per second. It's hard to imagine somebody typing that fast for long enough that a few garbage objects for every keystroke could cause enough memory pressure to make a noticeable performance difference. Clearly making a single timer that gets reset on each keystroke was the right decision, but it's still hard to imagine that creating 10 timers per second for brief periods causes performance problems.
BTW, using the W style of queueing is the right thing to do here because then the line doesn't stall unless all the tellers are asleep waiting for the gofer. And if more tellers can open up their stations when too many other tellers are waiting for the gofer, the line will never stall until the bank runs out of teller stations.
AN excellent blog on Asynchrony!
Rob Reckless - RecklessDevelopment
You kind of left out the punchline - what's the *correct* approach to this situation, in either the bank analogy or the world of asynchrony?
My guess is that you would send each client into some sort of limited "wait pool" (waiting area?) every time they need to wait on I/O (or the gofer), and whenever the I/O thread (gofer) comes back, you make a note to check the pool after the current request is done and finish off whatever you can. Presumably if the pool has a limited size then eventually the line will start to snake out the door again if you get too busy.
But, maybe this is way too simplistic an approach, or too close in concept to the warehouse situation. Is there a better way?
How about just bump people back to the front of the queue if the teller is unable to finish servicing their request due to lack of bills? Given that we're using the W system (one shared queue) that should work fine as long as we have a way for the tellers to skip people that they know in advance they can't service.
Have you considered replicating Mono's coroutine approach?
Apart from still being buggy in Mono, this allows for easy suspension of function execution.
No need to expand into a state machine.
My sample implementation of coroutines usting tasklets (circa 50 lines of code):
(did I mention, it's still buggy? :P)
Followup: Coroutine implementation
MZ: Doesn't the use of tasklets require modifications to the CLR? Granted, I'm all for modifying the CLR to support that...
Yes it certainly does. Still CLR unlike JVM does evolve (example: con- and contra- variance in 4.0) so that shouldn't be a blocker.
Though your point is well-taken, I note that generic variance was added to the CLR in v2, not v4. No mainstream languages took advantage of that feature until C# 4.0 and VB 10, but it has been there since generics were implemented in the first place. - Eric