This is an interesting story of how the twitter datacenter improved their deployment time from 40 minutes to 12 seconds using the right tool for the job. And twitter are not alone. Let that inspire you!
Since there was some requests to get the code for the parallel QuickSort mentioned here I decided to post the code I used. This is not a complete listing. You need the samples that can be downloaded from here. This listing is an extension to the Sort class in Chapter6/ParallelSort/Sort.cs.
1: private static DispatcherQueue queue =
2: new DispatcherQueue("CcrQueue");
3: private static Port<SortInfo> infoPort = new Port<SortInfo>();
4: private static Port<int> resultPort = new Port<int>();
5: private static EventWaitHandle done =
6: new EventWaitHandle(false, EventResetMode.ManualReset);
7: private static int remaining = 0;
8: private static int theArray = null;
10: class SortInfo
12: public int from;
13: public int to;
14: public int depth;
17: public static void CcrQuickSort(int array)
19: theArray = array;
20: remaining = array.Length;
22: Arbiter.Receive(true, infoPort, CcrSortPart));
24: Arbiter.Receive(false, resultPort, CcrSortPartDone));
26: new SortInfo
28: from = 0,
29: to = array.Length,
30: depth =
31: (int) Math.Log(Environment.ProcessorCount,2)+4
36: private static void CcrSortPartDone(int count)
38: remaining -= count;
39: if (remaining <= 0)
43: Arbiter.Receive(false, resultPort, CcrSortPartDone));
46: private static void CcrSortPart(SortInfo info)
48: if (info.from == info.to)
50: if (info.to - info.from <= Threshold)
52: InsertionSort(theArray, info.from, info.to);
53: resultPort.Post(info.to - info.from);
57: int pivot = info.from;
58: pivot = Partition(theArray, info.from, info.to, pivot);
59: if (info.depth > 0)
61: infoPort.Post(new SortInfo
63: from = info.from,
64: to = pivot,
65: depth = info.depth - 1
67: infoPort.Post(new SortInfo
69: from = pivot + 1,
70: to = info.to,
71: depth = info.depth - 1
76: CcrSortPart(new SortInfo
78: from = info.from,
79: to = pivot
81: CcrSortPart(new SortInfo
83: from = pivot + 1,
84: to = info.to
The use of a static array and remaining counter are obviously not a great design for your typical application, but sneaked in there in the hunt for a saving a few microseconds. Didn't really matter...
A while ago I stumbled over this article. It talks about how teams working on important projects tend to revert to less effective ways of communicating and executing their work. Ithink it makes sense. Among amateurs I think it is quite common to fail a task under pressure that they would have no problem completing otherwise. Everybody knows how easy it is to get all answers right watching Jeopardy, but the people actually participating make all kinds of stupid mistakes. And in my work I've seen projects where the most important pieces have been late while less important pieces are completed ahead of schedule. I've also seen small teams doing proof of concepts that exceeds the accomplishments of much larger teams in the same time frame. Previously I've explained that with the overhead of large teams but this article made me think if there is another dimension to it because underdog projects always seems to be a success.
I was reading this about profiling gotchas and couldn't resist the urge to download the samples from the book and implement a few of them using CCR to compare performance against TPL. I played around mostly with the parallel quick-sort. The first observation is that CCR takes a little more setup than using TPL but most of that would be something you do once in your application anyway so I would consider them similar in readability. Both implementations looked very similar and straight forward. I guess that is what I like with both TPL and CCR is that your code looks almost like regular single threaded code, but with the benefits of concurrent execution.
While executing these two variants and tweaking thresholds I noticed that CCR almost always ended up executing 10-20% slower than the equivalentTPL implementation. Looking a little closer at what was happening I think it all made sense. First of all this was on a twin core machine so there are really only two threads running. CCR is built around message passing and the way I implemented the sort algorithm was to post a message when a part of the array was sorted signaling when all parts had been sorted. Since quick-sort uses a pivot number putting it in the right place it also means that a lot of these partial sorts were completed by just putting a single number in place. That is a lot of messages being sent. The handler also must be a run once handler that then puts itself back in the queue so that I don't update the remaining unsorted counterconcurrently. Hence a lot of time is relatively spent in the CCR dispatcher just to decide when everything is done rather than sorting. Did that make sense without a code listing?
Anyway... My gut feeling is that TPL is more well suited for parallel computations than CCR. The code looks a little cleaner and performance is a little better. However CCR is more well suited for event based implementations where TPL is not that big of an advantage. As usual it is a matter of choosing the right tool for the right problem.
Preemptive comment:I think I could have made my CCR implementation a little more efficient and I suspect the differences would have been smaller with more cores but I didn't test that. I'll let you do that!
The thing about ship parties made me think about a more common event at Microsoft; Morale events. What I've seen is that depending on who's the hosting manager (i.e. the manager that is everybody else's boss directly or indirectly) your morale event will be different. On one end of the spectrum we have the manager who organizes booze cruises. In the middle you find managers who like fine dining and tasty beverages. And the other end of the spectrum we have managers who know they need (or think they need) to provide alcohol but they don't drink themselves. The first two types of morale events are typically appreciated by a lot of people. It's easy to like anything that involves tasty beverages and good food. The last type however tends to annoy a lot of people since supply of alcoholic beverages is often terribly underestimated. Because in my opinion, there is nothing worse than running out of food and drinks at a party.
Premtive comment: No, I don't think it's a must to provide unlimited amount of food and drinks. I would gladly pay for a few beers but the fact that you run out of drinks is pretty bad I think. It's OK to run out of drinks and food, but not when the party is starting. It's OK towards the end of a party!
The best thing about relocating to Redmond is the opportunity to attend ship parties. Assuming that you work on a shipping product... I would like to add a few things to this list:
I've been working with MSTest (the unit test framework that comes with Visual Studio) lately and I learned the hard way that it is a bad idea to not use the TestInitialize attribute. This may sound obvious to you but I'm more used to work in xUnit.net and there I typically setup as much as I can with field initializers and the rest in the constructor. So while working in MSTest I did the same thing; I setup as much as I could in field initializers. I added a failing test, implemented it and suddenly I had multiple test errors. Not failures, but errors. The message was: Type Tin assembly A is not serializable. Interestingly I did not touch T and it's an exception so it's definitely serializable... Looking at what a search on the Internet gives you for this error indicates problems with finding the right assemblies. Well, the build environment I'm using is kind of tweaked, but i didn't change that when the tests started to report errors. By looking at the tests now with errors I manage to conclude that they all use a default constructor for an object that calls another constructor with an empty string. And that constructor does not accept an empty string. It should but it doesn't. So now I know how to fix it but why the bad error message I think. Well I don't know that, but i do know that when i moved the call to the constructor from the field initialization to a method with the TestInitialize attribute i got a test failure; Unexpected excpetion in initializer: T: Path must start with '/'. I guess I'll stop using field initializers when dealing with MSTest in the future...
Being new to the Robotics team also means I get a chance to do some beginner mistakes with CCR. Let's assume that you have a method that does a lot of work and you want to use CCR to do the work. Before using CCR your code would have looked like this:
1: private void ProcessWithoutCcr()
3: // Do Work
4: // Do a lot more work that takes a lot of time
If all the work you're doing is just taking time because it needs to use the CPU the method is actually OK. But if there are a lot of things working with files, network etc you want to use a series of yield returns to split up the execution. If you're not really interested in the result or the methods cannot fail you (as a beginner) wold maybe do something like this:
1: private void DoSomething()
3: // Do Work
6: private IEnumerator<ITask> DoSomethingElse()
8: // Do work that takes a lot of time
9: // and split up with yield returns
10: yield break;
13: public IEnumerator<ITask> Process()
15: yield return Arbiter.FromHandler(DoSomething);
16: yield return Arbiter.FromIteratorHandler(DoSomethingElse);
This will however not work and only DoSomething will be executed. The problem is how the execution IEnumerator<ITask> is implemented in CCR. To get it to work you can use a SuccessFailurePort either by passing it in or by returning it. Both approaches are shown in this example:
1: private SuccessFailurePort DoSomethingWithPort()
3: // Do Work
4: SuccessFailurePort port = new SuccessFailurePort();
6: return port;
9: private IEnumerator<ITask> DoSomethingElseWithPort(SuccessFailurePort port)
11: // Do work that takes a lot of time
12: // and split up with yield returns
14: yield break;
17: public IEnumerator<ITask> ProcessWithPort()
19: yield return DoSomethingWithPort().Choice(CcrServiceBase.EmptyHandler, CcrServiceBase.EmptyHandler);
20: SuccessFailurePort port = new SuccessFailurePort();
21: SpawnIterator(port, DoSomethingElseWithPort);
22: yield return port.Choice(CcrServiceBase.EmptyHandler, CcrServiceBase.EmptyHandler);
In this example no real handler is used for the port so it could be simplified with a more simple port type, but the SuccessFailurePort pattern is pretty nifty so why not use it...
Yesterday was about writing good test fixtures rather than using a tool that makes you get away with bad designs. The same thing goes for mocking frameworks. If you use a mocking framework you should avoid using it do fake things you wrote yourself. And you should use it rarely for anything else. That way you're more likely to end up with a good design in my opinion.
A few weeks ago a question was asked on an internal mailing list about how the unit test framework part of VS2010 works. Basically some client had decided to move away from that framework in favor of NUnit. Even though I think they did the right thing I think they did it for all the wrong reasons. The problem they encountered was that tests from multiple test classes were run at the same time and also the class initiators for one class ran before the class cleanup for another one was completed. If I understood the problem correctly the problem was not with their unit tests but with a number of bigger integration tests that were implemented using a unit test framework. Still that is not their problem I think. the problem is that they had tests that used some common resource (probably a database) and that it was used so that fixtures overlapped. Switching to a framework that happened to not run tests concurrently (like VS2010) nor in random order (like xUnit.net) did not fix their problem. It just suppressed the symptoms. It's like talking pain killers for a broken leg. It may feel good for the moment but it will not help you in the long run.
What they should have done was to fix their test fixtures. There are three things they could have done;