While I wish I could write a long article on how Spotify works technically this is not what I want to tell you about today. Nor will I tell you how I would build Spotify if I had to, but that would be an interesting blog post. But today I want to tell you about a great article describing how Spotify has organized their teams, how they work and best of all; they have cool names for it too: Squads, Tribes, Chapters & Guilds!
In HTTP 1.1 connections are reused by default. This means that if you make two HTTP requests after each other you can do it over the same TCP connection. This saves you the overhead of setting up a new TCP connection. This is even more important if you're using HTTPS since the SSL handshake to setup the secure communication is relatively expensive for the server. That is why you typically want to reuse connections when you can. if you're using an HTTP client library (such as HttpClient in .Net) the client library will open a few connections if needed (this is configurable and the default is two) and then reuse them as long as there is new requests pending soon enough. but this can also be a problem if your server is a cluster such as a few instances in an Azure deployment. The problem occurs when you have few (relatively speaking) clients generating a lot of requests on your servers.
This is easily illustrated with an example. Let's assume that you have two servers behind a VIP (i.e. for each new TCP connection one server is chosen using a round robin selection) and three clients who each need one TCP connection but do 10 requests per second using this connection. Client one will first open a connection to server one, client two connects to server two and then client three connects to server one. And now we have an uneven load of 20 requests per second on server 1 and 10 requests per second on server 2. This might not be too bad if we had 101 clients and two servers given that each client is equal but that is probably not the case. Consider for example the case where client two in the example above only generates one request per second and closes the connection between each request. Now server one will alternate between 20 and 21 requests per second while server two has zero or one request per second. If you have a mix of these short lived connections and long lived connections and assuming that even the long lived connections once in a while will be closed, then you will notice that some of your servers will have much more load than some others. This is probably not desirable since random servers will have very high load at random times while a lot of servers will have less than average load. This problem is called connection affinity.
You might think that a smarter VIP choosing where to route new connections based on capacity would help and yes it will. A little. The problem is that once a TCP connection is established the VIP cannot do anything to reroute it. At least not your vanilla VIP. Maybe there is some super advanced HTTP inspecting VIP out there but you probably don't have one so don't bother thinking about that too much.
What you want to do is to let a few requests be handled by the server on each TCP connection to get the best of both worlds; reuse a connection for a while to reduce connection overhead but once in a while force the client to connect again so that gout VIP can load balance. While this is definitely possible the overhead to keep a list of all clients currently connected in your web service will waste some memory and CPU cycles and instead you can let math help you. If you want to reuse each connection N times on average you can randomly choose to close a connection after each request with the probability of 1/N. This works great for a mix of long lived and short lived connections since the long lived connections will be closed on average every N requests (trust math!) while short lived connections with just a few requests are likely to not be closed prematurely.
You might be temped to just have a global counter and close connections every time your total counter hits N. This does not achieve what you want. There is a famous problem called the coupon collector's problem that tells us that if you have N options that are equally probably the number of picks you need to make in order to expect to have picked all N options is N ln(N). That means that if you have C connections and close every N requests it will take C N ln(N) requests before you can expect to have closed all connections so he average life time of each connection is going to be larger than N. Once you add a number of short lived connections it gets even worse. Trusting randomness is much easier and more accurate!
Last week I helped a colleague who was experiencing UnobservedTaskExceptions I his code. The problem was essentially that the code started several tasks and then in a loop checked each one if it was faulted or not. If a task was faulted the method threw an exception. This meant that if two tasks faulted in the collection of tasks then the second one was never observed causing an UnobservedTaskException that brought down the process. While this sounds simple it turned out to be a hard nut to crack for a number of reasons.
First of all you need to know some things about the UnobservedTaskExceptions; while they by default crash your process in .Net 4.0, they don't in .Net 4.5. The fun thing is that you get the 4.5 behavior on your 4.0 assemblies by just having 4.5 installed. There is a way to configure your application to use the old 4.0 behavior and you can read about that and why the default behavior changed here. Forgetting this can frustrate you if your build environment does not have 4.5 but you have it on your own machine.
Second you have to remember that even if you use async/await you can still end up writing code that have the same "problem". The "problem" is that you only bubble up the first error you see and not all errors. For many reasons I think this is what you actually want (WhenAllOrError anybody?) but if you really want to get all exceptions you can just use Task.WhenAll and you'll be good.
I read this interesting article that illustrated the difference between processes optimized for flow efficiency versus resource efficiency. Maybe not obvious in that article why flow is cheaper (or same) cost as resource optimized but if we assume customer satisfaction is a great asset I think it is obvious which process is preferably from a customer satisfaction perspective...
Some researchers from MSR have published a paper on how do software engineers understand code changes. To me there are no surprises, especially since I participated in the survey the paper is based on (I think I'm quoted too actually). The bottom line is that developers spend a lot of time trying to understand previous code changes and that good check-in descriptions help in the effort to understand old changes.
Last week Stephen Toub covered WithCancellation in a more thorough way than I did. You should read his article too!
I loved the original XCOM game (and terror from the deep was OK). Will definitely play this new game when it releases next week. They did something interesting for their recent trailer; an interactive trailer where you actually get to change how the flow goes.
This is a variant of WhenAllorError that a colleague asked me about. His scenario was that he had a lot of tasks to complete but since they all involved making HTTP requests to other servers he did not want to start them all at once but rather start a few and then as they completed start a few more. That's how I came up with WhenAllOrErrorBatched. The idea is that instead of giving it a list of tasks you provide a batchSize (number of parallel tasks) and a function that returns new tasks that will be called until it returns null.
1: public async static Task<T> WhenAllOrErrorBatched<T>(
2: int batchSize, Func<Task<T>> nextTask)
4: var result = new List<T>(batchSize);
5: var pending = new List<Task<T>>(batchSize);
6: bool pendingTasks = true;
7: while (true)
9: while (pendingTasks && pending.Count < batchSize)
11: var task = nextTask();
12: if (task == null)
14: pendingTasks = false;
21: if (pending.Count == 0)
26: await Task.WhenAny(pending);
28: for (int i = 0; i < pending.Count; i++)
30: if (pending[i].IsCompleted)
39: return result.ToArray();
Remember the Great Ball Contraption from earlier this year? There has been an update and it is awesome! My favourite part is the ball throwing one...
I spent the weekend catching up on my RSS feeds and read this interesting story on how to explain why two developers pair programming would only produce a hundred lines of code in a single day. Brilliant idea to use business plans to explain what development really is about.