Being Cellfish

Stuff I wished I've found in some blog (and sometimes did)

December, 2011

Change of Address
This blog has moved to
  • Being Cellfish

    TPL Dataflow and async/await vs CCR - summary


    As you have seen there is really much less to cover when doing "CCR tips & tricks for TPL data-flow" probably mostly because the latter is about seven years younger and designed with a latest and greatest in mind. While you tend to need to do a lot of work yourself with CCR for certain scenarios TPL data-flow typically already have a construct to simplify your task but as with any framework you need to know about it and find it which in the long run is probably better for you even though, especially if you're used to CCR, TPL data-flow can feel a little overwhelming at first.

    However in experimenting with TPL data-flow and especially async/await I must say that rushing into using TPL data-flow may not be what you want. A lot can be accomplished with just async/await.

    Writing unit tests
    Async APIs with async/await
    TPL data-flow for CCR developers more
    Scatter/gather performance


  • Being Cellfish

    TPL Dataflow and async/await vs CCR - part 4


    Today I wanted to show a number of simple examples of how to do things with TPL data-flow compared to CCR. Creating a CCR port and posting and then receiving on it asynchronously is one fundamental scenario in CCR. This is what it looks like with TPL data-flow:

     1: var port = new BufferBlock<int>();
     2: port.Post(42);
     3: int result = await port.ReceiveAsync();

    If you want to explicitly extract data from a port this is how you would do it:

     4: int item;
     5: if (port.TryReceive(out item))
     6: {
     7:     ProcessItem(item);
     8: }

    Very similar to CCR which would just use the Port.Test method on line X. Remember how we in the test coded needed a synchronous way to receive data on a port using ManualResetEvents. Much easier with TPL data-flow:

     9: int item = port.Receive();

    One larger difference is that in CCR concurrent processing of handlers on ports is the default while in TPL data-flow the default is to process one item at the time. This is how you can change that:

     10: var port = new ActionBlock<int>(
     11:     item => ProcessItem(item), 
     12:     new DataflowBlockOptions 
     13:         { MaxDegreeOfParallelism = Environment.ProcessorCount });

    It is also very easy to introduce interleaving as in CCR. That is limit the processing across multiple ports. It can look something like this:

     14: var options = new DataflowBlockOptions() { 
     15:     TaskScheduler = 
     16:         new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler
     17: };
     18: var p1 = new ActionBlock<int>(item => ProcessItem(item), options);
     19: var p2 = new ActionBlock<int>(item => ProcessItemDifferent(item), options);

    These are just a few differences that should get you started. Bottom line is that if you're used to CCR it is very easy to switch over to use TPL data-flow if you ask me.

  • Being Cellfish

    TPL Dataflow and async/await vs CCR - part 2


    Dealing with asynchronous APIs will also be much easier than with CCR. First of all you can expect most (if not all) classes in the .Net framework to have another method added to them that is declared async. For example the Stream object used to have a synchronous Read method and then BeginRead and EndRead for asynchronous processing. In .Net 4.5 the Stream object also have ReadAsync which can be used to process asynchronously. If you however need to work with an API that have not been updated with an async method you can either apply the same pattern as with CCR by using TPL data-flow or you can just spawn and await two tasks like this:

     1: var asyncResult = default(IAsyncResult);
     2: var result = default(int);
     3: var mre = new ManualResetEvent(false);
     4: await Task.Run(() => 
     5:     {
     6:         stream.BeginRead(buffer, offset, count, 
     7:             r => { asyncResult = r; mre.Set(); }, null);
     8:         mre.WaitOne();
     9:     });
     10: await Task.Run(() => result = stream.EndRead(asyncResult));

    You could naturally wrap this in a helper method to deal with this generically exactly the same way as with CCR.

    UPDATE: First of all I was too quick to post this so there was an error in the original code. The use of BeginRead must wait on completion before completing. Second (I guess that is the price for using the latest stuff) I missed to point out that there already is a helper for making Begin/End into a task; Task.Factory.FromAsync. Last (but not least) I admit the code above is kind of stupid and yes it would block a thread pool thread. I was stuck in a pattern commonly used in CCR. Stephen Toub's first comment below show's how this should really be done using a TaskCompletionSource object. So to really learn something useful, look at that comment and the use of TaskCompletionSource.

  • Being Cellfish

    Dependency injection and good design


    I helped preparing a meeting on dependency injection on my team and we had that meeting last week and it lead to a number of interesting discussions. Before we go into that I have to explain the title (which was the same as the title for this blog post). If you look at attributes people tend to claim being attributes of a good software design it involves correctness, adaptability, loose coupling, maintainability etc. interestingly enough is that most (if not all) attributes people tend to come up with are all part of a "super attribute"; testability. So by achieving testability we also achieve a number of other desired attributes. Given that we want to achieve testability on unit level the need for dependency injection is almost impossible to get away from. There are a number of patterns for dependency injection and I have to come back to that in future posts because the list is getting long nowadays.

    However related to this we had a lot of interesting discussions. The one that surprised me the most was the argument that dependency injection through design required more lines of code than the untestable original. The argument made was that more lines of code is harder to understand. Excuse me, but that is like saying that the variable name "r" is easier to understand than "distanceFromCenter". Sure I agree that a very long method or class is hard to understand but the addition of dependency injection does not change that. Functionality and dependencies are separate and the understanding of one should not affect the other. Now related to this is a difference in preference by developers. To generalize there are two types of developers when it comes to understanding code; the ones who execute the code in their head and the ones who like to read the code as a book. The former hates lots of small methods since it means jumping back and forth in the code and the latter hates long methods since they need to "extract method" in their head (I'll probably revisit this in the future too).

    Another interesting observation was that most patterns for dependency injection changes objects to now expose these dependencies in either properties of constructors. This can be a problem for a software company that ships code libraries for others to use since people may either use a constructor wrong or inject their own objects as dependencies resulting in the object not working as expected. While some of you might think that "well, if you gave it ***, naturally it is doing ***" there is still a problem with this. The customer might not know that they did something wrong and generate a support case that takes a lot of time to investigate just to figure out that the customer did something wrong. Apart from the wasted time there may also be bad-will from this. Good thing is that there is an easy solution; do not make your dependencies public. In .Net for example you can use PrivateObject (which uses reflection) to access private things. pattern is still the same, you would just be using a trick to lower your support costs.

    The last interesting observation I made during the meeting was the reaction to the use of a static factory for dependency injection. While static factories minimize the number of added lines to a class a lot of people reacted to what happened to the test code. Several people commented on how setting up the factory and then calling some random method on some random object made it unclear if the dependencies were really being used. I found this particularly interesting since I personally am not a big fan of the static factory pattern but for other reasons.

    I guess the bottom line is that there is no single pattern for dependency injection that always work great. But you should have a few in your toolbox and use whatever is best in any given situation.

  • Being Cellfish

    TPL Dataflow and async/await vs CCR - Introduction


    In the .Net framework 4.5 developer preview there are two new great additions; TPL data-flow classes and async/await keywords. If you're familiar with CCR you'll notice that TPL data-flow looks and feels a lot like CCR. The new async/await keywords also make asynchronous programming very easy. So I sat down and revisited my old CCR tips & tricks to see what they would look like with these new tools. The first thing that struck me was all the things I did not need to do... First of all async/await takes you a long way. Both CCR and TPL data-flow are great ways to setup data handlers and then just post data into these handlers and get data processed as it arrives, that is a very typical scenario in the robot world where data is sensor data and actions is reacting to sensor data. If you use CCR in another domain async/await is enough to add asynchronous processing and deal with scatter/gather patterns. The second thing that strikes me is how much easier it is to debug (unexpected) exceptions with the new tools. Instead of posting exceptions (as in CCR) you just throw them and when catched the call-stack looks just like if the code was synchronous and single threaded even though it is not. This significantly simplifies debugging and exception handling in my opinion.

    Over the next few days I'll show some examples a few old CCR tips & tricks but now using TPL data-flow and async/await.

  • Being Cellfish

    TPL Dataflow and async/await vs CCR - part 5


    I got a tip from a co-worker that your choice of waiting for a scatter gather operation as described in part 3 may have a huge performance impact. I made three version of the FibonacciAsync, one that awaits on the return statement, one that uses the Task.WaitAll method and one that just awaits each call as it is made. There are some interesting results when I introduced a one second delay in the asynchronous fibonacci. Here is the code I used:

     1: namespace AsyncTests
     2: {
     3:     [TestClass]
     4:     public class SlowFibonacci
     5:     {
     6:         private async Task<int> FibonacciAsync1(int n)
     7:         {
     8:             await Task.Delay(TimeSpan.FromSeconds(1));
     10:             if (n <= 1)
     11:                 return n;
     13:             var n1 = FibonacciAsync1(n - 1);
     14:             var n2 = FibonacciAsync1(n - 2);
     16:             return await n1 + await n2;
     17:         }
     19:         private async Task<int> FibonacciAsync2(int n)
     20:         {
     21:             await Task.Delay(TimeSpan.FromSeconds(1));
     23:             if (n <= 1)
     24:                 return n;
     26:             var n1 = FibonacciAsync2(n - 1);
     27:             var n2 = FibonacciAsync2(n - 2);
     28:             Task.WaitAll(n1, n2);
     29:             return n1.Result + n2.Result;
     30:         }
     32:         private async Task<int> FibonacciAsync3(int n)
     33:         {
     34:             await Task.Delay(TimeSpan.FromSeconds(1));
     36:             if (n <= 1)
     37:                 return n;
     39:             var n1 = await FibonacciAsync3(n - 1);
     40:             var n2 = await FibonacciAsync3(n - 2);
     42:             return n1 + n2;
     43:         }
     45:         [TestMethod]
     46:         public void TestFibonacciAsync1()
     47:         {
     48:             var n = FibonacciAsync1(6);
     49:             n.Wait();
     50:             Assert.AreEqual(8, n.Result);
     51:         }
     53:         [TestMethod]
     54:         public void TestFibonacciAsync2()
     55:         {
     56:             var n = FibonacciAsync2(6);
     57:             n.Wait();
     58:             Assert.AreEqual(8, n.Result);
     59:         }
     61:         [TestMethod]
     62:         public void TestFibonacciAsync3()
     63:         {
     64:             var n = FibonacciAsync3(6);
     65:             n.Wait();
     66:             Assert.AreEqual(8, n.Result);
     67:         }
     68:     }
     69: }

    And the results look like this:

    Turns out that only the first version completes in six seconds as I expected. Even the similar Task.WaitAll call which I expect to be equalient takes significantly longer to complete. Awaiting each asynchronous call as it is being made is obviously a bad idea when a scatter/gather operation is more suitable.

  • Being Cellfish

    TPL Dataflow and async/await vs CCR - part 1


    As when working with CCR, working with async/await you need to have good tools for writing tests and execute the asynchronous code synchronously. I wish it was this easy:

     1: [TestMethod]
     2: public async void WithAsync()
     3: {
     4:     Assert.AreEqual(42, await DoSomething());
     5: }

    Unfortunately this does not work and I hope it changes before the next version of Visual Studio ships. But even if it did work it is not perfect. Since it's test code you probably want to have some timeout when you wait. Most test frameworks allow you to have a timeout on your test and the default is typically very long. If your framework does not support that or you want more granular control you can use a nifty little extension method like this:

     6: public static class AsyncExtensions
     7: {
     8:     public static T WaitForResult<T>(this Task<T> task, TimeSpan timeout)
     9:     {
     10:         if (!task.Wait(timeout))
     11:         {
     12:             throw new TimeoutException("Timeout getting result");
     13:         }
     15:         return task.Result;
     16:     }
     17: }

    Using that extension method the test above would look like this:

     18: [TestMethod]
     19: public void WithTimeout()
     20: {
     21:     Assert.AreEqual(42, DoSomething().WaitForResult(TimeSpan.FromSeconds(5));
     22: }

    As you can see this beats the use of synchronous arbiters and causalities for tests which were needed to test CCR.

  • Being Cellfish

    TPL Dataflow and async/await vs CCR - part 3


    While you could use TPL data-flow for scatter/gather patterns in much the same way as CCR async/await is actually enough for you once again. The only thing you need to do is to create a number of tasks and then wait for them all either by using the Task.WaitAll method (or Task.WaitAny if you're only interested in the first result) but if you have a short constant list of tasks to wait for I would just wait for them with the await keyword like this:

     1: public static async Task<int> FibonacciAsync(int n)
     2: {
     3:     if (n <= 1)
     4:     {
     5:         return n;
     6:     }
     7:     var n1 = FibonacciAsync(n - 1);
     8:     var n2 = FibonacciAsync(n - 2);
     9:     return await n1 + await n2;
     10: }

    Remember a common pattern in CCR where you want to scatter/gather on a large number of tasks but wait for all success or first failure? While this also can be achieved with TPL data-flow async/await may be enough depending on how you report errors. If you use exceptions to report errors your scatter/gather will "continue on first failure" if you throw an exception when error occurs. Only in the case where the error is returned TPL data-flows would be a suitable solution and it would be very CCRish in how it would be done (i.e. post back exceptions etc).

    Preventive comment: Technically the code above will execute synchronously (since no threads are started nor real async methods are being called), but that is not important. I wanted to show a simple "scatter/gather" pattern by first calling a number of async functions and then awaiting the results as needed.

  • Being Cellfish

    TPL Dataflow and async/await vs CCR - part 6


    Same co-worker as the other day pointed out an important difference between how CCR and TPL data flow deals with exclusive schedulers as described in the end of part 4. To illustrate, assume you have the following test code:

      1: [TestMethod]
      2: public void TestExclusiveExecution()
      3: {
      4:     var options = new ExecutionDataflowBlockOptions() 
      5:         { TaskScheduler = 
      6:             new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler };
      7:     var p1 = new ActionBlock<int>(item => ProcessItem(1, item), options);
      8:     var p2 = new ActionBlock<int>(item => ProcessItem(2, item), options);
      9:     p1.Post(42);
     10:     p2.Post(4711);
     11:     Task.Delay(TimeSpan.FromSeconds(6)).Wait();
     12: }

    If you have what I would describe as the default implementation looking like this:

     13: private async Task ProcessItem(int id, int i)
     14: {
     15:     Console.WriteLine("ProcessItem {0} - Start - {1}", id, i);
     16:     await Task.Delay(TimeSpan.FromSeconds(1));
     17:     Console.WriteLine("ProcessItem {0} - Working - {1}", id, i);
     18:     await Task.Delay(TimeSpan.FromSeconds(1));
     19:     Console.WriteLine("ProcessItem {0} - End - {1}", id, i);
     20: }

    Then the console output looks like this:

    Which is definitely not what you expect coming from the CCR world. As you can see only one task is executed at any given time but as one task is awaiting a delay the other executes. This is bad if you're using the exclusive scheduler options as kind of a lock which is kind of the CCR way. It is however good if you assume that only the code in your handlers needs to be exclusive but any asynchronous tasks they need to complete can be waited on while another handler executes. However if you want your handler to behave more like in CCR this is one way of doing it:

     21: private void ProcessItem(int id, int i)
     22: {
     23:     Console.WriteLine("ProcessItem {0} - Start - {1}", id, i);
     24:     Task.Delay(TimeSpan.FromSeconds(1)).Wait();
     25:     Console.WriteLine("ProcessItem {0} - Working - {1}", id, i);
     26:     Task.Delay(TimeSpan.FromSeconds(1)).Wait();
     27:     Console.WriteLine("ProcessItem {0} - End - {1}", id, i);
     28: }

    In the result you can now see that each handler is completing before the other is executed.

  • Being Cellfish

    ForEach enumeration with index


    Yesterday I did a code review that used a pattern I've seen a couple of times in the past. The pattern is using a foreach statement to enumerate over some collection but doing things where an index variable is important. This is typically done when working with an enumerable and where the cost to convert to a list or array is considered too high since it leads to enumerating the same collection twice. It typically looks something like this:

     1: int i = 0;
     2: foreach (var item in enumerable)
     3: {
     4:     someArray[i] = item.SomeValue;
     5:     someOtherArray[i] = item.OtherValue;
     6:     i++;
     7: }

    The obvious fix would be to create a nice little extension method to deal with this:

     8: public static class EnumerableExtensions
     9: {
     10:     public static void ForEachWithIndex<T>(
     11:         this IEnumerable<T> enumerable, 
     12:         Action<T, int> loopBody)
     13:     {
     14:         int i = 0;
     15:         foreach (var item in enumerable)
     16:         {
     17:             loopBody(item, i++);
     18:         }
     19:     }
     20: }

    Then it struck me that I should be able to do this with LINQ and it turned out to be an interesting exercise. Note that we have to return some dummy value since we're using a select method to transform the data. Also note the call to the Count method in order to force enumeration of the collection causing the selector delegate to execute for each element. This solution however makes me uncomfortable since I don't feel 100% certain that this whole statement will not be optimized into nothing since the return value of the Count method is not used. Only if that value was actually used I would feel good about this solution:

     21: enumerable.Select(
     22:     (item, i) =>
     23:     {
     24:         someArray[i] = item.SomeValue;
     25:         someOtherArray[i] = item.OtherValue;
     26:         return true;
     27:     }).Count();

    However I realized that in most cases where I've seen this kind of pattern, the actual order of execution is rarely important. Hence parallel execution could be the solution and then I actually could do this as a LINQ expression. Here is an implementation of that as an extension method:

     28:     public static void ParallellForEachWithIndex<T>(
     29:         this IEnumerable<T> enumerable,
     30:         Action<T, int> loopBody)
     31:     {
     32:         enumerable
     33:             .Select((item, i) => new { Item = item, Index = i })
     34:             .AsParallel()
     35:             .ForAll(data => loopBody(data.Item, data.Index));
     36:     }

    Of all these options I guess the last one is pretty neat for parallel execution, but I would stick with the non-LINQ extension method for sequential work mainly because it is easier to understand than LINQ for new developers.

Page 1 of 1 (10 items)