Being Cellfish

Stuff I wished I've found in some blog (and sometimes did)

Change of Address
This blog has moved to
  • Being Cellfish

    Asynchronous enumerations - Introduction


    A couple of weeks ago I had a discussion with a co-worker about what would be the proper way to asynchronously iterate over some data in azure tables. Exploring different options was very interesting and let us understand different pros and cons for each asynchronous strategy. So over the next few weeks I'll go over each option we looked at in more detail. But first a little background so you understand the basic problem.

    When you retrieve rows form an azure table you can obviously get all rows first and then do what ever you need to do on the result. However if that is a lot of rows it means a (relatively) long waiting time to get the data that also will use a lot of memory. If you do not need all that data at once or if the first thing you do is some kind of filtering on the data that results in only a few records being left in the final collection you want to execute on, then an asynchronous retrieval of records would be a good thing. If you know a little about azure you know that when you retrieve records from azure tables you may or may not get a continuation token. Something that most people never deal with manually. But in this case you might want to by for example using the BeginExecuteSegmented method. That method will give you an asynchronous access to zero or more rows at a time.

    Already you can see that there are several different scenarios to take into account. Do you need all the records together for your processing? Can you process each record individually? Do you need to scan a lot of records only filtering out a few? Depending on what you need I believe your choice of "asynchronous enumeration" will differ. But be assured; I'll help you pick the right one for you! The options I will cover in the next few weeks are:

    • Get the whole enumeration asynchronously, a.k.a. Task<IEnumerable<T>>
    • Get each item asynchronously, a.k.a. IEnumerable<Task<T>>
    • Use reactive extensions, a.k.a. IObservable<T>
    • Get segments asynchronously, a.k.a. IEnumerable<Task<IEnumerable<T>>>
    • Create your own custom solution, a.k.a. MyEnumerationAsync


  • Being Cellfish

    "Mocks: the code smell"


    So I stole the title from this talk:

    I have seen Arlo argue for his tests with simulators and I've always felt I shared his view on mocks. Or at least I share what I think is his view; that they should be avoided and only used under special circumstances. Kind of like nuclear weapons... But that is not important. My point was that before this talk I never understood exactly what Arlo was trying to say. What I learned from this talk made sense. Let me try to boil it down for you.

    Designing an API where dependencies are given as interfaces is just bundling a bunch of delegates with a common name (IMyInterface). But when you test something you rarely use all the methods in your interface in every test. You also need to create an object that implements that interface. If you instead just use delegates for your dependencies then each API can easily be tested by only providing exactly what that API needs. Down to individual methods. I'm sure this makes writing unit tests way easier. Actually I know that...

    However there is another thing that I think is not mentioned in the talk. If you provide all your dependencies as individual delegates you will sooner feel that an object has way to many dependencies and you will break it up. Think about it. An object or method that takes a single interface is not that bad. Not even if the interface has ten methods on it. But faking that sucker is going to be a pain unless you know for sure which of the ten methods you do not need to provide. But if the object or method takes ten delegates it is obvious that maybe it should be broken up in more smaller pieces. A design that is most likely to be preferred. Hence this pattern kind of encourages the same thing as object calisthenics rule #6; keep entities small.

  • Being Cellfish

    Don't make it hard to trust your code


    I recently learned that an API I had been using for a project was lying to me. Maybe not intentionally but anyway. I think that a fundamental rule in software development is that you must trust that the methods you call at least try to do what they say they do. For example if a method is called "SavePerson" I should assume that it will save a person and nothing else except maybe some logging. If the method is called "SavePersonAsync" it should be safe to assume that it will be an asynchronous operation and that it will not block my execution. And if the method takes a callback method I think it should be safe to assume that it is an asynchronous operation and that it will not block my execution.

    Well in my case the API I was using had a method that took a callback function as an argument. But it turned out that the method would connect synchronously and then perform the operation I requested asynchronously. So what do you think happened when the component couldn't connect? You guessed right! It blocked my execution on a thread where I assumed I did not do anything blocking causing big performance problems!

    It doesn't matter if you document this. The most important documentation you have is your method name and the patterns you use. In my opinion if something takes a callback then it is non-blocking!

  • Being Cellfish

    Which collection interface do I use?


    ReSharper has a warning that I thought came from FxCop that is so important I wish it was an FxCop warning. It warns you if you try to enumerate an IEnumerable twice. This is important when you take a collection as an argument and you do not know how that collection is created. For example if the collection is created by a method that just yield returns a few times, the second enumeration of that collection will never return any elements. Another more common and possible worse case is when the collection is created from a database query or similar so that a second enumeration will talk to the database a second time. Something that can be devastating for performance in your application.

    So when you need to take a collection as an argument in your interface the guideline is actually very easy. Since it is an argument you know how you will use it. If you just need to enumerate once then you should take an IEnumerable as your argument since it gives your callers maximum flexibility. If you need to enumerate twice then ICollection should be your choice. That is for all your generic needs. Occasionally you actually want a the specific collection classes (Dictionary, HashSet, Queue etc) in which case you obviously should use those, but if you can use IEnumerable or ICollection you should in my opinion.

    So what do you do when you return a collection? Here tend to prefer ICollection over anything else. Returning an IEnumerable when the collection already is a List or similar typically just causes the callers to convert the IEnumerable to a list so that they safely can enumerate it multiple times. And you need to be extra careful when creating public APIs that will be spread and used a lot since in that case you don't want to change your API (naturally you could add a new method so that you have one that returns ICollection by wrapping the IEnumerable one or vice versa).

  • Being Cellfish

    How would I test a WebAPI controller


    Kind of related to my previous post, this article on how to test ASP.Net WebAPI controllers made me think. As you can see from the article it is fairly easy to get your controller under test, but it does take some work to get everything setup properly. And I have never tested my WebAPI controllers like that.

    First of all my experience is that since WebAPI provides an MVC model it is very tempting to write your application logic as controllers. I mean, that is what MVC is all about. However almost all MVC and MVVM frameworks I've been exposed to end up with some dependency that is very specific to the target application (ex: HttpResponseMessage in WebApi). I like my application logic to be 100% isolated from presentation which means that the controller of WebAPI is not the place for my application logic. The only thing the WebAPI controller should do in my opinion is to translate between HTTP and my application logic.

    This creates a situation where I can test my application logic easily and if necessary reuse it in different types of applications (WebAPI vs Console apps vs WinForm apps). The WebAPI layer then becomes so thin that I can feel confident that it works even without any unit tests. That is how I would do it... However if I have some extra time to spend on writing unit tests, adding a test or two that verifies the conversion between HTTP and my application logic is not so bad either and the article mentioned above is a good help. But that would be icing on the cake.

  • Being Cellfish

    If it is hard to test, you did something wrong


    I've often been asked questions like how would you test this or been told that there cannot be unit test for some code because it is impossible to test. Well, my opinion is that if something is hard to test it is all your fault. You designed it, you implemented it and hence it is your fault it is hard to test.

    The situation in which people have the hardest time accepting that it is their own fault is when dealing with external dependencies. They say things like "I'm using this other module that does not provide a nice interface". And then they typically revert to some mocking framework to cleanup their mess. Most mocking frameworks today are powerful enough through reflection or even binary instrumentation that they magically can make untestable code very untestable.

    Rather than using powerful tools as my savior, I use abstractions. Whenever I depend on something that is not easy to test and that I do not have the power to change I hide it behind an abstraction; an interface or just a class with a few virtual methods I can override in my tests. Is this abstraction just another level of indirection? I do not believe so. Isolating dependencies makes it possible for me to implement something the way I want it to be and then deal with the behavior of the dependency separately. Here is an example from real life:

    I needed to implement a feature that depended on two components that really didn't lend them selves for easy faking. So I created two new interfaces representing the functionality I needed for my feature. As I started to integrate the first component it became clear that the interface I needed was not provided by the dependency in an easy way. I needed to put some logic in there to make it work the way my feature expected it. So I created a new interface representing the dependency functionality and made that a dependency of a class that added the logic needed to transform dependency results into what my feature needed. I did not expect this but it was easy given my design and the fact I did not know exactly how the dependency worked in the beginning did not stop me from being productive. My second dependency turned out to behave the way I expected and the interface I invented could easy be used to wrap the second dependency.

    The key is that the wrapper for each external dependency should be thin. Preferably one-liners so there is no need to test them. That does not mean the method signature needs to be the same as your dependency. I often make my wrapper signature simpler than what the dependency actually needs.

    And remember; if it is hard to test - You did something wrong!

  • Being Cellfish

    Collection initializers not doing what you expect


    Let's assume that you have a class that have a collection property and that you want the default for that collection to be to actually have a value. That class might look like this:

     1: class Foo
     2: {
     3:     public Foo()
     4:     {
     5:         this.Numbers = new List<int> { 4711 };
     6:     }
     7:     public ICollection<int> Numbers { get; set; }
     8: }

    Now somewhere else in your code you want to override the collection value and set some other value so you do this:

     9: var foo= new Foo { Numbers = { 42 } };

    So what is now the value of the property Numbers? Turns out it actually holds both the initial value and the new value rather than just the one value you want. The way to do it if you only want one value is this:

     9: var foo= new Foo { Numbers = new List<int> { 42 } };

    One way to try and avoid this is to not initialize the Numbers property with a list but rather an array like this:

     5:         this.Numbers = new[] { 4711 };

    But that might not always be possible so you might be better off by not exposing your collections like this and only provide an IEnumerable property for enumeration and do all manipulation through separate methods and not properties.

  • Being Cellfish

    Task-based Asynchronous Pattern - kick starter


    Regardless of if you are new to TAP (Task-based Asynchronous Pattern aka "async/await") or have been doing it for a while this presentation from an MVP summit in February (2013) serves both as a good introduction explaining how it works as well as providing deeper knowledge and high-lighting a few common problems. It's a short presentation so definitely worth downloading and reading!

  • Being Cellfish

    How to know when the garbage collector is not helping you


    A while back I did an experiment where it turned out that allocating objects was better than pooling them. Since then I have encountered a few times where allocating actually turned out to be a bad thing. I've never seen this being a problem in a client application, but in servers allocating a lot of objects can be a problem even if the objects are very short lived. What happened to me was that so many objects were created that the garbage collector wanted to run several times a second to try and clean things up. Each time a few of these short lived objects would escape from generation zero to generation one making any following garbage collections more expensive. This time I was lucky because the object in question was a byte array buffer that could be both reduced and reused with some simple logic.

    What might be harder is when you add async/await and tasks into the mix. The reason is that async/await is very good at making your async code easy to understand but at the same time it can create a lot of objects if you just do the naive implementation of what you want to do. However the naive approach is still the preferred one I think since it reduces the risk of creating something that does not do what you want under all circumstances. But it is always good to know what is happening and that is why you should read this article that explains how to use the memory profiler and as an example look at an example with tasks.

  • Being Cellfish

    Using T4 to eliminate maintenance


    I like to abstract diagnostics (logging and performance counters) into a separate interface or abstract class. But it becomes tedious to manually keep the fake diagnostics (used to test that that proper diagnostics calls are made), dummy diagnostics (when I just need something to pass around), console loggers (used for command line applications) and production logging in sync with the interface. And it is boring to do manually since the first three essentially have the same implementation for all methods.

    So what I've started to do is to use T4 to generate those three files but also the interface. Production logging is still updated manually but I think it is important to consider exactly what events and performance counters should be triggered by each event. And I want to keep my diagnostics definition simple. So this is what I do:

    • I created a include file in a common place (solution item) that defines some common classes to generate method declarations and method calls.
    • I created a include file in a common place (solution item) that defines my interface.
    • I created one template (where needed, typically in different projects) to generate the interface, fake, dummy and console logger.

    I find it easy to work with the T4 template files (even though it feels a little like you're back to classical PHP/ASP style programing, but it does remove some boring tasks. And the error messages you get when you have errors in your template have been pretty good for me so it has been easy to correct the problems.

    The only downside with this is that when you change the include file you need to transform the templates manually in visual studio. Unless you install an SDK that defines a build target you can use. This Is because the T4 transformation happens when the template file is saved, not when one of its dependencies change. I did however find this clever way of getting around it which is essentially running the tool manually as a build step. I prefer that over forcing additional SDKs to be installed by everyone including the build servers. I only hope this will be just working in a future version if VS.

Page 4 of 49 (482 items) «23456»