Being Cellfish

Stuff I wished I've found in some blog (and sometimes did)

June, 2013

Change of Address
This blog has moved to
  • Being Cellfish

    Asynchronous enumerations - Introduction


    A couple of weeks ago I had a discussion with a co-worker about what would be the proper way to asynchronously iterate over some data in azure tables. Exploring different options was very interesting and let us understand different pros and cons for each asynchronous strategy. So over the next few weeks I'll go over each option we looked at in more detail. But first a little background so you understand the basic problem.

    When you retrieve rows form an azure table you can obviously get all rows first and then do what ever you need to do on the result. However if that is a lot of rows it means a (relatively) long waiting time to get the data that also will use a lot of memory. If you do not need all that data at once or if the first thing you do is some kind of filtering on the data that results in only a few records being left in the final collection you want to execute on, then an asynchronous retrieval of records would be a good thing. If you know a little about azure you know that when you retrieve records from azure tables you may or may not get a continuation token. Something that most people never deal with manually. But in this case you might want to by for example using the BeginExecuteSegmented method. That method will give you an asynchronous access to zero or more rows at a time.

    Already you can see that there are several different scenarios to take into account. Do you need all the records together for your processing? Can you process each record individually? Do you need to scan a lot of records only filtering out a few? Depending on what you need I believe your choice of "asynchronous enumeration" will differ. But be assured; I'll help you pick the right one for you! The options I will cover in the next few weeks are:

    • Get the whole enumeration asynchronously, a.k.a. Task<IEnumerable<T>>
    • Get each item asynchronously, a.k.a. IEnumerable<Task<T>>
    • Use reactive extensions, a.k.a. IObservable<T>
    • Get segments asynchronously, a.k.a. IEnumerable<Task<IEnumerable<T>>>
    • Create your own custom solution, a.k.a. MyEnumerationAsync


  • Being Cellfish

    "Mocks: the code smell"


    So I stole the title from this talk:

    I have seen Arlo argue for his tests with simulators and I've always felt I shared his view on mocks. Or at least I share what I think is his view; that they should be avoided and only used under special circumstances. Kind of like nuclear weapons... But that is not important. My point was that before this talk I never understood exactly what Arlo was trying to say. What I learned from this talk made sense. Let me try to boil it down for you.

    Designing an API where dependencies are given as interfaces is just bundling a bunch of delegates with a common name (IMyInterface). But when you test something you rarely use all the methods in your interface in every test. You also need to create an object that implements that interface. If you instead just use delegates for your dependencies then each API can easily be tested by only providing exactly what that API needs. Down to individual methods. I'm sure this makes writing unit tests way easier. Actually I know that...

    However there is another thing that I think is not mentioned in the talk. If you provide all your dependencies as individual delegates you will sooner feel that an object has way to many dependencies and you will break it up. Think about it. An object or method that takes a single interface is not that bad. Not even if the interface has ten methods on it. But faking that sucker is going to be a pain unless you know for sure which of the ten methods you do not need to provide. But if the object or method takes ten delegates it is obvious that maybe it should be broken up in more smaller pieces. A design that is most likely to be preferred. Hence this pattern kind of encourages the same thing as object calisthenics rule #6; keep entities small.

  • Being Cellfish

    Don't make it hard to trust your code


    I recently learned that an API I had been using for a project was lying to me. Maybe not intentionally but anyway. I think that a fundamental rule in software development is that you must trust that the methods you call at least try to do what they say they do. For example if a method is called "SavePerson" I should assume that it will save a person and nothing else except maybe some logging. If the method is called "SavePersonAsync" it should be safe to assume that it will be an asynchronous operation and that it will not block my execution. And if the method takes a callback method I think it should be safe to assume that it is an asynchronous operation and that it will not block my execution.

    Well in my case the API I was using had a method that took a callback function as an argument. But it turned out that the method would connect synchronously and then perform the operation I requested asynchronously. So what do you think happened when the component couldn't connect? You guessed right! It blocked my execution on a thread where I assumed I did not do anything blocking causing big performance problems!

    It doesn't matter if you document this. The most important documentation you have is your method name and the patterns you use. In my opinion if something takes a callback then it is non-blocking!

  • Being Cellfish

    Which collection interface do I use?


    ReSharper has a warning that I thought came from FxCop that is so important I wish it was an FxCop warning. It warns you if you try to enumerate an IEnumerable twice. This is important when you take a collection as an argument and you do not know how that collection is created. For example if the collection is created by a method that just yield returns a few times, the second enumeration of that collection will never return any elements. Another more common and possible worse case is when the collection is created from a database query or similar so that a second enumeration will talk to the database a second time. Something that can be devastating for performance in your application.

    So when you need to take a collection as an argument in your interface the guideline is actually very easy. Since it is an argument you know how you will use it. If you just need to enumerate once then you should take an IEnumerable as your argument since it gives your callers maximum flexibility. If you need to enumerate twice then ICollection should be your choice. That is for all your generic needs. Occasionally you actually want a the specific collection classes (Dictionary, HashSet, Queue etc) in which case you obviously should use those, but if you can use IEnumerable or ICollection you should in my opinion.

    So what do you do when you return a collection? Here tend to prefer ICollection over anything else. Returning an IEnumerable when the collection already is a List or similar typically just causes the callers to convert the IEnumerable to a list so that they safely can enumerate it multiple times. And you need to be extra careful when creating public APIs that will be spread and used a lot since in that case you don't want to change your API (naturally you could add a new method so that you have one that returns ICollection by wrapping the IEnumerable one or vice versa).

Page 1 of 1 (4 items)