So what did we end up to do after the discussion two months ago? Well we ended up using option #1 mostly because it simplified things to have all items available during processing. technically not needed, but it simplified things.
My generic advice would be to use option #1 or #2 and to favor option #2 whenever possible since #1 can easily be implemented wrapping #2. If your project is already heavy on using Rx, then and only then would I consider option #3.
Here are the links to previous parts for your convenience:
For background, read the introduction.
If you have no need for filtering using LINQ (or Rx) and you do not want to expose an IEnumerable you can create tour own "IEnumeratorAsync" that asynchronously fetches the next item. Again this is something I would advice against. First of all you are loosing the option to use constructs like foreach since you do not use standard interfaces. Second you loose the option of using LINQ even for simple things.
This is really just a variant of option #2 using the fact that you use azure tables and know that you will get several items at a time. In my opinion you shouldn't even consider this option since it uses a feature of the underlying storage layer and compared to IEnumerable<Task<T>>you do not get any other benefits other than that you create fewer Tasks. Compared to the time it will take you to actually retrieve the data items, the overhead of creating a task for each item is negligible.
In part 2 I briefly mentioned that if you have simple filtering needs then you might implement your own LINQ-like methods. Well if you have more advanced filtering needs there is already something out there to help you; Reactive Extensions (Rx). The only thing you have to do is to expose your data result as an IObservable<T>. So are there no disadvantages with this approach? Well I think there are several.
As you can see; while Rx is a great fit for asynchronous enumerations (because it is what it really is), it uses clever constructs rather than language support to achieve this. Things that are clever often end up complex rather than simple and complex stuff typically breaks. But if your project already uses Rx a lot then it's a no-brainer; use Rx!
This is essentially the opposite of option #1. With this approach you are optimized for processing one item at a time. Or even process all items in parallel! So this is a very good option if you do not need all your data at once and you also have a large number of rows to work on. This option is also very flexible if you have a few scenarios where you actually need all the data since you can always do a Task.WhenAll to wait for all data to be available.
The only big drawback with this approach is filtering using LINQ. Since it is an enumeration of tasks there is no way for you to asynchronously wait for an item in order to for example filter on its value using LINQ. Hence this is not a good option if you need to do a lot of filtering on your items. If you have just a few simple filters it might be worth implementing your own filters like WhereAsync and SelectAsync. All that together miht be worth it given your data and scenarios.
If you need all data before you start processing it or if you expect just a few records each time, then this is probably your simplest way to get an "asynchronous enumeration". Technically the enumeration is not asynchronous since you retrieve all data asynchronously first and then process it. You can also easily use LINQ on the enumeration for processing which is very nice.
The only real drawback is that you're not flexible in case you one day will retrieve a lot of data (or no longer need al data at once for processing). If you have those needs from day one, then this is not the option for you I think.
A couple of weeks ago I had a discussion with a co-worker about what would be the proper way to asynchronously iterate over some data in azure tables. Exploring different options was very interesting and let us understand different pros and cons for each asynchronous strategy. So over the next few weeks I'll go over each option we looked at in more detail. But first a little background so you understand the basic problem.
When you retrieve rows form an azure table you can obviously get all rows first and then do what ever you need to do on the result. However if that is a lot of rows it means a (relatively) long waiting time to get the data that also will use a lot of memory. If you do not need all that data at once or if the first thing you do is some kind of filtering on the data that results in only a few records being left in the final collection you want to execute on, then an asynchronous retrieval of records would be a good thing. If you know a little about azure you know that when you retrieve records from azure tables you may or may not get a continuation token. Something that most people never deal with manually. But in this case you might want to by for example using the BeginExecuteSegmented method. That method will give you an asynchronous access to zero or more rows at a time.
Already you can see that there are several different scenarios to take into account. Do you need all the records together for your processing? Can you process each record individually? Do you need to scan a lot of records only filtering out a few? Depending on what you need I believe your choice of "asynchronous enumeration" will differ. But be assured; I'll help you pick the right one for you! The options I will cover in the next few weeks are:
So I stole the title from this talk:
I have seen Arlo argue for his tests with simulators and I've always felt I shared his view on mocks. Or at least I share what I think is his view; that they should be avoided and only used under special circumstances. Kind of like nuclear weapons... But that is not important. My point was that before this talk I never understood exactly what Arlo was trying to say. What I learned from this talk made sense. Let me try to boil it down for you.
Designing an API where dependencies are given as interfaces is just bundling a bunch of delegates with a common name (IMyInterface). But when you test something you rarely use all the methods in your interface in every test. You also need to create an object that implements that interface. If you instead just use delegates for your dependencies then each API can easily be tested by only providing exactly what that API needs. Down to individual methods. I'm sure this makes writing unit tests way easier. Actually I know that...
However there is another thing that I think is not mentioned in the talk. If you provide all your dependencies as individual delegates you will sooner feel that an object has way to many dependencies and you will break it up. Think about it. An object or method that takes a single interface is not that bad. Not even if the interface has ten methods on it. But faking that sucker is going to be a pain unless you know for sure which of the ten methods you do not need to provide. But if the object or method takes ten delegates it is obvious that maybe it should be broken up in more smaller pieces. A design that is most likely to be preferred. Hence this pattern kind of encourages the same thing as object calisthenics rule #6; keep entities small.
I recently learned that an API I had been using for a project was lying to me. Maybe not intentionally but anyway. I think that a fundamental rule in software development is that you must trust that the methods you call at least try to do what they say they do. For example if a method is called "SavePerson" I should assume that it will save a person and nothing else except maybe some logging. If the method is called "SavePersonAsync" it should be safe to assume that it will be an asynchronous operation and that it will not block my execution. And if the method takes a callback method I think it should be safe to assume that it is an asynchronous operation and that it will not block my execution.
Well in my case the API I was using had a method that took a callback function as an argument. But it turned out that the method would connect synchronously and then perform the operation I requested asynchronously. So what do you think happened when the component couldn't connect? You guessed right! It blocked my execution on a thread where I assumed I did not do anything blocking causing big performance problems!
It doesn't matter if you document this. The most important documentation you have is your method name and the patterns you use. In my opinion if something takes a callback then it is non-blocking!
ReSharper has a warning that I thought came from FxCop that is so important I wish it was an FxCop warning. It warns you if you try to enumerate an IEnumerable twice. This is important when you take a collection as an argument and you do not know how that collection is created. For example if the collection is created by a method that just yield returns a few times, the second enumeration of that collection will never return any elements. Another more common and possible worse case is when the collection is created from a database query or similar so that a second enumeration will talk to the database a second time. Something that can be devastating for performance in your application.
So when you need to take a collection as an argument in your interface the guideline is actually very easy. Since it is an argument you know how you will use it. If you just need to enumerate once then you should take an IEnumerable as your argument since it gives your callers maximum flexibility. If you need to enumerate twice then ICollection should be your choice. That is for all your generic needs. Occasionally you actually want a the specific collection classes (Dictionary, HashSet, Queue etc) in which case you obviously should use those, but if you can use IEnumerable or ICollection you should in my opinion.
So what do you do when you return a collection? Here tend to prefer ICollection over anything else. Returning an IEnumerable when the collection already is a List or similar typically just causes the callers to convert the IEnumerable to a list so that they safely can enumerate it multiple times. And you need to be extra careful when creating public APIs that will be spread and used a lot since in that case you don't want to change your API (naturally you could add a new method so that you have one that returns ICollection by wrapping the IEnumerable one or vice versa).