Several years ago I told a story about a project that proved that estimation was a waste of time once the work was split up in small enough chunks. Not necessarily equal size of chunks but manageable size chunks. But sometimes you need to do estimations because somebody wants to know the cost of some project or large feature.
Read more here.
1973 days ago I moved away from my private blog and started writing here. I have now decided that it is time to reclaim the control over the platform I'm using. That is why I will be posting here (blog.cellfish.se) from now on. I'll continue to cross post a summary here for some time for your convenience.
A long time ago I mentioned that triggers is rarely a good option in your database. I recently read this article (requires free registration) that shows you a solution to prevent developers from updating a full table (or deleting it all) by mistake because they failed to ad a WHERE clause to their query. Is this a good reason to add triggers to your database? I think not! There are so many things wrong here in my opinion...
First of all the script proposed the example have the same trigger duplicated three times (once for update, delete and update/delete triggers each) when that could have been avoided easily since the script is just building a string to run anyway. So from the start the script proposed could easily contain bugs depending on the type of trigger added.
Second this is only protecting against the most simple mistakes where you forget your where clause. What if a query has a where clause like "WHERE ID > 1" (assuming there is at least one record not satisfying that condition. You are probably still miserable.
I would still say that the use of stored procedures and proper security setup so that users can only execute stored procedures and not modify data is the solution. That way the stored procedure can be reviewed. Even checked automatically for things like always having a where clause. The stored procedure can also be tested with some unit tests. All that should be enough and better than adding triggers that gives you some false sense of protection. And remember your backups in the case you failed to catch something like this earlier. But that goes without saying whenever you work with a database I think.
Any kind of backups other than straight file copies have Always been magic to me. Mostly because in the early years of my career it was so rare that the backups actually worked when needed because nobody evern tried the backups to see if they were actually working... This post (free registration needed to read) goes over some SQL server myths that are backup related. Way to many screen shots for my taste but interesting.
Last weekend I helped my father-in-law (who is hosting his company web site on azure websites) troubleshoot why he was no longer getting any more data in the web server logs. Logs that he uses to analyze what content is popular. Well it turned out that you should read the FAQ... Once your logs reach 35MB, they are no longer updated until you clean up the directory. I guess it kind of makes sense because I did think that it was weird that you could just have any amount of logs for your website for free. But there is no free lunch I guess... I just wish the user interface where you turn this thing on would mention this in the little help popup because there are probably a lot of people out there that don't read the FAQ and just assumes the data will just be there...
Also the logs didn't just start to magically appear due to some bug mentioned here. I had to change the size of the VM temporarily and then back in order to get the logs going again.
So what did we end up to do after the discussion two months ago? Well we ended up using option #1 mostly because it simplified things to have all items available during processing. technically not needed, but it simplified things.
My generic advice would be to use option #1 or #2 and to favor option #2 whenever possible since #1 can easily be implemented wrapping #2. If your project is already heavy on using Rx, then and only then would I consider option #3.
Here are the links to previous parts for your convenience:
For background, read the introduction.
If you have no need for filtering using LINQ (or Rx) and you do not want to expose an IEnumerable you can create tour own "IEnumeratorAsync" that asynchronously fetches the next item. Again this is something I would advice against. First of all you are loosing the option to use constructs like foreach since you do not use standard interfaces. Second you loose the option of using LINQ even for simple things.
This is really just a variant of option #2 using the fact that you use azure tables and know that you will get several items at a time. In my opinion you shouldn't even consider this option since it uses a feature of the underlying storage layer and compared to IEnumerable<Task<T>>you do not get any other benefits other than that you create fewer Tasks. Compared to the time it will take you to actually retrieve the data items, the overhead of creating a task for each item is negligible.
In part 2 I briefly mentioned that if you have simple filtering needs then you might implement your own LINQ-like methods. Well if you have more advanced filtering needs there is already something out there to help you; Reactive Extensions (Rx). The only thing you have to do is to expose your data result as an IObservable<T>. So are there no disadvantages with this approach? Well I think there are several.
As you can see; while Rx is a great fit for asynchronous enumerations (because it is what it really is), it uses clever constructs rather than language support to achieve this. Things that are clever often end up complex rather than simple and complex stuff typically breaks. But if your project already uses Rx a lot then it's a no-brainer; use Rx!
This is essentially the opposite of option #1. With this approach you are optimized for processing one item at a time. Or even process all items in parallel! So this is a very good option if you do not need all your data at once and you also have a large number of rows to work on. This option is also very flexible if you have a few scenarios where you actually need all the data since you can always do a Task.WhenAll to wait for all data to be available.
The only big drawback with this approach is filtering using LINQ. Since it is an enumeration of tasks there is no way for you to asynchronously wait for an item in order to for example filter on its value using LINQ. Hence this is not a good option if you need to do a lot of filtering on your items. If you have just a few simple filters it might be worth implementing your own filters like WhereAsync and SelectAsync. All that together miht be worth it given your data and scenarios.