Being Cellfish

Stuff I wished I've found in some blog (and sometimes did)

Change of Address
This blog has moved to
  • Being Cellfish

    Analyzing logs from Azure web sites


    I recently played around with Azure web sites and wanted to analyze the IIS logs generated by azure but none of the tools I tried could parse the file I downloaded. Turned out that the header line of the file that looks like this:

    # date time s-sitename cs-method ...

    That is apparently not correct W3C log format. once I change the line to this:

    #Fields: date time s-sitename cs-method ...

    With that change all the tools I tried were happy with the log files.

  • Being Cellfish

    Factory pattern improved


    My impression of most major west coast cities like Seattle, San Francisco, Los Angeles etc, is that people in general are very healthy. And Redmond where Microsoft have its HQ is even the bicycle capital of the north west (I guess anything can be the capital of anything if you just constrain geography in a convenient way). And people who like to exercise also typically like to be environment friendly. And in the spirit to make applications more environmentally friendly there are a few new patterns you need to learn. I have been using these new patterns together with a co-worker for exactly one year now. The first pattern is a replacement for the polluting factory pattern; the farmers market pattern!

    The farmers market pattern is similar to the factory pattern in that it is used to create other objects, but instead of having all the bad properties of a factory such as being far away from the object and not caring about the object's community. Factories also tend to keep the benefit they create for themselves. The farmers market pattern gives you a local object that can create your objects on a local and hence more environmentally friendly way. Here is what it looks like in it's most simple form:

     1: public class Foo
     2: {
     3:     private Foo()
     4:     {
     5:     }
     7:     public static class FarmersMarket
     8:     {
     9:         public static Foo Create()
     10:         {
     11:             return new Foo();
     12:         }
     13:     }
     14: }

    Having a the farmers market class local like this will save CPU cycles during compilation making sure your application is produced with minimal carbon dioxide footprint. Tomorrow I'll show you the hybrid pattern.


  • Being Cellfish

    Using HTML as your web service format


    In the past I've seen examples of clever use of style sheets to turn XML responses into browsable results that are human readable and more important; human navigable. But this was the first time I heard about using HTML straight up to create the content of a web service API. It is interesting that there is so much focus of defending HTML as shorter than JSON when I think size should not really be a factor here. Sure, mobile clients want smaller payload but with faster and faster networks and clients with more memory the availability of parser are far more important. but if we can assume that all client have both JSON and XML parsers that is not a deciding factor. I think the most important factor is what the client wants. The linked article mentions that using compressed payload is common and as such it is used as an argument why HTML is not really larger than JSON. But whether or not compression is used is something the clients requests in its request headers. Which content type a client can handled is also an HTTP header so I think it is a no-brainer for a web service to support multiple content types if needed by its clients. Supporting both XML and JSON should be trivial these days and implementing a HTML serializer should be trivial.

    I also think the format the author used for HTML was a little bit surprising. My first thought was to convert the example JSON used into something like this:


    That even turned out to be slightly smaller than what was suggested in the article. Not that size matters...

  • Being Cellfish

    Analysing code coverage


    I was recently asked to look at a project that had around 60% code coverage and was asked to give my recommendations on what area to focus on to increase the code coverage. There were a lot of unit tests and actually there was around 10% more unit test code than production code so I was a little surprised that the coverage was as low as 60%. Obviously something must be wrong with the tests. A closer look showed that the code that implemented the logic of the application had very high coverage in general and that only the code dealing with external dependencies had low coverage. Also there wasn't that much code so the boilerplate code just to initialize the application was around 15% of the code. The code dealing with external dependencies and that low coverage was slightly less than 10%. So all in all 60 of the possible 75% was covered. Relatively speaking that is 80% - a number I've often heard managers be very happy with (but remember code coverage itself is a shady property to measure).

    So what was missing? Well there were a few cases where a few small classes completely were missing unit tests so they would be easy to fix. There were also a bunch of classes that had unit tests exclusively for the happy path of execution and those can be fixed easily too. But I still think it would be possible to bring coverage up above 90%. How is that possible you might think? Well, the boiler plate code had some, but more important the code dealing with external dependencies had a lot of room for improvement. They were much more than simple pass through objects needing no testing and with a simple refactoring like extracting the external call as a protected virtual method of the class you could easily add unit tests for a huge portion of that code to I think.

    So my conclusion? The original coverage was not bad. It was covering the important pieces and functional tests (not used when I analysed code coverage) covered a big portion of the code not covered by unit tests. So the situation was not bad. But when you use a code coverage report the right way and look at what and why it is not covered you can always find areas of improvement.

  • Being Cellfish

    Certificates are hard


    Almost to the day, Azure had another certificate related outage. Last year was more interesting I think and this year it was something different. My initial guess (remember I don't work for Azure nor do I have any knowledge about the details other than what has been communicated to the public) was that a few years ago when they were first creating Azure they generated some certificates to be used with SSL. My guess was that the certificates generated were valid for a long time. Probably like 5 years or something. Since the certificates were created long before azure was available to customers nobody thought about adding monitoring to detect when the certificates expired. Turns out I was wrong and that there were good alerting in place, but that the process had other flaws.

    So how can you prevent this from happening in your Project? I would suggest you do the following:

    • During development use certificates with very short lifetime. Just one or two months. This way you get used to your certificates expiring and the monitoring you add to warn you when a certificate is about to expire will be tested on a regular basis.
    • When you do a risk analysis of your system remember to be detailed and consider the implications of your certificates expiring.
    • Learn from others! When things like this happens to the big companies, think about if it could happen to you.

    The funniest thing about this outage is an article in Australia thinking this was caused by hackers... I guess I shouldn't get surprised that news papers don't check their facts...

  • Being Cellfish

    An offshoring experiment


    Last week I read this article on an experiement a team did to compare off-shoring to co-location of the team. Pretty good summarizes my experiences. I've only ever seen two acceptable off-shoring projects.

    The first one was not really off-shoring. It was when I first joined Microsoft and our team was located in Sweden and one by one the team members moved over to the US so for a while half the team was in the US and half in Sweden. We definitely saw some of the bad things of a distributed team and we definitely saw some of the good things with having different time zones since severe bugs could be worked on around the clock. But I think the main reason this worked was because we were a co-located team that got distributed so we were already working well together and knew each other so syncing across the pond was easier. Still an overhead, but not as bad as I imagine hiring random people across the world.

    The second good implementation I saw was a partner company my old (pre Microsoft) company had. What hey did is that they flew in parts of their off-shore team a few months each year and had them work co-located with the rest of the team. This way the team got to know each other better and the team culture in Sweden could spread to the off-shore team. The second thing they did was to give the off-shore team separate things to work on as much as possible making daily synchronization between the different time zones minimal.

  • Being Cellfish

    The green February revolution


    I read this great article "the February Revolution" listing four things that tend to happen when great results are achieved. The thing that interested me the most was how similar at least three of the items were to how a successful military unit operates.

    • People know why they are doing their work. This makes me think of how a good order is communicated in the military. Everybody in the unit should know why they need to do what they do. I've seen both projects and military operations fall apart because the team did not understand why they were doing whatever they were doing.
    • Organizations focus on delivering outcomes and impacts rather than features. Again, successful military operations focus on the goal rather than the how. back to giving orders again, it is about telling everybody what the goal is rather than giving step by step instructions. I've seen my squad leaders completely misunderstand an order I give in the heat of a moment but given the initial goal I gave everybody the end result tends to turn out the way I wanted.
    • Teams decide what to do next based on immediate and direct feedback from the use of their work. So this one where I didn't immediately see a similarity with the military. The explanation in the article makes a lot of sense but it is not the first thing the one-liner makes me think. I think the spirit is to empower the team, something (deja vu anybody?) that successful military organizations do too.
    • Everyone cares. One of the most boring parts of being in the military is that after an exercise you need to do a lot of maintenance on guns, vehicles, tents etc. In some units I've been people just do as they are told and often several individuals take a looooong time to complete their first task so that they don't have to do anything else. This way it takes forever to get done. In other units everybody just picks up whatever needs to be done, ask if they can help etc. Probably no surprise but the latter unit typically finishes their work long before the former.

    Though listed last, the everyone cares is the most important point of all these. If everyone cares that is probably enough to achieve great things. But it is also the most difficult one to achieve since it relies on each individual to actually care... In my experience I've seen that if the leaders (formal and informal) of a group cares and show that they care, the rest of the group typically start to care pretty quickly too. This takes the form of both taking action but also rewarding the team for caring.

  • Being Cellfish

    Implementing a good GetHashCode


    If you've ever implemented GetHashCode you probably did it the way suggested in MSDN which is using XOR. And if you use R# you might have seen that it generates a different GetHashCode using prime numbers. So what should you do? I think there are three properties you want to aim for when it comes to implement GetHashCode, in order of importance.

    1. GetHashCode must be consistent. This should be a no-brainer. It simply means that for two objects that are equal, the should generate the same hash code every time.
    2. GetHashCode should provide random distribution output. Random here just means that it appears to be random, not that it is truly random since that would break rule number one. Also the output of GetHashCode is only an integer. That is only 32 bits. So if your object has a lot more possible states there will be hash collisions and you don't want them to be likely. For example if you have an object that consists of two different integers that you just XOR together, then (a ^ b) and (b ^ a) will return the same hash. You most definitely don't want this.
    3. GetHashCode should be fast. Since GetHashCode is used in a lot of collection classes you don't want this method to be slow.

    Given all this it turns out that the approach R# uses is pretty powerful into achieving all these goals. But the best possible approach is to use pick an algorithm that is suitable for your data. A few examples:

    • If your object have up to int.MAX possible values then a simple bit shift is probably a good way. Example: our object have two short values: GetHashCode() { return (a << 16) | b; }
    • If your object consists of other objects that generate good distribution hash codes and XOR or just multiplying them together is likely good enough for you. Please note that the GetHashCode for a lot of basic types do not generate good distribution values (especially bool and integers).
    • In all other cases and especially when in doubt it is preferable to go with an implementation like the one R# generates for you.

    If you want to read more about a number of different hashing options there a is a pretty good list here with lots of good information.

  • Being Cellfish

    Cost of meetings


    This was recently sent to me and anybody who has worked at Microsoft knows that a lot of teams (if not all) have a lot of meetings and do a lot of communication over email. The essence of the article linked is that people who's work mainly consists of meetings need to understand that people who contribute mostly outside meetings will be less effective if their day is chopped up with a bunch of meetings. I think it is hard to argue that this is wrong since yes; meetings do take time from other things and with less meetings more things tend to get done. And if you have a few meetings in a day it makes sense to try to have them back to back I think.

    However I believe meetings is the less disruptive thing during my work day since meetings are scheduled. Every time I walk into somebody's office to discuss something without booking a meeting I interrupt whatever they are doing and vice versa. But that does not mean I want to batch them up. Most of the time a short interruption is not that bad I think and if it means somebody else on the team quickly can be unblocked without writing long emails and/or booking meetings I think the team as a whole will be more productive. And then there is email notifications, instant messages, phone calls etc that cause interruptions. So I think it is more important to make sure the team can handle interruptions and defer them until they can be handled. My favorite method for that is the Pomodoro technique!

    I am a believer in the Pomodoro technique since I do not think you can be productive and in the zone for hours. I do think the brain need some time to process thoughts in the background to quickly find solutions to problems. And I've felt the feeling of accomplishment when you know exactly how much you've gotten done in a day. The manager's vs maker's schedule article is good and worth reading but IMO it's just the tip of the iceberg and not addressing main problem. But it makes an important point that should not be forgotten.

  • Being Cellfish

    Implementing IDisposable


    The IDisposable is probably one of the most abused interfaces in .Net. Except from all the cases where you actually have an unmanaged resource you need to release I've seen it being used a lot of times (including by myself) just to guarantee some code is executed immediatly when a variable go out of scope. I've used this in test utilities to make sure certain validation happens before the test execution ends but I've also seen this used in non-test code. The people who like to use IDisposable probably (like myself) have some experience from C++ and using smart pointers or similar there. I wish there was a similar construction in .Net where I could force execution of a certain method on an object immediatly when it goes out of scope if (and only if) I explicitly want that. Kind of an IDisposable that does not generate FxCop warnings when not disposed properly...

    Anyway, the best guid I've seen so far on how to actually implement IDisposable can be found here. Follow those guidelines and do what I do; stop abuse IDisposable constructs just because you think they make the code look neat.

Page 5 of 49 (482 items) «34567»