Random Disconnected Diatribes of a p&p Documentation Engineer
There are some seemingly simple phrases that trip so easily off the tongue, but end up leaving you tongue-tied. Or, if not physically entangled, then tied in knots both architecturally and programmatically. Our intrepid little band of developers and writers just encountered an interesting example of one of these disarticulating phrases: namely "reliable messaging".
It all seemed so easy at the start. We have an application that needs to send a message to a partner organization instructing that partner to fulfill a specific task, and then - at some time in the future - receive a message back from the partner indicating that the task has been successfully completed. What could be difficult about that? Surely applications do that every day...
So we create some code that sends a message. As our application is running in Windows Azure, we decide to use Service Bus Brokered Messaging, which provides a robust and reliable mechanism that survives application failures and role restarts. The application also listens on a separate Service Bus queue for an acknowledgement message that the partner sends back to confirm that it received the message. If we don't get an acknowledgement within a specified time, we send the message to the partner again. What could go wrong?
Well, the message is going across the Internet, which everyone tells us is not a fully reliable transport mechanism and our application is likely to suffer transient loss of connectivity. So we include a retry mechanism for sending the message, namely the Enterprise Library Transient Fault Handling Application Block (also known as "Topaz"). We tell it to have four attempts and then give up if it still fails and raise an exception. Easy.
Ah, but what if the message is sent OK, yet an acknowledgement never comes back? So we build a custom mechanism that maintains the status of the overall process in a local database table in SQL Azure. But reading and updating the database might fail, so we need to use Topaz to retry that operation as well. If we don't get a reply, we can restart the whole sending-and-waiting-for-an-acknowledgement process; though we also need to keep track of how many times we restarted it in case there's a blocking fault such as a duff certificate and Service Bus authentication fails every time, or if the partner has gone away for good.
Of course, it may be that the partner received the message but failed when attempting to send back the acknowledgement message, so we'll implement the same retry mechanism there. But how will the partner know that their acknowledgement message was delivered and processed by the application? Does it matter? Well, it might if the partner later sends a message to say that the requested task is complete. The application will think it never managed to send the original message because it never got a reply, but here's a message to say the task is complete.
Perhaps the partner should expect the application to send an acknowledgement message to say it received the original acknowledgement sent by the partner, and wait for this before actually starting the instructed task? It could end up like two lovers trying finish a phone conversation: "You hang up first", "No, you hang up first"... And, of course, because now we have dozens of duplicate message going both ways we also need to include code in the partner that prevents it from carrying out the same task again when a duplicate instruction message arrives.
And then, after we've satisfactorily completed the original send/acknowledge cycle, the partner carries out the requested task and, when complete, sends the confirmation of completion message to the application. Using, of course, a suitable retry mechanism to ensure the message gets sent. And a restart mechanism in case the application fails to send back an acknowledgement of the confirmation message within a specified time. And maybe an acknowledgement of the acknowledgement so the partner can confirm that the application received its acknowledgement...?
What we end up with is a cornucopia of database tables, Service Bus queues, and multiple layers of code that resembles an onion - retry and restart functionality wrapping more retry and restart functionality wrapping more retry and restart functionality. All to ensure that our message-based communication actually is "reliable messaging".
Wouldn't it be easier just to fix the Internet...?
My wife will tell you that I'm really not very good at getting the point of things. I mean, when it comes to making typically vital choices such as whether I want brown sauce or ketchup on my sausage sandwiches, I can't see the point of long-winded pondering and tortuous decision making. Just put brown on one half and ketchup on the other. In fact if there was a competition for getting the point, and she made me enter, I probably wouldn't even get the point.
The week after I published this post I was amazed to see the same question appear on the quiz show Million Pound Drop: "According to a recent survey, which do men prefer on their sausage sandwiches, brown sauce or tomato sauce?" The answer was brown sauce...
Yet there are so many other things out there in the real world (which don't involve sausages) that it's hard to see the point of. I watched a main evening news broadcast and noticed that three of the reports included footage from "on the spot" reporters. One stood in the rain outside the House of Commons telling us about this week's faux-pas by some Government minister, but he didn't speak to anyone, or even walk purposely into the building while talking, or actually move at all. There were plenty of people milling around in the background, but nobody I recognized. Why bother? Why not have him stand in front of a photo in the nice warm studio, or even let the rather scary news-anchor lady just read it out?
And then there was one about a huge car pile-up on the motorway, which they think might have been caused by smoke from a golf club fireworks display. There was the intrepid roving reporter standing on the side of a country lane explaining the intricacies of the event. OK, so there was an empty police car parked behind him, but nobody else in sight. And you couldn't see the cars involved, or the motorway itself, or the golf club buildings, or any fireworks, or even any smoke. He might as well have been standing outside our house (maybe he was) for all the point of doing a "live from the scene" report.
I guess all this is done just to try and keep people's attention for the massive fifteen minutes duration of the program. It's almost like they don't expect the people who tune in to actually be interested in the news, so they have to make it exciting with lots of different scenes and people. And, of course, they have to tell you what's in the program at the start, and then keep telling you "what's coming up" between each item. Wow, I really do want to hear about the lady whose cat had to be rescued from a tree, and I need you to keep telling me that you haven't forgotten about it.
Can you imagine trying to create technical documentation based on pointlessness like you see every day in TV news broadcasts? I'd have to recruit dozens of writers who could travel the country writing paragraphs in appropriate locations. Send Fred, together with a huge support team of laptop preparation operators, maintenance engineers, Microsoft Word technical support staff, desk light electricians, and office furniture assembly operatives to sit in our server room and write the part about minimizing server peak load.
Meanwhile Christina would be dispatched, along with half a dozen security staff and experts in the use of pizza- and cola-proof protective clothing, to sit with the development team when writing the paragraph that describes how developers can use Visual Studio to add WIF authentication features to their applications.
And, of course, not forgetting Ravi, who would begin the long journey to the local telephone exchange accompanied by around 50 specially trained health and safety experts, telecommunications jargon translators, public relations staff, company policy compliance advisors, facilitation collaborators (and, hopefully, his laptop) in order to provide the vital paragraph about ADSL networking reliability.
Then, when we come to assemble it into the final book format for release, we'll have to remember to include an "upcoming chapters list" every fifth page, and an index after every first-level heading, so people don't get fed up halfway through - or start to panic that we might have missed out the bit they were really looking forward to.
Just imagine how exciting this kind of technical documentation will be to read...
It started with Windows XP Media Center Edition, continued through Windows Vista Home Premium Edition, and now extends into Windows 7 Ultimate Edition. Are we sadomasochists, or is the pain of keeping our glorious multi-media, big screen experience worth it? Do I really need another computer in the house, with the accompanying palaver of monthly patches, backing up, and general tweaking? So far, the answer has just about been yes, though sometimes it's a very close call.
It would be easy to blame Microsoft for being over-ambitious with Media Center, but in fact it's not really a software problem. In Windows 7, as well as being very pretty, Media Center seems amazingly reliable most of the time. It generally manages to act as a high-resolution TV, record all the stuff we want, play all our music collection, display all of our growingly huge collection of photos, play videos and DVDs, and even has a neat screensaver that shows random photos when not in use.
The big issue, as ever, seems to be getting a working combination of TV signal and accurate guide data. We did the switch to satellite (DVB-S) during the digital TV changeover when the terrestrial signal was unavailable (as described in previous blog posts). But we can't get some of the ordinary free-to-air channels that we prefer through satellite, and matching the channels that we do want (and can get) to the reams of downloaded guide data seems extraordinary difficult and unintuitive.
So when they finally got the terrestrial signal changes done and we found that we can get an "ordinary TV" (DVB-T) signal again, we switched back. But the guide data for this seems to confuse Media Center altogether. The guide no longer seems to mark programs as repeats, so "New Only" series recordings pick up old editions of programs as well as the current ones. And the GUID identifiers for programs that are repeated seem to be screwed up, so it records the same program four times when it's shown on the "+1" channels, and then again the next day.
Then last week, when there was a special edition of a soap that my wife watches where somebody died (a "must-watch" episode she told me), it didn't get recorded because the guide said it was a repeat and that the original air date was in 2005. And I got more grief later in the week because one episode in a four-part drama wasn't marked as part of the series, so didn't get recorded either. Then I noticed that the guide now contains only a week of listings for the BBC channels instead of the usual two weeks, and so when they advertise a new program as "coming soon" you can't find it in the guide to record it. Perhaps the factor most likely to destroy the concept of Media Center as a home entertainment hub is the lack of accurate guide data?
Mind you, what I also can't figure is why, after a couple of days, it seems to decide that it can't do Live TV any more. Maybe there's something hanging on to the tuners in the background, because the same program will record although you can't watch it live. And, finally and annoyingly, none of the satellite or terrestrial tuner cards that fit our Media Center box can receive HD signals. But neither Sky (satellite) nor Virgin (cable) can provide the same comprehensive multi-media experience as Media Center, so I guess it's time for another hardware upgrade.
Or should I just wait for Windows 8...?
I've decided that, next time I write a book, I'm going to put everything on page one and make some obvious errors as well. I'm not convinced that it will actually do much to make the book any better, but it will save the reviewers a lot of headaches. And probably make it easier for the publisher too.
To understand why, let's explore the typical sequence of events for a work that's complete to first draft and ready for review. There are several things that you can almost guarantee will happen:
Finally, when all the writing, reviews, and edits are done, you sign it off and it disappears into the bowels of the production department. In theory you can wave goodbye to it and get on with the next one. Or maybe not. I was recently party to a series of meetings related to creating a detailed specification of the publishing process. The list of tasks that have to be accomplished was nearly as long as the book itself, and several seem to have ended up being allocated to me.
I suppose the saving grace is that I'm pretty hardened to all this after the years I've spent involved at both the input and output ends of the process. I even try to behave myself when I'm being a reviewer rather than a writer; and I tend to be fairly relaxed about seeing my finely crafted text come back with a review comment every third word and a plethora of Track Changes from the copy editor. I wonder how they managed all this in the days before word processors!
Still, now I've seen what the production people have to do, and experienced the joys of reviewing other people's work even more regularly, I've decided that I've actually got the easy job...