Random Disconnected Diatribes of a p&p Documentation Engineer
Reading in the newspaper this week about the technological advances in political campaigning set my mind wondering about whether there is an ethics/success trade-off in most areas of work, as well as in life generally.
I don't mean cheating in order to win; it's more about how you balance what you do, with what you think people want you to do. The article I was reading focused on the area of national politics. Technologies that we in the IT world are familiar with are increasingly being used to determine the "mood of the people" and to target susceptible voters. In the U.S. they already use Big Data techniques to profile the population and to analyze sectors for specific actions. The same is happening here in Britain.
What I can't help wondering is whether this spells the end to true political conviction. If, as a party, you firmly believe that policy A is an absolutely necessary requirement for the country, and will provide the best future for the people, what happens when your data analysis reveals that it's not likely to be as popular as policy B? Do you try to adapt policy A to match the results from the data and sound like policy B, abandon it altogether in favour of policy C that is even more popular, or carry on regardless and hope that people will finally realize policy A is the best way to go?
Some of the greatest politicians of the past worked from a basis of pure conviction, and many achieved changes for the better. Some pushed on regardless and failed. Does the ability to get accurate feedback on the perceived desires of the population, or of specific and increasingly narrowly defined sectors, reduce the conviction that has always been at the heart of real politicians? Perhaps now, instead of relying on the experts that govern us to make a real difference to our lives, we just get the policies we deserve because we all just want what's best for each of us today - and politicians can discover what that is.
There's an ongoing discussion that the same is true of many large companies and organizations. They call it "short-termism" because public companies have to focus on what will look good in the next quarter's results in order to keep shareholders happy, rather than being able to take the long view and maximize success through long term changes. Even though governments generally get a longer term, such as five years, the same applies because it's pretty much impossible to make real changes in politics in such a short space of time.
Of course, there are some organizations where you don't need to worry about public opinion. In private companies you can, in theory, do all the long term planning you need because you have no shareholders to please. You just need to be able to stay in business as you plan and change for the future. In extreme cases, such as here in the European Union, you don't even need to worry what the public thinks. The central masters of the project can just do whatever they feel is right for the Union, and nobody gets to influence the decisions. Maybe the EU, and other non-democratic regions of the world, are the only place where the politics of conviction still apply.
So how does all this relate to our world of technology? As I read the article it seemed as though it was a similar situation to that we have in creating guidance and documentation for our products and services. Traditionally, the process of creating documentation for a software product revolved about explaining the features of the product. In many cases, this simply meant explaining what each of the menu options does, and how you use that feature.
I've recently installed a 4-channel DVR to monitor four bird nest boxes, and the instructions for the DVR follow just this pattern. There are over 100 pages that patiently explain every option in the multiple menus for setting up and using it, yet nowhere does it answer some obvious questions such as "do I need to enable alarms to make motion detection work?", "why is the hard disk light flashing when it's not recording anything?", and "why are there four video inputs but only two audio inputs?" And that's just the first three of the unanswered questions.
Over the years, we've learned to write documentation that is more focused on the customer's point of view instead. We start with scenarios for using the product, and develop these into procedures for achieving the most common tasks. Along the way we use examples and background information to try to help users understand the product. But, in many cases, the scenarios themselves come from our best guesses at what the user needs to know, and how they will use the product. It's still very much built from our opinions and a conviction that we know what the customer needs to know, rather than being based on what they tell us they actually want to know.
However, more recently, even this has started to change. The current thinking is that we should answer the questions users are asking now, rather than telling them what we think they need to know. It's become a data gathering exercise, and we use the data to maximize the impact we have by targeting effort at the most popular requirements. In most IT sectors and organizations, fast and flexible responsiveness is replacing principles and conviction.
Is it a good thing? I have to say that I'm not entirely persuaded so far. Perhaps, with the rate of change in modern service-based software and marketplace-delivered apps, this is the only way to go forward. Yet I can't help wondering if it just introduces randomness, which can dilute the structured approach to guidance that helps users get the most from the product.
Maybe if I could get a manual for my new DVR that answers my questions, I would be more convinced...
So I've temporarily escaped from Azure to lend a hand with, as Monty Python would say, something completely different. And it's a bit like coming home again, because I'm back with the Enterprise Library team. Some of them even remembered me from last time (though I'm not sure that's a huge advantage).
Enterprise Library has changed almost beyond recognition while I've been away. Several of the application blocks have gone, and there are some new ones. One even appeared and then disappeared again during my absence (the Autoscaling block). And the blocks are generally drifting apart so that they can be used stand-alone more easily, especially in environments such as Azure.
It's interesting that, when I first started work with the EntLib team, we were building the Composite Application Block (CAB) - parts of which sort of morphed into the Unity Dependency Injection mechanism. And the other separate application blocks were slowly becoming more tightly integrated into a cohesive whole. Through versions 3, 4, and 5 they became a one-stop solution for a range of cross-cutting concerns. But now one or two of the blocks are starting to reach adolescence, and break free to seek their fortune in the big wide world outside.
One of these fledglings is the block I'm working on now. The Semantic Logging Application Block is an interesting combination of bits that makes it easier to work with structured events. It allows you to capture events from classes based on the .NET 4.5 and above EventSource class, play about with the events, and store them in a range of different logging destinations. As well as text files and databases, there's an event sink that writes events to Azure table storage (so I still haven't quite managed to escape from the cloud).
The latest version of the block itself is available from NuGet, and we should have the comprehensive documentation available any time now. It started out as a quick update of the existing docs to match the new features in the block, but has expanded into a refactoring of the content into a more logical form, and to provide a better user experience. Something I seem to spend my life doing - I keep hoping that the next version of Word will have an "Auto-Refactor" button on the ribbon.
More than anything, though, is the useful experience it's providing in learning more about structured (or semantic) logging. I played with Event Tracing for Windows (ETW) a few times in the past when trying to consolidate event logs from my own servers, and gave up when the level of complexity surpassed by capabilities (it didn't take long). But EventSource seems easy to work with, and I've promised myself that every kludgy utility and tool I write in future will expose proper modern events with a structured and typed payload.
This means that I can use the clever and easy to configure Out-of-Process Host listener that comes with the Semantic Logging Application Block to write them all to a central database where I can play with them. And the neat thing is that, by doing this, I can record the details of the event but just have a nice useful error message for the user that reflects modern application practice. Such as "Warning! Your hovercraft is full of eels...", or maybe just "Oh dear, it's broken again..."
Probably there's not many people who can remember when TVs had just six buttons and a volume knob. You simply tuned each of the buttons to one of the five available channels (which were helpfully numbered 1 to 5), hopefully in the correct order so you knew which channel you were watching, and tuned the sixth button to the output from your Betamax videocassette recorder.
As long as the aerial tied to your chimney wasn't blown down by the wind, or struck by lightning, that was it. You were partaking in the peak of technical media broadcasting advancement. Years, if not decades, could pass and you never had to change anything. It all just worked.
And then we went digital. Now I can get 594 channels on terrestrial FreeView and satellite-delivered FreeSat. Even more if I chose to pay for a Sky or Virgin Media TV package. Yet all I seem to have gained is more hassle. And, looking back at our viewing habits over the previous few weeks, pretty much all of the programs we watch are on the original five channels!
Of course, the list of channels includes many duplicates, with the current fascination for "+1" channels where it's the same schedule but an hour later (which is fun when you watch a live program like "News At Ten" that's on at 11:00 o'clock). Channel 5 even has a "+24" channel now, so you can watch yesterday's programs today. A breakthrough in entertainment provision, which may even be useful for the 1% of the population that doesn't have a video recorder. How long will it be before we get "+168" channels so you can watch last week's episode that you missed?
What's really annoying, however, is that I've chosen to fully partake in the modern technological "now" by using Media Center. Our new Mamba box (see Snakin' All Over) is amazing in that it happily tunes all the available FreeView and FreeSat channels and, if what it says it did last night is actually true, it can record three channels at the same time while you are watching a recorded program. I was convinced that it's not supposed to do more than two.
However, it also seems to have issues with starting recordings, and with losing channels or suddenly gaining extra copies of existing channels. For some reason this week we had three BBC1 channels in the guide, but ITV1 was blank. Another wasted half an hour fiddling with the channel list put that right, but why does it keep happening? I can only assume that the channel and schedule lists Media Center downloads every day contain something that fires off a channel update process. And helpfully sets all the new ones (or ones where the name changed slightly) to "selected" so that they appear in the guide's channel list. I suppose if it didn't pre-select them, you wouldn't know they had changed.
Talking with the ever-helpful Glen at QuitePC.com, who supplied the machine, was also very illuminating. Media Center is clever in that it combines the multiple digital signals for the same channel into one (you can see them in the Edit Sources list when you edit a channel) and he suggested editing the list to make sure the first ones were those with the best signal so that Media Center would not need to scan through them all when changing channels to start a recording.
Glen also suggested using the website King Of Sat to check or modify the frequencies when channels move.
This makes sense because Media Center does seem to take a few seconds to change channels. Probably it times out too quite quickly when it doesn't detect a signal, pops up the warning box on screen, and then tries the other tuner on the same card. Which works, maybe because the card is now responding, and the program gets recorded. But when I checked yesterday for a channel where this happens, there is only one source in the Edit Sources list and it's showing "100%" signal strength.
And a channel that had worked fine all last week just came up as "No signal" yesterday. Yet looking in the Edit Sources list, the single source was shown as "100%". Today it's working again. Is this what we expected from the promise of a brave new digital future in broadcasting? I'm already limited to using Internet Radio because the DAB and FM signals are so poor here. How long will it be before I can get TV only over the Internet?
Mind you, Media Center itself can be really annoying sometimes. Yes it's a great system that generally works very well overall, and has some very clever features. But, during the "lost channel" episode this week, I tried to modify a manual recording by changing the channel number to a different one. It was set to use channel 913 (satellite ITV1) but I wanted to change it to use channel 1 (terrestrial ITV1). Yet all I got every time was the error message "You must choose a valid channel number." As channel 1 is in the guide and works fine, I can't see why it's invalid. Maybe because it uses a different tuner card, and the system checks only the channel list for the current tuner card?
It does seem that software in general often doesn't always get completely tested in a real working environment. For example, I use Word all the time and - for an incredibly complex piece of software - it does what I expect and works fine. Yet, when I come to save a document the first time onto the network server, I'm faced with an unresponsive Save dialog for up to 20 seconds. It seems that it's looking for existing Word docs so it can show me a list, which is OK if it was on the local machine or there were only a few folders and docs to scan. But there are many hundreds on the network server, so it takes ages.
Perhaps, because I use software like this all day, I just expect too much. Maybe there is no such thing as perfect software...
I don't know if General Custer ever made a last stand against the Apache, but I feel like I have. My Apache is, of course, the Hadoop one. Or, to be technically accurate, Microsoft Azure HDInsight. And, going on experience so far, this is unlikely to actually be the last time I do it.
After six months of last year, and about the same this year, it seems like I've got stuck in some Big Data related cluster of my own. We produced a guide for planning and implementing HDInsight solutions last year, but it's so far out of date now that we might as well have been writing about custard rather than clusters. However we have finally managed to hit the streets with the updated version of the guide before HDInsight changes too much more (yes, I do suffer odd bouts of optimism).
What's become clear, however, is how much HDInsight is different from the typical Hadoop deployment. Yes, it's Hadoop inside (the Hortonworks version), but that's like saying battleships and HTML are the same because they both have anchors. Or cats and dogs are the same because they both have noses (you can probably see that I'm struggling for a metaphor here).
HDInsight stores all its data in Azure blob storage, which seems odd at first because the whole philosophy of Hadoop is distributed and replicated data storage. But when you come to examine the use cases and possibilities, all kinds of interesting opportunities appear. For example, you can kill off a cluster and leave the data in blob storage, then create a new cluster over the same data. If you specify a SQL Database instance to hold the metadata (the Hive and HCatalog definitions and other stuff) when you create the cluster, it remains after the cluster is deleted and you can create a new cluster that uses the same metadata. Perhaps they should have called in Phoenix instead.
We demonstrate just this scenario in our guide as a way to create an on-demand data warehouse that you can fire up when you need it, and shut down when you don't, to save running costs. And the nice thing is that you can still upload new data, or download the existing data, by accessing the Azure blob store directly. Of course, if you want to get the data out as Hive tables using ODBC you'll need to have the cluster running, but if you only need it once a month to run reports you can kill off the cluster in between.
But, more than that, you can use multiple storage accounts and containers to hold the data, and create a cluster over any combination of these. So you can have multiple versions of your data, and just fire up a cluster over the bits you want to process. Or have separate staging and production accounts for the data. Or create multiple containers and drip-feed data arriving as a stream into them, then create a cluster over some or all of them only when you need to process the data. Maybe use this technique to isolate different parts of the data from each other, or to separate the data into categories so that different users can access and query only the appropriate parts.
You can even fire up a cluster over somebody else's storage account as long as you have the storage name and key, so you could offer a Big Data analysis service to your customers. They create a storage account, prepare and upload their source data, and - when they are ready - you process it and put the results back in their storage account. Maybe I just invented a new market sector! If you exploit it and make a fortune, feel free to send me a few million dollars...
Read the guide at http://msdn.microsoft.com/en-us/library/dn749874.aspx