Random Disconnected Diatribes of a p&p Documentation Engineer
Some friends have just adopted a rather cute ginger cat and decided to name it Juno, perhaps after the Queen of the Roman Gods. Though it regularly leads to the interesting conversation: "What's your cat's name?" - "Juno" - "No I don't, that's why I'm asking"...
Meanwhile, here at p&p we're just starting on a project named after one of the new religions of information technology: Big Data. It seems like a confusing name for a technology if you ask me (though you probably didn't). Does it just consist of numbers higher than a billion, or words like "floccinaucinihilipilification" and "pseudopseudohypoparathyroidism"?
Or maybe what they really mean is Voluminous Data, where there's a lot of it. Too much, in fact, for an ordinary database to be able to handle and query in a respectable time. Though most of the examples I've seen so far revolve around analyzing web server log files. It's hard to see why you'd want to invent a whole new technology just for that.
Of course, what's at the root of all this excitement is the map/reduce pattern for querying large volumes of distributed data, though the technology now encompasses everything from highly distributed file systems (HDFS) to connectors for Excel and other products to allow analysis of the data. And, of course, the furry elephant named Hadoop that sits in the middle remembering everything.
Thankfully Microsoft has adopted a new name for its collection of technologies previously encompassed by Big Data. Now it's HDInsight, where I assume the "HD" means "highly distributed". There's a preview in Windows Azure and a local server-based version you can play with.
What's interesting is that when I first started playing with real computers (an IBM 360) all data was text files with fixed width columns that the code had to open and iterate through, parsing out the values. The company where I worked used to have four distinctly separate divisions, each with its own data formats, but these had now been melded into one company-wide sales division. To be able to assemble sales data we had a custom program written in RPG 2 that opened a couple of dozen files, read through them extracting data, and assembled the summaries we needed - we'd built something vaguely resembling the map/reduce pattern. Though we could only run it an night because it prevented most other things from working by locking all the files and soaking up all of the processing resources.
Thankfully relational databases and Structured Query Language (SQL) put paid to all that palaver. Now we had a proper system that could store vast amounts of data and run fast queries to extract exactly what we needed. In fact we could even do it from a PC. And yet here we are, with our highly distributed data and file systems, going back to the world of reading multiple files and aggregating the results by writing bits of custom code to generate map and reduce algorithms.
But I guess when you appreciate the reasons behind it, and start to grasp the concepts of the vast amounts of data involved, our new (old) approach starts to make sense. By taking the processing to the data, rather than moving the data around, you get distributed parallel processing across multiple nodes, and faster responses. And when you discover just how vast some of the data is, you realize that our modern relational and SQL-based approach just doesn't cut it.
Though there are some interesting questions that nobody I've spoken to so far has answered satisfactorily. What happens when you need more than just a simple aggregate result? It seems likely that the map function needs to produce a result set that is considerably smaller than the data it's working on, and if there is little correspondence between the data in each node the reduce function won't be able to do much reducing.
Maybe I just don't get it yet. And maybe that's why being just a "database programmer" is no longer good enough. Now, it seems, you need to be a "data scientist". You not only need to know about Database Theory, but Agile Manifesto and Spiral Dynamics as well according to DataScientists.net. You're going to spend the rest of your life organizing, packaging, and delivering data rather than writing programs that simply run SQL queries.
But it does seem that data scientists get paid a lot more, so maybe this Big Data thing really is a good idea after all...
So it's New Year resolution time again, and it's pretty clear after many previous unsuccessful iterations that the usual crop consisting of more exercise, better diet, and giving up smoking are a waste of time. Therefore, after several months of playing host to an assortment of builders and tradesmen, this year's resolution is more DIY.
What's annoying is that most tradesmen seem to be in a mad rush to get to the next job, and so don't have time for those little finishing touches (which, as my wife says I'm a perfectionist, are so important). Some days it really did feel like I might as well have done the job myself. For example, over the last several weeks I've been:
Meanwhile the people who delivered the rubbish skip and promised to come back for it the next morning left it here for a week so my front lawn now has a square hole that, after all the rain we've had, resembles a small swimming pool.
Realistically, though, many of the jobs they tackled were beyond my level of competence or patience. I can see that my attempts to plaster a ceiling or tile a floor would probably be a disaster, and completely rewiring a kitchen is likely to require some level of theoretical knowledge of the regulations that I don't have.
But, hopefully, it will be another fifteen years before we need to do anything else to the house. And I'll be retired long before then, so I'll have plenty of spare time. However, my wife says we're never going to go through this again - we're going to move house instead. Though I'm not sure that would be any less stressful...
So, at last, we're done. After fighting with multiple new versions of the Windows Azure SDK, updated features in the management portal, changes to the functionality of services, and the regular changes to the names of various parts of Windows Azure, we've shipped the third editions of two of our Windows Azure guides and the associated Hands-on Labs.
The first, "Moving Applications to the Cloud", is aimed at those whose Field is Brown. It focuses on getting your existing on-premises applications running in Windows Azure using Virtual Machines, Windows Azure Web Sites, Cloud Services, and many other features of Windows Azure.
Through a series of migration stages, our fictional company named Adatum moves their aExpense application to Windows Azure. The first step is to use Windows Azure Virtual Machines, including a VM running SQL Server and another running Active Directory. This approach minimizes the need to change the application code; it simply runs exactly as it did when on-premises.
Next, Adatum experiments with Windows Azure Web Sites before refactoring the application to run as a Cloud Service. Along the way Adatum switches over to using federated authentication with Windows Azure Access Control, and using Windows Azure SQL Database instead of a hosted SQL Server.
Adatum then adds background processing with a separate Cloud Services worker role, before moving the data to Windows Azure table storage. Along the way Adatum calculates running costs, adds features to make the application scalable, and takes advantage of other Windows Azure features such as Caching.
OK, so most companies won’t go through the multiple migration steps that Adatum carried out, but the aim of the guide is to demonstrate as many of the available options as possible. As well as the planning, design, decision making, and development processes the guide also discusses application lifecycle management issues such as testing, monitoring, and maintenance.
The second guide, "Developing Multi-tenant Applications for the Cloud", is aimed at Greenfield scenarios. While much of the content is devoted to understanding the concepts of multi-tenant application design and development, the fundamentals are equally applicable to all kinds of applications designed from scratch to run in the cloud.
This guide is centered round a fictional independent software vendor (ISV) named Tailspin, and its design and implementation of the multi-tenant Surveys application. It discusses hosting options, application partitioning, and data storage options for multi-tenancy; but also contains a wealth of information about designing Windows Azure applications so as to maximize availability, scalability, elasticity, and performance. In addition, the guide explores different security and authentication options, how you can implement features directly related to ISVs, and techniques for managing the application.
One of the fundamental features for maximizing performance is to appreciate and manage the throughput limitations imposed by Windows Azure services and the Internet itself, and this guide will help you to understand how you can work round these limitations. For example, it explores how Tailspin uses multiple queues, the delayed write pattern, storage partitioning, and optimistic concurrency in order to maximize performance.
Both of the guides have had a full makeover from the previous editions, and the addition of considerable new content, so take a look and let us know what you think. Or just tell us what color your field is...
Does lateral thinking mean you need to look outside your own head instead of just accepting the most obvious solution? If so, I might as well plead guilty in terms of managing the backup power supply for my servers.
Like a great many people I depend on APC UPSs to handle mains power fluctuations and interruptions for my servers. Since Windows NT, through Server 2000, 2003, and now 2008 R2 I've blithely installed the default power management utilities provided by APC. Everything was hunky dory until I went virtual and set up the servers using Hyper-V. That's when the problems started.
Mind you, I can't say I really noticed the problems at the start. OK, so the latest versions of the APC software don't seem to install on a machine configured to host Hyper-V VMs, but the earlier version did and I continued to use that. The only thing I noticed were occasional messages that the server had lost connection with the UPS, but then immediately restored it again.
I had the software set up to shut down the server gracefully, well before the battery ran out in the UPS, and reboot it when back to 60% charge. In the past on Server 2000 etc. this has worked fine. So I reckoned that, because the Hyper-V system manages graceful shutdown of the hosted VMs, it would also work fine on 2008 and 2008 R2, and initial tests proved this to be the case.
However, during a recent power outage (I was rewiring a ring main socket) I came back to find the UPS fully charged but the server stopped. Pressing the power button initiated a reboot sequence but the machine just shut down again. It looked very much like the recent episode when a motherboard failed in another machine, and here I was on a busy Saturday morning pondering another visit from the Dell man. Thankfully I took out a full onsite warranty this time!
However, after shutting down the UPS, then restarting it and the server, everything came back to life again. The event log showed a graceful shutdown and reboot, and all the VMs that are set up to auto-start were running fine. Interestingly, the one that doesn't auto-start showed up in Hyper-V Manager as "Suspended" rather than "Off". It didn't take long to figure that Hyper-V had suspended the VMs rather than closing down and then restarting them.
But why had the server not restarted automatically? In fact, as the power was off for only a few minutes, why did it shut down in the first place? The answer, as evident in the Event Log, was a "Communication Lost" event from the APC software; followed by "Runtime Limit Reached" and then "Shutting Down the System". If the software can't see the UPS it assumes it's broken and shuts down the server automatically, even though there was at least an hour left in the batteries.
According to the APC site, the free software doesn't support Hyper-V because it can't guarantee to safely shut down each VM individually. As many people regularly attempt to point out on the APC forum, surely it doesn't need to. Server 2008 and 2008 R2 can quite happily respond to a shutdown message and safely manage the VMs it hosts. The suggestion from aggravated forum posters is that it's just a cynical way for APC to sell the network version of the management software.
Oh well, I don't mind paying a bit for the real thing, but it seems that to make it work I also have to buy and install a special network management card, and install a ton more drivers and stuff. Do I really want to do that? So I look at the Open Source alternative, apcupsd, but it looks complicated enough to need more than what remains of the afternoon to sort out. I'll need a day to read and understand the manual.
But that's when the "outside your head" thing struck me. A quick Bing located a post by Ben Armstrong (the Virtual PC Guy) that says that the built-in power management stuff in Server 2008 R2 can manage your server and UPS automatically. In fact, as I discovered when installing the APC software, battery management is part of the O/S and all you install from APC is the service that manages the UPS and interacts with Windows. Without the APC software, Server 2008's default settings will monitor battery power and can initiate a server hibernate and shutdown when it's low, though you probably want to tweak the Low and Critical level settings in the Advanced Power Management dialog to something less optimistic that 10% and 5%.
Then I read somewhere else that installing the Hyper-V service changes the server's behavior by disabling hibernate mode, because hibernating a server that hosts VMs is not recommended. When I checked the advanced power configuration settings in my box the Critical Battery Action was still set to "Hibernate", but opening the drop-down list showed that the only options available now are "Do Nothing" and "Shut Down". Obviously installing Hyper-V does not change the current settings. I selected "Shut Down" and set the Critical Battery Level to 50% to make sure that the O/S has plenty of time to shut down all the VMs. I also set the Low Battery Level to 75% and the Low Battery Notification to "On" so that I can see when (and if) the server detects a power failure.
Since uninstalling the APC software and allowing Windows to manage its power requirements directly I've had no Event Log warnings and the power icon in the system tray seems to work, as a quick shutoff of the mains feed to the UPS demonstrated. Of course, where the APC software and the Open Source apcupsd service have an advantage is that they can restart the server when power is restored. And without the APC software I can't monitor the UPS, or configure the EEPROM settings inside it (although apcupsd provides a utility that can do this). So before I uninstalled the APC software I set up the UPS to do a shutdown only (not turn off) and allow 15 minutes for the server to shut down when the low battery warning occurs.
I also configured the UPS is to turn on the power again when the charge reaches 60% after a power failure, and the server BIOS is configured to auto-start when power is restored. Therefore it should, in theory, all work by powering up the server again automatically. The real test was a few days later when the electrician arrived to rewire the kitchen as part of our ongoing modernization plan. Unfortunately, while it kind of worked, there are some issues.
The server did shut down, and restart again. But examining the Event Logs after the restart revealed that, despite the Power settings in Windows Server being set to notify when the battery charge drops to 75%, there was no matching Event Log message. Maybe the warning just pops up in the notification area of the screen. But the Event Log messages did indicate that the server correctly shut down, and restarted with no unexpected errors.
Things were different with the VMs, however. I had configured a combination of different settings in Hyper-V Manager to experiment with the behavior. One VM was set to "Turn Off" and restart if previously running, one was set to "Save" and restart if previously running, and one was set to "Shut Down" and restart if previously running. The fourth was set to "Save" and always restart. Hyper-V Manager revealed that they had all started automatically, so that's OK. The "Turn Off", "Save", and "Shut Down" actions when the host server shuts down all work as expected and allow for automatic restart if previously running.
The problem was that the Event Logs in all of the VMs indicated that they had all shut down unexpectedly. There was the System log message saying just this, and the Critical system error message to confirm it on every one. While the host server had shut down correctly, it seems that the VMs had not.
When you shut down the server manually this doesn't happen, so it must be that the shutdown initiated by the battery power management system does something different from the "Shut Down" command on the Start menu. I wondered if it was just that the UPS had switched off the power to the host server before it had a chance to shut down, turn off, or save the VMs, but the fact that the host server had shut down properly without error seems to indicate this isn't the case.
From the times recorded in the host server and VM Event Logs and by my NAS (which also logs power failure events), it seems that the shutdown occurred only 30 minutes after the power failure, whereas the UPS reckons it has more than 90 minutes of battery life. So it does look like the shutdown occurred when Windows power management detected only 50% battery life remaining.
Does the UPS send some signal to the Windows power management system that initiates a shutdown? Or perhaps Windows power management sends a signal to the UPS to hibernate until the power is restored? Or maybe it's just that there's some setting hidden away somewhere that I haven't found yet...
I love those scenes in nature documentaries where they deploy a remote camera, and the local wildlife takes an interest in it so you get wonderful close-up shots of inquisitive animals. A while ago my wife persuaded me that we should get one to use in the small copse of trees next to our house.
There’s certainly plenty of evidence of night-time activity because the local wildlife population has succeeded in producing several clearly defined pathways through the trees. No doubt, in part, it's down to the selection of foxes that come to visit every evening, helping to dispose of leftovers from the kitchen and the food that some days our two fine-mouth-hungry cats decline to even sniff.
However, we’ve also heard interesting night-time rumblings and cries from what we assumed were badgers, plus occasional visits from a bad-tempered squirrel that chases the cats and steals the birds’ sunflower hearts from the feeder. There’s even been several reports locally of a large black cat-like creature that may have escaped from captivity, though this is probably an urban myth that you hear in every area of the country. But you never know!
So, after a few months, did we catch any views of the passing wildlife? Here’s a selection of the results:
OK, so it’s not as dramatic as those people who put pictures on their blog of bears scavenging from their dustbin, or roe deer eating their geraniums, but it’s nice to know that we do get a regular procession of wildlife passing through. Even if most of it is ours and all the neighbors’ cats.
The camera is a ScoutGuard 550, which captures images at 5 Megapixels and can also do video. The only downside is that, at night, it takes a couple of seconds to switch on the infrared LEDs and take a picture when it detects motion, so you do get a lot of pictures of tails...
It's been four months since I moved all my websites to the Windows Azure Web Sites platform, so how's it working out? Very well so far is pretty much all I can say, because there's been nothing in terms of operational activity to report. A welcome change after all the fuss and effort of running the same sites on my own web server. And, so far, the bill has been zero. What you might call excellent value for money!
I monitor the sites using my own Server Monitor utility (available here), and they consistently show a minimum of 99.9% availability - even 100% for a lot of the time. Very occasionally the hourly automated FTP upload for the local weather site fails, but that's perhaps only once every couple of weeks.
Access and response times do, however, vary. As I've remarked before, the initial startup time when the sites have been idle for a while, and hence are no longer loaded on the shared web server, can be a bit more than I'd wish for. It's not uncommon to see a five seconds or longer delay on the first hit for the most complex site. However, subsequent requests return startlingly quickly from the North Europe datacenter where the sites are hosted, and people I've spoken to who use the sites have remarked how fast they seem to be.
I haven't needed to do much in the way of updates to the sites, and what bits I have done have been easy using Web Matrix. A recent neat addition to Windows Azure Web Sites is the Web Matrix icon on the lower ribbon of a website in the Management Portal that installs the latest version of Web Matrix, and launches it with a connection to the hosted site files.
The sites use Windows Azure SQL Database, and I've backed up the databases using the Import/Export tool within the Management Portal. The first time I used this it was really fiddly and annoying, with poor documentation and an unintuitive interface that resulted in several attempts with different subsets of values for the storage account and other parameters. However, the latest update to this tool makes it really easy. Just click Export in the lower ribbon and it automatically selects the storage account and other information required. You can often just select a blob container and enter your database password. Best of all, it can even create a new blob container if you haven't already done this.
And it's reassuring to see that the Azure team has as much trouble keeping up with the changes to the features and the portal UI as we do here in p&p. At the time of writing their online docs for using the Import/Export feature still described the previous version...
I discovered this week why builders always have a tube of Superglue in their pockets, and how daft our method of heating houses here in the UK is. It's all connected with the melee of activity that's just starting to take over our lives at chez Derbyshire as we finally get round to all of the modernization tasks that we've been putting off for the last few years.
I assume that builders don't generally glue many things together when building a house - or at least not using Superglue. More likely it's that "No More Nails" adhesive that sticks anything to anything, or just big nails and screws. However, the source of my alternative adhesive information was - rather surprisingly - a nurse at the Accident and Emergency department of our local hospital.
While doing the annual autumn tidying of our back garden I managed to tumble over and poke a large hole in my hand on a nearby fence post. As I'm typically accident prone, this wasn't an unexpected event, but this time the usual remedy of antiseptic and a big plaster dressing didn’t stop the bleeding so I reluctantly decided on a trip to A&E.
Being a Sunday I expected to be waiting for hours. However, within ten minutes I was explaining to the nurse what I'd done, and trying to look brave. Last time I did something similar, a great many years ago and involving a very sharp Stanley knife, I ended up with several stitches in my hand. However, this time she simply sprayed it with some magic aerosol and then lathered it with Superglue. Not what I expected.
But, as she patiently explained, they use this method now for almost all lacerations and surgery. It's quicker, easier, keeps the wound clean and dry, heals more quickly, and leaves less of a scar than stitches. She told me that builders and other tradesman often use the same technique themselves. Obviously I'll need to buy a couple of tubes for future use.
Meanwhile, back at the hive of building activity and just as the decorator has started painting the stairs, I discover that the central heating isn't working. For the third time in twelve years the motorized valve that controls the water flow to the radiators has broken. Another expensive tradesman visit to fix that, including all the palaver of draining the system, refilling it, and then patiently bleeding it to clear the airlocks.
Of course, two of the radiators are in the wrong place for the new kitchen and bathroom, so they need to be moved. Two days later I've got a different plumber here draining the system again, poking pipes into hollow walls, setting off the smoke alarms while soldering them together, randomly swearing, and then refilling and bleeding the system again.
But what on earth are we doing with all this pumping hot water around through big ugly lumps of metal screwed to the walls anyway? Isn’t it time we adopted the U.S. approach of blowing hot air to where it's needed from a furnace in the basement? When you see the mass of wires, pipes, valves, and other stuff needed for our traditional central heating systems you have to wonder.
Mind you, on top of all the expense, the worst thing is the lump on my arm the size of a tennis ball where the nurse decided I needed a Tetanus shot...
It seems that, contrary to expectations, the few remaining record stores still selling old-fashioned vinyl LPs and singles are flourishing here in England. At first you might think this is only because old codgers like me just have an old record player, and we spend our days looking for rare copies of original Rolling Stones and Black Sabbath albums. However, according to a recent item in the newspaper, young people are now discovering the joys of thirty-three-and-a-third as well.
Quite what young people are actually looking for wasn't made clear in the report. I doubt that much modern music is released on LP these days, but there are plenty of kids that are into classic rock, and even the older blues music of my teenage years. And there's always the argument that music sounds better in its original analogue form, with a more flowing and a warmer tone than the digital equivalent. Plus, as most people will have discovered, all those promises we were given years ago that CDs are almost indestructible have proved to be somewhat less than wholly fulfilled. I've got plenty that skip or where the music breaks up.
However, the sheer ease of use and availability of digitally stored music, especially in a house full of computers like ours, means that we're never likely to go back to vinyl. In fact, I did make a start digitizing my collection of old LPs at one time. I bought a good quality pre-amp for my old record deck, and installed a selection of software for capturing digital streams. Then some more to break it into separate tracks, level the volume a little, remove the pops and scratches, and de-hiss it.
Of course, every LP took hours to digitize. You forget how easy ripping a CD at 24x speed is; you can't do that with an LP. Instead it's an hour of recording, another hour of splitting the tracks and cleaning it up to get rid of noise, and then twenty minutes typing in the track names and details. Compare that to ripping a CD in three minutes, and having all the details filled in automatically by a remote music search provider.
But after what seemed like a week digitizing some of my rarer albums and singles, I discovered that I can buy most of the less rare ones on CD at silly low prices. I probably paid about two hundred pounds in the equivalent of today's money to buy the Rolling Stones album "12 x 5" when it first came out, but I can buy a brand new remastered copy on CD from Amazon today for five pounds. And King Crimson's "In The Court Of The Crimson King" for eight pounds. Including post and packing.
Even better, I can play the CDs in my car as well as ripping them to my server for listening in the house or on an MP3 player (hopefully this is legal - if not, I didn't do it. Honest). And, with my deteriorated sense of hearing, I doubt that I could tell the difference between vinyl and digital anyway. In the end I just replaced most of the LPs with new CDs, even getting the bonus of extra tracks on some. Meanwhile the old LPs are carefully stored away in the attic just in case, someday, they regain a value comparable with their original price.
So maybe digital is better overall for your music collection. But what about for broadcast? I've written endlessly about the problems here in the UK with the change from analogue to digital TV (DVB). We never had a problem getting the old five analogue channels, but dragging enough digits to out of the sky assemble a TV picture, even with a huge multi-element high-gain contraption ten feet above the chimney top for the birds to roost on, seems impossible some days. They call the effect of the picture breaking up into large lumps "blocking". I call it "rubbish".
And now they want to turn off the analogue FM signal in 2015 and force us all to use digital radios (DAB). Some chance. I have an eight foot antenna in the attic but all I can get is three digital channels and a dozen more hissy ones that might be playing music – but who could tell? It might as well be an endless tape of the sea and whales singing for all of the sense it makes.
And can you really see this working in a car? It would be back to the days of CB radio where we all had huge whippy aerials stuck on the car roof, and it would probably only pick up a signal when you were at the top of a hill. Thankfully they can't do it until 50% of the population is already on DAB, and that's not going to happen any time soon.
Mind you, I did hear about a guy standing at the side of the road on a sharp corner who suffered severe injuries when a vehicle with a huge whippy aerial went round the corner at high speed. The emergency medic said it was the worst case of van aerial disease he'd ever seen...