Random Disconnected Diatribes of a p&p Documentation Engineer
It seems like everything is "HD" these days. Your laptop has a "high density" plastic case, your car has a "heavy duty" battery, and your TV has a "high definition" picture. So I guess it's only to be expected that your data analysis tools will be "highly distributed". And at last we're "happily done" with our guide to HDInsight.
Yep, after several months fighting with Microsoft's Big Data solution we've shipped the first version of our guide to Windows Azure HDInsight, based on the current preview release. It's been one of the most troublesome guides in terms of figuring out the structure and the content boundaries; and interfacing with the exciting world of open source technologies (as I ruminated about just a few weeks ago). But we got it all together in the end.
Our guide contains the obligatory "What is Big Data" section, as well as describing how HDInsight integrates with the rest of the Microsoft data platform. Then there's a chapter about loading data, one about performing queries and transformations, and one about consuming the data. From there on in the guide discusses automating the whole data analysis process; plus the usual management and monitoring topics as well.
What's likely to be of most interest to developers, however, are the scenario chapters and associated code examples that show how you can use HDInsight in four distinct ways: as an experimental platform for investigating interesting data, as an extract/transform/load (ETL) mechanism for data validation and cleansing, as a data warehouse that you can turn on and off on demand, and as a data source for your existing enterprise business intelligence (BI) systems.
Here's our tube map for the guide:
Unlike most other books and guides, we've concentrated on integration of HDInsight with your existing business processes, and combining it with data analysis and visualization tools such as Power View and GeoFlow, as part of an end-to-end solution. Yes, it's Microsoft-centric - but, hey, that's who pays my wages...
So I considered naming this week's post "Climb Up On My Roof Sunny Boy", but I'd probably get sued by Al Jolson's estate. Though, as we're now officially inverted here at chez Derbyshire, it did seem appropriate. What I wasn't expecting was the domestic strife that is a direct result. They don't tell you about that in the glossy brochures...
The inversion we've encountered is, of course, a direct result of the team of local guys who came and nailed solar panels all over our roof and installed a box the size of my huge Dell workstation in the garage to invert the volts from the panels into real electricity. I did ask the guy if it meant they come out upside down, but he didn't laugh. I suppose he's heard that too many times before. But why don't they just call it a "converter"?
At first it seemed like a great idea. The little red light on the generation meter was flashing like mad and I could envision the pound notes rolling in. Until the RCD earth leakage thingy tripped out. Regularly. Turns out that, along with the seven 1000W UPSs and who knows what else in the way of technological magic there is plugged into the house circuit, the additional leakage from the inverter was too much for the mandatory RCD.
However, after they came back and moved the inverter and my server cabinet to separate consumer units (it really should have been done like that from the start), we're up and running again. It's just a shame that the sun hasn't come out since. If we have a miserable summer this year, you can blame me.
And then I discovered the next problem with going photo-voltaically green. I suggested to my wife that she should use the tumble drier only when it was really hot and sunny so it used the free electricity, but on days like that she wants to put the wet laundry outside on the washing line. When it's raining, and there's no free electricity, she uses the tumble drier. And you can imagine the response when I suggested she stay in and do the ironing on nice warm days, then go shopping when it's snowing.
Still, it's nice to know that for part of the day all my computers are costing me nothing to run; and it even diverts any spare volts we generate to the immersion heater to reduce the amount of gas we use to provide hot water. With the current rate of climate change supposedly due to global warming, I should easily recoup my investment by 2050.
But having a new technology device available to play with is, of course, too much for a computer geek to ignore. Having discovered that the inverter exposes its generation log data over Bluetooth, it was inevitable that I'd need to find a way to add the data to my weather website. And I can justify all this effort by the fact that I don't need to spend a couple of hundred pounds buying a rather boring-looking remote monitor device to see what's going on.
So out came Visual Studio and, after a couple of days, I have an automated system for generating data and charts; and I can expose the data over my internal network and publish the charts on the web. If you're interested, you can see them here. And if, by some remote chance you have an SMA Sunny Boy inverter (or you're prepared to modify the code to suit your inverter) you can download the utility and the .NET source code here.
What I need to discover now is if, on a typical English summer day, we're actually generating enough electricity to run the inverter and the computer that monitors it...
At a meeting of our remote workers group the other day I noticed how competitive people are about how far they are away from their Microsoft office. It's almost like you get a prize for being the remotest (hopefully measured by location and not character trait). What set me thinking was how the different people measured their remote distance.
For example, early contributors to the discussion expressed the distance in miles. Starting at about fifty the figure steadily increased until one person decided that their remote distance was "three hours". Though they did then qualify it with "180 miles" and mention that this was driving time in a car. I suppose if you don't own a car, three hours could be 40 miles on a bus (based on the journey times and regular stops in my neck of the woods), or 12 miles walking. So hours doesn't seem to be much of a useful distance measurement scale.
In fact it reminded me of a conversation I had some while ago (in my pre-Softie days) with a nice lady from Digital River in Minneapolis. At the time they were selling my software, and the nice lady had just been appointed as my marketing representative. She suggested I should call and see them when I was nearby and I happily agreed, even though I had no idea at the time where in the US the state of Minnesota is. But I often passed through Chicago airport in those days, and when I mentioned this she helpfully suggested that it was only a couple of hours from their office. "Great," I said, "Next time I'm over I'll hire a car and call in." To which she replied "Err, no, that's two hours by 'plane..."
So now I have another measure of distance - how far away somewhere is when flying there. Here in Ye Olde England a distance of two hours by air covers something like nine countries. Being a remote worker where there are a couple of whole countries between you and the office surely wins some kind of prize.
But getting back to the discussion at our meeting, I decided not to upset the rest of the group by mentioning that I'm 4,791 miles away from my office as the crow flies. Or, by plane via Amsterdam (my usual route) 5,935 miles away. And the trip, end to end, takes about 23 hours. But if I said I was 23 hours away, somebody would just suggest that I went the other way round the world because then it would take only an hour.
Or, if I was walking, it would only be four miles...
It's easy to imagine that the computer is a recent invention. A search of the web reveals a host of machines claiming to be the first electronic computer, and all are mid-20th century. However, what's harder to determine is the first appearance of a truly programmable computer. After watching a fascinating TV documentary this week, it seems that amongst the first was a model of a small boy writing on paper with a quill pen. And it was built more than two hundred years ago.
The automaton named The Writer was built by a family watchmaker business in France in 1774, and is just one of a series of clockwork-powered machines created around the world during that period. The Writer sits at a desk, dips a quill pen into a dish of ink, and then slowly and delicately writes beautiful cursive text onto a pad of paper on the desk. You can see a full description and pictures on the IW Magazine website.
OK, so clockwork automatons and toys had already been around for ages at that time. What struck me about The Writer is that the mechanism uses a large wheel containing details of the letters that will be written. But the segments of the wheel are removable and replaceable, so they can be changed at any time to write completely different text. It's effectively a stored program computer that converts a set of instructions into some recognizable output.
Yes, you can argue that it's a fairly simple transformation from program to output; there is no intermediate processing as such. Each interchangeable segment of the control wheel simply defines the set of movements of the automaton's hand. It's not a general purpose computer either - you can't tell it to play chess, or calculate the trajectory of a cannonball. But it's a fascinating stage in the development of adaptable stored program machines.
Mind you, the documentary also showed an automaton called The Turk, built in 1770, that supposedly could play chess - and could even beat the most skillful players of the time. This completely astounded those who saw it in action, and certainly would have been an incredible achievement if it hadn't turned out to be powered by a real human chess player hidden inside.
The documentary was produced by the BBC here in the UK, written and presented by the enigmatic Simon Schaffer, and you can even see a clip of The Writer on the program's website page. Yet, in a remarkable contrast to the capabilities of our ancestors, news this week revealed that the BBC has removed the clock showing the current time from all of its website pages.
Why? Because somebody complained that all it did was show the time on the origin server that generated the page, which might not be accurate for the location of the viewer (and presumably, if they left the page open for a few minutes, would be the wrong time anyway). The comment from the official BBC spokesperson was that it would take 100 developer days to change it so that the clock showed the correct time, and this could not be justified.
It comes as a bit of a shock when you phone your bank with a routine enquiry, only to be told that they're no longer a bank. When I mentioned it to a colleague she asked what they're turned into. Perhaps now they are a greengrocer? Or maybe a pet grooming salon? Have they spent my meagre savings on hair dryers for dogs? Or cabbages and carrots?
Fortunately, the change is only due to the consolidation and reorganization of the several banks that are now owned by the government (or, to be more accurate, by us taxpayers) and my bank is continuing to operate but not accepting any new business. However, it also is not accepting any changes to existing accounts either, and their advice was (amazingly) "I suggest you move to another bank!" So I have done.
Of course, the problem is that our lives are now so complicated by direct payments, standing orders, bank credits, and other bank-related puffery that you'd assume switching to another bank would be a complete nightmare. Even though they say they have all these clever systems to move everything over automatically. But, hey ho, what else can I do?
So I chose a bank that's UK-based, part of a major banking group, and has top ratings for customer service. And that doesn't charge an arm and a leg every month for "account maintenance," or for "valuable additional services" that I don't need. And when I phoned their customer service department to ask a few questions it was answered immediately by a real person instead of the expected "press 1 to be annoyed and listen to inane music" message.
Even setting up an account was easy and quick online, and for a while it looked like a wonderful financial honeymoon was on the cards! Until it came to setting up some additional services, where the system seems incapable of selecting the appropriate one of the two joint account holders. So I log off and apply for the additional service as a new customer, but now it says I can't do that either because I'm already a customer.
No problem, just call the nice man on the free-phone number and ask him what to do. That's when you discover that (a) it's not a free-phone number once you become a customer, and (b) on the number you have to use now they do have a "press 1 to be annoyed and listen to inane music" message. So I'll submit my question over their secure messaging system instead. Except that it won't let me tell them which account I want to talk about because "the application is pending".
Maybe I just expect too much from technology...
So my Windows 8 adventure has been terminated after only a brief foray into the delights of the new O/S. It's annual review time here an Microsoft, which means I need to connect to the corporate network. But my Windows 8 machine can't do that because the TPM module is faulty, and I need to have BitLocker enabled before they'll allow me to talk to the big iron in Seattle. So the old hard disk with Vista has seen the light of day again (or, to be more accurate, the dark of inside my laptop) and I'm back in 2006.
I never really noticed how little I really need to use the corporate network now that most of the applications I use in my daily work are in the cloud. I use our corporate ADFS for federated authentication, allowing me to access all our working docs stored in the TFS service in the cloud, and to connect to the various third party sites that manage our internal processes. And because my daily work is centered on Windows Azure, all of the working sets we use are available without needing even a sniff of the internal systems on the corporate network.
Even my online storage and email is cloud-powered now, and I'm being urged to make more use of cloud-based systems such as Office 356 in my daily work. It's quite amazing to see how the cloud is creeping, almost unnoticed, into everything connected with our IT world. It's a real vindication of what we've been writing here in p&p about claims-based authentication, moving applications to the cloud, and building enterprise solutions in Windows Azure.
Of course, I still have Windows 8 on my RT tablet, so I'm not completely divorced from 2013. OK, so it can't connect to my email server or the corporate network, but it means I can continue to figure out how to do stuff with the new O/S. Though sometimes I still look like an amateur. I used the rather good camera to take some photos this week to email to a colleague. However, having poked about to find where they end up being stored, and then got one showing full screen, every attempt to share it through Hotmail gave an error that my email wasn't set up correctly.
And then I couldn't figure out how to go back. There's no back button until you poke the screen. Then it took several minutes of wildly experimental prodding and sliding to get my email inbox and the photo showing side by side, and then there seemed no way to drag the photo into an email. Maybe I need to read the instructions. In the end I dropped into the desktop and did it the old fashioned way. I love the style of Windows 8 and the way that you can do lots of things by poking and sliding, but it really doesn't seem intuitive sometimes - or maybe it's just too clever.
Of course, the more I use it the easier all this will be. Except that I discovered a major problem now that summer is almost here. It's pretty much unusable in the conservatory unless I huddle under an overcoat like somebody selling bootleg watches. The reflectivity of the screen means that all I can see is my ugly mug and the sky (complete with clouds, so it looks a bit like Windows XP desktop). Though I guess this is an issue with all touch-screen devices. I have to go indoors to be able to see the screen on my phone.
What's worrying is that my new company laptop, when it finally arrives, will have a touch screen. Perhaps I'll never see the garden and conservatory again. I'll need to lock myself away in a dark room, or work covered with a sheet like some character from a third rate horror movie. My wife can just lift one edge and slide my meals underneath, and tell visitors that her husband is in the conservatory under a sheet, stored away like some item of old furniture (though maybe that's not so far from the truth).
Can I buy a non-reflective cover for a laptop touch-screen?
It's safe to assume that nobody could accuse me of being an eco-warrior. I buy cars that have more engine than I need, and computers with more power than is required to run any software I might ever use. And I quite happily squander electricity on a waterfall in the pond and lights in the trees, just to make the garden look nice. The trouble is that the electricity company seems to think that I should pay an increasingly exorbitant price for it.
We've all seen those TV programs where people build eco-friendly houses that generate their own electricity, collect and re-use rainwater, and suck heat out of the ground instead of paying for gas. Meanwhile the government is gradually covering the entire countryside with huge windmills and solar farms in an attempt to meet some green target, yet all this free electricity just seems to cost more every month. The electricity company just sent me an estimate of my next year's bill, and they reckon it will be over 1,500 pounds. It's time I figured out how to either use less, or pay less, or even get some for free.
Working on the "no free lunch" principle, you'd guess that the only people who would benefit from the current fad around solar panels on the roof would be the installers and panel manufacturers. However, talking to some neighbors who have taken the plunge, they have seen a considerable reduction in electricity bills and get a payment for feed-in four times a year as well. So it's probably time I took the plunge and turned our south-facing roof into a miniature power station. At least it should generate enough to run my servers for a few months in the summer...
But where you have to wonder is that, in a country that will supposedly be unable to keep the lights on beyond 2016, we are planning to throw huge amounts of money we don't have at a project that will eat electricity and blight thousands of people's lives. Nobody can produce a realistic rationalization for it, yet it will probably cost the best part of a hundred billion pounds by the time it's done.
At a time when the world is moving towards driverless cars, all-encompassing digital communication, increasing home working and localization, and the need to lead a green lifestyle, we're going to build a high-speed railway to connect the North and the Midlands with London. Yet we can't afford to build a high-speed broadband fixed and mobile system that would cost a tiny fraction of that amount.
OK, so I'm lucky because we happen to have cable here, but half a mile away our local post office is struggling to use modern retail technology over a 1MB ADSL line. My ADSL provider just sent me a beautiful color leaflet explaining how I will soon be able to watch loads of new sports channels over their Internet connection. At the bottom in very small writing it says that all I need is a 2MB ADSL line. Yes please, when can I have one? What's that? You're planning to have fibre-to-the-cabinet installed sometime in 2015? Super.
In the meantime I suppose I can just go to London on the high-speed train and watch it live instead...
I suppose it's a bit like that film The Matrix - you realize that you live in an ethereal and closed world only when you actually get to step outside of it. Or, like some people who have never been to another country, your view of the rest of the world is shaped just by what you see on TV. I guess I've been like that with open source stuff, and particularly Java; looking out incredulously from my little village of Microsoft technologies and products at the wide world beyond.
I started my computing days with what we euphemistically called a "home computer" (basically a games console with a keyboard), and progressed via a series of Amstrad computers to real PCs. At the time I was doing statistical and reporting work for a large manufacturing company, and had played with several MSDOS-based databases until I finally found nirvana with Windows 3.1 and Microsoft Access 1.0.
OK, so I'd learned a lot of computing theory in the meantime (such as a mixture of languages and programming theory), and I'd even written and sold technical, commercial, and business software. But most of it, especially after drifting deeper into the Windows way, was aimed at Microsoft operating systems and integration with Microsoft products. Gradually I'd been drawn in and captured by the Redmond magic.
It's only since I began work on our current HDInsight project that I've had to navigate deeper into the dark and scary jungles of open source; slashing away at the undergrowth of bewildering terminology with my virtual machete (Bing); wading knee-deep through murky and meandering streams of sometimes conflicting advice and guidance; and peering in amazement at the vast array of previously undiscovered wonders of nature such as ants, pigs, hives, zebra, and even a strange yellow elephant.
Sitting inside my nice comfy and well-defined Microsoft technology world, it's un-nerving to realize that until recently I never knew that all of this even existed. OK, so I've had dealings with Java, though mainly only under duress, and I've read about and even learnt a bit of other languages and frameworks such as Python and Ruby. In fact my first serious attempts at creating Windows DLLs were with Pascal (mainly because I could never get my head round C++). So Java code itself isn't really an issue.
No, where it all gets complicated is that almost all of the docs I read about Hadoop and the associated technologies are written by experts for experts. It seems like you need to know all about a whole range of topics and technologies before you can start learning about them. It's a bit like letting someone watch a medical drama series on TV and then giving them a scalpel, an operating theatre, and some (currently) live patients to practice on.
For example, I read endless articles about testing and debugging. It seems I should start by mocking out my objects (makes sense) and use a test runner to execute them within a single node local installation of Hadoop and with the Java virtual machine in advanced debug mode (I think). So I read about PowerMock, but it says I should use it with Mockito, but that says it's an extension of EasyMock. And I probably want to do it all from within Eclipse.
And to set up a local cluster I need to install Hadoop directly, although I have the HDInsight single node development environment already installed on my laptop. Can I use that instead? And if I want to run the Java VM in debug mode I probably want to use another program called ant to configure it. I'm sure I'll work all this out in time, though at the moment I'm wondering if it's going to be easier just to phone a friend (or ask the audience).
What's clear, however, is just how adaptable, configurable, and interoperable all this stuff is. I suppose I'm used to a strict product hierarchy and road map, nice dialog boxes and configuration web pages, and reams of beginner documentation for the Microsoft technologies that make up the confines of my own little world. Now I've strayed outside of the Microsoft Matrix, and discovered a whole wide world out there, I can't find a map. I'll probably spend the rest of my writing days blundering through the undergrowth, peering hopefully at every new edifice of civilization I can find, until - hopefully - it all start to fall into place. Or until some kind soul (aka Program Manager) taps me on the shoulder and mutters the equivalent of "Dr. Livingstone I presume."
Yes, the adventure is fun at the moment, but I'm quite looking forward to going home again...