Random Disconnected Diatribes of a p&p Documentation Engineer
So it's true. Senility had obviously settled in and my addled brain can no longer maintain even the simplest items of information such as a two-fingered keyboard combination. It seems that in future I'll be wandering aimlessly around my server room dribbling helplessly onto the network switches, muttering profanities in response to the strange symbols appearing on the monitors, and talking into the mouse.
What's brought me to this late stage of realization? Could it be because my habitual dabs at AltGr (the right-hand Alt key) and Delete no longer bring up the login page in my Hyper-V hosted machines? For some weeks I've been confounded by the fact my ailing brain seemed to remember that this always worked before, but now it doesn't. Even poking around in Hyper-V Manager and the properties of the VMs didn't reveal anything useful.
In fact, things got so bad that I actually had to look up the Hyper-V key combinations on TechNet after I got fed up restoring down the VM's window and clicking the Ctrl-Alt-Delete icon in the top menu bar. It seems that what you need is Ctrl-Alt-End, but how could I have forgotten that when most days I'm administering the servers?
However, after some Binging it turns out that I might have a few more months before I finally turn into a doddering and disoriented wreck. According to the Virtual PC and Virtual Server help pages, the equivalent of Ctrl-Alt-Delete in a virtual machine is HOSTKEY and Delete. Of course, it took ages more to find out that the default HOSTKEY is AltGr (you can change it), which was obviously maintained in Hyper-V. Probably so that the world's systems admins wouldn't all decide to retire in the same week.
As far as I can tell, some recent update must have removed this backwards compatibility - though I can't find any mention of it on the web. Maybe it's just me...? Did I break something...?
You'd think that, after all the years I've been writing guidance for Microsoft technologies and tools, I'd have at least grasped how to organize the structure of a guide ready to start pouring content into it. But, just as we're getting into our stride on the Windows Azure HDInsight project here at p&p, it turns out that Big Data = big problem.
Let me explain. When I worked on the Enterprise Library projects, it was easy to find the basic structure for the guides. The main subdivision is around the individual application blocks, and for each one it seems obvious that all you need to do is break it down into the typical scenarios, the solutions for each one, the practical implementation details, and a guide to good practices.
In the more recent guide for migrating existing applications to Windows Azure (see Moving Applications to the Cloud) we identified the typical stages for moving each part of the application to the cloud (virtual machines, hosted services, cloud database, federated authentication, Windows Azure storage, etc.) and built an example for each stage. So the obvious subdivision for the guide was these migration stages. Again, for each one, we document the typical scenarios, the solutions for each one, the practical implementation details, and a guide to good practices.
In the cases of our other Windows Azure cloud guides (Developing Multi-tenant Applications and Building Hybrid Applications) we designed and built a full reference implementation (RI) sample that showcases the technologies and services we want to cover. So it made sense to subdivide the guides around the separate elements of the technologies we are demonstrating - the user interface, the data model, the security mechanism, the communication patterns, deployment and administration, etc.
But none of these approaches seems to work for Big Data and HDInsight. At first I thought I'd just lost the knack of seeing an obvious structure appear as I investigate the technology. I couldn't figure out why there seemed to be no instantly recognizable subdivisions on which to build the chapter and content structure. And, of course, I wasn't alone in struggling to see where to go. The developers on the team were suddenly faced with a situation where they couldn't provide the usual type of samples or RI (or, to use the awful marketing terminology, "an F5 experience").
The guidance structure problem, once we finally recognized it, arises because Big Data is one of those topics that - unlike designing and building an application - doesn't have an underlying linear form. Yes, there is a lifecycle - though I hesitate to use the term "ALM" because what most Big Data users do, and what we want to document, is not actually building an application. It's more about getting the most from a humungous mass of tools, frameworks, scenarios, use cases, practices, and techniques. Not to mention politics, and maybe even superstition.
So do we subdivide the guide based on the ethereal lifecycle stages? After collecting feedback from experts and advisors it looks as though nobody can actually agree what these stages are, or what order you would do them in even if you did know what they are. The only thing they seem to agree on is that there really isn't anything concrete you can put into a "boxes-and-arrows" Visio diagram.
What about subdividing the guide on the individual parts of the overall technology? Perhaps a chapter on Hive, one on custom Map/Reduce component theory and design, one on configuring the cluster and measuring performance, and one on visualizing the results. But then we could easily end up with an implementation guide and documentation of the features, rather than a guide that helps you to understand the technology and make the right choices for your own scenario.
Another approach might be to subdivide the guide across the actual use cases for Big Data solutions. We spent quite some time trying to identify all of these and then categorize them into groups, but by the time we'd got past fifteen (and more were still appearing) it seemed like the wrong approach as well. Perhaps what's really big about Big Data is the amount of paper you need to keep scrawling a variety of topic trees and ever-changing content lists.
What becomes increasingly clear is that you need to keep coming back to thinking about what the readers actually want to know, and how best you can present this as a series of topics that flow naturally and build on each other. In most previous guides we could take some obvious subdivision of content and use it to define the separate chapters, then define a series of flowing topics within each chapter. But with the whole dollop of stuff that is Big Data, the "establishing a topic flow" thing needs to be done at the top level rather than at individual chapter level. Once we figured that, all the other sections fell naturally into place in the appropriate chapters.
So where did we actually end up after all this mental gyration? At the moment we're going with a progression of topics based on "What is it and what does it do", "Where and why would I use it?" "What decisions must I make first?", "OK, so basically how do I do this?" and "So now I know how to use it, how does it fit in with my business?" Then we'll have four or five chapters that walk through implementing different scenarios for Big Data such as simple querying and reporting, sentiment analysis, trend prediction, and handling streaming data. Plus some Hands-on Labs and maybe a couple of appendices describing the tools and the Map/Reduce patterns.
Though that's only today's plan...
So here's a question: why aren't our European masters hounding a certain well-known company to stop them installing unwanted software on our computers? Every time a hole in the Flash plugin is fixed they insist on fiddling with people's computers in a way that, if not actually illegal, seems to cause some users no end of hassle. If Microsoft included an update in every patch Tuesday that changes the user's default web browser to Internet Explorer, I'm sure there would be a huge outcry.
I mean, here in the People's Republic of Europe our faceless and unaccountable despots insist that I put my company's registration number in every email message I send, apply for a license before I can save somebody's email address in a database, and I even have to ask visitors to my website if they mind me sending them a cookie. Yet they do nothing about a company that tricks people into installing browser toolbars, and even whole web browsers.
Yes, it's a rant, and mainly because - yet again - I've had calls from friends and colleagues who have discovered that their computer has "gone funny". One even thought it was a virus, and is now too frightened to use the computer at all. And one call was from a relative whose computer I "fixed" just last month by resetting Internet Explorer as the default browser after the previous Flash player update.
I know you can argue that there's a checkbox you can un-tick if you don't want your system interfered with, but most inexperienced users won't dare do that in case they "break the computer" - as an industry we regularly impress on users that they should not fiddle with settings unless they know what they are doing.
And, yes, you could argue that the option is clearly shown with a description of what it does. But why is it set by default? If I want a new web browser, then surely I should have to say yes - rather than forgetting (or being too frightened of breaking something) to say no. If your local supermarket required you to tell them every time you didn't want some extra items automatically added to your shopping basket, you'd soon be writing to the local newspaper to complain. So at least try and persuade me to tick "yes" by telling me how wonderful the new browser is, rather than hoping I won't notice you decided "yes" was the default.
But I suppose that, if you want to win the browser wars, maybe one way is to pay some other company to surreptitiously install it on everyone's computer as part of a routine update...
Sometimes I think I'm the only person who takes Wi-Fi security seriously. Unlike all of my neighbors, I run my Wi-Fi access point with a hidden SSID so that nobody casually browsing the available networks will be tempted to try and connect to it. I also run it on half power, which is plenty sufficient to reach all round the house and garden without exposing it all along the street.
Of course, I also have it set to use WPA2-PSK, and it has a long and complex non-dictionary password. On top of that I enabled MAC authentication so that only known devices can connect. Yes, I know that most of these features can be cracked by determined attackers but all the good books say that defence in depth is the best approach, and the more layers of protection I have enabled the less the risk.
Should I actually worry about anybody connecting to my internal network through Wi-Fi? There's several other computers and devices on the internal network, although they are all secured with user names and passwords different from the wireless router credentials, and all sensitive folders and shares are locked down to the network admin account. But I really don't fancy having somebody I don't know wandering around my network.
Plus, anyone who did connect could get out onto the Internet through my proxy server, absorbing my bandwidth and exposing me to the risk of action if they do anything illegal over my connection. And I have to pay for my bandwidth, so why should I let other people soak it up browsing Facebook, playing games, and viewing doubtful content.
So it seems like my security approach is sensible. Unfortunately, Google doesn't agree. I recently bought my wife a Google Nexus 7 tablet so that she can soak up my expensive bandwidth browsing Facebook, playing games, and viewing pictures of cats. All the reviews I read said it's really easy to set up - you just choose your locale and your network connection, enter your Google account details, and (as we say over here, though I don't know why) "Bob's your uncle."
Yeah, you reckon? At step two you have to choose an existing wireless network and connect to it, or select "Add a network" if you use a hidden SSID. That's fine, but if I don't enter the MAC code of the device into the wireless router's configuration I can't connect. At this point the screen just says "Not in range" and you can't do anything about it.
Usually, when setting up any other computer, I skip the network setup and then go into the device information page to find the MAC address (that's what I had to do with our HTC Android phones). But Android on a tablet is obviously paranoid about not being able to talk to its Do No Evil home because there's no option to set up a network later. I guess they think that nobody would ever dream of using a tablet (where you can read books, watch videos, and listen to music) if there's no Internet connection.
And just to make matters worse, when you set up a new connection and don't get it exactly correct (such as the wrong letter case in the SSID, or an incorrect password) you can't edit it. The only options are "Connect" and "Forget It" - you have to remove the connection and then start all over again. And the dialog quite happily closes without saving the settings or warning you they'll be lost if your finger wavers a little on the onscreen keyboard.
So the only remedy to finish the setup seemed to be to go into the router's configuration and turn off MAC authentication while the tablet connected. Then, after setup is complete, find the MAC address in the tablet's system information pages, add it to the list in the router, and then turn MAC authentication back on. Assuming, of course, that turning off MAC authentication didn't lose the list of existing permitted addresses (I suggest you take a screenshot or copy them into Notepad first).
However thankfully, after three attempts when I finally got everything right in the tablet's connection dialog, my wireless router configuration page (after I turned MAC authentication off) detected that some unknown device was trying to connect and displayed the MAC address for me to add to the permitted clients list. After that I could turn MAC authentication back on and it worked. So completing the tablet's three page setup wizard only took the best part of an hour. Including swearing time.
It was only then that I discovered why I had so much trouble with the connection settings dialog - the tablet was suffering from the "phanton keystrokes" issue several other people have encountered (search the web for "nexus 7 phantom typing" for more details). So the next day it was back to the store to swap it for another one. From a different batch. And go through all the MAC authentication thing again because the MAC address is different.
And now I just need to figure out how to get it to talk to my wife's Exchange Server email account - which is exposed as a service over HTTP by our remote email hosting provider. And convert all the music she indoors wants putting onto it from WMV to MP3 format. Perhaps I'll need to take a holiday and stock up on new swear words before then...
Nobody could accuse me of being posh, and compared to most of the developers I work with here at Microsoft I'm probably not the brightest button in the box. But I did study mathematics in the past, including matrix theory. I just never got to pronounce it right.
It all came rushing back to me as I was watching a presentation about using singular value decomposition (SVD) to identify textual semantic spaces in a Big Data solution. I guess with a title like that I should have known better, but it did sound interesting. And would probably be really useful if I could understand any of it. Mind you, it was the end of a particularly stressful day and I was trying to get some other jobs finished at the same time it was playing on the second screen. Maybe an early start straight after a couple of cups of strong coffee will help next time.
But what struck me was that I remember everyone at school and college pronounced "matrix" with an "a" sound like in "apple", not an "ay" sound like in "say". Perhaps that's where being a rough English Northerner rather than a posh Home Counties softie comes into play. We say "grass" with the same "a as in apple" sound rather than "grahhss".
And even when the movie "The Matrix" came out and everyone called it "The May-tricks", I never thought about the different pronunciation. I suppose because, in my day job, you don't come across many matrices. Though I will accept that if you pronounce the plural as "may-tress-ease" rather than "mat-ress-ease" you're less likely to get confused when visiting a bed shop.
Though it did remind me of the story about an Englishman and an American who were trying to set up a meeting at a mutually convenient time:
Englishman: "You don't pronounce it like that! The proper way is 'shed-yool'" American: "Really? Is that what they taught you at sshool..."