Random Disconnected Diatribes of a p&p Documentation Engineer
One of the wondrous features of Windows from Vista onwards is Network Location Awareness (NLA). It means that, while the whole family can view your photo collection at home, when you pick up your laptop and wander out for coffee everyone else in Starbucks can't see the holiday pictures of you in embarrassing shorts. Or read the rather sad list of unadventurous things you planned to do before you were thirty (and didn't).
The idea is simple enough. NLA automatically adjusts the behavior of potentially risky settings and options based on the type of network to which the computer is attached. Typically this means turning on network discovery and file and printer sharing for home or private networks, turning them off for public networks, and doing something in between for domain-joined computers. It works by examining a range of settings for each network interface and making a decision on what type of network it actually is connected to. All very clever.
In Vista, you can usually change the setting yourself by clicking the Customize link in the Network and Sharing Center page. However, in domain-joined Windows Server 2008 R2 machines there is no Customize link, as I discovered last week. I'd just finished installing the latest round of updates and everything seemed to be running fine until I noticed that the icon in the taskbar was showing connection to "an unidentified Public network". With a nice picture of a park bench just in case I didn't grasp what "Public" means.
Note to the Microsoft Server team: how about including a tree in the picture so it's more obvious that the computer is now outside? You never know what some people might consider as appropriate lounge furniture.
Yet everything else seemed to be OK. So I did what all amateur administrators do: wandered off to TechNet and asked the question; only to find that lots of people seem to be confounded by the wrong type of network setting in Windows 7 and Windows Server 2008. There's loads of stuff about editing Registry settings and things, but I decided I'd just start out by disabling and re-enabling the network connection. Obviously it would sort itself out automatically. It did - the icon changed to the Domain setting with a pretty picture of an office building and correctly identified the name of the domain network.
What I wasn't prepared for was the welter of Event Log errors. All of a sudden the machine (a domain controller) couldn't find any other machines, couldn't apply group policy updates, couldn't access Active Directory, and couldn't find any DNS servers (even though I have three). After perusing a few more articles, I checked the interface settings (they were all correct) and then restarted the DNS service. And was rewarded with a dozen new DNS errors saying that the interfaces are unavailable.
Now, every five minutes, Group Policy dumps another error in the Event Log. So I open a command window and run gpupdate. It reports success, and confirms it with Event Log messages, but still Group Policy errors appear every five minutes. Together with one saying a certificate cannot be renewed and several saying that DNS can't find its own domain. It's after midnight and I feel like I've been hit by a train. And, of course, I can't just switch NLA back to Public mode again...
So I run netdiag and it reports no errors. Neither does dcdiag. And using the Active Directory console to force replication seems to work fine. Even ntdsutil can't find anything wrong with the roles or topology. So, in the end, I cave in and reboot the machine. Yep, it just came back up without reporting any errors. And it's still running clean and properly (with the correct NLA type) the next day. Why didn't I just do that first...? And how come something as simple as changing the NLA type can have such a stupendous effect on the system?
Perhaps instead of "Keep Calm and Carry On" it should be "Keep Calm and Reboot". Maybe I can have T-shirts made. Though I did like the sign in a tiny teashop in Whitby we called in at during a recent trip to the seaside: "Keep Calm and Eat Cake"...
What's the best way to document an API? It's a question that came up when we were documenting the Enterprise Library 5.0 project a while ago, and has resurfaced recently with another project I unexpectedly found myself attached to. It's also one of those annoying questions that typically offer three dozen wildly varying answers; none of which really appears to provide the optimum result. Yet good documentation of APIs is vital for developers to get the best from the code.
While I'm not actually a developer as such, I do write quite a lot of code. Most of it is examples for others to use and reuse, though sometimes I take my life in my hands and actually write stuff that I run in a production environment. And, inevitably, most of the samples I write are concerned with the newest, undocumented, and often beta technologies. So all I have to work with is Visual Studio Object Browser, IntelliSense, and (if I'm feeling particularly inquisitive) .NET Reflector.
Of course, tools such as Sandcastle and others can generate the HTML docs from the source code automatically, and these will (hopefully) contain meaningful summaries and parameter descriptions written by the original class developer within the source code. So all I need do is provide a brief explanation of any particular intricacies when using the class or class member, and add a short sample of code that shows how that class member works. Surely I can turn out all the required content in a few hours...?
But it's generally not that simple once you start to think about what developers might expect to find when they hit F1 in Visual Studio, or Bing for a class or member reference page. For example:
In an ideal world I would write one or more examples for each member of each class in the API. But should I write samples that use several members of the class that I can reuse in more than one class member page? This sounds like a time-saver, but generally results in a sample that is over-complicated and may even make it harder to understand, or hide some members in the midst of a big block of code.
More to the point, do I actually have the resources available to write specific samples for every member of every class in an API that, when you include member overloads, might have many hundreds of individual pages. Years ago when I was documenting the API for Active Server Pages 1.0 (in the now almost forgotten pre-.NET era), it was easy enough to document the very few members of the five classes that made up ASP 1.0. But even a reasonably small framework such as Enterprise Library 5.0 has more than 1000 pages in the API reference section.
The path we took with Enterprise Library was to avoid writing samples in the API pages, and instead document the key scenarios (both the typical ones and some less common ones) in the main product documentation. This allows us to explain the scenario and show code and relevant details for the classes and class members that accomplish the required tasks. In fact, even getting this far only came about after some reconsideration of our documentation process (see Making a Use Case for Scenarios).
So, if I was documenting the file access classes in System.IO I could spend several months writing different and very short samples for each member of the File, FileInfo, Directory, DirectoryInfo, TextReader, FileStream, Path, and many more classes. Or I could try and write a few meaningful examples that use all the methods of a class and include them in several member pages, though it's hard to see how this would be either meaningful or easy to use as a learning aid. And it's certain to result in unrealistic examples that are very unlikely be "copy and paste ready".
Instead, perhaps the way forward is to make more use of scenarios? For example, I could decide what the ten main things are that people will do with the File class; and then write specific targeted examples for each one. These examples would, of necessity, make use of several of the members of the class and so I would put them in the main page for the class instead of in the individual class member pages. And each one of these scenario solutions could be a complete example that is "copy and paste ready", or a detailed explanation and smaller code examples if that better suits the scenario. Each class member page would then have a link "scenarios and code examples" that points to the parent class page.
The problem is that people tell me developers expect to see code in the class member page, and just go somewhere else if there isn't any. What they don't tell me is how often developers look at the code and then go somewhere else because the one simple code example (or the much repeated over-complex example) doesn't satisfy their particular scenario.
For example, if you want to find out how to get the size of a disk file where do you start looking? In the list of members of the File class, or the FileInfo class. Or search for a File.Length property? Or a File.GetLength method? If the File class had a scenario "Find the properties of a disk file" you would probably figure that it would be a good place to look. The example would show that you need to create a FileInfo instance; and that you can then query the Length property of that instance.
Or, when using the SmtpClient class to send an email, one of the scenarios would be "Provide credentials for routing email through an SMTP server". That way the majority of examples would just use the default credentials, simplifying them and reducing complexity for the most typical scenarios. If the developer needs to create and set credentials, the specific scenario would show how to create different kinds of NetworkCredential instances that implement ICredentialsByHost for use with the SmtpClient class, but wouldn't need to include all the gunk for adding attachments and other non-relevant procedures.
I know it would be impossible to always have the exact scenario and code example that would satisfy the needs of every developer each time they use the API reference pages, but it does seem like the scenarios approach could cover the most common tasks and requirements. It could also be easily extended over time if other scenarios become obvious, or in response to specific demands. OK, so it would mean a couple of extra mouse clicks to find the code, but that code should more closely resemble the code you need to use, and be easier to adapt and include in your project.
Why not tell me what you think? How do you use an API reference, and - more important - what do you actually want to see in it?
In the days when I used to visit my Uncle Gerald, who was a keen gardener, he would often present me around this time of year with a large bundle of rhubarb and the instruction to "give these to your Mother and wish her a moving Easter". I suspect that the comment was somehow related to the laxative properties of rhubarb. We haven't had rhubarb in our house lately, but I still managed to have a moving Easter. I was moving all my VMs from a dead server to the backup one.
Yep. Woke up on Good Friday morning with the sun shining and plans for a nice relaxing day in the garden only to find the main server for my network sulking glumly in the corner of the server cabinet with no twinkly lights on the front and no whooshing of stale air from the back. Poke the "on" button and it runs for five seconds then dies again. Start to panic. Keep trying, no luck. Open the box and peer hopefully around inside. Nothing missing, no smoke or burnt bits, nothing looking like it was amiss.
Wiggle some wires and try again (the total extent of my hardware fault diagnosis capabilities). Disconnect the new hard drive I fitted a couple of weeks ago. Look in the BIOS logs, but they're empty. The most I could get it to do on one occasion was run as far as the desktop before it just died again. So, in desperation, phone a local Dell-approved engineer who offers to come and fix it the same day. But after three hours of testing, swapping components, general poking about with a multi-meter, and much huffing and mumbling, he comes the sad conclusion that the motherboard is faulty. And a new one is going to cost around 500 pounds in real money. Plus shipping and fitting.
The server is only two and a half years old (see Hyper-Ventilation, Act I), and I buy Dell stuff because it usually outlasts the lifespan of the software I run and ends up being donated to a needy acquaintance (with the hard drives removed, of course). But I suppose the sometimes extreme temperatures reached in the server cabinet can't have helped, especially as we've had a couple of very warm years and last week was a scorcher here. Though it has made me feel less like I trust the backup server I bought at the same time.
Ah, but surely there's no problem when a server fails? Just fire up the exported VM image backups on the other machine and I'm up and running again. Except that, unfortunately, I've been less than strict about setting things up generally on the network. Thing is, I was planning for a disaster such as a disk failure, which is surely more likely that a motherboard failure. With a disk failure it's just a matter of replacing the disk then restoring from a backup or importing the exported VMs. But a completely dead box raises lots of different issues. I know I should have nothing running within the Hyper-V host O/S, but somehow I ended up with one server having the backup domain controller running on the host O/S and the other (the main one) with the host O/S running WSUS, the SMTP server, Windows Media Services, the scheduled backup scripts, the website checker, and probably several other things I haven't discovered yet.
Therefore, while that main hosted server VMs (the FSMO domain controller, web server, ISA server, and local browser) fired up OK on the backup server, all the other stuff that makes the network work was gone. And then it got worse. The backup of the FSMO domain controller was a week old, and so it kept complaining that it didn't think the FSMO role was valid. And none of the recommended fixes using the GUI tools or ntdsutil worked. So I ended up junking the FSMO domain controller, forcing seizure of the roles on the backup domain controller, and then using ntdsutil to clean up the AD metabase. Afterwards, I discovered this document about virtualizing a domain controller which says "Do not use the Hyper-V Export feature to export a virtual machine that is running a domain controller" and explains why.
I certainly recommend you read the domain controller document. There's a ton of useful information in there, even though much is aimed at enterprise-level usage. However, when you get to the part about disabling write caching and using the virtual SCSI disk controller, look at this document that says you must use the virtual IDE controller for your start-up disk in a VM. But, coming back to the issue of backing up/exporting a VM'd domain controller, it looks like the correct answer is to run a regular automated backup within the DC's VM to a secure networked location instead. I've set it up for both the virtual and physical DCs to run direct to a local share and then get copied to the NAS drive, which will hopefully give me a fighting chance of getting my domain back next time. After you set up a scheduled backup in Windows Server Backup manager you can open Task Scheduler, find the task in the Microsoft | Windows | Backup folder, and change the schedule if you want something different from one or more times a day. And make sure any virtual DC VMs are set to always start up when the host server starts so that the FSMO DC can confirm it actually is the valid owner of the roles.
It does seem like a workable last resort disaster recovery strategy if a DC does fail is to force its removal from the domain and rebuild it from scratch. As long as you have one DC still working, even if it's not the FSMO, you should still be able to get (most of) your domain back by using it to seize the FSMO roles that were held by the dead DC and then cleaning it up afterwards. However, I wouldn't recommend this as a back-up strategy.
So after spending most of the holiday weekend with my head in the server cabinet, I managed to get back to some level of normality. I'm still trying to resolve some of the issues, and still trying to figure the ideal solution for virtualized and physical domain controllers. There's tons of variable advice on the web, and all of it seems to point to running multiple physical servers to overcome the problem of a virtualized DC not being available when a host server starts. Nobody is suggesting running Hyper-V on the domain controller host. However, my backup server that is valiantly and temporarily supporting the still working remnants of my network has both Domain Services (it's the FSMO domain controller) and Hyper-V roles enabled (it's hosting all the Hyper-V VMs).
Even though no-one seems to recommend this, they do grudgingly agree that it works and it does seem to be one way to cope with redundancy and start-up issues on a very small and lightly loaded network like mine, and when I get a new server organized it will also be a DC. Meanwhile I've created a "server operations" VM that contains all the other stuff that I lost - WSUS, SMTP server, Media Services, scheduled backup scripts, web site monitoring, etc. That way all I actually need on the base hosting server is Active Directory (so it is a DC) and the Hyper-V role with the correct network configuration. Oh, and the correct UPS configuration. And probably more esoteric setup stuff I'll only find out about when I get there.
Mind you, after I complained to my Dell sales guy about the failed server he's done me an extremely good deal on a five year pro support warranty with full onsite maintenance for the new box. So next time it fails I can just phone them and tell them to come and fix it. And until it arrives and is working so that I again have some physical server redundancy, I can only ruminate as to whether the fear of waking up to a dead network is as good a laxative as rhubarb...
My wife has been asking me why I haven't written about the recent Royal Wedding. Mainly it's because, surprisingly, I didn't receive an invitation; and so was unable to apply my usual highly perceptive and amazingly incisive documentation engineering capabilities to the occasion without first-hand, on-site experience. So I decided to write about the Royal Mail instead.
It seems that an outside broadcast presenter at one of our local radio stations phoned Royal Mail to ask where the post boxes are located in his town so that he could post letters to his listeners as he travelled around the locality. They told him that the information was "not available to the public", so - just to see what would happen - he applied officially for the details under the Freedom of Information Act.
The letter he got back stated that "releasing information on the locations of post boxes would clearly be likely to prejudice the commercial interests of Royal Mail", and that such information "would undermine their commercial value, significantly reducing Royal Mail's ability to exploit the information commercially". They even said that there was "significant public interest" in keeping the information private. OK, so I'm only an insignificant member of the public, but I've never shown any interest at all in keeping the whereabouts of our local post box a secret...
Obviously nobody at Royal Mail uses a road atlas, phone directory, sat-nav, or mapping website or they would have discovered that all of these show the locations of post office branches of Royal Mail. Surely these, each measuring several hundred square yards and often located in prime city centre locations, are more "commercially valuable" than the two square feet of pavement (sidewalk) taken up by a post box? Should I write and tell them about this alarming leak of commercially valuable information?
Of course, it could be that they are right about keeping valuable commercial locations secret. Just in case I've emailed the press office of a couple of national supermarket chains and hi-fi retailers, all of whom have a "Find your nearest branch" page on their websites. I haven't had any replies yet, but I confidently expect this dangerous feature to disappear from their sites very soon. I mean, just think of the commercial value of the ten acre town-centre site our local Tesco store inhabits. And they even have the naivety to display a huge sign on the roof!
And the same could just as easily apply to us here at Microsoft. I'm sure that the domain name alone is worth a few bob (dollars), and the huge number of sites and pages that hang off it must be of not inconsiderable commercial value. I need to warn our IT people that they should immediately remove us from all the DNS servers around the world, and disguise the sites so that people don't encounter them by accident and reveal their location. Just think how that could undermine their commercial value!
Mind you, as our roving local radio reporter pointed out, several people probably already know where the post boxes in his town are. Let's face it, a five foot high bright red box that, in many cases has been there since Victorian times, is hard to disguise. And if you were that interested, you'd only need to follow a post van on its rounds to find them. They even help you by painting the words "Royal Mail" in big letters on the side of the vans.
And I've just realized why I didn't get my invitation to the Royal Wedding! Obviously nobody would tell Kate where to find a post box to send it...
"Welcome back! You join us as Alex is trying to decide whether to act out his Star Wars fantasy with an R2 detour (D2-er, get it? Maybe not). With several hundreds of newly acquired gigs in the servers, will he risk upgrading from the so-last-decade Windows Server 2008 to the shiny new R2 edition? Especially now SP1 is out there."
In fact, now that I have plenty of room for new Hyper-V VMs it seemed like it was worth a try. As long as ADPrep doesn't screw up my Active Directory I can export the existing domain controller and other server VMs and then upgrade imported-alongside copies. If it all goes fruit-dimensional I can just dump the new VMs and fire up the old ones again. And if it does all work out OK I'll be less worried about upgrading the physical machine installations of 2008 that host the VMs.
So early on a Saturday morning I start the process. I've always dreaded running ADPrep since the time I tried to upgrade a box that started life on NT4 as an Exchange Server, was upgraded to Windows 2000 Server, and then upgraded again to Windows 2003 Server. The NT to 2000 upgrade required two days playing with ADSIEdit afterwards, and the 2000 to 2003 upgrade destroyed the domain altogether. However, this time the ADPrep 2008 to R2 upgrade ran fine on both forest and domain, so it was all looking peachy.
Have you ever wondered why things that go well are compared to peaches, while things that go wrong are pear-shaped? Especially as my wife can confirm that I will have absolutely nothing to do with hairy fruit (but that's another story).
And now I can expand the size of the VMs disks in Hyper-V Manager and then extend the volumes using the Storage Management console within each VM's O/S to get the requisite 15 GB of free space. Then bung in the DVD, cross my fingers and toes, mutter a short prayer to the god of operating system upgrades, and hit Install. Except that it says I have to stop or uninstall Active Directory Federation Services (ADFS) first.
So I go and read about upgrading ADFS. This doc on MSDN for upgrading and uninstalling ADFS goes through all the things you need to do with IIS configuration, PowerShell scripts, and editing the Registry to properly remove the standalone v 2.0 installation. But another says that the R2 upgrade will just remove it anyway. There is an ADFS Role in 2008 R2, but note that this is ADFS 1.1 not 2.0. And I never managed to make this role work anyway; probably because I didn't do all the uninstall stuff first. If you want to run ADFS 2.0 I suggest you follow the full uninstall and clean-up instructions before you upgrade to R2. Then, after you upgrade the O/S, just download and install the ADFS 2.0 setup file for 2008 R2 (make sure you select RTW\W2K8R2\amd64\AdfsSetup.exe on the download page) instead of enabling the built-in Federation Service Role.
Next, install the 72 updates for R2 (thank heavens for WSUS) and then install SP1. And then some more updates. But, finally, my primary (FSMO) domain controller was running again. And most of the 100 or so errors and warnings in Event Viewer had stopped re-occurring. Except for a couple of rather worrying ones. In particular: "The DHCP service has detected that it is running on a DC and has no credentials configured..." and "Volume Shadow Copy Service error: Unexpected error calling routine RegOpenKeyExW(-147483646,SYSTEM\CurrentControlSet\Services\VSS\Diag,[account name]). hr = 0x80070005, Access is denied".
Solving the VSS error is supposed to be easy - you can tell which account failed to access the Registry key from the message. Except that there is no account name in my error message. In this (not unknown) case, the trick with this VSS error, so they say, is to locate another error that occurred at the same time - which is usually the cause of the VSS error. In my case it seemed like it was the DHCP error, and this page on the Microsoft Support site explains how to fix it. I've never had this error before in Server 2008, but the fix they suggest seems to have cured the DHCP error.
Deleting a DHCP entry in DHCP Manager and then viewing DNS Manager shows it removes that machine from the DNS as expected, and ipconfig /renew on that machine creates a new DHCP entry that replicates to DNS. And no errors in Event Viewer, which hopefully indicates that it's working as it should. However, this hasn't so far cured the VSS error, and now there are no other errors occurring at the same time. But after some searching I found this page that explains why it's happening and says that you can ignore it.
Next I can upgrade the backup domain controller, and for some reason I don't get the same DHCP error even though it also runs DHCP (with a separate address range in case the primary server is down). Very strange... unless it was initially an AD replication issue when only one DC was running. Who knows? Though I do get the same VSS error here, confirming that it wasn't actually the DHCP problem causing it last time.
Anyway, at last I can tackle the more nerve-wracking upgrade of the base O/Ss of the machines that host the VMs. This time setup stops with a warning that I have to stop the Hyper-V service. However, this blog post from the Hyper-V team says I can just ignore this message and they are correct - it worked. The VMs fired up again afterwards OK, though the Server 2003 one did require an update to the Hyper-V Integration Services; which means you have to stop it again and add the DVD drive to it in Hyper-V Manager because you forgot to do that first...
One remaining cause of concern is the error on the primary DC that "Name resolution for the name [FQDN of its own domain] timed out after none of the configured DNS servers responded". NSLookup finds it OK, Active Directory isn't complaining, and everything seems to be working at the moment so it's on the "pending" list. A web search reveals hundreds of reports of this error, and an equally vast range of suggestions for fixing it - including buying a new router and changing all the underlying transport settings for the TCP protocol. Think I'll give that a miss for the time being.
Of course, a few more upgrade annoyances arose over the next couple of days. On the file server that is also the music server the upgrade to R2 removes the Windows Media Service role. After the upgrade you have to download the Windows Server 2008 R2 Streaming Media Service role from Microsoft and install it, then enable the role in Server Management and configure the streaming endpoints again. And, of course, it's been so long since you did this last time you can't remember what the settings were. Don't depend on the help file to be much user either.
And as with other upgrades and service packs, the R2 upgrade silently re-enables all of the network connections in the Hyper-V host machine's base O/S, so that the connections to the outside world are enabled for the machine that is typically on the internal network (see this post for details). You need to go back into the base O/S's Network Connections dialog and disable those you don't want. However, in R2 you can un-tick the Allow management operating system to share this network adapter option in Virtual Network Manager to remove these duplicated connections from the base O/S so that updates and patches applied in the future do not re-enable them.
But of much more concern was the effects of the upgrade on my web server box. After it was all complete, patched, SP'd, and running again I decided to have a quick peep at the IIS and firewall settings. Without warning the update had enabled the FTP Service (which I don't run) and set it to auto-start, then added a heap of Allow rules to the Public profile to allow FTP in and out. Plus several more to allow DCOM in for remote management. As usual, after any update, remember to check your configuration for unexpected changes. If you don't need the FTP service, remove it as a Feature in Server Manager, which prevents it from automatically enabling the firewall rules.
And a day or so later I discovered that the R2 upgrade also set the SMTP service to Manual start as well, so the websites and WSUS could no longer send email. The service started OK and so I set it to Automatic start and thought no more about it until WSUS began reporting three or four times a day that it was unable to send email. Yet testing it in the WSUS Email Options dialog reveals that it can send email. So I added the configuration settings in the IIS7 Manager for SMTP (even though I never had to do this before), and it made no difference. Every day I get an email from WSUS with the all the newly downloaded updates listed, and three Event Log messages saying it can't send email. Perhaps next week it will start sending me emails to tell me it can't send email...
Finally, by late Monday evening, everything was up and running again. OK, there are still a couple of Event Log errors and warnings to track down and fix, but mostly it all seems to be working. And, I guess, the whole process was a lot less painless than I'd expected. O/S upgrades have certainly improved over the years, and I have to say that the server guys really did an excellent job with this one. It was certainly worth it just to be able to run the latest roles, and - at least so far - I even have proper working mouse pointers in all the VMs!
What I did notice is how, for a short period post upgrade, life seems a lot more exciting. Well, at least the server-related segments of my day do. Each reboot is accompanied by that wonderful sense of anticipation: Have I broken it? Will it restart? Will I get some exciting new errors and warnings?
It's as though the new O/S is a bit delicate and you need to handle it gently for a while. Like when you've just glued the handle back on your wife's favorite mug you broke when doing the dishes, and you're not sure if it will all just fall to pieces again. Until you're really convinced it's settled down you don't want to click too quickly, or wave the mouse about too much. Or open too many applications at one go in case it gets annoyed, or just can't cope until it's finally unpacked its suitcase and settled in.
Or maybe I really do need to get a life...