Random Disconnected Diatribes of a p&p Documentation Engineer
After the problems with network location ignorance the other week, and being an inquisitive type, I decided to dig a little deeper and see if I could identify why my sever was unsure about the type of network is was connected to. For some while I've had occasional issues with web browsing where page requests immediately throw up an error that a URL cannot be found, but refreshing the page in the browser works fine. And, of course, the odd Event Log message that "Name resolution for the name [some domain name] timed out after none of the configured DNS servers responded."
I keep coming back to the conclusion that there is a DNS error somewhere in my network setup, but I've never been able to trace it. However, after some experimenting with nslookup I discovered that querying for a domain outside my network without adding a period (.) to the end of the domain name resulted in a response giving that domain name with my own FQDN domain appended to it, and it always resolved to the address 22.214.171.124. This seems wrong. For example, querying for "microsoft.com" returns "microsoft.com.[my-own-domain].com with the IP address 126.96.36.199, whereas it should return something like 188.8.131.52. But appending a period to the query ("microsoft.com.") gives the correct result.
So I query the weird IP address and it resolves to "advancedsearch.virginmedia.com". Which, if you query it as a DNS server ("nslookup microsoft.com advancedsearch.virginmedia.com") just times out. It isn't a DNS server. I use NTL Business Cable as one of my ISPs, and they are a branch of Virgin Media, so I can see where the IP address is coming from. I also have two valid Virgin Media DNS servers I can use, so I repeat the lookups specifying one of these to try and discover where the strange behavior is coming from.
It turns out that the Virgin Media DNS servers have a neat trick: if they receive a request for a domain they can't find, they automatically return the IP address of the Virgin Advanced Web Search page. As the browser does a DNS lookup by appending the machine's domain name if it can't find the one requested (I assume this happens because the default network connection setting in the DNS tab of the Network Connections dialog is to append the primary suffixes to the domain name for resolution of unqualified names), the Virgin Media DNS server responds with the requested domain name plus the machine FQDN and the IP address of the advanced web search page.
My internal DNS servers have forwarders set up for resolution of external domain names, and I had added the two Virgin Media DNS servers to the list along with the DNS servers of my other ISP (British Telecom). Repeating the tests against the BT DNS servers shows that they don't do any fancy tricks. Looking up a non-existent domain simply returns a "Non-existent domain" message. So I removed the Virgin Media DNS servers from the list of forwarders in my internal DNS and it stopped the weird "advanced search" behavior happening. And, so far, it also seems to have stopped the problems of failed lookups and "Name resolution timed out after none of the configured DNS servers responded" errors in the server's event logs.
Removing the Virgin Media DNS servers from the list of forwarders also looks like it has stopped the occurrence of exessive non-TCP requests being sent out onto the net from my domain controllers. My ISA Server occassionally reported that they were opening more than the maximum configured number of non-TCP connections, and these turned out to be DNS lookups. but, of course, it could all just be a wild coincidence.
But I can't help wondering why, for a business connection where you'd expect people to run their own DCs and DNS servers, they decided it's a good idea to return the address of a web search page from a DNS lookup query. Perhaps they get paid a bonus for each click-though...
Footnote: If you are looking for alternative DNS servers to use, you might like to try the Google ones (see http://code.google.com/speed/public-dns/). I'm using these at the moment, with no problems detected so far.
They've been advertising the book "In the Land of Invented Languages" by Arika Okrent on The Register web site for a while, and I finally caved in and bought a copy. And I have to say it's quite an amazing book. It really makes you think about how languages have evolved, and how we use language today. It even contains a list of the 500 most well-known invented languages; and a whole chapter that explores the origins and syntax of Klingon.
Even the chapter titles tempt you to explore the contents. There's a whole chapter devoted to the symbolic language representation for human excrement (though the word they use in the title is a little more graphic), and another called "A Calculus of Thought" that describes mathematical approaches to and analysis of language. Though the chapter title I liked best is "A Nudist, a Gay Ornithologist, a Railroad Enthusiast, and a Punk Cannabis Smoker Walk Into a Bar...". Meanwhile the chapter on Klingon explains that "Hab SoSlI' Quch" is a useful term for insulting someone ("Your Mother has a smooth forehead").
The book ranges widely over topics such as how languages work, and the many different ways that people have tried over the years to categorize languages into a set of syntactic representation trees that separate the actual syntax from the underlying meaning. A bit like we use CSS to separate the UI representation from the underlying data in web pages. It raises an interesting point that, if every language could be categorized into a tree like this, translation from one to another should be really easy.
For example, the problem we have with words such as "like" that could mean two completely different things ("similar to" or "have affection for") would go away because the symbolic representation and the location within the syntax tree would be different for each meaning. Except that you'd have to figure out how to convert the original text into the symbolic tree representation first, so it's probably no advantage...
But it struck me that the book makes little mention of the myriad invented languages that we use in the IT world every day. Surely Visual Basic, C#, Smalltalk, LISP, and even HTML and CSS are invented languages? OK, so we tend to use them to talk to a machine rather than to each other (though I've met a few people who could well be the exception that proves the rule), but they are still languages as such. And the best part is that they already have a defined symbolic tree that includes every word in the language, because that's how code compilers work.
However, it seems that our computer-related invented languages are actually resolutely single-language in terms of globalization. A web search for print("Hello World") returns 12,600,000 matches, whereas imprimer("Bonjour tout le monde") finds nothing even remotely related. It looks as though, at least in the IT world, we are actually forcing everybody to learn US English - even if it's only computer language keywords.
Does this mean that computer programming for people whose first language is not English is harder because they need to learn what words like "print", "add", "credential", "begin", "file", and more actually mean in their language to be able to choose the correct keywords? Or does learning a language such as Visual Basic or C# make it easier to learn English as a spoken language? Are there enough words in these computer languages to make yourself understood if that's the only English words you know? I guess it would be a very limited conversation.
So maybe we should consider expanding the range of reserved words in our popular computing languages to encompass more everyday situations. Working on the assumption that, in a few years' time everyone will need to be computer literate just to survive, eventually there would be no need for language translation. We could just converse using well-understood computer languages.
Finally let Me End this POST DateTime.Now && JOIN Me Next(Week) To continue...
One of the wondrous features of Windows from Vista onwards is Network Location Awareness (NLA). It means that, while the whole family can view your photo collection at home, when you pick up your laptop and wander out for coffee everyone else in Starbucks can't see the holiday pictures of you in embarrassing shorts. Or read the rather sad list of unadventurous things you planned to do before you were thirty (and didn't).
The idea is simple enough. NLA automatically adjusts the behavior of potentially risky settings and options based on the type of network to which the computer is attached. Typically this means turning on network discovery and file and printer sharing for home or private networks, turning them off for public networks, and doing something in between for domain-joined computers. It works by examining a range of settings for each network interface and making a decision on what type of network it actually is connected to. All very clever.
In Vista, you can usually change the setting yourself by clicking the Customize link in the Network and Sharing Center page. However, in domain-joined Windows Server 2008 R2 machines there is no Customize link, as I discovered last week. I'd just finished installing the latest round of updates and everything seemed to be running fine until I noticed that the icon in the taskbar was showing connection to "an unidentified Public network". With a nice picture of a park bench just in case I didn't grasp what "Public" means.
Note to the Microsoft Server team: how about including a tree in the picture so it's more obvious that the computer is now outside? You never know what some people might consider as appropriate lounge furniture.
Yet everything else seemed to be OK. So I did what all amateur administrators do: wandered off to TechNet and asked the question; only to find that lots of people seem to be confounded by the wrong type of network setting in Windows 7 and Windows Server 2008. There's loads of stuff about editing Registry settings and things, but I decided I'd just start out by disabling and re-enabling the network connection. Obviously it would sort itself out automatically. It did - the icon changed to the Domain setting with a pretty picture of an office building and correctly identified the name of the domain network.
What I wasn't prepared for was the welter of Event Log errors. All of a sudden the machine (a domain controller) couldn't find any other machines, couldn't apply group policy updates, couldn't access Active Directory, and couldn't find any DNS servers (even though I have three). After perusing a few more articles, I checked the interface settings (they were all correct) and then restarted the DNS service. And was rewarded with a dozen new DNS errors saying that the interfaces are unavailable.
Now, every five minutes, Group Policy dumps another error in the Event Log. So I open a command window and run gpupdate. It reports success, and confirms it with Event Log messages, but still Group Policy errors appear every five minutes. Together with one saying a certificate cannot be renewed and several saying that DNS can't find its own domain. It's after midnight and I feel like I've been hit by a train. And, of course, I can't just switch NLA back to Public mode again...
So I run netdiag and it reports no errors. Neither does dcdiag. And using the Active Directory console to force replication seems to work fine. Even ntdsutil can't find anything wrong with the roles or topology. So, in the end, I cave in and reboot the machine. Yep, it just came back up without reporting any errors. And it's still running clean and properly (with the correct NLA type) the next day. Why didn't I just do that first...? And how come something as simple as changing the NLA type can have such a stupendous effect on the system?
Perhaps instead of "Keep Calm and Carry On" it should be "Keep Calm and Reboot". Maybe I can have T-shirts made. Though I did like the sign in a tiny teashop in Whitby we called in at during a recent trip to the seaside: "Keep Calm and Eat Cake"...
What's the best way to document an API? It's a question that came up when we were documenting the Enterprise Library 5.0 project a while ago, and has resurfaced recently with another project I unexpectedly found myself attached to. It's also one of those annoying questions that typically offer three dozen wildly varying answers; none of which really appears to provide the optimum result. Yet good documentation of APIs is vital for developers to get the best from the code.
While I'm not actually a developer as such, I do write quite a lot of code. Most of it is examples for others to use and reuse, though sometimes I take my life in my hands and actually write stuff that I run in a production environment. And, inevitably, most of the samples I write are concerned with the newest, undocumented, and often beta technologies. So all I have to work with is Visual Studio Object Browser, IntelliSense, and (if I'm feeling particularly inquisitive) .NET Reflector.
Of course, tools such as Sandcastle and others can generate the HTML docs from the source code automatically, and these will (hopefully) contain meaningful summaries and parameter descriptions written by the original class developer within the source code. So all I need do is provide a brief explanation of any particular intricacies when using the class or class member, and add a short sample of code that shows how that class member works. Surely I can turn out all the required content in a few hours...?
But it's generally not that simple once you start to think about what developers might expect to find when they hit F1 in Visual Studio, or Bing for a class or member reference page. For example:
In an ideal world I would write one or more examples for each member of each class in the API. But should I write samples that use several members of the class that I can reuse in more than one class member page? This sounds like a time-saver, but generally results in a sample that is over-complicated and may even make it harder to understand, or hide some members in the midst of a big block of code.
More to the point, do I actually have the resources available to write specific samples for every member of every class in an API that, when you include member overloads, might have many hundreds of individual pages. Years ago when I was documenting the API for Active Server Pages 1.0 (in the now almost forgotten pre-.NET era), it was easy enough to document the very few members of the five classes that made up ASP 1.0. But even a reasonably small framework such as Enterprise Library 5.0 has more than 1000 pages in the API reference section.
The path we took with Enterprise Library was to avoid writing samples in the API pages, and instead document the key scenarios (both the typical ones and some less common ones) in the main product documentation. This allows us to explain the scenario and show code and relevant details for the classes and class members that accomplish the required tasks. In fact, even getting this far only came about after some reconsideration of our documentation process (see Making a Use Case for Scenarios).
So, if I was documenting the file access classes in System.IO I could spend several months writing different and very short samples for each member of the File, FileInfo, Directory, DirectoryInfo, TextReader, FileStream, Path, and many more classes. Or I could try and write a few meaningful examples that use all the methods of a class and include them in several member pages, though it's hard to see how this would be either meaningful or easy to use as a learning aid. And it's certain to result in unrealistic examples that are very unlikely be "copy and paste ready".
Instead, perhaps the way forward is to make more use of scenarios? For example, I could decide what the ten main things are that people will do with the File class; and then write specific targeted examples for each one. These examples would, of necessity, make use of several of the members of the class and so I would put them in the main page for the class instead of in the individual class member pages. And each one of these scenario solutions could be a complete example that is "copy and paste ready", or a detailed explanation and smaller code examples if that better suits the scenario. Each class member page would then have a link "scenarios and code examples" that points to the parent class page.
The problem is that people tell me developers expect to see code in the class member page, and just go somewhere else if there isn't any. What they don't tell me is how often developers look at the code and then go somewhere else because the one simple code example (or the much repeated over-complex example) doesn't satisfy their particular scenario.
For example, if you want to find out how to get the size of a disk file where do you start looking? In the list of members of the File class, or the FileInfo class. Or search for a File.Length property? Or a File.GetLength method? If the File class had a scenario "Find the properties of a disk file" you would probably figure that it would be a good place to look. The example would show that you need to create a FileInfo instance; and that you can then query the Length property of that instance.
Or, when using the SmtpClient class to send an email, one of the scenarios would be "Provide credentials for routing email through an SMTP server". That way the majority of examples would just use the default credentials, simplifying them and reducing complexity for the most typical scenarios. If the developer needs to create and set credentials, the specific scenario would show how to create different kinds of NetworkCredential instances that implement ICredentialsByHost for use with the SmtpClient class, but wouldn't need to include all the gunk for adding attachments and other non-relevant procedures.
I know it would be impossible to always have the exact scenario and code example that would satisfy the needs of every developer each time they use the API reference pages, but it does seem like the scenarios approach could cover the most common tasks and requirements. It could also be easily extended over time if other scenarios become obvious, or in response to specific demands. OK, so it would mean a couple of extra mouse clicks to find the code, but that code should more closely resemble the code you need to use, and be easier to adapt and include in your project.
Why not tell me what you think? How do you use an API reference, and - more important - what do you actually want to see in it?
In the days when I used to visit my Uncle Gerald, who was a keen gardener, he would often present me around this time of year with a large bundle of rhubarb and the instruction to "give these to your Mother and wish her a moving Easter". I suspect that the comment was somehow related to the laxative properties of rhubarb. We haven't had rhubarb in our house lately, but I still managed to have a moving Easter. I was moving all my VMs from a dead server to the backup one.
Yep. Woke up on Good Friday morning with the sun shining and plans for a nice relaxing day in the garden only to find the main server for my network sulking glumly in the corner of the server cabinet with no twinkly lights on the front and no whooshing of stale air from the back. Poke the "on" button and it runs for five seconds then dies again. Start to panic. Keep trying, no luck. Open the box and peer hopefully around inside. Nothing missing, no smoke or burnt bits, nothing looking like it was amiss.
Wiggle some wires and try again (the total extent of my hardware fault diagnosis capabilities). Disconnect the new hard drive I fitted a couple of weeks ago. Look in the BIOS logs, but they're empty. The most I could get it to do on one occasion was run as far as the desktop before it just died again. So, in desperation, phone a local Dell-approved engineer who offers to come and fix it the same day. But after three hours of testing, swapping components, general poking about with a multi-meter, and much huffing and mumbling, he comes the sad conclusion that the motherboard is faulty. And a new one is going to cost around 500 pounds in real money. Plus shipping and fitting.
The server is only two and a half years old (see Hyper-Ventilation, Act I), and I buy Dell stuff because it usually outlasts the lifespan of the software I run and ends up being donated to a needy acquaintance (with the hard drives removed, of course). But I suppose the sometimes extreme temperatures reached in the server cabinet can't have helped, especially as we've had a couple of very warm years and last week was a scorcher here. Though it has made me feel less like I trust the backup server I bought at the same time.
Ah, but surely there's no problem when a server fails? Just fire up the exported VM image backups on the other machine and I'm up and running again. Except that, unfortunately, I've been less than strict about setting things up generally on the network. Thing is, I was planning for a disaster such as a disk failure, which is surely more likely that a motherboard failure. With a disk failure it's just a matter of replacing the disk then restoring from a backup or importing the exported VMs. But a completely dead box raises lots of different issues. I know I should have nothing running within the Hyper-V host O/S, but somehow I ended up with one server having the backup domain controller running on the host O/S and the other (the main one) with the host O/S running WSUS, the SMTP server, Windows Media Services, the scheduled backup scripts, the website checker, and probably several other things I haven't discovered yet.
Therefore, while that main hosted server VMs (the FSMO domain controller, web server, ISA server, and local browser) fired up OK on the backup server, all the other stuff that makes the network work was gone. And then it got worse. The backup of the FSMO domain controller was a week old, and so it kept complaining that it didn't think the FSMO role was valid. And none of the recommended fixes using the GUI tools or ntdsutil worked. So I ended up junking the FSMO domain controller, forcing seizure of the roles on the backup domain controller, and then using ntdsutil to clean up the AD metabase. Afterwards, I discovered this document about virtualizing a domain controller which says "Do not use the Hyper-V Export feature to export a virtual machine that is running a domain controller" and explains why.
I certainly recommend you read the domain controller document. There's a ton of useful information in there, even though much is aimed at enterprise-level usage. However, when you get to the part about disabling write caching and using the virtual SCSI disk controller, look at this document that says you must use the virtual IDE controller for your start-up disk in a VM. But, coming back to the issue of backing up/exporting a VM'd domain controller, it looks like the correct answer is to run a regular automated backup within the DC's VM to a secure networked location instead. I've set it up for both the virtual and physical DCs to run direct to a local share and then get copied to the NAS drive, which will hopefully give me a fighting chance of getting my domain back next time. After you set up a scheduled backup in Windows Server Backup manager you can open Task Scheduler, find the task in the Microsoft | Windows | Backup folder, and change the schedule if you want something different from one or more times a day. And make sure any virtual DC VMs are set to always start up when the host server starts so that the FSMO DC can confirm it actually is the valid owner of the roles.
It does seem like a workable last resort disaster recovery strategy if a DC does fail is to force its removal from the domain and rebuild it from scratch. As long as you have one DC still working, even if it's not the FSMO, you should still be able to get (most of) your domain back by using it to seize the FSMO roles that were held by the dead DC and then cleaning it up afterwards. However, I wouldn't recommend this as a back-up strategy.
So after spending most of the holiday weekend with my head in the server cabinet, I managed to get back to some level of normality. I'm still trying to resolve some of the issues, and still trying to figure the ideal solution for virtualized and physical domain controllers. There's tons of variable advice on the web, and all of it seems to point to running multiple physical servers to overcome the problem of a virtualized DC not being available when a host server starts. Nobody is suggesting running Hyper-V on the domain controller host. However, my backup server that is valiantly and temporarily supporting the still working remnants of my network has both Domain Services (it's the FSMO domain controller) and Hyper-V roles enabled (it's hosting all the Hyper-V VMs).
Even though no-one seems to recommend this, they do grudgingly agree that it works and it does seem to be one way to cope with redundancy and start-up issues on a very small and lightly loaded network like mine, and when I get a new server organized it will also be a DC. Meanwhile I've created a "server operations" VM that contains all the other stuff that I lost - WSUS, SMTP server, Media Services, scheduled backup scripts, web site monitoring, etc. That way all I actually need on the base hosting server is Active Directory (so it is a DC) and the Hyper-V role with the correct network configuration. Oh, and the correct UPS configuration. And probably more esoteric setup stuff I'll only find out about when I get there.
Mind you, after I complained to my Dell sales guy about the failed server he's done me an extremely good deal on a five year pro support warranty with full onsite maintenance for the new box. So next time it fails I can just phone them and tell them to come and fix it. And until it arrives and is working so that I again have some physical server redundancy, I can only ruminate as to whether the fear of waking up to a dead network is as good a laxative as rhubarb...
My wife has been asking me why I haven't written about the recent Royal Wedding. Mainly it's because, surprisingly, I didn't receive an invitation; and so was unable to apply my usual highly perceptive and amazingly incisive documentation engineering capabilities to the occasion without first-hand, on-site experience. So I decided to write about the Royal Mail instead.
It seems that an outside broadcast presenter at one of our local radio stations phoned Royal Mail to ask where the post boxes are located in his town so that he could post letters to his listeners as he travelled around the locality. They told him that the information was "not available to the public", so - just to see what would happen - he applied officially for the details under the Freedom of Information Act.
The letter he got back stated that "releasing information on the locations of post boxes would clearly be likely to prejudice the commercial interests of Royal Mail", and that such information "would undermine their commercial value, significantly reducing Royal Mail's ability to exploit the information commercially". They even said that there was "significant public interest" in keeping the information private. OK, so I'm only an insignificant member of the public, but I've never shown any interest at all in keeping the whereabouts of our local post box a secret...
Obviously nobody at Royal Mail uses a road atlas, phone directory, sat-nav, or mapping website or they would have discovered that all of these show the locations of post office branches of Royal Mail. Surely these, each measuring several hundred square yards and often located in prime city centre locations, are more "commercially valuable" than the two square feet of pavement (sidewalk) taken up by a post box? Should I write and tell them about this alarming leak of commercially valuable information?
Of course, it could be that they are right about keeping valuable commercial locations secret. Just in case I've emailed the press office of a couple of national supermarket chains and hi-fi retailers, all of whom have a "Find your nearest branch" page on their websites. I haven't had any replies yet, but I confidently expect this dangerous feature to disappear from their sites very soon. I mean, just think of the commercial value of the ten acre town-centre site our local Tesco store inhabits. And they even have the naivety to display a huge sign on the roof!
And the same could just as easily apply to us here at Microsoft. I'm sure that the domain name alone is worth a few bob (dollars), and the huge number of sites and pages that hang off it must be of not inconsiderable commercial value. I need to warn our IT people that they should immediately remove us from all the DNS servers around the world, and disguise the sites so that people don't encounter them by accident and reveal their location. Just think how that could undermine their commercial value!
Mind you, as our roving local radio reporter pointed out, several people probably already know where the post boxes in his town are. Let's face it, a five foot high bright red box that, in many cases has been there since Victorian times, is hard to disguise. And if you were that interested, you'd only need to follow a post van on its rounds to find them. They even help you by painting the words "Royal Mail" in big letters on the side of the vans.
And I've just realized why I didn't get my invitation to the Royal Wedding! Obviously nobody would tell Kate where to find a post box to send it...
"Welcome back! You join us as Alex is trying to decide whether to act out his Star Wars fantasy with an R2 detour (D2-er, get it? Maybe not). With several hundreds of newly acquired gigs in the servers, will he risk upgrading from the so-last-decade Windows Server 2008 to the shiny new R2 edition? Especially now SP1 is out there."
In fact, now that I have plenty of room for new Hyper-V VMs it seemed like it was worth a try. As long as ADPrep doesn't screw up my Active Directory I can export the existing domain controller and other server VMs and then upgrade imported-alongside copies. If it all goes fruit-dimensional I can just dump the new VMs and fire up the old ones again. And if it does all work out OK I'll be less worried about upgrading the physical machine installations of 2008 that host the VMs.
So early on a Saturday morning I start the process. I've always dreaded running ADPrep since the time I tried to upgrade a box that started life on NT4 as an Exchange Server, was upgraded to Windows 2000 Server, and then upgraded again to Windows 2003 Server. The NT to 2000 upgrade required two days playing with ADSIEdit afterwards, and the 2000 to 2003 upgrade destroyed the domain altogether. However, this time the ADPrep 2008 to R2 upgrade ran fine on both forest and domain, so it was all looking peachy.
Have you ever wondered why things that go well are compared to peaches, while things that go wrong are pear-shaped? Especially as my wife can confirm that I will have absolutely nothing to do with hairy fruit (but that's another story).
And now I can expand the size of the VMs disks in Hyper-V Manager and then extend the volumes using the Storage Management console within each VM's O/S to get the requisite 15 GB of free space. Then bung in the DVD, cross my fingers and toes, mutter a short prayer to the god of operating system upgrades, and hit Install. Except that it says I have to stop or uninstall Active Directory Federation Services (ADFS) first.
So I go and read about upgrading ADFS. This doc on MSDN for upgrading and uninstalling ADFS goes through all the things you need to do with IIS configuration, PowerShell scripts, and editing the Registry to properly remove the standalone v 2.0 installation. But another says that the R2 upgrade will just remove it anyway. There is an ADFS Role in 2008 R2, but note that this is ADFS 1.1 not 2.0. And I never managed to make this role work anyway; probably because I didn't do all the uninstall stuff first. If you want to run ADFS 2.0 I suggest you follow the full uninstall and clean-up instructions before you upgrade to R2. Then, after you upgrade the O/S, just download and install the ADFS 2.0 setup file for 2008 R2 (make sure you select RTW\W2K8R2\amd64\AdfsSetup.exe on the download page) instead of enabling the built-in Federation Service Role.
Next, install the 72 updates for R2 (thank heavens for WSUS) and then install SP1. And then some more updates. But, finally, my primary (FSMO) domain controller was running again. And most of the 100 or so errors and warnings in Event Viewer had stopped re-occurring. Except for a couple of rather worrying ones. In particular: "The DHCP service has detected that it is running on a DC and has no credentials configured..." and "Volume Shadow Copy Service error: Unexpected error calling routine RegOpenKeyExW(-147483646,SYSTEM\CurrentControlSet\Services\VSS\Diag,[account name]). hr = 0x80070005, Access is denied".
Solving the VSS error is supposed to be easy - you can tell which account failed to access the Registry key from the message. Except that there is no account name in my error message. In this (not unknown) case, the trick with this VSS error, so they say, is to locate another error that occurred at the same time - which is usually the cause of the VSS error. In my case it seemed like it was the DHCP error, and this page on the Microsoft Support site explains how to fix it. I've never had this error before in Server 2008, but the fix they suggest seems to have cured the DHCP error.
Deleting a DHCP entry in DHCP Manager and then viewing DNS Manager shows it removes that machine from the DNS as expected, and ipconfig /renew on that machine creates a new DHCP entry that replicates to DNS. And no errors in Event Viewer, which hopefully indicates that it's working as it should. However, this hasn't so far cured the VSS error, and now there are no other errors occurring at the same time. But after some searching I found this page that explains why it's happening and says that you can ignore it.
Next I can upgrade the backup domain controller, and for some reason I don't get the same DHCP error even though it also runs DHCP (with a separate address range in case the primary server is down). Very strange... unless it was initially an AD replication issue when only one DC was running. Who knows? Though I do get the same VSS error here, confirming that it wasn't actually the DHCP problem causing it last time.
Anyway, at last I can tackle the more nerve-wracking upgrade of the base O/Ss of the machines that host the VMs. This time setup stops with a warning that I have to stop the Hyper-V service. However, this blog post from the Hyper-V team says I can just ignore this message and they are correct - it worked. The VMs fired up again afterwards OK, though the Server 2003 one did require an update to the Hyper-V Integration Services; which means you have to stop it again and add the DVD drive to it in Hyper-V Manager because you forgot to do that first...
One remaining cause of concern is the error on the primary DC that "Name resolution for the name [FQDN of its own domain] timed out after none of the configured DNS servers responded". NSLookup finds it OK, Active Directory isn't complaining, and everything seems to be working at the moment so it's on the "pending" list. A web search reveals hundreds of reports of this error, and an equally vast range of suggestions for fixing it - including buying a new router and changing all the underlying transport settings for the TCP protocol. Think I'll give that a miss for the time being.
Of course, a few more upgrade annoyances arose over the next couple of days. On the file server that is also the music server the upgrade to R2 removes the Windows Media Service role. After the upgrade you have to download the Windows Server 2008 R2 Streaming Media Service role from Microsoft and install it, then enable the role in Server Management and configure the streaming endpoints again. And, of course, it's been so long since you did this last time you can't remember what the settings were. Don't depend on the help file to be much user either.
And as with other upgrades and service packs, the R2 upgrade silently re-enables all of the network connections in the Hyper-V host machine's base O/S, so that the connections to the outside world are enabled for the machine that is typically on the internal network (see this post for details). You need to go back into the base O/S's Network Connections dialog and disable those you don't want. However, in R2 you can un-tick the Allow management operating system to share this network adapter option in Virtual Network Manager to remove these duplicated connections from the base O/S so that updates and patches applied in the future do not re-enable them.
But of much more concern was the effects of the upgrade on my web server box. After it was all complete, patched, SP'd, and running again I decided to have a quick peep at the IIS and firewall settings. Without warning the update had enabled the FTP Service (which I don't run) and set it to auto-start, then added a heap of Allow rules to the Public profile to allow FTP in and out. Plus several more to allow DCOM in for remote management. As usual, after any update, remember to check your configuration for unexpected changes. If you don't need the FTP service, remove it as a Feature in Server Manager, which prevents it from automatically enabling the firewall rules.
And a day or so later I discovered that the R2 upgrade also set the SMTP service to Manual start as well, so the websites and WSUS could no longer send email. The service started OK and so I set it to Automatic start and thought no more about it until WSUS began reporting three or four times a day that it was unable to send email. Yet testing it in the WSUS Email Options dialog reveals that it can send email. So I added the configuration settings in the IIS7 Manager for SMTP (even though I never had to do this before), and it made no difference. Every day I get an email from WSUS with the all the newly downloaded updates listed, and three Event Log messages saying it can't send email. Perhaps next week it will start sending me emails to tell me it can't send email...
Finally, by late Monday evening, everything was up and running again. OK, there are still a couple of Event Log errors and warnings to track down and fix, but mostly it all seems to be working. And, I guess, the whole process was a lot less painless than I'd expected. O/S upgrades have certainly improved over the years, and I have to say that the server guys really did an excellent job with this one. It was certainly worth it just to be able to run the latest roles, and - at least so far - I even have proper working mouse pointers in all the VMs!
What I did notice is how, for a short period post upgrade, life seems a lot more exciting. Well, at least the server-related segments of my day do. Each reboot is accompanied by that wonderful sense of anticipation: Have I broken it? Will it restart? Will I get some exciting new errors and warnings?
It's as though the new O/S is a bit delicate and you need to handle it gently for a while. Like when you've just glued the handle back on your wife's favorite mug you broke when doing the dishes, and you're not sure if it will all just fall to pieces again. Until you're really convinced it's settled down you don't want to click too quickly, or wave the mouse about too much. Or open too many applications at one go in case it gets annoyed, or just can't cope until it's finally unpacked its suitcase and settled in.
Or maybe I really do need to get a life...
Oh dear. Here in this desolate and forgotten outpost of the p&p empire it's pretend-to-be-a-sysadmin time all over again. Daily event viewer errors about the servers running out of disk space and shadow copies failing (mainly because I had to disable them due to lack of disk space) are gradually driving me crazy. Will I finally have to abandon my prized collection of Shaun The Sheep videos, or risk my life by deleting my wife's beloved archive of Motown music? And, worse still, can I face losing all those TV recordings of wonderful classic rock and punk concerts? Or maybe (warning: bad pun approaching) I just need to find some extra GIGs to store the gigs.
Yep, I finally decided it was time to bite the bullet and add some extra storage to the two main servers that run my network and, effectively, my life. Surprisingly, my two rather diminutive Dell T100 servers each had an empty drive bay and a spare SATA port available, though I'll admit I had to phone a friend and email him some photos of the innards to confirm this. And he managed to guide me into selecting a suitable model of drive and cable that had a reasonable chance of working. The drives even fitted into the spare bays with some cheap brackets I had the forethought to order. Of course, it was absolutely no surprise when Windows blithely took no notice of them after a reboot. I never really expect my computer upgrades to actually work. But at least the extra heat from them will help to stop the servers freezing up during next winter's ice-age.
However, after poking around in the BIOS and discovering that I needed to enable the SATA port, everything suddenly sprang into life. For less than fifty of our increasingly worthless English pounds each server now has 320 brand new gigs available - doubling the previous disk space. Amazing. And after some reshuffling of data, and managing to persuade WSUS to still work on a different drive, I was up and running again.
Mind you, setting the appropriate security permissions and creating the shares for drives and folders was an interesting experience. One tip if you want to know how many user-configured shares there are on a drive is to open the Shadow Copies tab of the Properties dialog for a drive. It doesn't tell you where they are, but just type net share into a command window to get a list that includes the path - though it includes all the system shares as well. And if you intend to change the drive letter, do it before you create the shares. If not they disappear from Windows Explorer, but continue to live as hidden shares pointing to the old drive letter. You have to create new shares with the same name and the required path, and accept the warning message about overwriting the existing ones.
And now I can move a couple of the Hyper-V VMs to a different drive as well, instead of having all four on one physical drive. Maybe then it won't take 20 minutes for each one to start up after the monthly patch update and reboot cycle. So, being paranoid, I check the security permissions on the existing VM drive and the new one before I start and discover that the drive root folder needs to have special permissions for the "Virtual Machines" account. So here's a challenge - try and add this account to the list in the Security tab of the Properties dialog for a drive. You'll find, as I did, that there is no such account. Not even the one named NT VIRTUAL MACHINES mentioned in a couple of blog posts. But as the MS blogs and TechNet pages say that you can just export a VM, move it, and then import it, there should be no problem. Maybe.
Of course, they also say you can use the same name for more than one VM as long as you don't reuse the existing VM ID (un-tick this option in the Import dialog). Or you can use the same ID if you don't intend to keep the original VM. Obviously I can't run both at the same time anyway as they have the same DNS name and SIDs. So should I export the VM to the new drive, remove it from Hyper-V Manager, and then import it with the same ID? Or import it alongside the original one in Hyper-V Manager but allow it to create a new ID and then delete the old one when I find out if it works?
As the VM in question is my main domain controller and schema master, I'm fairly keen not to destroy it. In the end I crossed all my fingers and toes and let it create a new ID. And, despite my fears, it just worked. The newly imported VM fired up and ran fine, even though there are two in Hyper-V Manager with the same name (to identify which is which, you can open the Settings dialog and check the path of the hard disk used by each VM). And the export\import process adds the required GUID-named account permission to the virtual disk file automatically (though not to the drive itself, but it seems to work fine without).
What's worrying is how I never really expect things like this to just work. Maybe it's something to do with the aggravation suffered over the years fighting with NT4 and Windows 2000 Server, and the associated Active Directory and Exchange Server hassles I encountered then. I really must be paranoid about this stuff because I even insist on installing my Windows Updates each month manually rather than just letting the boxes get on with it themselves. So it was nice to see that Hyper-V continues to live up to its promise, and I'm feeling even more secure that my backup strategy of regularly exporting the machines and keeping multiple copies scattered round the network will work when something does blow up.
So anyway, having gained all the new gigs I need, should I finally risk my sanity altogether and upgrade the servers and Hyper-V VMs from Windows Server 2008 to Windows Server 2008 R2? I abandoned that idea last time because I didn't have the required 15 or so gigs of spare disk space for each one. But it seemed like as good a time as any to have another go at testing my reasonably non-existent sysadmin capabilities. Maybe I would even get properly working mouse pointers in the VMs with R2 installed.
So as they say in all the best TV shows (and some of the very dire ones), "Tune in next week to see if Alex managed to destroy his network by upgrading it to Windows Server 2008 R2..."