Random Disconnected Diatribes of a p&p Documentation Engineer
I watched a program on TV the other night about how your body clock works. It seems that when you are young, your body clock is "offset late" so you are useless in the mornings and tend to be a bit of a night owl. I guess this is useful so you can go to those all-night parties and clubs. When you get old your body clock is "offset early", so you have to go to bed at 6:30 PM and get up in time to watch breakfast TV and those weird quiz shows that nobody has heard of. I suppose this means that there are only a couple of weeks around the age of 35 when your life is actually aligned with the world around you. That's going to be my excuse in future, anyway.
And it seems that all this is the result of strict scientific investigation, and not just some university student making stuff up for his final exam dissertation. It's supposed to explain why extricating a teenager from their bed before lunchtime is about as easy as folding custard (or herding cats). In fact, there is a school in the North East of England where they are experimenting with delaying the start of lessons that require anything more than desultory half-awakeness until after 11:00 AM. Maybe this is a way to reduce traffic congestion - send kids to school for 10:30 in the morning so we can all get to work without being buried by a flock of 4x4s on the school run, and keep them there until we've had a chance to sit down after work and read the paper in peace.
They also say that "body clock research" (which surely has to be a made-up science) can predict the best time of day to have a heart attack or stroke, provide the reason why you feel tired after a beer at lunchtime, and tell you when to have sex. Now, I'm no expert, but I reckon I could figure that the best time to have a heart attack or stroke is never, the reason you feel tired is because that's what beer does, and - well - I'll refrain from comment on the remaining point.
Strange thing is that, in my advancing years, I should now be well into the "offset early" camp. According to a rough calculation on the back of a Notepad document, I should be drifting off to sleep at seventeen minutes past nine all this week. And be wide awake and furiously typing guidance and documentation by around ten to seven in the morning. I'd have to say that his doesn't bear comparison with reality. If I go to bed much before midnight I can't drop off to sleep, and I don't remember when I last saw any time prior to 8:00 AM on the bedside alarm clock. I've even tried following my wife's sage advice that "...it's about time you had an early night", but it seems to make little difference. Me and a zombie exhibit remarkably similar traits (and appearance, according to my wife) any time before about 9:00 AM and the second cup of coffee.
I put it down to the fact that I live on GMT and work on PCT (Pacific Coast Time). So being a night owl is useful because I'm generally still around in the evenings trying to catch up on work while my colleagues are yawning and scratching their way into the office. As long as it's before lunch time their time, I'm generally around to answer panic emails, ignore desperate pleas for completion of the latest important document, and attend conference meeting calls where all I can hear is distant mumblings and trans-Atlantic crosstalk on the line. On one occasion last week, I think I was in three meetings at the same time. I remember agreeing to a new wholesale price for bulk crayfish shipments, and an updated schedule for delivery of some pork bellies to Nebraska. I think we agreed on the appropriate terminology for describing presentation layer components as well, but I can’t be sure about that part.
Maybe your body clock influences your choice of employment. Or maybe it’s the other way round - your choice of career actually changes your body clock schedule. I mean, you'd have to assume that postmen (sorry, postal delivery workers) are offset early, and that night-club bouncers are offset late. So what about us in the IT world? I've noticed that the p&p office is not exactly bursting with activity at 8:00 AM, or even 9:00 AM, most days. Yet there are still plenty of people hunched over keyboards late into the evenings. Do you actually know any "offset-early" IT people?
I suspect that there is a crisis at our local council offices at the moment. They've obviously run out of things to waste taxpayer's money on, so they decided to publish a ten page full-color pamphlet containing really useful information about our local community. On page three, it says that - in case we hadn't noticed - work is underway on the open-cast coal mine just across the fields from where I live. Really? I would never have guessed that the brand new railway, dozens of huge trucks, and a hole half a mile wide and a hundred feet deep were connected with that.
Of course, there is some less-blindingly-obvious information in there as well. Like the fact that the local post office has had to close because the postmaster is in prison (we live in an exciting area); and news that in the village next to us they are going to concrete over the field where all the kids play, then spend thousands of pounds making it into a kids' play area. But what struck me most was the incredible number of misspellings and serious grammar errors in the ten pages that - in total - hold no more than about 30 complete sentences. Does nobody read this stuff before they send it out? Or is it some covert scheme to try and make everybody think our local district is run by idiots? As if we needed convincing...
Still, at least they put a nondescript photo of some unidentifiable area of countryside covered in snow on the cover to cheer us all up. Obviously they were ensuring they didn't fall into the "wrong city" trap like the council that runs the second largest city here in England did a while ago. Maybe you saw it in the papers - it even made it into the US Today newspaper (which they deliver to all hotel rooms in the US - whether you want it or not). Somebody probably asked a junior editor in the "community communication" department to search the Web for a picture of Birmingham. When the thousands of leaflets were distributed across the city, several people remarked that they never realized there were so many skyscrapers in Birmingham. Of course, there aren't. They'd put a photo of Birmingham Alabama on the cover (see http://news.bbc.co.uk/1/hi/england/west_midlands/7560392.stm and http://blog.al.com/spotnews/2008/08/birmingham_england_officials_c.html).
Anyway, coming back to the topic of this post (spelling and grammar in case you've forgotten), maybe it's the fact that I work with words and documents that means I tend to spot mistakes, and that they annoy me so much. But the best ones are often amusing in a silly kind of way. For example, I got very nervous reading about how you create Office Business Applications (OBAs) when I found a note in the documentation about how they are really useful "...when you have a rage of documents to handle". Now I have to keep checking if there are any angry spreadsheets on my computer, and I wonder if my virus checker will detect furious Word documents. Maybe there is an irateness rating for emails that my spam filter can use? On a scale of one to ten, move anything over 4.5 into the Junk Mail folder.
I also came across an article by somebody who writes data access code the same way as I do - just gather together some keywords that sound like they might be appropriate, add a few randomly named variables, and mix it all up until it does something useful. At least that's what I assumed they meant when they said that "...the best approach is to use a stired procedure." But I reckon the best of all was the article that described how "Exception management and logging are often not sufficient in enterprise applications, and you should consider complimenting them with notifications". I tried this - but after half an hour, I ran out of accolades and flattering remarks without seeming to achieve any positive effect on the application.
I'm starting to worry that I can’t cope with the frantic releases of operating system versions. I just got settled with a couple of Vista machines and, more recently, two Server 2008 boxes, and now I'm being pushed to "dogfood" Windows 7. I wonder if I should install it on the machine I use for all my important work, or on the laptop I depend on when travelling. I know I tend to be somewhat conservative in terms of upgrading to the latest cool software, but neither of these options seems like a really good idea with a beta operating system.
I guess it's OK if you work onsite - you can just throw the machine at the local systems admin guy if it toasts itself, and pick up another from the stores in the meantime. Mind you, I've seen a few of my colleagues using it and they seem happy enough that it does what it says on the Start button. Maybe I should install it in a VPC, or on a machine I don't use for anything important. But what good would that be, unless I also install all the tools and software I use every day, configure everything to work like it needs to, and then put up with using a machine that I had sidelined because it was too old or slow to be practicable? So I'm probably not much help as a dogfooder.
Hmmm... I wanted to write "dogfeeder" there, but that sounds like someone who works in a kennels. Perhaps it should be "...not much help with dogfooding" (but not "dogfeeding"). Is there a conjugation (or declension) for the verb "dogfood"? Something like "I dogfood, you dogfood, he dogfeeds, we dogfed". Probably it’s the same as the one for the verb "impact". And people accuse me of making up words...
Anyway, I'm not sure yet I've even got the knack of the "version 6" stuff. One of my Vista laptops decided to display two account icons on the startup screen when I changed my account password last time, and I haven't managed - despite a great deal of poking and swearing - to get rid of either of them. When it boots, it immediately displays the message "Incorrect password", but then logs in without prompting for a password when I click either of the icons. Despite endless fiddling with account management dialogs and saved password configurations, I can’t resolve it. Maybe I should install Windows 7 on this machine just out of spite.
Though I did solve (partially) a Windows Server 2008 issue this week. Since I moved over to Server 2008, I've had endless problems with batch files and scheduled tasks. I have a series of batch files that use XCOPY to duplicate and mirror data around my network and onto various backup stores. Everything worked fine with Windows 2000 and Windows 2003 Server, but there seem to be some weird things with Server 2008.
For example, my XCOPY batch file uses the /m switch to copy only files with the archive attribute set, and turn off this attribute after copying. But when run from Task Scheduler (either as a timed event or started manually) it copied the files but did not clear the archive bit. The result was that every night when it ran, it copied all the files again. I tried the /d switch, but it still copied files even when they exist on the target drive. I guessed it was because I was using the NET USE command in the batch file to get access to the NAS drive (see my previous blog post "Herding Buffalo" for an explanation). So I created a new separate batch file that just contained the command to clear the archive attributes on the source files:
attrib -a d:\myfiles\backup\*.* /s
When I ran this by double-clicking on it in Explorer, it worked fine. So I created a new scheduled task to execute this batch file a couple of hours after the XCOPY batch file had finished. But, even though I set up the scheduled task to run under a domain admin account, it simply reported "Access denied" for every file. So the fact that the XCOPY command does not work properly is obviously nothing to do with NET USE, or the fact I am accessing a Linux-based NAS drive as the target for the copy. It simply doesn't have permission to update the archive bit on the source files.
Now, I kind of suspected that a domain admin account would be a good choice for doing domain administration stuff. Obviously that's not the case in Windows 2008. So I did what every amateur part-time administrator does in these circumstances - wandered across to TechNet and asked them the question of life, the universe, and why on earth I don't have permission to update files on my own computer. After some exploration, and fortunate choice of keywords (and swear words) I tracked it down to the User Access Control (UAC) feature.
I know I'm a bit vague at the best of times, but I never knew that Windows 2008 had UAC built in. I know that it's been the bane of many people's lives on Vista, but it's not something you'd expect to find lurking in a box where the guy on the keyboard is most likely to be the administrator for most of the time. And it seems that there is a new Admin Approval Mode (AAM) configuration mode as well. If you go off and read the TechNet page "User Account Control" (it's OK to pretend that you understand most of it), you will probably grasp as I did that administrator accounts in Server 2008 get two security tokens. One is a low-trust token used for most activities to help protect against malicious code having full access to the machine. The other token is a full-trust one that is used, according to TechNet, "...when the user attempts to perform an administrative task." It doesn't say when or how the system knows which token to use, or how it knows if the request is a malicious one... but I'll accept that all this just works like it should.
So maybe this is why I don't have permission to update my own files? Does Task Scheduler use the low-trust token for the administrator account I specify to run the task? Or does all this apply to just local administrator accounts and not to domain admin accounts? In fact, AAM is disabled for the built-in Administrator account in Server 2008, but not for other accounts. So Task Scheduler is probably starting all the backup scripts in low trust mode even though its using a domain admin account.
How do I get round this? I did start reading about the Group Policy settings to manage it, and the Registry entries involved, but then noticed the checkbox marked "Run with highest privileges" in the General tab of each scheduled task. I figured it couldn't do that much harm (and I have a recent backup), so I tried it. And, as you probably guessed, it solves the problem. It forces Task Scheduler to start the task using the full-trust token instead. My backup script now works like it should.
Except now I just noticed that the mouse pointer on my Hyper-V VPC has gone back to that stupid "thin up arrow" cursor again (see "Cursory Distractions" ). Oh well, I guess optimism and computers never did mix that well.
First off, I need to apologize to all those people who have been reduced to reading my previous "Hyper-Ventilating" posts hoping to find some crumb of comfort to alleviate their crippling medical condition. It seems from the analysis of Web search requests for those posts that more people are ill than are using Hyper-V. I suppose that's reasonable, and will perhaps teach me to stop using misleading (and often incomprehensible) titles for my posts. A bit like this one, I guess.
Ah, but no, this week's title is actually highly accurate! Because, despite endlessly installing and re-installing the Hyper-V integration components in my Windows XP and Windows Server 2003 virtual machines, Hyper-V seems totally incapable of displaying the appropriate cursor or mouse pointer. It always starts off OK, leading me to believe that it was just some intermittent aberration last time, and from now on I'll be able to see the pointer wherever it gaily wanders across the multiple windows and icons of the vast 800 x 600 pixel screen estate it has to play in.
And then it changes to a thin black "up arrow", or an hourglass, or even the tiny little square that's supposed to indicate that Hyper-V is letting you see the virtual machine even though you haven't clicked on it yet. But I'll say one thing - it's certainly making me learn where the "hotspot" is on each of the multitude of pointer and cursor shapes. No doubt that will be useful in some future life after the Government of the People's Republic of Europe bans all pointer shapes except "East-West-Arrow" or "No Drop". Maybe Microsoft should be planning their court appeal now, along with the one where the EU decide that MS has to take keyboard support out of the operating system in order to "level the playing field for competitors" (OK, so I made that one up).
Anyway, I know I'm not alone on this one. There are literally several people complaining on the Web about the same problem. Of course, I was quietly confident that they would fix it in this month's Patch Tuesday selection, but it seems not. Although, so far, the pointer appears to be firmly jammed in "standard arrow pointer" mode now, which isn't so bad. I can probably risk changing the desktop background back from sky blue into something that doesn't require me to wear sunglasses, and still be able to find the pointer on the screen. Before you ask, yes I did try changing the mouse pointer scheme in the virtual machine to one that's easier to see, but it's obvious after doing that (with its total lack of effect) that it is the Hyper-V runtime that's actually generating and managing the mouse pointer. So perhaps I should be grateful I actually get a pointer at all. I suppose I could try changing the mouse scheme in the base O\S instead...
And, while I'm ranting about Hyper-V, can we have some absolute guidance on how to do time synchronization please? After the aforementioned medical condition, this seems to be the second most popular search that finds my posts. Despite the fix I put in place some weeks ago, I was still getting a raft of errors in the Event Logs. Some helpful feedback from Virtual PC Guy and his colleagues suggested that there is a delay in the time synchronization from the Hyper-V base O/S, and that can cause synchronization failures against external time providers or domain controllers.
The problem I found is that, even if I remove all time provider details from domain member virtual machines, they insist on searching for a time provider and inevitably end up snuggling up to the domain controller (as you'd expect). I did wonder about disabling the Windows Time service on virtual machines and just allowing the base O\S to set the time - I reckon I can live with being half a second adrift from the rest of the world. But I have no idea if that would break anything else, and nobody seems able to tell me. And other than disabling the service, I can't see how else you can prevent it from searching for time providers. The other alternative, and the one I've now settled with, is to set up the domain controller(s) as reliable time servers and turn off the Hyper-V "VM IC Time Synchronization Provider" to prevent time synchronization from the base O\S.
Start by opening the Hyper-V Manager and select the virtual machine - it doesn't matter if it's running or not. Open the Settings dialog and click on the Integration Services section near the bottom. Then simply uncheck the setting for Time Synchronization. Now go to Configure the Windows Time Service and follow the instructions. To find an NTP server to use, check out the NTP Pool Project. A bathing costume is not required.
Before we start, I want to make it clear that - although I often use US spelling in stuff I write - I refuse to accept that "tire" is a way of spelling the round black things that you put on a car. I'm English, and tired (sorry) of seeing that weird spelling, so from here on in we'll be using the proper spelling: "tyre". And, annoyingly, Word has just red-wigglyed that now I've typed it. I guess an indication of how I have to produce most of my verbiage with Word set to US English. And this post is not even about spelling or languages. What is it about? I suppose it's kind of another grumble about technology in general. And about measuring stuff. So, if you are already in a bad mood, this might be a good place to stop reading and go off and do some yoga or listen to a Coldplay album.
It all started when my wife came home with a gleaming new set of bathroom scales (though, as there was only one of them, maybe it should be "a bathroom scale"). Like most gadgets and appliances these days, it proudly proclaims that it has a "bright and easy-to-read" digital display. And, more than that, it can tell you all about your health - things like your body mass index, water retention rate, and overall wellbeing. And probably your shoe size as well. "Amazing!" I thought, "I wonder how it knows all that".
The answer is, of course, that when you first put the batteries in, it asks you loads of questions such as your height, age, body shape (you get five choices for this one), exercise regime (four choices), and sex (you only get two choices for this one). This would be OK, but we unfortunately fell at the first hurdle. Despite wiggling the switch underneath to tell it to display in stones and pounds, it insisted my wife enter her height in centimetres. We are both of that lost generation who learned imperial measures at school, and now find ourselves cast adrift into a whole new world of metric things. It's like going in to work one day to find the management has ruled that everyone and everything will now be spoken/read/written in a foreign language. One you don't understand.
I know that a foot is around 300 millimetres, or 30 centimetres, or 0.3 metres (see how complicated it is already), and a metre is a bit over a yard. So I can do mental calculations such as converting "it's about 300 metres on the left" into roughly 325 yards or nearly 1000 feet. Of course, everyone in England blames the French for us losing our proper system of measures (it must be their fault because the letters in "meter" are the wrong way round). Just because they think it's easier than remembering that there are 12 inches in a foot, 3 feet in a yard, 5.5 yards in a perch, 4 perches in a chain, 10 chains in a furlong, and 8 furlongs in a mile. I mean, what's complicated about that?
So I used a calculator on the Web to do the conversion, and we finally got to the bit where it said "setting saved". Except that, when you stand on it, all it does for 10 seconds is display a fascinating series of flashing light trails across the "bright and easy-to-read" digital display, followed by "Error". We're nearly an hour in and still no sign of being able to find out our weights. Not even in some funny unit like kilograms. I know how to convert those into proper weights because I remember the mnemonic "One and three-quarter pounds of jam weighs about a kilogram". Except that, as someone pointed out a few weeks ago, it's not a very good memory aid because the important bit ("one and three-quarter") doesn't actually rhyme with anything. So it could just as easily be "one and a quarter", or "two and a half", or "one hundred and seventy three".
Anyway, having dispatched my wife to the store where she bought the scale to exchange it for a new one, we started the process again. At the end, when it came to the "stand on it and see what you weigh" moment, we got a different result this time. It said "Err-374". Thing is, we weren't interested in it telling us our BMI, liquidity ratio, or some meaningless number that relates to our wellbeing. Certainly, by now, my wellbeing was not what it was two hours ago. Why can't it just tell you your weight? Maybe even, and here's a shocking thought, by using a big calibrated spring with a pointer on the end that goes round a dial covered in numbers. After all, inside there's probably just a big spring connected to sensor that sends signals to the chip that converts them into numbers (or not in our case). So it's not like it's any more accurate or reliable.
This worrying advance in digital displays is not confined to bathroom scales either. I splashed out a noticeable volume of cash some while ago on a digital tyre pressure gauge because all the old-fashioned ones I have give slightly different readings, and I thought it would be a good idea to get a proper accurate and reliable one to replace them. No chance. Not only does it decide at random what units to display the result in, but consecutive readings seem to vary by 20%. And none are near to the average of my old-fashioned manual ones. The one that seems to be most accurate is the "looks like a pen and the inside pops up" kind like my Dad used to use when I was a kid. In fact, it's probably the same one.
And here we come up against another unit protest. Tyre pressures have always been measured in pounds per square inch, which seems innately sensible because the common range goes from about 30 to near 60 so it's easy to read and adjust the pressure. Get within a couple of p.s.i and its fine. But my new car handbook has all the tyre pressures in BARs. Not even in a semi-believable foreign measure like millimetres per second or kilograms per hectare. Can you imagine the people who invented this in some design meeting?
Marketing guy: "We need a new way to measure the pressure in car tyres."Engineer: "OK, how about we use a scale that starts at one and nearly goes up to three?"Accountant: "Sounds great - that will save on printing costs and we'll need less numbers on the dial."Manager: "That's a wrap, do it!"
OK, so I know that BAR is connected with atmospheric pressure, but now instead of "36 p.s.i. all round" I have to figure out where 2.18 is on the tiny dial of one of my other old-fashioned gauges that only has a marking every half a BAR. I suppose it's something to do with the European Union - most everything else is their fault. Just think how easy it would be if our global society could actually get to grips with using some standard measures and units for things. Starting, please, with time zones. I'm time shifted by 8 hours from my work colleagues, though for a couple of weeks of the year it's actually 7 hours or 9 hours because we can't even agree when daylight saving time starts and ends. Wouldn't it be easy if we all just switched to a single World time, like the Swatch Internet time that's been around for ages.
Except that we'd need to make sure it didn't change our four o'clock teatime here in England, when we all stop for cucumber sandwiches and a pot of Earl Grey.
I read in the newspaper this week that scientists have discovered why men are better at reading maps, while women are more able to find things like car keys. It seems that it all goes back to pre-history behavioral patterns and responsibilities. Men had to travel long distances hunting, and so had to be able to navigate. Women foraged locally for food, so needed a keen eye for detail. Now I don't want to appear sexist, but I have to say that, at least in our house, reality tends the match that assertion. Mind you, one guy wrote in to the paper to say that his wife was really good at map reading - as long as they were heading north.
Even more interesting, I've come to the conclusion that, for some indeterminable reason probably long lost in the mists of time, women who like cats tend to marry men who work with computers. It's kind of hard to justify this on a "back at the dawn of time" basis, or in terms of a theory based on studies of Stone Age cave paintings. Mainly, I suspect, because there was a noticeable scarcity of computers in those days. And it's probably a long shot to try and find some comparison between map reading abilities and an innate ability to absorb computer language syntax and structured architectural design patterns. Other, of course, than using a sat-nav.
Not being one to jump to wild conclusions, I have obtained solid statistical evidence of the marital preference assertion. My wife is a cat-lover, and does a lot of work raising money for our local cat sanctuary. The lady who runs the sanctuary is married to a guy who writes medical software for hospitals. One of the ladies who help to run the money-raising jumble (rummage) sales is married to a guy who works for a well-known manufacturer of routers and switches. And her friend is married to a guy who runs a business doing computerized accounting systems.
And what's really weird about the cat rule is that, when I first met my wife, I didn’t actually work with computers. I was a salesman. OK, so I was selling windows at the time, but they were ones you put into buses, trains, and office blocks - not the one I spend my days fighting with now. So obviously the theory extends to future employment prospects as well.
Perhaps this could be adapted into some kind of suitability test for prospective employees. Instead of all those long and complicated interviews, psychological profiles, and adaptability tests, you just need to ask the geek the other side of the desk how many cats they've got. It would work on a sliding scale: one moggy, suitable for general development tasks. Two Seal-point Persians, obviously a prime candidate for program manager. A British Blue and a Cornish Rex, would do well in technical support and systems administration.
Of course, what would be really useful would be to establish if there is some reason for this amazing compatibility situation. While avoiding any puerile reference to mice, what subtle traits do cats and computer programmers share that attracts a woman to both? Is it the inscrutable independence, the haughty view of the world as being there only to satisfy their whims, or the fact that - despite giving every outward appearance of being asleep for 23 hours a day - they are actually alert to everything going on around them? Or maybe it's just that both have an inbuilt sense of independence. I suspect that the term used to describe some hopeless task as being "...as easy as herding cats" might work just as well as "...as easy as herding computers". I mean, how often does your computer do exactly what you expect...?
Meanwhile, I've also discovered that women who like dogs tend to marry plumbers or policemen. OK, so I only know one plumber and one policeman, but we're getting 100% compliance to the rule here as they both have very large dogs.
Every now and then I get to write actual code rather than just documentation. Usually there's either a crowd watching in amazement that I can actually find Visual Studio, never mind knowing some of the magic keywords that make it all work when you press the green arrow button. Or else everyone is cowering behind their desk in case my computer can't cope with the culture shock and explodes. Isn't it wonderful when everyone has so much faith in your capabilities - after all, I've read the .NET Architecture Guide (endlessly, as I've been working on it for the last year) so I ought to know a bit about this stuff.
Unfortunately, as I've rambled about in previous posts (such as "How p&p Makes Cheese Sandwiches"), my programming tasks tend to involve building kludges and "temporary fix" tools to solve problems that are either too esoteric to be of any use to people outside our small documentation team, or are a stop-gap until the proper tool gets upgraded next time. OK, so I used to be a consultant and I wrote a few Web apps for customers, but I'd hesitate to publish "best practice compliance" figures for those. Especially as they were also usually kludges required to get software the company had paid big money for to work how they wanted it to.
So, anyway, last week I decided to put together a rough tool to help us find broken and incorrect links in a documentation set that builds to create a CHM or HxS file. The issue is that, although the authoring tools we use can create links between topics and within topics, you can't easily (or at all in some cases) check if these links are valid. They may point to a topic that you removed, or the target topic may have changed (so has a different auto-generated topic filename), or they may just point to the wrong topic. We do use a link checker utility to find broken links, but it can't find links that go to the wrong topic or links that point to an anchor (bookmark) in the same page that does not exist. The only way we can verify these are through manual testing ("click and read").
In theory, the process for automatically checking the links that the link checker cannot verify sounds relatively simple. Every topic page is an HTML file generated by the documentation tools from the source Word documents. The text of a link that points to a separate topic should have the same text as the title of that topic. And a topic containing an in-page link should contain an anchor that matches the bit after the "#" in the link href. So it's just a matter of applying some processing to each HTML file to verify these rules. I could do it by reading the pages one by one using MSXML, or just open them as text files and read the source that way.
I chose the second approach for no better reason that it seemed easier and quicker to build, and because I already had most of the code from another tool that did similar stuff to update various sections of the source files (such as inserting feedback links and index entries). All it involves is some judicious string handling, searching, and text comparisons. Of course, it got more complex as time went on because I found I needed to allow for optional settings such as allowing the case of titles to differ, and ignoring leading and trailing spaces in topic titles. But it was fairly easy to stir in a mixed selection of semi-appropriate keywords and variables, and bring the whole lot slowly to the boil.
Did I use agile development methods, as I know I should (especially after working with the p&p dev teams for so long)? Well, if you can call throwing it together and fixing broken bits afterwards "agile", then yes. But, in reality, no. I should have started by writing tests, but the tool simply reads the files and dumps the results out as a log file so how do I write tests for that? Probably I should have divided the task into a series of separate routines for each minor action, and written tests for each one, but that seemed overkill for such a simple project. I did all the testing as I went along by adding errors into a set of existing source docs and then checking that each one was detected as I added features to the code.
After it all seemed to be working, confirmed by fairly comprehensive testing by my advisory panel and beta test team (Hi, Nelly), I decided I should do the agile "refactor" thing. Except there was just one chunk of code that got used more than once (twice in fact), and it was only three lines. Perhaps I could write some generalized code routine with half a dozen parameters for some of the functionality, and call it from the three or four places in the code that seemed appropriate. Would that be more efficient than repeating half a dozen lines of code in four places? It would be easier to maintain, but probably take time to code, fix, and retest everything again.
Next, I did the "make it more efficient" thing. One thing the code needs to do for each inter-topic link it finds in every page is extract the title from the linked page (if it exists) to see if it matches the text in the link. First time round, I had a routine that opened the file, located the <title> element, and returned the contents. But this meant I was opening every topic multiple times; for example, every page contains a link to the contents topic so it gets opened and read for every page in the doc set.
Obviously this is hugely inefficient, even if .NET caches the file contents. So, I simply added code to save each title it found in a Dictionary using the file name as the key, and then look in the Dictionary before reading the disk file. That way the code only opens and reads each topic page once. Makes sense, but when I ran the code again it made almost no difference. I did some comparative tests, and the reduction in time taken to process the doc set was averaging somewhere around 5%. And as the tool, running on a fairly average desktop machine, can process a doc set containing 465 HTML topic files in less than 5 seconds, how much time and effort should I put into fine tuning the code? Especially as you only run the tool a few times at the end of a project after the docs are complete and you are ready to build into a CHM or HxS...
Don't get me wrong, I'm not in any way suggesting you should ignore the principles of coding best practice, agile development and pre-written tests, and proper pre-release validation. But, sometimes, just getting stuff done by relying on the power of modern machines and software environments does seem appropriate. Though I guess I'm not likely to get any jobs building "proper" enterprise applications now I've 'fessed up...
I'm a great believer in the future of "cloud" computing. It seems to be the way forward for both large and small organizations to maximize return on investment and reduce the complexity of managing their own hardware. Not that I'm one to talk about simplifying technology requirements after the past three weeks of virtual notworkingness with new servers, Windows 2008, and Hyper-V (though, to be fair, it eventually evolved into mainly-workingness-with-odd-broken-bits). One thing it has exposed me to, however, is some of the problems that seem to be gathering in the cloud.
Like an increasing number of people, my shopping regularly involves a Web browser and debit card, rather than traffic queues and the search for a car park space. I'd like to think I get a better deal on prices as well, though that's not my primary motivation. And I tend to use companies that I know and respect, rather than chancing my luck with some fly-by-night I've never heard of. Though, thankfully, when I do have to go outside my usual comfort zone, obviously taking appropriate precautions, it's mostly been a problem-free experience (try buying inline Ethernet surge protectors from your "regular supplier" to see what I mean).
I'm one of the many millions of Web-based shoppers who recognize Amazon as a reliable and trustworthy supplier, and I regularly use our local (.co.uk) site to order a variety of stuff I need (or just want). Yes, music, films, books, the usual things; plus, increasingly, electronics and computer-related stuff such as cables and switch boxes. OK, so when I ordered two APC UPSs a while ago they "lost them in transit" and ended up refunding the money, but it was all pretty efficient and painless.
So what suddenly changed? Or has it been a gradual process that only recently rubbed sufficiently to be annoying? The problem seems to be that they are now a "store front" as well as a supplier. When you find something you need, especially stuff other than books, films, and music, it invariably comes from an "associate" that you've never heard of. In some ways, I applaud that. They're providing a great opportunity for small companies who would never be able to build an effective Web presence otherwise. But, in other ways, I wonder if it is damaging their core business. It has certainly changed my behavior.
Let me elaborate. The parcel delivered against one recent order contained, not the electric radiant heater ordered and confirmed, but 1000 empty DVD library cases. Instead of talking to Amazon (if you can actually find an email address or posting facility) I have to directly contact a supplier I've never heard of. In another case, a faulty mobile phone that was my wife's Christmas present had to go away somewhere (I've no idea where) to "be examined". OK, so both purchases were sorted out after a couple of weeks, but I'd have preferred to deal with somebody I know and trust (such as Amazon) rather than having to look at "ratings" and decide if I want to trust some other firm. The redeeming feature is that, I guess, you can go back to Amazon if it all goes pear-shaped.
But the final straw was trying to buy a couple of USB cables, a USB extension cable, and two UPS power extension cords to finish off the network upgrades that have generally blighted my festive season. I found what I wanted easily enough, and was impressed by the prices. But, reaching the checkout, I discovered that the five cables were coming from three different suppliers - each one charging postage and packing. And one of them was trying to charge 18 pounds for post and packing on standard 3-5 days delivery - on an order consisting of two USB cables costing around 3 pounds each! In total, for goods to the value of 17 pounds, I was expected to pay more than 26 pounds post and packing...
Instead, I went back to the main site and searched for products by specifying the name of one of the suppliers (not the 18 pounds delivery one) figuring I'd order all the stuff from one supplier. But examining the items in the list revealed that they were still all from different suppliers. Maybe each supplier puts their competitors' names in the search field to improve the number of hits? After about 40 minutes, I gave up and ordered the whole lot from one of my other regular suppliers (Dabs.com) who have never failed me yet - though I'm touching a large chuck of wood as I write this.
So what's gone wrong with the "cloud" approach in this particular scenario? Thing is, if I want to deal with people I've never heard of who work out of a back bedroom, I can use EBay. Have Amazon damaged their brand by allowing suppliers to hide within their product lists, and by not providing enough interaction in terms of getting support or actually submitting a comment? Or are they bravely promoting the concepts of the cloud and providing opportunities to small suppliers who would otherwise struggle to reach market?
I eventually ended up on some "Your Account" feedback page where I complained about the post and packing cost thing, but I have no idea where my feedback went, or if I was wasting my time. And, strangely enough, the next day I was talking to a friend who I know is an active Web shopper and told them about my experiences. And their response? "Oh yes, I know what you mean, that's why I only ever use Amazon for books and CDs these days..." Maybe this is an issue that new partakers of cloud-based services need to actively address. I can appreciate that part of the ideal of Web trading is to get rid of the need to handle emails and phone calls, but I reckon most people still value the capability to buy from someone they know, and actually talk to somebody when the need arises, or at least get some prompt response and a solution - without having to jump through hoops just to submit it.