Random Disconnected Diatribes of a p&p Documentation Engineer
I don't know if General Custer ever made a last stand against the Apache, but I feel like I have. My Apache is, of course, the Hadoop one. Or, to be technically accurate, Microsoft Azure HDInsight. And, going on experience so far, this is unlikely to actually be the last time I do it.
After six months of last year, and about the same this year, it seems like I've got stuck in some Big Data related cluster of my own. We produced a guide for planning and implementing HDInsight solutions last year, but it's so far out of date now that we might as well have been writing about custard rather than clusters. However we have finally managed to hit the streets with the updated version of the guide before HDInsight changes too much more (yes, I do suffer odd bouts of optimism).
What's become clear, however, is how much HDInsight is different from the typical Hadoop deployment. Yes, it's Hadoop inside (the Hortonworks version), but that's like saying battleships and HTML are the same because they both have anchors. Or cats and dogs are the same because they both have noses (you can probably see that I'm struggling for a metaphor here).
HDInsight stores all its data in Azure blob storage, which seems odd at first because the whole philosophy of Hadoop is distributed and replicated data storage. But when you come to examine the use cases and possibilities, all kinds of interesting opportunities appear. For example, you can kill off a cluster and leave the data in blob storage, then create a new cluster over the same data. If you specify a SQL Database instance to hold the metadata (the Hive and HCatalog definitions and other stuff) when you create the cluster, it remains after the cluster is deleted and you can create a new cluster that uses the same metadata. Perhaps they should have called in Phoenix instead.
We demonstrate just this scenario in our guide as a way to create an on-demand data warehouse that you can fire up when you need it, and shut down when you don't, to save running costs. And the nice thing is that you can still upload new data, or download the existing data, by accessing the Azure blob store directly. Of course, if you want to get the data out as Hive tables using ODBC you'll need to have the cluster running, but if you only need it once a month to run reports you can kill off the cluster in between.
But, more than that, you can use multiple storage accounts and containers to hold the data, and create a cluster over any combination of these. So you can have multiple versions of your data, and just fire up a cluster over the bits you want to process. Or have separate staging and production accounts for the data. Or create multiple containers and drip-feed data arriving as a stream into them, then create a cluster over some or all of them only when you need to process the data. Maybe use this technique to isolate different parts of the data from each other, or to separate the data into categories so that different users can access and query only the appropriate parts.
You can even fire up a cluster over somebody else's storage account as long as you have the storage name and key, so you could offer a Big Data analysis service to your customers. They create a storage account, prepare and upload their source data, and - when they are ready - you process it and put the results back in their storage account. Maybe I just invented a new market sector! If you exploit it and make a fortune, feel free to send me a few million dollars...
Read the guide at http://msdn.microsoft.com/en-us/library/dn749874.aspx
Probably there's not many people who can remember when TVs had just six buttons and a volume knob. You simply tuned each of the buttons to one of the five available channels (which were helpfully numbered 1 to 5), hopefully in the correct order so you knew which channel you were watching, and tuned the sixth button to the output from your Betamax videocassette recorder.
As long as the aerial tied to your chimney wasn't blown down by the wind, or struck by lightning, that was it. You were partaking in the peak of technical media broadcasting advancement. Years, if not decades, could pass and you never had to change anything. It all just worked.
And then we went digital. Now I can get 594 channels on terrestrial FreeView and satellite-delivered FreeSat. Even more if I chose to pay for a Sky or Virgin Media TV package. Yet all I seem to have gained is more hassle. And, looking back at our viewing habits over the previous few weeks, pretty much all of the programs we watch are on the original five channels!
Of course, the list of channels includes many duplicates, with the current fascination for "+1" channels where it's the same schedule but an hour later (which is fun when you watch a live program like "News At Ten" that's on at 11:00 o'clock). Channel 5 even has a "+24" channel now, so you can watch yesterday's programs today. A breakthrough in entertainment provision, which may even be useful for the 1% of the population that doesn't have a video recorder. How long will it be before we get "+168" channels so you can watch last week's episode that you missed?
What's really annoying, however, is that I've chosen to fully partake in the modern technological "now" by using Media Center. Our new Mamba box (see Snakin' All Over) is amazing in that it happily tunes all the available FreeView and FreeSat channels and, if what it says it did last night is actually true, it can record three channels at the same time while you are watching a recorded program. I was convinced that it's not supposed to do more than two.
However, it also seems to have issues with starting recordings, and with losing channels or suddenly gaining extra copies of existing channels. For some reason this week we had three BBC1 channels in the guide, but ITV1 was blank. Another wasted half an hour fiddling with the channel list put that right, but why does it keep happening? I can only assume that the channel and schedule lists Media Center downloads every day contain something that fires off a channel update process. And helpfully sets all the new ones (or ones where the name changed slightly) to "selected" so that they appear in the guide's channel list. I suppose if it didn't pre-select them, you wouldn't know they had changed.
Talking with the ever-helpful Glen at QuitePC.com, who supplied the machine, was also very illuminating. Media Center is clever in that it combines the multiple digital signals for the same channel into one (you can see them in the Edit Sources list when you edit a channel) and he suggested editing the list to make sure the first ones were those with the best signal so that Media Center would not need to scan through them all when changing channels to start a recording.
Glen also suggested using the website King Of Sat to check or modify the frequencies when channels move.
This makes sense because Media Center does seem to take a few seconds to change channels. Probably it times out too quite quickly when it doesn't detect a signal, pops up the warning box on screen, and then tries the other tuner on the same card. Which works, maybe because the card is now responding, and the program gets recorded. But when I checked yesterday for a channel where this happens, there is only one source in the Edit Sources list and it's showing "100%" signal strength.
And a channel that had worked fine all last week just came up as "No signal" yesterday. Yet looking in the Edit Sources list, the single source was shown as "100%". Today it's working again. Is this what we expected from the promise of a brave new digital future in broadcasting? I'm already limited to using Internet Radio because the DAB and FM signals are so poor here. How long will it be before I can get TV only over the Internet?
Mind you, Media Center itself can be really annoying sometimes. Yes it's a great system that generally works very well overall, and has some very clever features. But, during the "lost channel" episode this week, I tried to modify a manual recording by changing the channel number to a different one. It was set to use channel 913 (satellite ITV1) but I wanted to change it to use channel 1 (terrestrial ITV1). Yet all I got every time was the error message "You must choose a valid channel number." As channel 1 is in the guide and works fine, I can't see why it's invalid. Maybe because it uses a different tuner card, and the system checks only the channel list for the current tuner card?
It does seem that software in general often doesn't always get completely tested in a real working environment. For example, I use Word all the time and - for an incredibly complex piece of software - it does what I expect and works fine. Yet, when I come to save a document the first time onto the network server, I'm faced with an unresponsive Save dialog for up to 20 seconds. It seems that it's looking for existing Word docs so it can show me a list, which is OK if it was on the local machine or there were only a few folders and docs to scan. But there are many hundreds on the network server, so it takes ages.
Perhaps, because I use software like this all day, I just expect too much. Maybe there is no such thing as perfect software...
Reading in the newspaper this week about the technological advances in political campaigning set my mind wondering about whether there is an ethics/success trade-off in most areas of work, as well as in life generally.
I don't mean cheating in order to win; it's more about how you balance what you do, with what you think people want you to do. The article I was reading focused on the area of national politics. Technologies that we in the IT world are familiar with are increasingly being used to determine the "mood of the people" and to target susceptible voters. In the U.S. they already use Big Data techniques to profile the population and to analyze sectors for specific actions. The same is happening here in Britain.
What I can't help wondering is whether this spells the end to true political conviction. If, as a party, you firmly believe that policy A is an absolutely necessary requirement for the country, and will provide the best future for the people, what happens when your data analysis reveals that it's not likely to be as popular as policy B? Do you try to adapt policy A to match the results from the data and sound like policy B, abandon it altogether in favour of policy C that is even more popular, or carry on regardless and hope that people will finally realize policy A is the best way to go?
Some of the greatest politicians of the past worked from a basis of pure conviction, and many achieved changes for the better. Some pushed on regardless and failed. Does the ability to get accurate feedback on the perceived desires of the population, or of specific and increasingly narrowly defined sectors, reduce the conviction that has always been at the heart of real politicians? Perhaps now, instead of relying on the experts that govern us to make a real difference to our lives, we just get the policies we deserve because we all just want what's best for each of us today - and politicians can discover what that is.
There's an ongoing discussion that the same is true of many large companies and organizations. They call it "short-termism" because public companies have to focus on what will look good in the next quarter's results in order to keep shareholders happy, rather than being able to take the long view and maximize success through long term changes. Even though governments generally get a longer term, such as five years, the same applies because it's pretty much impossible to make real changes in politics in such a short space of time.
Of course, there are some organizations where you don't need to worry about public opinion. In private companies you can, in theory, do all the long term planning you need because you have no shareholders to please. You just need to be able to stay in business as you plan and change for the future. In extreme cases, such as here in the European Union, you don't even need to worry what the public thinks. The central masters of the project can just do whatever they feel is right for the Union, and nobody gets to influence the decisions. Maybe the EU, and other non-democratic regions of the world, are the only place where the politics of conviction still apply.
So how does all this relate to our world of technology? As I read the article it seemed as though it was a similar situation to that we have in creating guidance and documentation for our products and services. Traditionally, the process of creating documentation for a software product revolved about explaining the features of the product. In many cases, this simply meant explaining what each of the menu options does, and how you use that feature.
I've recently installed a 4-channel DVR to monitor four bird nest boxes, and the instructions for the DVR follow just this pattern. There are over 100 pages that patiently explain every option in the multiple menus for setting up and using it, yet nowhere does it answer some obvious questions such as "do I need to enable alarms to make motion detection work?", "why is the hard disk light flashing when it's not recording anything?", and "why are there four video inputs but only two audio inputs?" And that's just the first three of the unanswered questions.
Over the years, we've learned to write documentation that is more focused on the customer's point of view instead. We start with scenarios for using the product, and develop these into procedures for achieving the most common tasks. Along the way we use examples and background information to try to help users understand the product. But, in many cases, the scenarios themselves come from our best guesses at what the user needs to know, and how they will use the product. It's still very much built from our opinions and a conviction that we know what the customer needs to know, rather than being based on what they tell us they actually want to know.
However, more recently, even this has started to change. The current thinking is that we should answer the questions users are asking now, rather than telling them what we think they need to know. It's become a data gathering exercise, and we use the data to maximize the impact we have by targeting effort at the most popular requirements. In most IT sectors and organizations, fast and flexible responsiveness is replacing principles and conviction.
Is it a good thing? I have to say that I'm not entirely persuaded so far. Perhaps, with the rate of change in modern service-based software and marketplace-delivered apps, this is the only way to go forward. Yet I can't help wondering if it just introduces randomness, which can dilute the structured approach to guidance that helps users get the most from the product.
Maybe if I could get a manual for my new DVR that answers my questions, I would be more convinced...
So there's another New Year on the horizon and it's time to make some resolutions that will hopefully last for at least a few weeks into January. But at the moment I can think only one: find a new Internet provider.
As previously documented in these pages, I really do try hard to deal with my business cable broadband provider. But they seem to try even harder to make it difficult. I guess the only saving grace is that, on average so far, I've only had to actually contact them once every four years.
The trials and tribulations of it taking four months to get my account set up initially are long forgotten (except as an anecdote for long winter evening when geeks gather around a hot router discussing technology). And even the seven weeks waiting for an upgrade that simply involved changing the modem to a different model (where I did most of the configuration myself) are gradually fading into distant memory.
Of course, I joked at the time that it would probably take another four months to get the invoicing right, though after intervention from the local office manager it seemed for a while that I was being unduly pessimistic. After only a month, I had a correct invoice for the upgraded package. Amazing.
What I didn't realize was that I was still being billed for the old package as well. It was only when I checked the welter of paperwork dropping through the letterbox in more detail that I discovered two invoices with the same number. That's when I found that an "upgrade" is really a "brand new customer".
Yep, despite the difficulties in actually getting a line installed at all, or a modem replaced, I am now the proud owner of two different accounts - and I get the privilege of paying for both. I'm confidently expecting to be told there is a charge to have the old account closed, and a waiting list of five weeks to do so. Perhaps they'll send an engineer round again to check if I have two cables coming into the house.
It makes me laugh when I hear people say they will never deal with our ex-monopoly British Telecom ISP because they are "a pain in the neck" and "useless". BT are my secondary supplier and I cannot fault their service, be it technical or paperwork-related. The only problem is that their promised roll-out of high-speed fibre seems to have stalled before it got as far as me. I'd switch over to them tomorrow if I could get more than 1.5 Mbs.
Though, based on experience, I'll probably have half a dozen Virgin Cable connections by the time BT find a bit of fibre long enough to reach the cabinet on our street. It seems it's rather like Hotel California. You can cancel, but you can never leave…
UPDATE: According to the BT website today, the availability of its "Infinity" high speed upgrade that was due last September, morphed into October, drifted quietly into November, and was finally promised to be definitely here in December, is now advertised as "between January and March". Yet they still keep phoning me to ask why I haven't yet signed up for their broadband TV package.
After spending part of the seasonal holiday break reorganizing my network and removing ISA Server, this week's task was reviewing the result to see if it fixed the problems, or if it just introduced more. And assessing what impact it has on the security and resilience of the network as a whole.
I always liked the fact that ISA Server sat between my internal domain network and the different subnet that hosted the router and modems. It felt like a warm blanket that would protect the internal servers and clients from anything nasty that crept in through the modems, and prevent anything untoward from escaping out onto the ‘Net.
The new configuration should, however, do much the same. OK, so the load-balancing router is now on the internal subnet, but its firewall contains all the outbound rules that were in ISA Server so nothing untoward should be leaking out through some nefarious open port. And all incoming requests are blocked. Beyond the router are two different subnets connecting it to the ADSL and cable modems, and both of those have their firewalls set to block all incoming packets. So I effectively have a perimeter network (we're not allowed to call it a DMZ any more) as well.
But there's no doubt that ISA Server does a lot more clever stuff than my router firewall. For example, it would occasionally tell me that a specific client had more than the safe number of concurrent connections open when I went on a mad spree of opening lots of new tabs in IE.
ISA Server also contained a custom deny rule for a set of domains that were identified as being doubtful or dangerous, using lists I downloaded from a malware domains service that I subscribe to. I can't easily replicate this in the router's firewall, so another solution was required. Which meant investigating some blocking solution that could be applied to the entire network.
Here in Britain, out deeply untechnical Government has responded to media-generated panic around the evils of the Internet by mandating that all ISPs introduce filtering for all subscribers. What would be really useful would be a system that blocked both illegal and malicious sites and content. Something like this could go a long way towards reducing the impact of viruses and Trojan propagation, and make the Web safer for everyone. But, of course, that doesn't get votes.
Instead, we have a half-baked scheme that is supposed to block "inappropriate content" to "protect children and vulnerable adults". That's a great idea, though some experts consider it to be totally unworkable. But it's better than nothing, I guess, even if nobody seems to know exactly what will be blocked. I asked my ISPs for more details of (a) how it worked – is it a safe DNS mechanism or URL filtering, or both; and (b) if it will block known phishing sites and sites containing malware.
The answer to both questions was, as you'd probably expect, "no comment". They either don't know, can't tell me (or they'd have to kill me), or won't reveal details in order to maintain the integrity of the mechanism. I suspect that they know it won't really be effective, especially against malware, and they're just doing it because not doing do would look bad.
So the next stage was to investigate the "safe DNS services" that are available on the ‘Net. Some companies that focus on identifying malicious sites offer DNS lookup services that automatically redirect requests for dangerous sites to a default "blocked" URL by returning a replacement IP address. The idea is that you simply point your own DNS to their DNS servers and you get a layer of protection against client computers accessing dangerous sites.
Previously I've used the DNS servers exposed by my ISPs, or public ones such as those exposed by Google and OpenNIC, which don't seem to do any of this clever stuff. But of the several safe DNS services I explored, some were less than ideal. At one of them the secondary DNS server was offline or failed. At another, every DNS lookup took five seconds. In the end the two candidates I identified were Norton ConnectSafe and OpenDNS. Both require sign-up, but as far as I can tell are free. In fact, you can see the DNS server addresses even without signing up.
Playing with nslookup against these DNS servers revealed that they seem fast and efficient. OpenDNS says it blocks malware and phishing sites, whereas Norton ConnectSafe has separate DNS server pairs for different levels of filtering. However, ConnectSafe seems to be in some transitional state between v1 and v2 at the moment, with conflicting messages when you try to test your setup. And neither it nor the OpenDNS test page showed that filtering was enabled, though the OpenDNS site contains some example URLs you can use to test that their DNS filtering is working.
The other issue I found with ConnectSafe is that the DNS Forwarders tab in Windows Server DNS Manager can't resolve their name servers (though they seem to work OK afterwards), whereas the OpenDNS servers can be resolved. Not that this should make any difference to the way DNS lookups work, but it was annoying enough to make me choose OpenDNS. Though I guess I could include both sets as Forwarders. It's likely that both of them keep their malware lists more up to date than I ever did.
So now I've removed all but the OpenDNS ones from my DNS Forwarders list for the time being while I see how well it works. Of course, what's actually going on is something equivalent to DNS poisoning, where the browser shows the URL you expect but you end up on a different site. But (hopefully) their redirection is done in a good way. I did read reports on the Web of these services hijacking Google searches and displaying annoying popups, but I'm not convinced that a reputable service would do that. Though I will be doubly vigilant for strange behaviour now.
Though I guess, at some point, you just have to trust somebody...
I've been watching the BBC program Stargazing on TV this week, and I have to say that they did a much better job this year than last. As well as stunning live views of the Northern Lights over three nights, and interviews with some ex-astronauts as well as a lady from the Cassini imaging team, viewers discovered a previously unknown galaxy. I guess that's what you call an interactive experience.
I tend to be something of an armchair stargazer. I watch all the astronomy documentaries, and never miss "The Sky at Night" - thankfully the BBC changed their mind about dropping it after Patrick Moore passed away. We do have a telescope, but it seems to rarely see the light of day (or, more accurately, the dark of night). It's rather like the guitar sitting forlornly in the corner of the study. Both are waiting for me to retire so I have endless hours of free time.
Of course, astronomy is like most physical sciences. Some things you can easily accept, such as the description and facts about our own solar system. Though looking at photos of the surface of another planet is a little un-nerving, and it requires a stretch of the imagination to accept that you're not looking at a film set in Hollywood or a piece of the Mojave desert. And the fact that they say they can tell what the weather is like on some distant Earth-like planet in a far-off galaxy seems to be stretching the bounds of possibility.
They also had to mention the old "where's all the missing stuff" question again. Not only do we not know where 95% of the mass of every galaxy (including our own) is, but we have no idea what the dark matter that they use to describe this missing stuff actually is. Though there was an interesting discussion with the Gaia team, who reckon they can map it. We still won't know what it is, but at least we'll know where the largest lumps are.
One exciting segment of the final program was where viewers who were taking part in an exercise to find lensing galaxies, which can help to locate far more distant galaxies, came up with a really interesting hit. So much so that they immediately retargeted the Jodrell Bank and several other telescopes around the world at it. We wait the results with bated breath, including the name - which is currently open to suggestions.
But it's when they start talking about how you are seeing distant galaxies as they were several million, or even several billion, years ago that it gets a bit uncomfortable. Especially how the limit to our discoveries is stuff that is 14 billion light years away or closer, because the light coming from anything further away hasn't had time to reach us yet. Even though it all started in the same place at the Big Bang. And galaxies that are near the limit are actually 40 billion light years away now because they kept going since they emitted the light that we are looking at now. So will we still be able to see them next year?
Also interesting was the discussion of what happens when two galaxies collide. It seems that the Andromeda galaxy, our nearest neighbor, is heading towards us at a fairly brisk pace right now. Due to the vast distances between the stars in each galaxy, there's only a small chance of two stars (or the planets that orbit them) colliding, but they say it will produce some exciting opportunities for astronomical observation as it passes through. And there's a theory that the shape and content of own Milky Way galaxy is actually the result of a previous encounter with Andromeda anyway.
For me, however, one of the presenters managed to top all of these facts and theories almost by accident. When asked what the oldest visible thing in the Universe is, he simply pointed to himself and said that all the hydrogen atoms that make up parts of all of us (and everything around us) were made within two minutes of the Big Bang. So pretty much all of them are 14 billion years old.
Of course, the other things that make up us, the larger and more complicated atoms, are a bit younger. Many of these types of atoms are still being manufactured in distant super-novae, but this stuff inside us has no doubt been around for a very long time. As any good astronomer will tell you, Joni Mitchell was right when she sang "We are stardust"...
So after I was castigated by Cisco's customer support people for buying from Amazon, who they class as a "grey importer", I decided to do the right thing this time. And look where it got me.
I decided to upgrade my load-balancing router, and chose one that is relatively inexpensive yet has more power and features than the existing one. Yes, it's a Cisco product - the RV320. It looks like it's just the thing I need to provide high performance firewalling, port forwarding, and load balancing between two ISPs.
On Amazon there are several comments from customers who bought one, indicating that the one supplied had a European (rather than UK) power supply. Probably because those suppliers are not UK-based. That does sound like it's a grey import, and - like last time - means I wouldn't get any technical support. However, there are UK-based suppliers on Amazon as well, so I could have ordered it from one of these, and easily returned it if it wasn't the UK version.
But, instead, I used the Cisco UK site to locate an approved Cisco retailer located in the UK, and placed an order with them. Their website said they had 53 in stock and the price was much the same as those on Amazon, though I had to pay extra for delivery of the real thing. Still, the small extra charge would be worth it to get technical support, and just to know that I had an approved product.
And two weeks later, with two promised delivery days passed, I'm still waiting. The first excuse was that their suppliers had not updated the stock figures over the holiday period. So in actual fact they didn't have any in stock, despite what the website said. Probably the 53 referred to what the UK Cisco main distributor had in its warehouse. And, of course, they sold all 53 over the holiday period. Maybe, like me, all network administrators choose the Christmas holiday to upgrade their network.
A query after the second non-delivery simply prompted a "we'll investigate" response. So much for trying to spread my online acquisition pattern wider than just Amazon. I could have ordered one from Amazon.co.uk at the same price and had it installed and working a week ago. Or even paid for next-day delivery from Amazon, sent it back for replacement twice, and still be using it now. In a world that is increasingly driven by online purchasing and fast fulfilment, an arrangement of the words "act", "together", "your", and "get" seems particularly applicable if they want to remain competitive.
But I suppose I should have remembered that you can't believe everything you read on the Internet...
It seems like a question that has an obvious answer: How should you show code listings in guidance documents? I'm not talking about the C#/VB/other language debate, or whether you orient it in landscape mode to avoid breaking long lines. No, I'm talking about the really important topics such as what color the text is, how big the tabs are, and where you put the accompanying description.
We're told that developers increasingly demand code rather than explanation when they search for help, though I'm guessing they also need some description of what the code does, the places it works best, and how to modify it for their own unique requirements. However, copy and paste does seem to be a staple programming technique for many, and is certainly valid as long as you understand what's going on and can verify its suitability. I've actually seen extracts of code I wrote as samples for ASP 2.0 when it first appeared (complete with my code comments) in applications that I was asked to review.
But here in p&p a lot of what we create is architectural guidance and design documentation designed to help you think about the problem, be aware of the issues and considerations, and apply best practice principles. As well as suggesting how you can implement efficient and secure code to achieve your design aims with minimal waste of time and cost. "Proven practices for predictable results", as it says on the p&p tin.
But even design guidance needs to demonstrate how stuff works, so we generally have some developers in the team who create code samples or entire reference implementation (RI) applications. These are, of course, incredibly clever people who don't take kindly to us lowly documentation engineers telling them how to set up their environment, or that the code comments really should be sentences that make sense and have as few spelling mistakes as possible.
In addition, Visual Studio has a really amazing built-in capability that we've so far failed to replicate in printed books. It can scroll sideways. These esteemed developers often prefer to have four or more space character wide tabs to make it easy to read the code on screen (the Visual Studio default is four). By the time you are inside a couple of if statements, a try/catch, and a lambda statement, you're off the page in a book. Two spaces is plenty in a printed document (where we have to replace the tabs with spaces anyway), but I've never yet persuaded a developer to change the settings.
And now Microsoft mandates that we have to use the same colors for the text in listings as it appears in Visual Studio (I guess to make it look more realistic, or at least more familiar). The old trick of copying the code into Notepad, editing it, and then pasting it into the document no longer works. But copying it directly from the Visual Studio editor into a document is painful because it insists on setting the font, style, margins, and other stuff when I just want to copy the colors. Yet if I do Paste | Special | Unformatted Text in Word, I lose the colors.
And then, when I finally get the code into the document, I need to describe how it works. Do I dump the entire code for a class into a single code section and describe it at the start, or at the end? If the code is a hundred lines or more (not unusual), the reader will find it cumbersome to relate parts of what is likely to be a long descriptive section to the actual code listing. I can break the class down into separate methods and describe each one separately, but often these listings are so long that I have the same problem.
And, of course, explaining how the methods relate to each other often means including an abridged version of some parts of the class or one of its methods, showing how it calls other methods of the class. But do I list these methods first and reference back to them, or explain the flow of execution first with the abridged listing and then show the methods that get called?
Typically I end up splitting the code into chunks of 30 lines or less (about half a printed page) and insert text to introduce the code before the chunk and text to describe how it works after the chunk. Something like:
The GoDoIt method shown in the code listing above calls the DoThisBit method to carry out the first operation in the workflow. The DoThisBit method takes a parameter named thisAction that specifies the Task instance to execute, as shown in the following code listing.
[CODE LISTING HERE]
The DoThisBit method first checks that the task is valid and then creates an instance of a ThisBitFactory, which it uses to obtain an instance of the BitHelper class... and so on.
After going backwards and forwards swapping code and text, breaking it up into ever smaller chunks, and trying to figure out what the code actually does it's just a matter of editing the code comments so that they make sense, breaking the lines in the correct places because they inevitably are longer than the page width, and then persuading the developer to update the code project files to match (or doing that myself just to annoy them).
Sometimes I think that putting code listings into a document takes longer than actually writing the code, though I've never yet managed to convince our developers of that. But I've been doing it for nigh on twenty years now, so I probably should be starting to get the hang of it...