Random Disconnected Diatribes of a p&p Documentation Engineer
Did you know that almost everyone in Sweden has more than the average number of legs? According to Professor Hans Rosling of Sweden's Karolinska Institute, this must be the case because a few people have one or no legs, while nobody has more than two. He uses this fact to illustrate just how silly it is trying to apply the law of averages to many common scenarios.
I've always been fascinated by mathematics. Even though the capabilities with calculus and matrix theory have now percolated out of my aging brain in response to the continual onslaught of new computing technologies and programming languages I encounter in my day job, I just can't resist watching all those TV documentaries about mathematics and (in the case of the latest one) statistics.
Professor Rosling has a wonderful knack of making statistics sound really exciting. He also tosses in some nice anecdotes, such as this quote he heard from an attendee at a lecture: "If unemployment is up by six percent, debt is increasing by a quarter every year, and one in five companies are having difficulty paying suppliers, why are we wasting so much money on compiling statistics?"
It struck me as interesting because I've recently been trying to decide whether to kill off one of my seriously underperforming local community oriented websites, which seems to get more hits from malicious attackers than from real visitors interested in the content. Another example, I suppose, of trying to reduce my attack surface.
It should be really easy to make a decision based on the patterns of traffic to the site. According to an analysis of the last three months, the site gets around 50 unique visitors per day, each viewing two or three pages. However, the site was offline for over a week during the infrastructure failures I suffered in May, so the average is badly skewed both by that lack of availability and the loss of subsequent hits from visitors who gave up trying to access the site and never came back again.
The analysis also suggests that, on average, every third visitor accessed the "recover password" page, and half of the visitors accessed the "register for an account" page. And one on seven visitors was actually a search engine. So, assuming that search engines will spider the complete site, it means that the average number of page hits from real people per visit must be about 0.3. Maybe they only wanted to see the rather nice page header graphic and menu bar? Though the stats also say that, on average, only one visitor in five actually downloaded the page header image - a good indication that even though the site is very well indexed by search engines, there are hardly any real people visiting.
Perhaps the only way to make a sensible decision is to see what the income from the site is compared to the running costs. In my case the running costs are pretty much zero (it runs on a virtualized web server that hosts all my other sites). Though, as there are no fees or advertising revenue either, the income is also zero. So calculating the average ROI per month is not going to be much help.
So, even though the site is supposedly really useful for local people, it's clear that the average of 50 visits per day and two or three page views per visit is nowhere near the real truth. While statistical averages suggest it's doing OK (in terms of its target market), the underlying facts reveal how wrong such simple average numbers really are. It's time, I think, to pull the plug...
I'm probably not the only person who suddenly wondered if my computer had gone funny a few weeks ago when the Windows Live sign-in page looked very different from the one I'm used to. It was only after tracking down the Windows Live team blog and reading the article about it that I realized what was going on. And it seems from the comments on that post that a lot of people are upset by the change.
According to the relevant blog post, the reason for the change is that the implementation of "tiles" that remember your email address was "causing confusion" amongst users (they obviously refrained from using the familiar excuse that perhaps people are just becoming more stupid). It seems that the workings of the blatantly obvious "Remember my password" checkbox were unclear. Now you have only two options: enter your email address and password in full every time, or click "Keep me signed in" so you never see the sign-in page again.
However, if you visit a non-affiliated site that accepts a Windows Live ID, you get redirected to the Windows Live sign-in page and you see the old "tile" format - though without the "Remember my password" checkbox and with a note to let you know that the site wants to use your Live ID for authentication. Obviously this will confuse the stupid people even more.
As an aside on the "stupid" topic, there's a delightful tale doing the rounds that concerns a well-known and supposedly non-intellectual football player here in England. When asked by his team manager about the cylindrical metallic object in his kitbag, the player replied "It's one of those new unbreakable vacuum flasks. My wife gave it to me as a birthday present. It's amazing - it keeps hot things hot and cold things cold". "Really?", replied the manager, "What have you got in it?" to which the footballer replied "Two cups of coffee and a choc-ice"...
But getting back to the topic in hand, now there are two issues. If I choose not to stay signed in (which is typically the case if you are as paranoid as I am), I have to enter my email address in full every time I visit Hotmail or a site that uses Live ID for authentication. And because I tend to spell "hotmail.com" as "homtail.com" most times, it takes three attempts to log in even though I got the password right. I can't help wondering why they didn't put a "Remember only my email address" checkbox on there as well, even if they need to add a stupid-people explanation popup such as "tick this box if you want us to automatically fill in your email address so that you won't spell 'hotmail' wrong every time".
Or even make it the same as in Outlook Web Access where you can specify if this is a public or a private computer to have it remember your email address in the login page. This is, of course, the second issue. After a few abortive attempts and the general annoyance at having to type their email address every time, how many users will just click "Keep me signed in"? And then let their kids, neighbours, friends, and everyone else in the Internet cafe access this email account?
Hopefully you noticed the new "Get a single use code to sign in with" link on the sign-in page that's designed for when you aren't on your own machine...
Mind you, this discovery and subsequent aggravation did prompt me into something I've been thinking for a while I should do. It's time that I sorted out my logons and passwords, and tried to figure out where I have accounts with the multitude of websites I've visited over the years. And either close unwanted accounts or make sure I'm using different passwords for each of them.
Most sites now offer a "Close my account" option, and I made use of this where I could. Though many of the sites I've used over the years are no longer there (I just hope they haven't sold my login details to anyone else). I even managed to slim the number of different accounts I have with Hotmail, Amazon, PayPal, and others down to a single account with each. Other sites where I couldn't locate a "Close my account" option I found I could simply change all of the personal details into meaningless strings of random characters, and then set a ridiculously long and complex password.
But there are some sites that seem to go out of their way to make life difficult. One prime example must be the Sony UK web store. I tried contacting the online assistant using their chat facility, but was firmly told that it was not possible to close my account and that they could not remove my personal details from the site. So I tried to update them to random stuff, and discovered that it's not actually possible to change them at all, even with fully valid entries in every textbox. Even if you open the details page and then just click "Save changes" it tells you that you must "Choose an appropriate form for your address".
Oh well, I though, I'll just change the password to something really long and complex instead. But the first attempt failed because I included an invalid character. And all subsequent attempts simply displayed an error message saying that you can only update your password once a day. Though I must have achieved something because today I can't log in at all, and the retrieve password feature doesn't even recognize my email address. I suppose that means I can never buy another of those glorious laptops they advertise...
Those of us who read the documentation for software before installing it (though we are, it seems, members of a pitifully small minority) know that the most illuminating part is the "Known Issues" section. It's here that, hopefully, you discover all the problems you are likely to face - and can make an educated decision as to whether to continue. So I wonder if it's time that blogs were forced to include a Readme document that points to known issues with the content.
OK, so this obviously makes sense for blog posts that contain technical content and programming tips, but I reckon we should also adopt it for more general ravings, diatribes, news, comments, and definitely for those describing the writer's latest vacation to some conference or exotic location. It could even contain a list of prerequisites. Something like: "Before attempting to read this post, ensure that you are not already in a bad mood and that you are not currently in possession of a container of hot liquid or a sharp instrument". Or "This post is not designed for use unless you are totally bored, semi-comatose, or there is nothing even remotely interesting on television".
Maybe another approach would be to mandate a rating system for blog posts. Instead of violence, bad language, nudity, and discrimination we could have ratings for "Likely to annoy you", "May cause drowsiness", "Will break your computer", "Contains technical stuff that you won't be able to make work", and "Code samples only run on Windows 95". You would be able to adjust your RSS reader to only show posts that meet your specific required combination of such factors, or see a warning so you are properly prepared for the effects of the content before you start.
Of course, this would involve additional work for bloggers as they would have to categorize their posts before submitting them, but even this may be open to some automation. For example, I find that the weather affects my posts quite dramatically. When I wrote lasts week's rather downbeat post about security and privacy on the web it was cold, windy, and raining. This week I'm sitting in the conservatory enjoying a lovely warm summer day, with the fan spinning lazily above me and the cat asleep on the chair next to me. So the post has a much more positive outlook and upbeat content.
Surely it wouldn't be hard for my post editor to look up the local weather for my location and set the corresponding rating for "Upbeatness". And there are other automation opportunities as well: for example, it could scan my machine for part installed and broken software installations and set the "Likely to break your machine" rating. Or measure the force with which I'm hitting the keys to set the "Contains aggressive and bombastic rants, possibly including bad language" rating. Maybe also analyze the content for photos that contain holiday scenes and set the corresponding "May cause drowsiness" rating.
Of course, there are other approaches that are easier to implement in the short term while we wait for W3C to approve this proposal and for developers to create the appropriate blog editors. Most blogs have the ability to tag individual posts, so authors could have relevant categories. I find this technique useful for categorizing all those posts that tend to prompt comments such as "what the heck is he on about now" or "what is this guy smoking" by tagging them in the Weird category. Which, surprisingly, seems to get the largest number of hits...
Did you know that the internet is fundamentally broken? An excerpt from a recent interview by Emma Barnett with Carl Sjogreen, product manager at Facebook, includes this amazing comment: "The fact that I can go to a site and it doesn't know who I am or what I like shows how fundamentally broken the internet still is". Huh? Is this a new definition of the phrase "fundamentally broken"? Or is it just me exhibiting my usual level of dinosaurian paranoia?
Maybe it's just a coincidence that last week's rant was all about tracking ads, hidden background requests, and related nefarious stuff - but I really don't want every website I visit to know all about me. I realize that, even with occasional trips to the Tools|Options dialog to delete cookies, I leave some trail across the internet as I browse and click to achieve my daily work tasks. And as a regular online shopper I know that I'm leaving an increasingly broad and well-worn path for spam emails to follow; though thankfully Message Labs blocks the vast majority from reaching my inbox. But I'm increasingly noticing new events that indicate just how little control we may really have over our personal information.
For example, I have my usual web browsing machine set up to warn about switching to and from secure (HTTPS) pages, and it seems to pop up warnings now for even the most unexpected sites where the page itself isn't HTTPS. However, tucked away in one corner is a selection of social media site buttons such as Twitter and Facebook, and simply loading the page causes the browser to try and log me into these sites using an authentication cookie stored on my browser (presumably so I can instantly "Like", or apply some other adjectival verb, to the content). It's not going to happen because I don't have a Twitter or Facebook account, but it's an indication of what's going on behind the scenes. These people would know from the referrer string exactly which sites I'm visiting.
Of course, the reason is so that sites can (quoting from the interview again) make their pages "look different to a 60-year old man and an 18-year old girl based on their interests". OK, I am a much better match to the first of these categories than the second, so I probably don't want to read about how wonderful the latest boy bands are, and which color eye-shadow goes best with this year's fashion choice in summer hats. But surely this is illegal under age discrimination laws. And I'm not sure I want Amazon to automatically show me a choice selection of stair lifts and Zimmer frames every time I visit their site either.
Likewise, I recently downloaded a sample application from a very reputable source to learn more about HTML 5, and discovered that to use it I have to set up an account with OpenID to be able to log in. Why? I already have enough accounts and passwords just to do the things I need to, without scattering my personal details across more and more sites. And, when you read about the number of sites that "lose" your information to hackers, do I really want to increase the chance of my personal details arriving in the public domain? Especially as I'm fast running out of available memorable passwords, so I'm bound to have used it somewhere else. Good security practice suggests minimizing your attack surface, not expanding it.
Thankfully, modern browsers are making it easier to protect yourself online; though I'm still having daily disagreements with both IE9 and an increasing number of websites. I've recently encountered sites where you have to allow the browser to download and run an ActiveX control or Java applet just to be able to click the "Buy Now" button, or a Flash animation to be able to enter your delivery information. With most of the browser extensions and add-ins disabled, four out of five page loads pop up the message "An add-on for this site failed to load" (which, in IE9, unhelpfully manages to obscure the horizontal scroll bar as well).
And while we're in IE9 territory, have you figured out how to use the combined search and address box yet? Or managed to get a separate search box to show without filling half the window with a selection of other useless buttons? And how are you finding the download information bar? It seems like a really good idea, helping to protect against downloading and running malicious code. OK, so it took me a while to figure out why files wouldn't download (I was waiting for the pop-up download dialog), but the integration with Security Essentials that scans the file is nice – even though you'd assume that Security Essentials would scan it anyway as it was written to the disk.
But then I tried to download the latest drivers for the network card in my server from the Dell website and got the red-bordered warning that "This file is not commonly downloaded and could damage your computer". I'd entered my computer's Asset Tag number, checked the network card make and model number against the file that Dell offered, and so assumed it was the correct one. And as IE9 hadn't yet downloaded it, it couldn't have checked if it was laced with viruses and trojans. What am I supposed to do in that situation...? I downloaded the file anyway and scanned it with a couple of different anti-virus tools (it passed). But what will most non-paranoid users do after they've see this warning a few times?
Maybe it's time we looked seriously at the real risks to privacy, safety, and protection from malicious attack and came up with some realistic statistics. I'm guessing that the occurrence of successful attacks as a percentage of internet use is very small, but the daily revelations of sites being hacked, personal details being stolen, and viruses spreading like wildfire can't help but damage the way we regard the internet. It already gets enough bad press, yet we're constantly finding new ways to incorporate it into our daily life; to the extent that it's already nearly impossible to survive without it.
There's plenty of good advice about protecting yourself available out there, but perhaps it's time somebody started publishing information in a form that will help us to understand the real risks - before an increasing number of people come to the conclusion that the internet is just too dangerous to use. What's the percentage chance of suffering a successful attack if you use the default security settings and keep your system up to date? How many people are affected by viruses as a proportion of the population? Do certain well-known add-ons increase the risk by a specific amount?
Or will I have to actually go out of the house in future to do my shopping...
They say that the first step in achieving a cure is to actually admit you are an addict. So here we go: "My name is Alex and I'm an inveterate Ctrl-clicker". There, I feel better already. Perhaps I'm already half-way to kicking the habit, though I have to report that I'm having trouble finding an appropriate support group. I was looking forward to sitting round in a circle in some draughty church hall and confessing that I often have up to ten browser tabs open on the same website!
Perhaps I'm just extraordinarily impatient, but surely the whole reason that browsers have tabbed windows is to allow you to multi-view pages? When I'm looking for something on a site, be it information on some programming technique on MSDN, a gift for my wife's birthday on Amazon, or just reading the news on The Register, my left thumb automatically strays to the Ctrl key as I click multiple links in the page. Then I can quickly skip through the now-loaded pages to see if they contain anything I'm looking for, without losing the initial starting point. Surely this is how everyone uses the web?
A while ago I noticed my ISA Server would occasionally log an event that "may indicate the computer at [some IP address] is infected with a virus or is attacking the host", and that it would no longer be allowed to create TCP connections. After the initial panic attack I realized that this event exactly coincided with a spasm of mad Ctrl-clicking, but I never investigated further until I worked on some sample code for our "Claims Based Identity & Access Control Guide". This required installing the Fiddler HTTP proxy utility to examine the packets sent over the network that implement the authentication exchanges.
Out of interest I cleared the log and hit the Amazon.co.uk home page to look for a replacement hard disk for one of my aging machines while Fiddler was running. I suppose the fact that the browser made 67 requests to the server isn't unexpected - there's a lot of content on the home page. However, searching for suitable hard disks and then Ctrl-clicking on the seven that looked interesting had - within a couple of seconds - generated a total of 639 requests. And 254 of these were not to Amazon, but were related to various tracking and advertisements. In total my browser downloaded 9,031,111 bytes! No wonder ISA Server gets suspicious...
However, if you read about the latest plans Google have to speed up searching in their Chrome browser, you'll probably need to prepare yourself for even heavier network loads. The idea is to have the browser automatically download the entire target page for the link they figure you are most likely to click into a hidden tab, so that it can spring instantly into view when you do what they expect. It sounds like a great plan until you consider the ramifications.
For example, they are already suggesting web page developers use their implementation of the W3C's Page Visibility API to minimize the activities occurring in the page when it's not visible. And presumably the hidden page will change each time you type a letter in the search box and a different one becomes the "most likely choice". Perhaps it will store them all, so typing a really long search string in the text box is not going to be a good idea unless you have plenty of available memory.
Even more worrying, of course, is if the "most likely choice" happens to be a page you really don't want to download. Maybe it contains some of that type of content euphemistically labelled "NSFW" (not safe for work); so a simple typing error when searching for the definition of the C# keyword "public" could get you fired when the administrator examines the proxy server logs. Or initiate a visit from the FBI when you just wanted to find out why your server bombed again.
And what happens if the "most likely choice" page happens to be from one of those free antivirus scan sites that you studiously avoid when they pop up in the search results? Are writers of drive-by viruses likely to abide by the Page Visibility recommendations, and not execute their malicious code when the page is running in a hidden tab? But I suppose I'm just being a bit naive and old-fashioned in thinking that I'd like my browser to only access resources and sites that I want to load. As Fiddler reveals, that stopped happening a long while ago.