Holy cow, I wrote a book!
I have kept every single piece of spam and virus email since mid-1997.
Occasionally, it comes in handy, for example, to add
naïve Bayesian spam filter to my custom-written email filter.
And occasionally I use it to build a chart of spam and virus email.
The following chart plots every single piece of spam and virus email
that arrived at my work email address since April 1997.
Blue dots are spam and red dots are email viruses.
The horizontal axis is time, and the vertical axis is size of mail
(on a logarithmic scale).
Darker dots represent more messages.
(Messages larger than 1MB have been treated as if they were 1MB.)
Note that this chart is not scientific. Only mail which makes it past
the corporate spam and virus filters show up on the chart.
Why does so much spam and virus mail get through the filters?
Because corporate mail filters cannot take the risk of accidentally
classifying valid business email as spam. Consequently, the filters
have to make sure to remove something only if they has extremely high
confidence that the message is unwanted.
Okay, enough dawdling. Let's see the chart.
Overall statistics and extrema:
Subject: About your account...
Content-Type: text/plain; charset=ISO-8859-1
Things you can see on the chart:
As a comparison, here's the same chart based on email received
at one of my inactive personal email addresses.
This particular email address has been inactive since 1995;
all the mail it gets is therefore from harvesting done prior to 1995.
(That's why you don't see any red dots: None of my friends have this address
in their address book since it is inactive.)
The graph doesn't go back as far because
I didn't start saving spam from this address until late 2000.
Received: from dhcp065-025-005-032.neo.rr.com ([220.127.116.11]) by ...
Sat, 24 Jul 2004 12:30:35 -0700
I cannot explain the mysterious "quiet period" at the beginning
of 2004. Perhaps my ISP instituted a filter for a while?
Perhaps I didn't log on often enough to pick up my spam and it
expired on the server? I don't know.
One theory is that the lull was due to uncertainty created by the
CAN-SPAM Act, which took effect on January 1, 2004.
I don't buy this theory since there was no significant corresponding
lull at my other email account, and follow-up reports indicate
that CAN-SPAM was widely disregarded.
Even in its heyday, compliance was only 3%.
Curiously, the trend in spam size for this particular account is
that it has been going down since 2002.
In the previous chart, you could see a clear upward trend since 1997.
My theory is that since this second dataset is more focused on current
trends, it missed out on the growth trend in the late 1990's
and instead is seeing the shift in spam from text to <IMG> tags.
Each time you move a PS/2-style mouse, the mouse send three
bytes to the computer. For the sake of illustration, let's say
the three bytes are x, y, and buttons.
The operating system sees this byte stream and groups them into threes:
x y b
x y b
x y b
x y b
x y b
x b x
y b x
y b x
The operating system is now out of sync with the mouse and starts
misinterpreting all the data.
It receives a "y b x" from the mouse and treats the y byte
as the x-delta, the b byte as the y-delta, and
the x byte as the button state.
Result: A mouse that goes crazy.
Oh wait, then there are mice with wheels.
When the operating system starts up, it tries to figure out whether
the mouse has a wheel and convinces it to go into wheel
mode. (You can influence this negotiation from Device Manager.)
If both sides agree on wheeliness, then the mouse
bytes for each mouse motion, which therefore must be interpreted
something like this:
x y b w
x y b w
x y b w
x y b w
Now things get really interesting when you
introduce laptops into the mix.
Many laptop computers have a PS/2 mouse port into which you
can plug a mouse on the fly. When this happens, the
built-in pointing device is turned off and the PS/2 mouse
is used instead. This happens entirely within the
laptop's firmware. The operating system has
no idea that this switcheroo has happened.
Suppose that when you turned on your laptop, there was
a wheel mouse connected to the PS/2 port. In this case, when the
operating system tries to negotiate with the mouse, it sees
a wheel and puts the mouse into "wheel mode", expecting
(and fortunately receiving) four-byte packets.
Now unplug your wheel mouse so that you revert to the
touchpad, and let's say your touchpad doesn't have a wheel.
The touchpad therefore spits out three-byte mouse packets
when you use it. Uh-oh, now things are really messed up.
The touchpad is sending out three-byte packets, but the
operating system thinks it's talking to that mouse that
was plugged in originally and continues to expect
You can imagine the mass mayhem that ensues.
Moral of the story: If you're going to hot-plug a mouse
into your laptop's PS/2 port, you have a few choices.
Probably the easiest way out is to avoid the PS/2 mouse
entirely and just use a USB mouse.
This completely sidesteps the laptop's PS/2 switcheroo.
The x86 architecture
does things that almost no other modern architecture does,
but due to its overwhelming popularity, people think that
the x86 way is the normal way and that everybody else is weird.
Let's get one thing straight:
The x86 architecture is the weirdo.
The x86 has a small number (8) of general-purpose registers; the other modern
processors have far more.
(PPC, MIPS, and Alpha each have 32; ia64 has 128.)
The x86 uses the stack to pass function parameters;
the others use registers.
The x86 forgives access to unaligned data, silently fixing up
The others raise a misalignment exception, which can optionally
be emulated by the supervisor at an amazingly huge performance penalty.
The x86 has variable-sized instructions.
The others use fixed-sized instructions.
(PPC, MIPS, and Alpha each have fixed-sized 32-bit instructions;
ia64 has fixed-sized 41-bit instructions. Yes, 41-bit instructions.)
The x86 has a strict memory model, where external memory access
matches the order in which memory accesses are issued by the code
The others have weak memory models, requiring explicit memory
barriers to ensure that issues to the bus are made (and completed)
in a specific order.
The x86 supports atomic load-modify-store operations.
None of the others do.
The x86 passes function return addresses on the stack.
The others use a link register.
Bear this in mind when you write what you think is portable code.
Like many things, the culture you grow up with is the one that
feels "normal" to you, even if, in the grand scheme of things,
it is one of the more bizarre ones out there.
A commenter asked why the original window order is not always preserved
when you undo a Show Desktop.
The answer is "Because the alternative is worse."
Guaranteeing that the window order is restored can result in
When the windows are restored when you undo a Show Desktop,
Explorer goes through and asks each window that it had minimized
to restore itself. If each window is quick to respond, then the
windows are restored and the order is preserved.
However, if there is a window that is slow to respond (or
even hung), then it
loses its chance and Explorer moves on to the next window in the list.
That way, a hung window doesn't cause Explorer to hang, too.
But it does mean that the windows restore out of order.
For some reason,
some people go to enormous lengths to locate the Internet Explorer
binary so they can launch it with some options.
The way to do this is not to do it.
If you just pass "IEXPLORE.EXE" to
the ShellExecute function
[link fixed 9:41am],
it will go find Internet Explorer and run it.
ShellExecute(NULL, "open", "iexplore.exe",
The ShellExecute function gets its hands dirty so you don't have to.
(Note: If you just want to launch the
URL generically, you should use
ShellExecute(NULL, "open", "http://www.microsoft.com",
NULL, NULL, SW_SHOWNORMAL);
so that the web page opens in the user's preferred web browser.
Forcing Internet Explorer should be avoided under normal circumstances;
we are forcing it here because the action is presumably being taken
response to an explicit request to open the web page specifically
in Internet Explorer.)
If you want to get your hands dirty, you can of course do it yourself.
reading the specification from the other side, this time
the specification on how to register your program's name
and path ("Registering Application Path Information").
The document describes how a program should enter its properties
into the registry so that the shell can launch it. To read it
backwards, then, interpret this as a list of properties you (the launcher)
need to read from the registry.
In this case, the way
to run Internet Explorer (or any other program)
the same way ShellExecute does
is to look in
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\App Paths\IEXPLORE.EXE (substituting
the name of the program if it's not Internet Explorer you're after).
The default value is the full path to the program and the
the "Path" value specifies a custom path that you should prepend
to the environment before launching the target program.
When you do this,
don't forget to call
the ExpandEnvironmentStrings function
if the registry value's type is REG_EXPAND_SZ.
(Lots of people forget about REG_EXPAND_SZ.)
Of course, my opinion is that it's much easier just to let
ShellExecute do the work for you.
Even though Windows NT uses UTC internally,
the BIOS clock stays on local time.
Why is that?
There are a few reasons.
One is a chain of backwards compatibility.
In the early days, people often dual-booted between
Windows NT and MS-DOS/Windows 3.1.
MS-DOS and Windows 3.1 operate on local time,
so Windows NT followed suit so that you wouldn't
have to keep changing your clock each time you changed
As people upgraded from Windows NT to
Windows 2000 to Windows XP, this choice
of time zone had to be preserved so that people
could dual-boot between their previous operating
system and the new operating system.
Another reason for keeping the BIOS clock on local time
is to avoid confusing people who set their time via the BIOS
If you hit the magic key during the power-on self-test,
the BIOS will go into its configuration mode, and one of
the things you can configure here is the time.
Imagine how confusing it would be if you set the time to 3pm,
and then when you started Windows, the clock read 11am.
"Stupid computer. Why did it even ask me to change the time
if it's going to screw it up and make me change it a second time?"
And if you explain to them, "No, you see, that time was UTC,
not local time," the response is likely to be
"What kind of totally propeller-headed nonsense is that?
You're telling me that when the computer asks me what time it is,
I have to tell it what time it is in
(Except during the summer in the northern hemisphere,
when I have to tell it what time it is in
Why do I have to remember my time zone and manually subtract
four hours? Or is it five during the summer? Or maybe I have to
add. Why do I even have to think about this?
Stupid Microsoft. My watch says three o'clock. I type three o'clock.
End of story."
(What's more, some BIOSes have alarm clocks built in,
where you can program them to have the computer turn itself on at a particular
time. Do you want to have to convert all those times to UTC
each time you want to set a wake-up call?)
I didn't debug it personally, but I know the people who did.
During Windows XP development, a bug arrived on
a computer game that crashed only after you got to one of the higher levels.
After many saved and restored games, the problem was finally identified.
The program does its video work in an offscreen buffer and transfers
it to the screen when it's done. When it draws text with a shadow,
it first draws the text in black, offset down one and right one pixel,
then draws it again in the foreground color.
So far so good.
Except that it didn't check whether moving down and right one pixel
was going to go beyond the end of the screen buffer.
That's why it took until one of the higher levels before the bug
manifested itself. Not until then did you accomplish a mission
whose name contained a lowercase letter with a descender!
Shifting the descender down one pixel caused the bottom row of
pixels in the character to extend past the video buffer and
start corrupting memory.
Once the problem was identified, fixing it was comparatively easy.
The application compatibility team
has a bag of tricks, and one of them is called
This particular compatibility fix adds padding to every heap
allocation so that when a program overruns a heap buffer, all
that gets corrupted is the padding.
Enable that fix for the bad program
(specifying the amount of padding necessary,
in this case, one row's worth of pixels), and run through the
game again. No crash this time.
What made this interesting to me was that you had to play the
game for hours before the bug finally surfaced.
It depends which version of Windows you're asking about.
For Windows 95, Windows 98, and Windows Me,
the answer is simple: Not at all.
These are not multiprocessor operating systems.
For Windows NT and Windows 2000, the answer is
"It doesn't even know."
These operating systems are not hyperthreading-aware
because they were written before hyperthreading was invented.
If you enable hyperthreading, then each of your CPUs looks
like two separate CPUs to these operating systems.
(And will get charged as two separate CPUs for licensing
Since the scheduler doesn't realize the connection between
the virtual CPUs, it can end up doing a worse job than
if you had never enabled hyperthreading to begin with.
Consider a dual-hyperthreaded-processor machine.
There are two physical processors A and B, each with
two virtual hyperthreaded processors, call them A1, A2,
B1, and B2.
Suppose you have two CPU-intensive tasks.
As far as the Windows NT
and Windows 2000 schedulers are concerned, all four
processors are equivalent, so it figure it doesn't matter which two
it uses. And if you're unlucky, it'll pick
A1 and A2, forcing one physical processor to shoulder two
heavy loads (each of which will probably run at something
between half-speed and three-quarter speed),
leaving physical processor B idle;
completely unaware that it could have done a better job
by putting one on A1 and the other on B1.
Windows XP and Windows Server 2003 are hyperthreading-aware.
When faced with the above scenario, those schedulers will know
that it is better to put one task on one of the A's and the other
on one of the B's.
Note that even with a hyperthreading-aware processor,
you can concoct pathological scenarios where hyperthreading ends
up a net loss. (For example, if you have four tasks, two of which
rely heavily on L2 cache and two of which don't, you'd be better
off putting each of the L2-intensive tasks on separate processors,
since the L2 cache is shared by the two virtual processors.
Putting them both on the same processor would result in a lot of L2-cache
misses as the two tasks fight over L2 cache slots.)
When you go to the expensive end of the scale (the Datacenter Servers,
the Enterprise Servers), things get tricky again.
I refer still-interested parties to the
Windows Support for Hyper-Threading Technology white paper.
Update 06/2007: The white paper
appears to have moved.
On x86 machines, Windows chooses a page size of 4K because that was the
only page size supported by that architecture at the time the operating
system was designed. (4MB pages were added to the CPU later,
in the Pentium as I recall, but clearly that is too large for everyday use.)
For the ia64, Windows chose a page size of 8K. Why 8K?
It's a balance between two competing objectives.
Large page sizes allow more efficient I/O since you are reading
twice as much data at one go. However large page sizes also
increase the likelihood that the extra I/O you perform is wasted
because of poor locality.
Experiments were run on the ia64 with various page sizes
(even with 64K pages, which were seriously considered at one point),
and 8K provided the best balance.
Note that changing the page size creates all sorts of problems
for compatibility. There are large numbers of programs out there that
blindly assume that the page size is 4K.
Boy are they in for a surprise.
Einstein discovered that simultaneity is relative.
This is also true of computing.
People will ask, "Is it okay to do X on one thread and Y on
another thread simultaneously?" Here are some examples:
You can answer this question knowing nothing about the internal
behavior of those operations. All you need to know are some physics
and the answers to much simpler questions about what is
valid sequential code.
Let's do a thought experiment with simultaneity.
Since simultaneity is relative, any code that does X and Y
simultaneously can be observed to have performed X before Y
or Y before X, depending on your frame of reference.
That's how the universe works.
So if it were okay to do them simultaneously, then it must
also be okay to do them one after the other, since they
do occur one after the other if you walk
past the computer in the correct direction.
Is it okay to use a handle after closing it?
Is it okay to unregister a wait event twice?
The answer to both questions is "No," and therefore
it isn't okay to do them simultaneously either.
If you don't like using physics to solve this problem, you can
also do it from a purely technical perspective.
Invoking a function is not an atomic operation. You prepare
the parameters, you call the entry point, the function does some
work, it returns. Even if you somehow manage to get both threads
to reach the function entry point simultaneously (even though as
we know from physics there is no such thing as true simultaneity),
there's always the possibility that one thread will get pre-empted
immediately after the "call" instruction has transferred control
to the first instruction of the target function,
while the other thread continues to completion.
After the second thread runs to completion, the pre-empted thread
gets scheduled and begins execution of the function body.
Under this situation, you effectively
called the two functions one after the
other, despite all your efforts to call them simultaneously.
Since you can't prevent this scenario from occurring,
you have to code with the possibility that it might actually happen.
Hopefully this second explanation will satisfy the people who don't believe
in the power of physics.
Personally, I prefer using physics.