March, 2004

  • The Old New Thing

    A privacy policy that doesn't actively offend me


    I've ranted before about privacy policies and how they don't actually protect your privacy. (All they're required to do is disclose the policy; there is no requirement that the policy must be any good.)

    Today I read MetLife's privacy policy and found to my surprise that it does not actively offend me. It's written in plain English, it's well-organized, and it actually explains and limits the scope of each exception.

    I have noticed how the word "terrorism" has turned into a "magic word" of late. Anything distasteful you want to do, just say you're doing it to combat "terrorism" and people will give you the green light.

  • The Old New Thing

    The only logical conclusion is that he was cloned


    Something is wrong with the world when fark finds something "real" news organizations miss. (When I first learned about fark, I confused it with FARC, a different organization entirely. That's right, a terrorist organization has its own official web site. Gotta love the Internet.)

    Anyway, fark has pointed out that the guy that Pakistani forces claim today to have surrounded along the border with Afghanistan, Ayman al-Zawahiri, was already reported to have been captured two years ago by The Grauniad. They never printed a correction (as far as I can tell) so I guess he cloned himself.

  • The Old New Thing

    Why does the Resource Compiler complain about strings longer than 255 characters?

    As we learned in a previous entry, string resources group strings into bundles of 16, each Unicode string in the bundle prefixed by a 16-bit length. Why does the Resource Compiler complain about strings longer than 255 characters?

    This is another leftover from 16-bit Windows.

    Back in the Win16 days, string resources were also grouped into bundles of 16, but the strings were in ANSI, not Unicode, and the prefix was only an 8-bit value.

    And 255 is the largest length you can encode in an 8-bit value.

    If your 32-bit DLL contains strings longer than 255 characters, then 16-bit programs would be unable to read those strings.

    This is largely irrelevant nowadays, but the warning remained in the Resource Compiler for quite some time.

    It appears to be gone now. Good riddance.

  • The Old New Thing

    Catholic baseball fans want to eat meat on opening day


    So it happens that Opening Day of the baseball season coincides with Good Friday, a day of "fasting and abstinence" according to Catholic tradition. (Then again, after Vatican II, the definition of "fasting and abstinence" weakened significantly. All that most people remember any more is "no meat".)

    Catholics in Boston have applied to the archdiocese for a special dispensation so they can have a hot dog at the game. The Church said "Nice try".

    But at least you can still order a beer.

  • The Old New Thing

    The car with no user-serviceable parts inside


    For the first time, a team of women is challenged to develop a car, and the car they come up with requires an oil change only every 50,000 kilometers and doesn't even have a hood, so you can't poke around the engine.

    To me, a car has no user-serviceable parts inside. The only times I have opened the hood is when somebody else said, "Hey, let me take a look at the engine of your car." (I have a Toyota Prius.) On my previous car, the only time I opened the hood was to check the oil.

    Sometimes the open-source folks ask, "Would you buy a car whose hood can't be opened?" It looks like that a lot of people (including me) would respond, "Yes."

  • The Old New Thing

    Why is the line terminator CR+LF?

    This protocol dates back to the days of teletypewriters. CR stands for "carriage return" - the CR control character returned the print head ("carriage") to column 0 without advancing the paper. LF stands for "linefeed" - the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

    If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you'll see that they all specify CR+LF as the line termination sequence. So the the real question is not "Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?" but rather "Why did other people choose to differ from these standards documents and use some other line terminator?"

    Unix adopted plain LF as the line termination sequence. If you look at the stty options, you'll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

    where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

    The unix ancestry of the C language carried this convention into the C language standard, which requires only "\n" (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

    The C language also introduced the term "newline" to express the concept of "generic line terminator". I'm told that the ASCII committee changed the name of character 0x0A to "newline" around 1996, so the confusion level has been raised even higher.

    Here's another discussion of the subject, from a unix perspective.

  • The Old New Thing

    On a server, paging = death


    Chris Brumme's latest treatise contained the sentence "Servers must not page". That's because on a server, paging = death.

    I had occasion to meet somebody from another division who told me this little story: They had a server that went into thrashing death every 10 hours, like clockwork, and had to be rebooted. To mask the problem, the server was converted to a cluster, so what really happened was that the machines in the cluster took turns being rebooted. The clients never noticed anything, but the server administrators were really frustrated. ("Hey Clancy, looks like number 2 needs to be rebooted. She's sucking mud.") [Link repaired, 8am.]

    The reason for the server's death? Paging.

    There was a four-bytes-per-request memory leak in one of the programs running on the server. Eventually, all the leakage filled available RAM and the server was forced to page. Paging means slower response, but of course the requests for service kept coming in at the normal rate. So the longer you take to turn a request around, the more requests pile up, and then it takes even longer to turn around the new requests, so even more pile up, and so on. The problem snowballed until the machine just plain keeled over.

    After much searching, the leak was identified and plugged. Now the servers chug along without a hitch.

    (And since the reason for the cluster was to cover for the constant crashes, I suspect they reduced the size of the cluster and saved a lot of money.)

  • The Old New Thing

    More on the AMD64 calling convention


    Josh Williams picks up the 64-bit ball with an even deeper discussion of the AMD64 (aka x64) calling convention and things that go wrong when you misdeclare your function prototypes.

  • The Old New Thing

    Why do text files end in Ctrl+Z?


    Actually, text files don't need to end in Ctrl+Z, but the convention persists in certain circles. (Though, fortunately, those circles are awfully small nowadays.)

    This story requires us to go back to CP/M, the operating system that MS-DOS envisioned itself as a successor to. (Since the 8086 envisioned itself as the successor to the 8080, it was natural that the operating system for the 8086 would view itself as the successor to the primary operating system on the 8080.)

    In CP/M, files were stored in "sectors" of 128 bytes each. If your file was 64 byte long, it was stored in a full sector. The kicker was that the operating system tracked the size of the file as the number of sectors. So if your file was not an exact multiple of 128 bytes in size, you needed some way to specify where the "real" end-of-file was.

    That's where Ctrl+Z came in.

    By convention, the unused bytes at the end of the last sector were padded with Ctrl+Z characters. According to this convention, if you had a program that read from a file, it should stop when it reads a Ctrl+Z, since that meant that it was now reading the padding.

    To retain compatibility with CP/M, MS-DOS carried forward the Ctrl+Z convention. That way, when you transferred your files from your old CP/M machine to your new PC, they wouldn't have garbage at the end.

    Ctrl+Z hasn't been needed for years; MS-DOS records file sizes in bytes rather than sectors. But the convention lingers in the "COPY" command, for example.
  • The Old New Thing

    Still more creative uses for CAPTCHA


    I want to say up front that I think CAPTCHA is a stupid name. CAPTCHA stands for "Computer-Aided Process for Testing..." something something.

    Why do people feel the urge the create some strained cutesy acronym for their little invention?

    Anyway, it has already been noted how spammers are getting around these tests by harvesting a practically-free resource on the Internet: the desire to see pornography.

    Someone designed a software robot that would fill out a registration form and, when confronted with an image processing test, would post it on a free porn site. Visitors to the porn site would be asked to complete the test before they could view more pornography, and the software robot would use their answer to complete the e-mail registration.

    Ah, remember the days when you had to whisper the word "pornography"?

    Anyway, it looks like the virus-writers have also taken the two-edged sword and pointed it in the other direction. (Ah, another one of Raymond's tortured mixed metaphors.)

    As you may be aware, the latest trend in virus-detection-avoidance is to attach an encrypted ZIP file, since virus-checkers don't know how to decrypt them. To get the sucker to activate the payload, you put the password in the message body.

    Well, virus checkers figured this out rather quickly and scanned the message body to see if there's a password in the text.

    Now the virus-writers have upped the ante. The Bagle-N virus attaches an encrypted ZIP file and provides the password as an image, using the same trick as the anti-robot people.

    Fortunately, the image generator they use is pretty easy to do OCR on, since they don't make any attempt to fuzz the images.

    I predict the next step will be that the virus-writers send two messages to each victim. The first contains the payload, and the second contains the password. That way the virus-scanning software is completely helpless since the password to decrypt the ZIP file isn't even in the message being scanned!

    Once again, just goes to show that social engineering can beat out pretty much any technological security mechanism.

    (I think virus scanners are now starting to block any password-protected ZIP. But that won't stop the viruses for long. They'll just have a link to a ZIP file or something.)

Page 3 of 5 (50 items) 12345