• The Old New Thing

    Why do text files end in Ctrl+Z?


    Actually, text files don't need to end in Ctrl+Z, but the convention persists in certain circles. (Though, fortunately, those circles are awfully small nowadays.)

    This story requires us to go back to CP/M, the operating system that MS-DOS envisioned itself as a successor to. (Since the 8086 envisioned itself as the successor to the 8080, it was natural that the operating system for the 8086 would view itself as the successor to the primary operating system on the 8080.)

    In CP/M, files were stored in "sectors" of 128 bytes each. If your file was 64 byte long, it was stored in a full sector. The kicker was that the operating system tracked the size of the file as the number of sectors. So if your file was not an exact multiple of 128 bytes in size, you needed some way to specify where the "real" end-of-file was.

    That's where Ctrl+Z came in.

    By convention, the unused bytes at the end of the last sector were padded with Ctrl+Z characters. According to this convention, if you had a program that read from a file, it should stop when it reads a Ctrl+Z, since that meant that it was now reading the padding.

    To retain compatibility with CP/M, MS-DOS carried forward the Ctrl+Z convention. That way, when you transferred your files from your old CP/M machine to your new PC, they wouldn't have garbage at the end.

    Ctrl+Z hasn't been needed for years; MS-DOS records file sizes in bytes rather than sectors. But the convention lingers in the "COPY" command, for example.
  • The Old New Thing

    Still more creative uses for CAPTCHA


    I want to say up front that I think CAPTCHA is a stupid name. CAPTCHA stands for "Computer-Aided Process for Testing..." something something.

    Why do people feel the urge the create some strained cutesy acronym for their little invention?

    Anyway, it has already been noted how spammers are getting around these tests by harvesting a practically-free resource on the Internet: the desire to see pornography.

    Someone designed a software robot that would fill out a registration form and, when confronted with an image processing test, would post it on a free porn site. Visitors to the porn site would be asked to complete the test before they could view more pornography, and the software robot would use their answer to complete the e-mail registration.

    Ah, remember the days when you had to whisper the word "pornography"?

    Anyway, it looks like the virus-writers have also taken the two-edged sword and pointed it in the other direction. (Ah, another one of Raymond's tortured mixed metaphors.)

    As you may be aware, the latest trend in virus-detection-avoidance is to attach an encrypted ZIP file, since virus-checkers don't know how to decrypt them. To get the sucker to activate the payload, you put the password in the message body.

    Well, virus checkers figured this out rather quickly and scanned the message body to see if there's a password in the text.

    Now the virus-writers have upped the ante. The Bagle-N virus attaches an encrypted ZIP file and provides the password as an image, using the same trick as the anti-robot people.

    Fortunately, the image generator they use is pretty easy to do OCR on, since they don't make any attempt to fuzz the images.

    I predict the next step will be that the virus-writers send two messages to each victim. The first contains the payload, and the second contains the password. That way the virus-scanning software is completely helpless since the password to decrypt the ZIP file isn't even in the message being scanned!

    Once again, just goes to show that social engineering can beat out pretty much any technological security mechanism.

    (I think virus scanners are now starting to block any password-protected ZIP. But that won't stop the viruses for long. They'll just have a link to a ZIP file or something.)

  • The Old New Thing

    How do I convert a SID between binary and string forms?


    Of course, if you want to do this programmatically, you would use ConvertSidToStringSid and ConvertStringSidtoSid, but often you're studying a memory dump or otherwise need to do the conversion manually.

    If you have a SID like S-a-b-c-d-e-f-g-...

    Then the bytes are

    N(number of dashes minus two)
    bbbbbb(six bytes of "b" treated as a 48-bit number in big-endian format)
    cccc(four bytes of "c" treated as a 32-bit number in little-endian format)
    dddd(four bytes of "d" treated as a 32-bit number in little-endian format)
    eeee(four bytes of "e" treated as a 32-bit number in little-endian format)
    ffff(four bytes of "f" treated as a 32-bit number in little-endian format)

    So for example, if your SID is S-1-5-21-2127521184-1604012920-1887927527-72713, then your raw hex SID is


    This breaks down as follows:

    05(seven dashes, seven minus two = 5)
    000000000005(5 = 0x000000000005, big-endian)
    15000000(21 = 0x00000015, little-endian)
    A065CF7E(2127521184 = 0x7ECF65A0, little-endian)
    784B9B5F(1604012920 = 0x5F9B4B78, little-endian)
    E77C8770(1887927527 = 0X70877CE7, little-endian)
    091C0100(72713 = 0x00011c09, little-endian)

    Yeah, that's great, Raymond, but what do all those numbers mean?

    S-1-version number (SID_REVISION)
    -...-...-...-these identify the machine that issued the SID
    72713unique user id on the machine

    Each machine generates a unique ID that it uses to stamp all the SIDs it creates (-...-...-...-). The last number is a "relative id (RID)" that represents a user created by that machine. There are a bunch of predefined RIDs; you can see them in the header file ntseapi.h, which is also where I got these names from. The system reserves RIDs up to 999, so the first non-builtin account gets assigned ID number 1000. The number 72713 means that this particular SID is the 71714th SID created by the issuer. (The machine that issued this SID is clearly a domain controller, responsible for creating the accounts of tens of thousands of users.)

    (Actually, I lied above when I said that this is the 71714th SID created by the issuer. Large servers can delegate SID creation to helpers, in which case SID issuance is no longer strictly consecutive.)

    Security isn't my area of expertise, so it's entirely possibly (perhaps even likely) that I got something wrong up above. But it's mostly correct, I think.

  • The Old New Thing

    Senators are really good at stock-picking

    A Georgia State University study shows that U.S. senators have an uncanny knack for picking stocks that outpace the overall market. Professor Alan Ziobrowski's analysis of senators' financial disclosure data found that over a period of six years, the lawmakers outperformed the market by 12 percent.

    Professor Ziobrowski seems convinced that this is evidence of unethical behavior.

  • The Old New Thing

    What is the default security descriptor?


    All these functions have an optional LPSECURITY_ATTRIBUTES parameter, for which everybody just passes NULL, thereby obtaining the default security descriptor. But what is the default security descriptor?

    Of course, the place to start is MSDN, in the section titled Security Descriptors for New Objects.

    It says that the default DACL comes from inheritable ACEs (if the object belongs to a hierarchy, like the filesystem or the registry); otherwise, the default DACL comes from the primary or impersonation token of the creator.

    But what is the default primary token?

    Gosh, I don't know either. So let's write a program to find out.

    #include <windows.h>
    #include <sddl.h> // ConvertSecurityDescriptorToStringSecurityDescriptor
    int WINAPI
     HANDLE Token;
     if (OpenProcessToken(GetCurrentProcess(), TOKEN_QUERY, &Token)) {
     DWORD RequiredSize = 0;
     GetTokenInformation(Token, TokenDefaultDacl, NULL, 0, &RequiredSize);
     TOKEN_DEFAULT_DACL* DefaultDacl =
         reinterpret_cast<TOKEN_DEFAULT_DACL*>(LocalAlloc(LPTR, RequiredSize));
     if (DefaultDacl) {
      LPTSTR StringSd;
      if (GetTokenInformation(Token, TokenDefaultDacl, DefaultDacl,
                              RequiredSize, &RequiredSize) &&
          InitializeSecurityDescriptor(&Sd, SECURITY_DESCRIPTOR_REVISION) &&
          SetSecurityDescriptorDacl(&Sd, TRUE,
              DefaultDacl->DefaultDacl, FALSE) &&
       MessageBox(NULL, StringSd, TEXT("Result"), MB_OK);
     return 0;

    Okay, I admit it, the whole purpose of this entry is just so I can call the function ConvertSecurityDescriptorToStringSecurityDescriptor, quite possibly the longest function name in the Win32 API. And just for fun, I used the NT variable naming convention instead of Hungarian.

    If you run this program you'll get something like this:


    Pull out our handy reference to the Security Descriptor String Format to decode this.

    • "D:" - This introduces the DACL.
    • "(A;;GA;;;S-...)" - "Allow" "Generic All" access to "S-...", which happens to be me. Every user by default has full access to their own process.
    • "(A;;GA;;;SY)" - "Allow" "Generic All" access to "Local System".

    Next time, I'll teach you how to decode that S-... thing.

  • The Old New Thing

    What happens to those "To Any Soldier" care packages

    Commentator and novelist Christian Bauman recalls the excitement of receiving mail from anonymous well-wishers back home during his deployment with the U.S. Army in Somalia in the early 1990s.

    This was a fascinating listen.

    The coup, of course, was getting a letter with a snapshot or two inside. I don't know why, but the further west the return address, the more likely the envelope had a picture. And the more north, the more likely the picture was, shall we say, "revealing". Triangulate this equation, and you discover that the girls in the northwest get a real charge out of showing the troops exactly what it is they are fighting for.

    Make sure to stick to the end for the punch line. (Every good story has a punch line.)

  • The Old New Thing

    Tony Harding laces up again


    The skater you love to hate is back.

    Tonya Harding will lace up for a single game tomorrow with the Indianapolis Ice, which coincidentally happens to be "Guaranteed Fight Night". (If there's no fight, you get a free ticket to another game.)

    Ah, minor-league hockey...

    Personally, I don't think it's right when somebody benefits from having done something wrong.

  • The Old New Thing

    Why are dialog boxes initially created hidden?


    You may not have noticed it until you looked closely, but dialog boxes are actually created hidden initially, even if you specify WS_VISIBLE in the template. The reason for this is historical.

    Rewind back to the old days (we're talking Windows 1.0), graphics cards are slow and CPUs are slow and memory is slow. You can pick a menu option that displays a dialog and wait a second or two for the dialog to get loaded off the floppy disk. (Hard drives are for the rich kids.) And then you have to wait for the dialog box to paint.

    To save valuable seconds, dialog boxes are created initially hidden and all typeahead is processed while the dialog stays hidden. Only after the typeahead is finished is the dialog box finally shown. And if you typed far ahead enough and hit Enter, you might even have been able to finish the entire dialog box without it ever being shown! Now that's efficiency.

    Of course, nowadays, programs are stored on hard drives and you can't (normally) out-type a hard drive, so this optimization is largely wasted, but the behavior remains for compatibility reasons.

    Actually this behavior still serves a useful purpose: If the dialog were initially created visible, then the user would be able to see all the controls being created into it, and watch as WM_INITDIALOG ran (changing default values, hiding and showing controls, moving controls around...) This is both ugly and distracting. ("How come the box comes up checked, then suddenly unchecks itself before I can click on it?")

  • The Old New Thing

    Why do operations on "byte" result in "int"?

    (The following discussion applies equally to C/C++/C#, so I'll use C#, since I talk about it so rarely.)

    People complain that the following code elicits a warning:

    byte b = 32;
    byte c = ~b;
    // error CS0029: Cannot implicitly convert type 'int' to 'byte'

    "The result of an operation on 'byte' should be another 'byte', not an 'int'," they claim.

    Be careful what you ask for. You might not like it.

    Suppose we lived in a fantasy world where operations on 'byte' resulted in 'byte'.

    byte b = 32;
    byte c = 240;
    int i = b + c; // what is i?

    In this fantasy world, the value of i would be 16! Why? Because the two operands to the + operator are both bytes, so the sum "b+c" is computed as a byte, which results in 16 due to integer overflow. (And, as I noted earlier, integer overflow is the new security attack vector.)


    int j = -b;
    would result in j having the value 224 and not -32, for the same reason.

    Is that really what you want?

    Consider the following more subtle scenario:

    struct Results {
     byte Wins;
     byte Games;
    bool WinningAverage(Results captain, Results cocaptain)
     return (captain.Wins + cocaptain.Wins) >=
            (captain.Games + cocaptain.Games) / 2;

    In our imaginary world, this code would return incorrect results once the total number of games played exceeded 255. To fix it, you would have to insert annoying int casts.

     return ((int)captain.Wins + cocaptain.Wins) >=
            ((int)captain.Games + cocaptain.Games) / 2;
    So no matter how you slice it, you're going to have to insert annoying casts. May as well have the language err on the side of safety (forcing you to insert the casts where you know that overflow is not an issue) than to err on the side of silence (where you may not notice the missing casts until your Payroll department asks you why their books don't add up at the end of the month).
  • The Old New Thing

    Char.IsDigit() matches more than just "0" through "9"


    Warning: .NET content ahead!

    Yesterday, Brad Abrams noted that Char.IsLetter() matches more than just "A" through "Z".

    What people might not realize is that Char.IsDigit() matches more than just "0" through "9".

    Valid digits are members of the following category in UnicodeCategory: DecimalDigitNumber.

    But what exactly is a DecimalDigitNumber?

    Indicates that the character is a decimal digit; that is, in the range 0 through 9. Signified by the Unicode designation "Nd" (number, decimal digit). The value is 8.

    At this point you have to go to the Unicode Standard Committee to see exactly what qualifies as "Nd", and then you get lost in a twisty maze of specifications and documents, all different.

    So let's run an experiment.

    class Program {
      public static void Main(string[] args) {
            "\x0661\x0662\x0663", // "١٢٣"

    The characters in the string are Arabic digits, but they are still digits, as evidenced by the program output:


    Uh-oh. Do you have this bug in your parameter validation? (More examples..) If you use a pattern like @"^\d$" to validate that you receive only digits, and then later use System.Int32.Parse() to parse it, then I can hand you some Arabic digits and sit back and watch the fireworks. The Arabic digits will pass your validation expression, but when you get around to using it, boom, you throw a System.FormatException and die.

Page 403 of 431 (4,306 items) «401402403404405»