• The Old New Thing

    What was the starting point for the Panther Win32 kernel?

    • 21 Comments

    When I presented a list of cat-related code names from Windows 95, commenter dave wanted to know whether the Panther kernel was derived from the 32-bit DOS kernel or the Windows/386 kernel.

    Neither.

    Here's the table again, with some more columns of information:

    Component Code Name Based on Fate
    16-bit DOS kernel Jaguar MS-DOS 5 Morphed into Windows 95 boot loader / compatibility layer
    32-bit DOS kernel Cougar Win386 kernel Morphed into VMM32
    Win32 kernel Panther Windows NT kernel Cancelled
    User interface Stimpy Windows 3.1 user interface Became the Windows 95 user interface

    The original idea for the Jaguar and Cougar projects was to offer a 16-bit MS-DOS environment that could be "kicked up a notch" to a 32-bit protected-mode MS-DOS environment, with virtual memory and multiple virtual machines. They used the MS-DOS 5 and Win386 kernels as starting points. (Why wasn't Jaguar based on MS-DOS 6.0? For the same reason NASA didn't use the Space Shuttle to rescue the Apollo 13 astronauts.) This project as originally envisioned was cancelled, but the work was not lost. The projects took on new life as the Windows 95 boot loader / compatibility layer and as the Windows 95 virtual machine manager, respectively.

    The idea for the Panther project was to start with the existing Windows NT kernel and strip it down to run in 4MB of RAM. This project did not pan out, and it was cancelled outright. It was replaced with a Win32 kernel written from scratch with the 4MB limit in mind.

    The Stimpy project survived intact and became the Windows 95 user interface.

    I doubt the code name was the reason, but it's interesting that the ferocious cats did not carry out their original missions, but the dim-witted cat did.

  • The Old New Thing

    How to find the IP address of a hacker, according to CSI: Cyber

    • 50 Comments

    The episode of the television documentary CSI: Cyber which aired on CBS last Wednesday demonstrated an elite trick to obtaining a hacker's IP address: Extract it from the email header.

    Here's a screen shot from time code 14:35 that demonstrates the technique.

    <meta id="viewport" content="" name="viewport"></m <link href="y/images/favicon.ico" rel="shortcut ic <link href="y/styles.css?s=1382384360" type="text/ <link href="y/mail.css?s=1382384360" type="text/cs <hidden: ip: 951.27.9.840 > < echo;off;>           <!--if lte IE 8><link rel="stylesheet" type="text/ <!--if lte IE 7><link rel="stylesheet" type="text/ <link href="plugins/jqueryui/themes/larry/jquery-u <link href="plugins/jqueryui/themes/larry/ui.js?s=

    This technique is so awesome I had to share it.

  • The Old New Thing

    Why are there both TMP and TEMP environment variables, and which one is right?

    • 49 Comments

    If you snoop around your environment variables, you may notice that there are two variables that propose to specify the location of temporary files. There is one called TMP and another called TEMP. Why two? And if they disagree, then who's right?

    Rewind to 1973. The operating system common on microcomputers was CP/M. The CP/M operating system had no environment variables. That sounds like a strange place to start a discussion of environment variables, but it's actually important. Since it had no environment variables, there was consequently neither a TMP nor a TEMP environment variable. If you wanted to configure a program to specify where to put its temporary files, you needed to do some sort of program-specific configuration, like patching a byte in the executable to indicate the drive letter where temporary files should be stored.

    (My recollection is that most CP/M programs were configured via patching. At least that's how I configured them. I remember my WordStar manual coming with details about which bytes to patch to do what. There was also a few dozen bytes of patch space set aside for you to write your own subroutines, in case you needed to add custom support for your printer. I did this to add an "Is printer ready to accept another character?" function, which allowed for smoother background printing.)

    Move forward to 1981. The 8086 processor and the MS-DOS operating system arrived on the scene. The design of both the 8086 processor and the MS-DOS operating system were strongly inspired by CP/M, so much so that it was the primary design goal that it be possible to take your CP/M program written for the 8080 processor and machine-translate it into an MS-DOS program written for the 8086 processor. Mind you, the translator assumed that you didn't play any sneaky tricks like self-modifying code, jumping into the middle of an instruction, or using code as data, but if you played honest, the translator would convert your program.

    (The goal of allowing machine-translation of code written for the 8080 processor into code written for the 8086 processor helps to explain some of the quirks of the 8086 instruction set. For example, the H and L registers on the 8080 map to the BH and BL registers on the 8086, and on the 8080, the only register that you could use to access a computed address was HL. This is why of the four basic registers AX, BX, CX, and DX on the 8086, the only one that you can use to access memory is BX.)

    One of the things that MS-DOS added beyond compatibility with CP/M was environment variables. Since no existing CP/M programs used environment variables, none of the first batch of programs for MS-DOS used them either, since the first programs for MS-DOS were all ported from CP/M. Sure, you could set a TEMP or TMP environment variable, but nobody would pay attention to it.

    Over time, programs were written with MS-DOS as their primary target, and they started to realize that they could use environment variables as a way to store configuration data. In the ensuing chaos of the marketplace, two environment variables emerged as the front-runners for specifying where temporary files should go: TEMP and TMP.

    MS-DOS 2.0 introduced the ability to pipe the output of one program as the input of another. Since MS-DOS was a single-tasking operating system, this was simulated by redirecting the first program's output to a temporary file and running it to completion, then running the second program with its input redirected from that temporary file. Now all of a sudden, MS-DOS needed a location to create temporary files! For whatever reason, the authors of MS-DOS chose to use the TEMP variable to control where these temporary files were created.

    Mind you, the fact that COMMAND.COM chose to go with TEMP didn't affect the fact that other programs could use either TEMP or TMP, depending on the mood of their original author. Many programs tried to appease both sides of the conflict by checking for both, and it was up to the mood of the original author which one it checked first. For example, the old DISKCOPY and EDIT programs would look for TEMP before looking for TMP.

    Windows went through a similar exercise, but for whatever reason, the original authors of the Get­Temp­File­Name function chose to look for TMP before looking for TEMP.

    The result of all this is that the directory used for temporary files by any particular program is at the discretion of that program, Windows programs are likely to use the Get­Temp­File­Name function to create their temporary files, in which case they will prefer to use TMP.

    When you go to the Environment Variables configuration dialog, you'll still see both variables there, TMP and TEMP, still duking it out for your attention. It's like Adidas versus Puma, geek version.

  • The Old New Thing

    Why did the original code for FIND.COM use lop as a label instead of loop?

    • 4 Comments

    A few years ago, I left you with an exercise: Given the code

            mov     dx,st_length            ;length of the string arg.
            dec     dx                      ;adjust for later use
            mov     di, line_buffer
    lop:
            inc     dx
            mov     si,offset st_buffer     ;pointer to beg. of string argument
    
    comp_next_char:
            lodsb
            cmp     al,byte ptr [di]
            jnz     no_match
    
            dec     dx
            jz      a_matchk                ; no chars left: a match!
            call    next_char               ; updates di
            jc      no_match                ; end of line reached
            jmp     comp_next_char          ; loop if chars left in arg.
    

    why is the loop label called lop instead of loop?

    The answer is that calling it loop would create ambiguity with the 8086 instruction loop.

    Now, you might say (if your name is Worf), that there is no ambiguity. "Every line consists of up to four things (all optional). A label, an instruction/pseudo-instruction, operands, and comments. The label is optionally followed by a colon. If there is no label, then the line must start with whitespace."

    If those were the rules, then there would indeed be no ambiguity.

    But those aren't the rules. Leading whitespace is not mandatory. If you are so inclined, you can choose to begin your instructions all in column zero.

    mov dx,st_length
    dec dx
    mov di, line_buffer
    lop:
    inc dx
    mov si,offset st_buffer
    comp_next_char:
    lodsb
    cmp al,byte ptr [di]
    jnz no_match
    dec dx
    jz a_matchk
    call next_char
    jc no_match
    jmp comp_next_char
    

    It's not recommended, but it's legal. (I have been known to do this when hard-coding breakpoints for debugging purposes. That way, a search for /^int 3/ will find all of my breakpoints.)

    Since you can put the opcode in column zero, a line like this would be ambiguous:

    loop ret
    

    This could be parsed as "Label this line loop and execute a ret instruction." Or it could be parsed as "This is an unlabeled line, consisting of a loop instruction that jumps to the label ret."

    Label Opcode Operand
    loop ret
    – or –
    loop ret

    Disallowing instruction names as labels or macros or equates is the simplest way out of this predicament. Besides, you probably shouldn't be doing it anyway. Imagine the havoc if you did

    or equ and
    
  • The Old New Thing

    When I set the "force X off" policy to Disabled, why doesn't it force X on?

    • 18 Comments

    A customer was using one of the many "force X off" policies, but instead of using it to force X off, they were trying to use it to force X on by setting the policy to Disabled. For example, there is a "Hide and disable all items on the desktop". The customer was setting this policy to Disabled, expecting it to force all icons visible on the desktop, removing the option on the desktop View menu to hide the icons.

    As we discussed some time ago, group policy is for modifying default behavior, and interpreting them requires you to have a degree in philosophy.

    In particular, a policy which forces X off has three states:

    • Enabled: X is forced off.
    • Disabled: X is not forced off.
    • Not configured: No opinion. Let another group policy object decide.

    Disabling a policy means "Return to default behavior", and the default behavior in many cases is that the user can decide whether they want X or not by selecting the appropriate option. In philosophical terms, "Not forced off" is not the same as "Forced on."

    If you want to force X on, then you have to look for a policy that says "Force X on." (And if there isn't one, then forcing X on is not something currently supported by group policy.)

  • The Old New Thing

    In the Seattle area, people treat you nicer once they learn your last name is Gates

    • 11 Comments

    One of the team members during the days of Windows 95 happens to have the last name Gates, no relation.

    She says that it's surprising how nice salespeople are to you once they learn what your last name is.

    Go figure.

  • The Old New Thing

    After I encrypt memory with CryptProtectMemory, can I move it around?

    • 23 Comments

    A customer had a question about the the Crypt­Protect­Memory function. After using it to encrypt a memory block, can the memory block be moved to another location and decrypted there? Or does the memory block have to be decrypted at the same location it was encrypted?

    The answer is that the memory does not need to be decrypted at the same memory address at which it was encrypted. The address of the memory block is not used as part of the encryption key. You can copy or move the memory around, and as long as you don't tamper with the bytes, and you perform the decryption within the scope you specified, then it will decrypt.

    That the buffer can be moved around in memory is obvious if the scope was specified as CRYPT­PROTECT­MEMORY_CROSS_PROCESS or CRYPT­PROTECT­MEMORY_SAME_LOGON, because those scopes encompass more than one process, so the memory will naturally have a different address in each process. The non-obvious part is that it also holds true for CRYPT­PROTECT­MEMORY_SAME_PROCESS.

    You can also decrypt the buffer multiple times. This is handy if you need to use the decrypted contents more than once, or if you want to hand out the encrypted contents to multiple clients, and leave each client to delay decrypting the data until immediately before they need it. (And then either re-encrypting or simply destroying the data after it is no longer needed in plaintext form.)

    Today's Little Program demonstrates the ability to move encrypted data and to decrypt it more than once.

    #include <windows.h> #include <wincrypt.h> #include <stdio.h> // horrors! mixing C and C++! union MessageBuffer { DWORD secret; char buffer[CRYPTPROTECTMEMORY_BLOCK_SIZE]; }; static_assert(sizeof(DWORD) <= CRYPTPROTECTMEMORY_BLOCK_SIZE, "Need a bigger buffer"); int __cdecl main(int, char **) { MessageBuffer message; // Generate a secret message into the buffer. message.secret = GetTickCount(); printf("Shhh... the secret message is %u\n", message.secret); // Now encrypt the buffer. CryptProtectMemory(message.buffer, sizeof(message.buffer), CRYPTPROTECTMEMORY_SAME_PROCESS); printf("You can't see it now: %u\n", message.secret); // Copy the buffer to a new location in memory. MessageBuffer copiedMessage; CopyMemory(copiedMessage.buffer, message.buffer, sizeof(copiedMessage.buffer)); // Decrypt the copy (at a different address). CryptUnprotectMemory(copiedMessage.buffer, sizeof(copiedMessage.buffer), CRYPTPROTECTMEMORY_SAME_PROCESS); printf("Was the secret message %u?\n", copiedMessage.secret); SecureZeroMemory(copiedMessage.buffer, sizeof(copiedMessage.buffer)); // Do it again! CopyMemory(copiedMessage.buffer, message.buffer, sizeof(copiedMessage.buffer)); // Just to show that the original buffer is not needed, // let's destroy it. SecureZeroMemory(message.buffer, sizeof(message.buffer)); // Decrypt the copy a second time. CryptUnprotectMemory(copiedMessage.buffer, sizeof(copiedMessage.buffer), CRYPTPROTECTMEMORY_SAME_PROCESS); printf("Was the secret message %u?\n", copiedMessage.secret); SecureZeroMemory(copiedMessage.buffer, sizeof(copiedMessage.buffer)); return 0; }

    Bonus chatter: The enumeration values for the encryption scope are rather confusingly named and numbered. I would have called them

    Old name Old value New name New value
    CRYPT­PROTECT­MEMORY_SAME_PROCESS 0 CRYPT­PROTECT­MEMORY_SAME_PROCESS 0
    CRYPT­PROTECT­MEMORY_SAME_LOGON 2 CRYPT­PROTECT­MEMORY_SAME_LOGON 1
    CRYPT­PROTECT­MEMORY_CROSS_PROCESS 1 CRYPT­PROTECT­MEMORY_SAME_MACHINE 2

    I would have changed the name of the last flag to CRYPT­PROTECT­MEMORY_SAME_MACHINE for two reasons. First, the old name CRYPT­PROTECT­MEMORY_CROSS_PROCESS implies that the memory must travel to another process; i.e., that if you encrypt with cross-process, then it must be decrypted by another process. Second, the flag name creates confusion when placed next to CRYPT­PROTECT­MEMORY_SAME_LOGON, because CRYPT­PROTECT­MEMORY_SAME_LOGON is also a cross-process scenario.

    And I would have renumbered the values so that the entries are in a logical order: Higher numbers have larger scope than lower values.

    Exercise: Propose a theory as to why the old names and values are the way they are.

  • The Old New Thing

    On the ways of creating a GUID that looks pretty

    • 39 Comments

    A customer had what at first appeared to be a question born of curiousity.

    Is it possible that a GUID can be generated with all ASCII characters for bytes? In other words, is it possible that a GUID can be generated, and then if you interpret each of the 16 bytes as an ASCII character, the result is a printable string? Let's say for the sake of argument that the printable ASCII characters are U+0020 through U+007E.

    Now, one might start studying the various GUID specifications to see whether such as GUID is legal. For example, types 1 and 2 are based on a timestamp and MAC address. An all-ASCII MAC address is legal. The locally-administered bit has value 0x02, and one you set that bit, all the other bits can be freely assigned by the network administrator. But then you might notice the Type Variant field, and the requirement that all new GUIDs must set the top bit, so that takes you out of the printable ASCII region, so bzzzzt, no all-ASCII GUID for you.

    But we've fallen into the trap of answering the question instead of solving the problem.

    What is the problem that you're trying to solve, where you are wondering about all-ASCII GUIDs?

    We want to create some sentinel values in our database, and we figured we could use some all-ASCII GUIDs for convenience.

    If you want a sentinel value that is guaranteed to be unique, why not create a GUID?

    C:\> uuidgen
    GUID_SpecialSentinel = {that GUID}
    

    Now you are guaranteed that the value is unique and will never collide with any other valid GUID.

    We could do that, but we figured it'd be handy if those sentinel values spelled out something so they'd be easier to spot in a dump file. If we know that all-ASCII GUIDs are not valid, then we can use all-ASCII GUIDs for our sentinel values.

    Now, while uuidgen does produce valid GUIDs, it's also the case that those valid GUIDs aren't particularly easy to remember, nor do they exactly trip off the tongue. After all, the space of values that are easy to pronounce and remember is much, much smaller than 2¹²⁸. It's probably more on the order of 2²⁰, which is not enough bits to ensure global uniqueness. Heck, it's not even enough bits to describe all the pixels on your screen!

    So w00t! Since all-ASCII GUIDs are not generatable under the current specification for GUIDs, I can go ahead and name my GUID {6d796152-6e6f-4364-6865-6e526f636b73} which shows up in a memory dump as

    52 61 79 6d 6f 6e 64 43-68 65 6e 52 6f 63 6b 73  RaymondChenRocks
    

    I am so awesome!

    But even if you convince yourself that no current GUID generation algorithm could create a GUID that collides with your special easy-to-remember and quick-to-pronounce sentinel GUIDs, there is no guarantee that you will make a particularly unique choice of sentinel value.

    This is also known as What if two people did this?

    There are many people named Raymond Chen in the world. Heck, there are many people named Raymond Chen at Microsoft. (We get each other's mail sometimes.) What if somebody else named Raymond Chen gets this same clever idea and creates their own sentinel value called RaymondChenRocks? Everything works great until my database starts interoperating with the other Raymond Chen's database, and now we have a GUID collision.

    Now, the most common way to create a duplicate GUID is to duplicate it. But here, we created a duplicate GUID because the thing we created was not generated via a duplicate-avoidance algorithm. If the algorithm wasn't designed to avoid duplicates, then it's not too surprising that there may be duplicates. I just pulled this GUID out of my butt. (Mind you, my butt rocks.)

    Okay, so let's go back to the original problem so we can solve it.

    The most straightforward solution is simply to create a standard GUID each time you need a new sentinel value. "Oh, I need a GUID to represent an item which has been discontinued. Let me run uuidgen and hey look, there's a new GUID. I will call it GUID_Discontinued." This solves the uniqueness problem, and it is very simple to explain and prove correct. This is what people end up doing the vast majority of the time, and it's what I recommend.

    Okay, you want to have the property that these special GUIDs can be easily spotted in crash dumps. One way to do this is to extract the MAC address from a network card, then destroy the card. You can now use the 60 bits of the timestamp fields to encode your ASCII message.

    A related problem is that you want to generate a GUID based on some other identifying information, with the properties that

    • Two items with the same identifying information should have the same GUID.
    • Two items with different identifying information should have different GUIDs.
    • None of these GUIDs should collide with GUIDs generated by any other means.

    For that, you can use a name-based GUID generation algorithm.

  • The Old New Thing

    If more than one object causes a WaitForMultipleObjects to return, how do I find out about the other ones?

    • 23 Comments

    There is often confusion about the case where you call Wait­For­Multiple­Objects (or its variants), passing bWaitAll = FALSE, and more than one object satisfies the wait.

    When bWaitAll is FALSE, this function checks the handles in the array in order starting with index 0, until one of the objects is signaled. If multiple objects become signaled, the function returns the index of the first handle in the array whose object was signaled.

    The function modifies the state of some types of synchronization objects. Modification occurs only for the object or objects whose signaled state caused the function to return.

    The worry here is that you pass bWaitAll = FALSE, and two objects satisfy the wait, and they are both (say) auto-reset events.

    The first paragraph says that the return value tells you about the event with the lowest index. The second paragraph says that the function modifies the state of the "object or objects" whose signaled state caused the function to return.

    The "or objects" part is what scares people. If two objects caused the function to return, and I am told only about one of them, how do I learn about the other one?

    The answer is, "Don't worry; it can't happen."

    If you pass bWaitAll = FALSE, then only one object can cause the function to return. If two objects become signaled, then the function declares that the lowest-index one is the one that caused the function to return; the higher-index ones are considered not have have caused the function to return.

    In the case of the two auto-reset events: If both events are set, one of them will be chosen as the one that satisfies the wait (the lower-index one), and it will be reset. The higher-index one remains unchanged.

    The confusion stems from the phrase "object or objects", causing people to worry about the case where bWaitAll = FALSE and there are multiple objects which cause the function to return.

    The missing detail is that when you pass bWaitAll = FALSE, then at most one object can cause the function to return. ("At most", because if the operation times out, then no object caused the function to return.)

    The presence of the phrase "or objects" is to cover the case where you pass bWaitAll = TRUE.

  • The Old New Thing

    Why is the System.DateCreated property off by a few seconds?

    • 38 Comments

    If you ask for the System.Date­Created property of a shell item, the timestamp that comes back is up to two seconds different from the file's actual timestamp. (Similarly for System.Date­Modified and System.Date­Accessed.) Why is that?

    This is an artifact of a decision taken in 1993.

    In general, shell namespace providers cache information in the ID list at the time the ID list is created so that querying basic properties from an item can be done without accessing the underlying medium.

    In 1993, saving 4KB of memory had a measurable impact on system performance. Therefore, bytes were scrimped and saved, and one place where four whole bytes were squeezed out was in the encoding of file timestamps in ID lists. Instead of using the 8-byte FILE­TIME structure, the shell used the 4-byte DOS date-time format. Since the shell created thousands of ID lists, a four-byte savings multiplied over thousands of items comes out to several kilobytes of data.

    But one of the limitations of the DOS date-time format is that it records time in two-second increments, so any timestamp recorded in DOS date-time format can be up to two seconds away from its actual value. (The value is always truncated rather than rounded in order to avoid problems with timestamps from the future.) Since Windows 95 used FAT as its native file system, and FAT uses the DOS date-time format, this rounding never created any problems in practice, since all the file timestamps were already pre-truncated to 2-second intervals.

    Of course, Windows NT uses NTFS as the native file system, and NTFS records file times to 100-nanosecond precision. (Though the accuracy is significantly less.) But too late. The ID list format has already been decided, and since ID lists can be saved to a file and transported to another computer (e.g. in the form of a shortcut file), the binary format cannot be tampered with. Hooray for compatibility.

    Bonus chatter: In theory, the ID list format could be extended in a backward-compatible way, so that every ID list contained two timestamps, a compatible version (2-second precision) and a new version (100-nanosecond precision). So far, there has not been significant demand for more accurate timestamps inside of ID lists.

Page 4 of 450 (4,499 items) «23456»