• The Old New Thing

    How do I delete bytes from the beginning of a file?


    It's easy to append bytes to the end of a file: Just open it for writing, seek to the end, and start writing. It's also easy to delete bytes from the end of a file: Seek to the point where you want the file to be truncated and call SetEndOfFile. But how do you delete bytes from the beginning of a file?

    You can't, but you sort of can, even though you can't.

    The underlying abstract model for storage of file contents is in the form of a chunk of bytes, each indexed by the file offset. The reason appending bytes and truncating bytes is so easy is that doing so doesn't alter the file offsets of any other bytes in the file. If a file has ten bytes and you append one more, the offsets of the first ten bytes stay the same. On the other hand, deleting bytes from the front or middle of a file means that all the bytes that came after the deleted bytes need to "slide down" to close up the space. And there is no "slide down" file system function.

    One reason for the absence of a "slide down" function is that disk storage is typically not byte-granular. Storage on disk is done in units known as sectors, a typical sector size being 512 bytes. And the storage for a file is allocated in units of sectors, which we'll call storage chunks for lack of a better term. For example, a 5000-byte file occupies ten sectors of storage. The first 512 bytes go in sector 0, the next 512 bytes go in sector 1, and so on, until the last 392 bytes go into sector 9, with the last 120 bytes of sector 9 lying unused. (There are exceptions to this general principle, but they are not important to the discussion, so there's no point bringing them up.)

    To append ten bytes to this file, the file system can just store them after the last byte of the existing contents. leaving 110 bytes of unused space instead of 120. Similarly, to truncate those ten bytes back off, the logical file size can be set back to 110, and the extra ten bytes are "forgotten."

    In theory, a file system could support truncating an integral number of storage chunks off the front of the file by updating its internal bookkeeping about file contents without having to move data physically around the disk. But in practice, no popular file system implements this, because, as it turns out, the demand for the feature isn't high enough to warrant the extra complexity. (Remember: Minus 100 points.)

    But what's this "you sort of can" tease? Answer: Sparse files.

    You can use an NTFS sparse file to decommit the storage for the data at the start of the file, effectively "deleting" it. What you've really done is set the bytes to logical zeroes, and if there are any whole storage chunks in that range, they can be decommitted and don't occupy any physical space on the drive. (If somebody tries to read from decommitted storage chunks, they just get zeroes.)

    For example, consider a 1MB file on a disk that uses 64KB storage chunks. If you decide to decommit the first 96KB of the file, the first storage chunk of the file will be returned to the drive's free space, and the first 32KB of the second storage chunk will be set to zero. You've effectively "deleted" the first 96KB of data off the front of the file, but the file offsets haven't changed. The byte at offset 98,304 is still at offset 98,304 and did not move to offset zero.

    Now, a minor addition to the file system would get you that magic "deletion from the front of the file": Associated with each file would be a 64-bit value representing the logical byte zero of the file. For example, after you decommitted the first 96KB of the file above, the logical byte zero would be 98,304, and all file offset calculations on the file would be biased by 98,304 to convert from logical offsets to physical offsets. For example, when you asked to see byte 10, you would actually get byte 98314.

    So why not just do this? The minus 100 points rule applies. There are a lot of details that need to be worked out.

    For example, suppose somebody has opened the file and seeked to file position 102,400. Next, you attempt to delete 98,304 bytes from the front of the file. What happens to that other file pointer? One possibility is that the file pointer offset stays at 102,400, and now it points to the byte that used to be at offset 200,704. This can result in quite a bit of confusion, especially if that file handle was being written to: The program writing to the handle issued two consecutive write operations, and the results ended up 96KB apart! You can imagine the exciting data corruption scenarios that would result from this.

    Okay, well another possibility is that the file pointer offset moves by the number of bytes you deleted from the front of the file, so the file handle that was at 102,400 now shifts to file position 4096. That preserves the consecutive read and consecutive write patterns but it completely messes up another popular pattern:

    off_t oldPos = ftell(fp);
    fseek(fp, newPos, SEEK_SET);
    ... do stuff ...
    fseek(fp, oldPos, SEEK_SET); // restore original position

    If bytes are deleted from the front of the file during the do stuff portion of the code, the attempt to restore the original position will restore the wrong original position since it didn't take the deletion into account.

    And this discussion still completely ignores the issue of file locking. If a region of the file has been locked, what happens when you delete bytes from the front of the file?

    If you really like this simulate deleting from the front of the file by decommitting bytes from the front and applying an offset to future file operations technique, you can do it yourself. Just keep track of the magic offset and apply it to all your file operations. And I suspect the fact that you can simulate the operation yourself is a major reason why the feature doesn't exist: Time and effort is better-spent adding features that applications couldn't simulate on their own.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    Microspeak: Take-away


    At Microsoft, the take-away is the essential message of a presentation or the conclusion that you are expected to draw from a situation. It is something you are expected to remember when the whole thing is over, a piece of information you take away with you as you leave the room.

    XYZ demo take away (title of a document)

    The preferred intensifier is key, and you probably see it attached to the phrase take-away more often than not. This example comes from a presentation on the results of a user study:

    Results: XYZ Tough to Use
    • ...
    • Key take-away:
      • Migration to XYZ will be difficult
      • Need to show value of using the power of DEF

    In fact, every single slide in this presentation had a bullet point at the bottom called Key take-away. (And, as you may have noticed, the heading is the singular take-away even though multiple take-aways were listed.)

    Another use of the term take-away follows in the same spirit as the "essential message" usage, but the idea of "taking away" is taken literally: A take-away is a small information card that sales and marketing people give to potential customers. Think of it as the business card for a service rather than for a person.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    What were Get/SetMessageExtraInfo ever used for?


    KJK::Hyperion asks, "Could you shed some light on Get/SetMessageExtraInfo? It's almost like nobody on earth used them, ever, and I can't get some sample code."

    Yup, that's about right. Nobody on earth (to within experimental error) ever used them.

    These functions were introduced on July 20, 1990 (I'm looking at the change history right now) at the request of what was then called the Hand-Writing Windows group, which shipped the first version of Windows for Pen Computing in 1992. The idea was that each input event from the custom pen hardware would have this extra information associated with it, and the software that converted pen input into strokes (and ultimately into gestures or characters via handwriting recognition) would use this extra information to guide the conversion process.

    Seeing as Pen Windows died a hasty death, I think it's fairly accurate to say that nobody on earth will admit to having used these functions.

    For those of you fortunate enough never to have been exposed to Pen Windows, here are some random tidbits of information.

    First, applications needed to be modified to support pen input. In particular, edit controls did not accept text input from the pen. You had to replace them with one of the following:

    • Handwriting edit control (hedit). This accepted free form handwriting input.
    • Boxed edit control (bedit). This accepted handwriting input, but you had to write one letter per box. This constraint resulted in much better character recognition.

    Both of these controls were significantly larger than the standard edit control. They needed to be, in order to give enough room for the user to write. This in turned means that you had to edit all your dialog templates and custom window layout to take into account the larger pen-aware controls.

    And just changing your controls wasn't enough. You also had to write extra code to call various character recognition functions to get the user's pen input converted and recognized.

    Here's an artist's conception of what the boxed edit control looked like:

    D o g

    That weird triangle-shaped thingie was, I believe, called the dinky. What did it do? Beats me.

    There are still vestiges of the old Pen Windows product in the GetSystemMetrics function: Check out SM_PENWINDOWS.

    (Note that the old Pen Windows product is unrelated to the current Tablet PC product, even though they both do handwriting recognition.)

    Bonus chatter: The Windows touch team saw their opportunity and commandeered the extra info (perhaps resurrecting the ghost of Pen Windows) and use the extra info to specify the source of the input.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    Watching the game of Telephone play out in five seconds


    Some time ago, I spent the Thanksgiving holiday with my nieces (ages 3 and 5 at the time), and I overheard this conversation between them.

    "Thanksgiving is over."

    Christmas is coming!

    "It's Christmas time!"

    Today is Christmas!

  • The Old New Thing

    The easy way out is to just answer the question: What is the current Explorer window looking at?


    A customer had the following question:

    We have an application which copies and pastes files. Our problem is that we want to paste the files into the folder which corresponds to the currently active Windows Explorer window. Right now, we're using SendKeys.SendWait("^(v)"), but we find this method unsatisfactory because we want to replace Explorer's default file copy engine with our own custom one. Can you provide assistance?

    (And commenter wtroost had no clue why somebody would copy a file by sending window messages to Explorer. Well here you have it.)

    The easy way out is to answer the question: You can enumerate the active Explorer windows and ask each one what folder it is viewing. There's even a script interface for it.

    The hard way out is to understand the customer's problem and see if there's a better solution. The question as phrased suggests that the customer hasn't thought through the entire problem. What if the current window is not an Explorer window, or if it's a window on a virtual folder instead of a file system folder (for example, an FTP site)? Simulating keyboard input (in this case, fake-pressing Ctrl+V) is rarely a good solution to a problem; after all, what if the hotkey for Paste changes based on the user's preferred language? Or what if the Explorer window happens to be in a state where Ctrl+V doesn't paste files into the current folder? (For example, focus might be on the Address Bar.) And the fact that they put contents onto the clipboard means that they are overwriting the previous contents of the clipboard.

    I asked for a little more information about what their application is trying to do.

    This is a file transfer application for computers which are not directly connected to each other, but which are both connected to a common third computer. From the first computer, you run the file transfer application, select some files from the transfer application's interface, and hit Copy. This transfers the files to the common third computer. Then from the second computer, you run the file transfer application and hit Paste, and the program retrieves the files from the common third computer and places them in the folder that you are currently viewing in Windows Explorer.

    Oh, the whole "get the path to the folder that Windows Explorer is viewing" is just a strange way of telling the program where to copy the files. In other words, they were using Windows Explorer as a very expensive cross-process replacement for the SHBrowseForFolder function.

    The recommendation therefore came in two parts:

    1. Instead of hijacking Explorer as a directory-picker, just call SHBrowseForFolder. You can pass the BIF_RETURNONLYFSDIRS flag, and SHBrowseForFolder will automatically filter out anything that is not a file system folder, thereby saving you the trouble of filtering them out yourself.
    2. If you really want to hijack Explorer as a directory-picker, then add a context menu command to Directory or Directory\Background called Paste from Transfer Shelf (or whatever your application calls that intermediate computer).
  • The Old New Thing

    What if two programs did this? Practical exam


    A customer who doesn't read this blog had the following question:

    The requirement for our application is that it must be on top of all other windows, even in the presence of other windows with the topmost style. In our WM_KILLFOCUS and WM_PAINT messages, we set ourselves to topmost, and then call SetWindowPos to bring ourselves to the front, and then we SetForegroundWindow ourselves just to make sure.

    The result is that our application and another application end up fighting back and forth because both applications are applying similar logic to remain on top. There are other applications that defeat our logic, and they manage to cover our application. (These other applications are third party applications we have no control over.)

    We are in the process of writing a DLL to hook all messages for all windows and [even crazier things that aren't worth elaborating on], but at the moment this causes the other hooked applciations to crash. We are also considering setting a timer and calling SetWindowPos when the timer fires, but that seems inefficient.

    The customer didn't even have to run through the thought experiment What if two programs did this?; they got to run it in real life! And the result was as predicted: The two programs fight with each other to be the one that is on top of the other. Nobody wins; everybody loses, especially the user.

  • The Old New Thing

    You can filter the Common File dialog with wildcards


    A customer reported an apparent inconsistency in the shell common file dialogs:

    The question mark appears to be treated differently from other invalid file name characters. I tried to save a file in Paint under the name ?.jpg but instead of telling me that I had an invalid character in the file name (as it does with other characters like / < and |) or navigating to a new folder (like \ or *), it appears to do nothing.

    Actually, it did do something. You just couldn't tell because the result of that something was the same as doing nothing.

    If you type a wildcard like ? or * into a common file dialog, the dialog interprets this as a request to filter the list of files to those which match the wildcard you specify. In this particular example, typing ?.jpg says "Show me all the single-character files with the .jpg extension." From the description in the original report, I gather that the customer's tests took place in an empty directory (so the filter had no effect).

    The customer responded with muted surprise.

    Hmmmmm.....interesting. I hadn't realized you can type in your own filter in dialog boxes. How long has this feature existed?

    The ability to filter the common file dialog has been around since the dialogs were introduced in Windows 3.1.

  • The Old New Thing

    But who's going to set up their own email server?


    Many many years ago, back in the days when Microsoft's email address had exclamation points, an internal tool was developed to permit Microsoft employees to view and update their Benefits information from the comfort of their very own offices. Welcome to the paperless office!

    One of my friends noticed an odd sentence in the instructions for using the tool: "Before running the program, make sure you are logged onto your email server."

    "That's strange," my friend thought. "Why does it matter that you're logged onto your email server? This tool doesn't use email."

    Since my friend happened at the time to be a tester for Microsoft's email product, he tried a little experiment. He created a brand new email server on one of his test machines and created an account on it called billg. He then signed onto that email server and then ran the tool.

    Welcome, Bill Gates. Here are your current Benefits selections...

    "Uh-oh," my friend thought. "This is a pretty bad security hole." The tool apparently performed authentication by asking your email server, "Hey, who are you logged in as?" The answer that came back was assumed to be an accurate representation of the user who is running the tool. The back-end server itself was not secured at all; it relied on the client application to do the security checks.

    My friend sent email to the vice president of Human Resources informing him of this problem. "You need to shut down this tool immediately. I have found a security hole that allows anybody to see anybody else's Benefits information."

    The response from the vice president of Human Resources was calm and reassuring. "My developers tell me that the tool is secure. Just enjoy the convenience of updating your Benefits information electronically."

    Frustrated by this, my friend decided to create another account on his test email server, namely one corresponding to the vice president of Human Resources. He then sent the vice president another email message.

    "Please reconsider your previous decision. Your base salary is $xxx and your wife's name is Yyyy. Would you like me to remind you one week before your son's tenth birthday? It's coming up next month."

    A reply was quickly received. "We're looking into this."

    Shortly thereafter, the tool was taken offline "for maintenance."

    Bonus reading: JenK shares her experience with the same incident.

  • The Old New Thing

    Consequences of using variables declared __declspec(thread)


    As a prerequisite, I am going to assume that you understand how TLS works, and in particular how __declspec(thread) variables work. There's a quite thorough treatise on the subject by Ken Johnson (better known as Skywing), who comments quite frequently on this site. The series starts here and continues for a total of 8 installments, ending here. That last page also has a table of contents so you can skip over the parts you already know to get to the parts you don't know.

    Now that you've read Ken's articles...

    No, wait I know you didn't read them and you're just skimming past it in the hopes that you will be able to fake your way through the rest of this article without having read the prerequisites. Well, okay, but don't be surprised when I get frustrated if you ask a question that is answered in the prerequisites.

    Anyway, as you learned from Part 5 of Ken's series, the __declspec(thread) model, as originally envisioned, assumed that all DLLs which use the feature would be present at process startup, so that all the _tls_index values can be computed and the total sizes of each module's TLS data can be calculated before any threads get created. (Well, okay, the initial thread already got created, but that's okay; we'll set up that thread's TLS before we execute any application code.)

    If you loaded a __declspec(thread)-dependent module dynamically, bad things happened. For one, TLS data was not set up for any pre-existing threads, since those threads were initialized before your module got loaded. Windows doesn't have a time machine where it can go back in time to when those threads were initialized and pre-reserve space for the TLS variables your new module needed. Nope, your module is just out of luck with respect to those pre-existing threads, and if it tries to use __declspec(thread) variables, it'll find that its TLS slot never got initialized, and there's no data there to access.

    Unfortunately, there's an even worse problem, which Ken quite ably elaborates on in Part 6: The _tls_index variable inside the module arrived after the train left the station. All those TLS indices were assigned at process initialization. When it loads dynamically, the _tls_index variable just sits there, and nobody bothers to initialize it, leaving it at its default value of zero. (Too bad the compiler didn't initialize it to TLS_OUT_OF_INDEXES.) As a result, the module thinks that its TLS variables are at slot zero in the TLS array, leading to what Ken characterizes as "one of the absolute worst possible kinds of problems to debug": Two modules both think they are the rightful owners of the same data, each with a different concept of what that data is supposed to be. It'd be like if there was a bug in HeapAllocate where it returned the same pointer to two separate callers. Each caller would use the memory, cheerfully believing that the values the code writes to the memory will be there when it comes back.

    What truly frightens me is that there's at least one person who considers this horrific data corruption bug a feature. webcyote calls this bug "sharing all variables between the EXE and the DLL" and complained that fixing the bug breaks programs that "depend on the old behavior". That's like saying "We found that if we use this exact pattern of memory allocations, we can trick HeapAllocate into allocating the same memory twice, so we will have our EXE allocate some memory, then perform the magic sequence of allocations, and then load the DLL, and then the DLL will call HeapAllocate to allocate some memory, and it will get the same pointer back, and now the EXE and DLL can share memory."


    Mind you, this crazy "EXE and DLL sharing thread variables" trick is extremely fragile. You have to intentionally delay loading the DLL until after process startup. (If you load it as part of an explicit dependency, then you don't trigger the bug and the DLL gets its own set of variables as intended.) And then you have to make sure that the EXE and DLL declare exactly the same variables in exactly the same order and link the OBJ files in exactly the right sequence, so that all the offsets match. Oh, and you have to make sure your DLL is loaded only into the EXE with which it is in cahoots. If you load it into any other EXE, it will start corrupting that EXE's thread variables. (Or, if the EXE doesn't use thread variables, it'll corrupt some other random DLL's thread variables.)

    If the feature had been intended to be used in this insane way, they would have been called "shared variables" instead of "thread variables". No wait, they would have been called "thread variables that sometimes end up shared under conditions outside your DLL's control."

    I wonder if Webcyote also drives a manual transmission and just slams the gear stick into position without using the clutch. Yes, you can do it if you are really careful and get everything to align just right, but if you mess up, your transmission explodes and spews parts all over the road.

    Don't abuse a bug in the loader. If you want shared variables, then create shared variables. Don't create per-thread variables and then intentionally trigger a bug that causes them to overlay each other by mistake. That's such a crazy idea that it probably never occurred to anyone that somebody would actually build a system that relies on it!

    Exercise: A customer ran into a problem with the "inadvertently sharing variables between the EXE and the DLL" bug. Here is the message from the customer liaison:

    My customer has a DLL that uses static thread local storage (__declspec(thread)), and he wants to use this DLL from his C# program. Unfortunately, he is running into the limitation when running on Windows XP that DLLs which use static thread local storage crash when they try to access their thread variables. The customer cannot modify the DLL. What do you recommend?

    Update: Commenter shf gives the most complete answer.

  • The Old New Thing

    What's the difference between the Windows directory and the System directory?


    (Windows was released on November 20, 1985, twenty-five years ago tomorrow. Happy birthday!)

    You have GetWindowsDirectory and you have GetSystemDirectory. Why do we need both? They're both read-only directories. They are both searched by LoadLibrary. They seem to be redundant. (There are other directories like GetWindowsSystemDirectory which are not relevant to the discussion.)

    Back in the old days, the distinction was important. The Windows directory was read-write, and it's where user configuration settings were kept. See for example, GetProfileInt, which reads from WIN.INI in the Windows directory, and GetPrivateProfileInt, which defaults to the Windows directory if you don't give a full path to the configuration file. This was in the days before user profiles; the Windows directory acted as your de facto user profile directory.

    Meanwhile, the bulk of the Windows files go into the System directory. Windows was designed so that it never wrote to the System directory (at least not during normal operation, outside of application install/uninstall or other system maintenance).

    This separation of duties (Windows directory: writeable, users store their stuff there; System directory: read-only) permitted a number of configurations.


    Each computer had a Windows directory and a System directory on a local drive (either a floppy disk, or if you were rich, a hard drive), and the System directory was a subdirectory of the Windows directory. This was how most people ran Windows. Even though the System directory was physically read-write on the local drive, Windows itself never wrote to it.


    Each computer had a Windows directory on a local drive, but the System directory was on a ROM-drive. As you might guess, a ROM-drive is like a RAM-drive, except it's in ROM. In the parlance of kids today, "Think of it as a permanently write-protected SSD." That's right, Windows was using SSDs over 25 years ago. ("You kids think you invented everything.") Once you burned the System directory into a ROM-drive, you didn't have to waste valuable floppy disk or hard drive space for all those files that never changed anyway.


    Each computer came with just a Windows directory, but it also had network drivers (wow, fancy, a computer that could communicate with other computers), and the AUTOEXEC.BAT file mapped a drive letter to a network share maintained by your company's IT department. That network share might be set up like this:

    M:\SYSTEMSystem directory files
    M:\WINWORDWord for Windows installed here
    M:\123Lotus 1-2-3 installed here

    All directories on that network share were read-only. Everybody in the company connected to the same share, so every computer in the company was using the same physical files for their System directory as well as their applications. If the IT department wanted to upgrade or install software, they could just kick everybody off the server (or, if they were nice, wait until everybody logged off), mount the M: volume read-write, upgrade or install the software, and then set M: back to read-only. When everybody logged back on, bingo, the new software was installed and ready to use.

    Fully network-based

    The computer boots off a ROM, a floppy disk, the local hard drive, or off the network. A drive letter is mapped to a network server, which contains both the Windows directory (a different one for each user) and the System directory. Windows then runs completely over the network. User files are stored in the Windows directory (on the server); system files are retrieved from the System directory (also on the server). This was commonly known as a diskless workstation because local drives are not used once Windows has booted. Even paging took place over the network.

    Given all the possible arrangements, you can see that there was no requirement that the System directory be a subdirectory of the Windows directory, nor was there a requirement that either the Windows or the System directory be on your boot drive.

    I suspect many (most?) of these configurations are no longer supported by Windows, but at least that's the history behind the separation of the Windows and System directories.

Page 124 of 432 (4,318 items) «122123124125126»