• The Old New Thing

    Why does my asynchronous I/O complete synchronously?

    • 36 Comments

    A customer was creating a large file and found that, even though the file was opened with FILE_FLAG_OVERLAPPED and the Write­File call was being made with an OVERLAPPED structure, the I/O was nevertheless completing synchronously.

    Knowledge Base article 156932 covers some cases in which asynchronous I/O will be converted to synchronous I/O. And in this case, it was scenario number three in that document.

    The reason the customer's asynchronous writes were completing synchronously is that all of the writes were to the end of the file. It so happens that in the current implementation of NTFS, writes which extend the length of the file always complete synchronously. (More specifically, writes which extend the valid data length are forced synchronous.)

    We saw last time that merely calling Set­End­Of­File to pre-extend the file to the final size doesn't help, because that updates the file size but not the valid data length. To avoid synchronous behavior, you need to make sure your writes do not extend the valid data length. The suggestions provided in yesterday's article apply here as well.

  • The Old New Thing

    Why does my single-byte write take forever?

    • 15 Comments

    A customer found that a single-byte write was taking several seconds, even though the write was to a file on the local hard drive that was fully spun-up. Here's the pseudocode:

    // Create a new file - returns quickly
    hFile = CreateFile(..., CREATE_NEW, ...);
    
    // make the file 1GB
    SetFilePointer(hFile, 1024*1024*1024, NULL, FILE_BEGIN);
    SetEndOfFile(hFile);
    
    // Write 1 byte into the middle of the file
    SetFilePointer(hFile, 512*1024*1024, NULL, FILE_BEGIN);
    BYTE b = 42;
    / this write call takes several seconds!
    WriteFile(hFile, &b, &nBytesWritten, NULL);
    

    The customer experimented with using asynchronous I/O, but it didn't help. The write still took a long time. Even using FILE_FLAG_NO_BUFFERING (and writing full sectors, naturally) didn't help.

    The reason is that on NTFS, extending a file reserves disk space but does not zero out the data. Instead, NTFS keeps track of the "last byte written", technically known as the valid data length, and only zeroes out up to that point. The data past the valid data length are logically zero but are not physically zero on disk. When you write to a point past the current valid data length, all the bytes between the valid data length and the start of your write need to be zeroed out before the new valid data length can be set to the end of your write operation. (You can manipulate the valid data length directly with the Set­File­Valid­Data function, but be very careful since it comes with serious security implications.)

    Two solutions were proposed to the customer.

    Option 1 is to force the file to be zeroed out immediately after setting the end of file by writing a zero byte to the end. This front-loads the cost so that it doesn't get imposed on subsequent writes at seemingly random points.

    Option 2 is to make the file sparse. Mark the file as sparse with the FSCTL_SET_SPARSE control code, and immediately after setting the end of file, use the FSCTL_SET_ZERO_DATA control code to make the entire file sparse. This logically fills the file with zeroes without committing physical disk space. Anywhere you actually write gets converted from "sparse" to "real". This does open the possibility that a later write into the middle of the file will encounter a disk-full error, so it's not a "just do this and you won't have to worry about anything" solution, and depending on how randomly you convert the file from "sparse" to "real", the file may end up more fragmented than it would have been if you had "kept it real" the whole time.

  • The Old New Thing

    Why do Windows functions all begin with a pointless MOV EDI, EDI instruction?

    • 37 Comments

    If you look at the disassembly of functions inside Windows DLLs, you'll find that they begin with the seemingly pointless instruction MOV EDI, EDI. This instruction copies a register to itself and updates no flags; it is completely meaningless. So why is it there?

    It's a hot-patch point.

    The MOV EDI, EDI instruction is a two-byte NOP, which is just enough space to patch in a jump instruction so that the function can be updated on the fly. The intention is that the MOV EDI, EDI instruction will be replaced with a two-byte JMP $-5 instruction to redirect control to five bytes of patch space that comes immediately before the start of the function. Five bytes is enough for a full jump instruction, which can send control to the replacement function installed somewhere else in the address space.

    Although the five bytes of patch space before the start of the function consists of five one-byte NOP instructions, the function entry point uses a single two-byte NOP.

    Why not use Detours to hot-patch the function, then you don't need any patch space at all.

    The problem with Detouring a function during live execution is that you can never be sure that at the moment you are patching in the Detour, another thread isn't in the middle of executing an instruction that overlaps the first five bytes of the function. (And you have to alter the code generation so that no instruction starting at offsets 1 through 4 of the function is ever the target of a jump.) You could work around this by suspending all the threads while you're patching, but that still won't stop somebody from doing a CreateRemoteThread after you thought you had successfully suspended all the threads.

    Why not just use two NOP instructions at the entry point?

    Well, because a NOP instruction consumes one clock cycle and one pipe, so two of them would consume two clock cycles and two pipes. (The instructions will likely be paired, one in each pipe, so the combined execution will take one clock cycle.) On the other hand, the MOV EDI, EDI instruction consumes one clock cycle and one pipe. (In practice, the instruction will occupy one pipe, leaving the other available to execute another instruction in parallel. You might say that the instruction executes in half a cycle.) However you calculate it, the MOV EDI, EDI instruction executes in half the time of two NOP instructions.

    On the other hand, the five NOPs inserted before the start of the function are never executed, so it doesn't matter what you use to pad them. It could've been five garbage bytes for all anybody cares.

    But much more important than cycle-counting is that the use of a two-byte NOP avoids the Detours problem: If the code had used two single-byte NOP instructions, then there is the risk that you will install your patch just as a thread has finished executing the first single-byte NOP and is about to begin executing the second single-byte NOP, resulting in the thread treating the second half of your JMP $-5 as the start of a new instruction.

    There's a lot of patching machinery going on that most people don't even realize. Maybe at some point, I'll get around to writing about how the operating system manages patches for software that isn't installed yet, so that when you do install the software, the patch is already there, thereby closing the vulnerability window between installing the software and downloading the patches.

  • The Old New Thing

    Random notes from //build/ 2011

    • 11 Comments

    Here are some random notes from //build/ 2011, information of no consequence whatesoever.

    • A game we played while walking to and from the convention center was spot the geek. "Hey, there's a guy walking down the street. He's wearing a collared shirt and khakis, with a black bag over his shoulder, staring into his phone. I call geek."
    • One of the stores on Harbor Boulevard has the direct-and-to-the-point name Brand Name Mart, or as it was known at night (due to burnt-out lights) Bra d N    Mart.
    • In the room where the prototype devices were being handed out to attendees, the boxes were stacked in groups. Each group consisted of 512 devices. Why 512? Because the boxes were stack 8 across, 8 high, and 8 wide. Somebody was being way too cute.
    • Nearly all the machines were handed out in the first 55 minutes of availability. During that time, they were distributed at a rate of one machine per second. Kudos to the event staff for managing the enormous onslaught! Also, kudos to my colleagues who flew down a week early for the thankless task of preparing 5,000 computers to be handed out!
    • In the way, way back of the Expo room were a bunch of makeshift private meeting rooms for the various vendors. As you can see from the picture, it was a depressing hallway of what looked like sensory deprivation chambers or interrogation rooms from 1984. All that was missing was the screaming. Upon seeing the photo, one of my friends remarked, "Mental institutions look more cheerful than this," and she should know: She's a professional nurse.
    • The setup at our demo booth consisted of a table with a touch monitor, with the image duplicated onto a wall-mounted display for better visibility. More than once, somebody would walk up to the wall-mounted display and try touching it. The game I played was to surreptitiously manipulate the touch monitor to match what the person was doing on the wall-mounted display, and see how long before they figure out that somebody was messing with them. (It didn't take long.)
    • Two of my colleagues played an even more elaborate trick. One of them stood about ten feet from the wall-mounted display and waved his arms as if he were using a Kinect. The other matched his colleague's actions on the touch monitor. So if you see a media report about seeing a Kinect-enabled Windows 8 machine at the //build/ conference, you'll know that they were pranked.
    • John Sheehan stopped by our booth, and around his neck were so many access passes he could've played solitaire. Security was tight, as you might expect, and any time he needed to go backstage, the security guard would ask to see his pass. "I'd just hold up all of them, saying 'Go ahead, pick one. Whatever pass you're looking for, it's in here somewhere.'"
    • One of my colleagues stopped by our booth, and I made some remark about the backstage passes around her neck. She replied, "You so don't want a backstage pass. Because if you have one, it means that you will be working on three hours' sleep for days on end."
    • Instead of "Hello, world," I think Aleš should have acknowledged that the programming landscape have changed, and the standard first application to write for a new platform is now a fart app. Wouldn't that have been an awesome app to have written on stage at the keynote?
    • You may have noticed that everybody was wearing a white or green T-shirt under their //build/ conference uniform. When we arrived, each staff member was issued two uniform shirts, plus four undershirts. And for people who didn't understand what that meant, there were instructions to wear a different undershirt each day. (The engineer would optimize the solution to two uniform shirts and only two undershirts, with instructions to wear undershirts the first two days and skip them on the last two days.)
    • Ever since PDC sessions started being put online, attending sessions has tended to take a back seat to business networking as a primary goal for coming to the conference, since you can always catch up on sessions later. As a result, the Expo floor tended to remain busy even when breakout sessions were taking place. Also, the last day of the conference tended to be a bit dead, with a lot of people leaving early, and the remaining people just taking it easy. But this year was different: People actually went to the breakout sessions! And despite being held on the final day of the conference, Matt Merry's session was not only well-attended, it overflowed.

  • The Old New Thing

    Microspeak: The bug farm

    • 9 Comments

    In its most general sense, the term bug farm refers to something that is a rich source of bugs.

    It is typically applied to code which is nearly unmaintainable. Code can arrive in this state through a variety of means.

    • Poor initial design.
    • An initial design that has been pushed far beyond its original specification (resulting in features built on top of other features in weird ways).
    • Overwhelming compatibility constraints such that the tiniest perturbation is highly likely to cause some application somewhere to stop working.
    • Responsibility for the code residing in people whom we shall euphemistically describe as "failing to meet your personal standards of code quality."

    The term is most often used as a cautionary term, calling attention to areas where there is high risk that code you're about to write is going to result in a bug farm.

    Aren't we setting ourselves up for a bug farm?
    This could easily lead to a bug farm from different lifetimes for this various state objects.

    The term is quite popular at Microsoft (pre-emptive snarky comment: because Microsoft software is all one giant bug farm). Here are some citations just from blogs.msdn.com:

    Layout runs under disable processing. The reason we did that is because, well, reentrant layout is a bug farm.
    A lot of testers suddenly realized that case sensitivity is a veritable bug farm on a project that thinks it is ready to go, but has not yet tried it.
    That type of implicit vs. explicit inference also turned out to be a bug farm.
    Did you forget to handle an entire set of test cases? Is the features implementation overly complex and going to be a bug farm?
  • The Old New Thing

    The clipboard viewer linked list is no longer the responsibility of applications to maintain, unless they want to

    • 20 Comments

    Commenter Nice Clipboard Manager (with drop->clipboard) wonders why Windows still uses a linked list to inform programs about clipboard modifications. If any clipboard viewer fails to maintain the chain, then some windows won't get informed of the change, and if a clipboard viewer creates a loop in the chain, an infinite loop results.

    Well, sure, that's what happens if you use the old clipboard viewer chain. So don't use it. The old clipboard viewer chain remains for backward compatibility, but it's hardly the best way to monitor the clipboard. (This is another example of people asking for a feature that already exists.)

    Instead of using the clipboard viewer chain, just add yourself as a clipboard format listener via AddClipboardFormatListener. Once you've done that, the system will post you a WM_CLIPBOARDUPDATE message when the contents of the clipboard have changed, and you can respond accordingly. When you're done, call RemoveClipboardFormatListener.

    By using the clipboard format listener model, you let Windows worry about keeping track of all the people who are monitoring the clipboard, as Clipboarder Gadget suggested. (Mind you, Windows doesn't go so far as making each clipboard viewer think that it's the only viewer in the chain, because there may be applications which break the chain on purpose. Changing the chain behavior will break compatibility with those applications.)

    Let's turn our scratch program into a clipboard format listener.

    void
    SniffClipboardContents(HWND hwnd)
    {
     SetWindowText(hwnd, IsClipboardFormatAvailable(CF_TEXT)
                 ? TEXT("Has text") : TEXT("No text"));
    }
    
    BOOL
    OnCreate(HWND hwnd, LPCREATESTRUCT lpcs)
    {
     SniffClipboardContents(hwnd); // set initial title
     return AddClipboardFormatListener(hwnd);
    }
    
    void
    OnDestroy(HWND hwnd)
    {
     RemoveClipboardFormatListener(hwnd);
     PostQuitMessage(0);
    }
    
    ... add to window procedure ...
    
     case WM_CLIPBOARDUPDATE: SniffClipboardContents(hwnd); break;
    

    And that's it. Much, much simpler than writing a clipboard viewer, and much more robust since you aren't dependent on other applications not screwing up.

    There's another alternative to registering a clipboard listener and that's using the clipboard sequence number. The window manager increments the clipboard sequence number each time the contents of the clipboard change. You can compare the sequence number from two points in time to determine whether the contents of the clipboard have changed while you weren't looking.

    Now you have a choice. Do you use the notification method (clipboard format listener) or the polling method (clipboard sequence number)? The notification method is recommended if you want to do something as soon as the clipboard contents change. On the other hand, the polling method is more suitable if you perform calculations based on the clipboard contents and cache the results, and then later you want to verify that your cached results are still valid.

    For example, suppose you have a program with a Paste function, and pasting from the clipboard involves creating a complex data structure based on the clipboard contents. The user clicks Paste, you create your complex data structure, and insert it into the document. Your research discovers that a common operation is pasting the same contents several times. To optimize this, you want to cache the complex data structure so that if the user clicks Paste five times in a row, you only have to build the complex data structure the first time and you can just re-use it the other four times.

    void DocumentWindow::OnPaste()
    {
     if (m_CachedClipboardData == NULL ||
         GetClipboardSequenceNumber() != m_SequenceNumberInCache) {
      delete m_CachedClipboardData;
      m_SequenceNumberInCache = GetClipboardSequenceNumber();
      m_CachedClipboardData = CreateComplexDataFromClipboard();
     }
     if (m_CachedClipboardData) Paste(m_CachedClipboardData);
    }
    

    When the OnPaste method is called, we see if we have clipboard data cached from last time. If not, then clearly we need to create our complex data structure from the clipboard. If we do have clipboard data in our cache, we see if the clipboard sequence number has changed. If so, then the cached data is no longer valid and we have to throw it away and create it from scratch. But if we have cached data and the sequence number hasn't changed, then the cache is still valid and we can avoid calling CreateComplexDataFromClipboard.

    The old clipboard viewer is like DDE: please feel free to stop using it.

  • The Old New Thing

    Why can't I PostMessage the WM_COPYDATA message, but I can SendMessageTimeout it with a tiny timeout?

    • 3 Comments

    After receiving the explanation of what happens to a sent message when Send­Message­Timeout reaches its timeout, a customer found that the explanation raised another question: If the window manager waits until the receiving thread finishes processing the message, then why can't you post a WM_COPY­DATA message? "After all, Send­Message­Timeout with a very short timeout isn't all that different from Post­Message."

    Actually, Send­Message­Timeout with a very short timeout is completely different from Post­Message.

    Let's set aside the one crucial difference that, unlike messages posted by Post­Message, which cannot be recalled, the Send­Message­Timeout function will cancel the message entirely if the receiving thread does not process messages quickly enough.

    Recall that messages posted to a queue via Post­Message are retrieved by the Get­Message function and placed in a MSG structure. Once that's done, the window manager disavows any knowledge of the message. It did its job: It placed the message in the message queue and produced it when the thread requested the next message in the queue. What the program does with the message is completely up in the air. There's no metaphysical requirement that the message be dispatched to its intended recipient. (In fact, you already know of a common case where messages are "stolen" from their intended recipients: Dialog boxes.)

    In principle, the message pump could do anything it wants to the message. Dispatch it immediately, steal the message, throw the message away, eat the message and post a different message, even save the message in its pocket for a rainy day.

    By contrast, there's nothing you can do to redirect inbound non-queued messages. They always go directly to the window procedure.

    The important difference from the standpoint of messages like WM_COPY­DATA is that with sent messages, the window manager knows when message processing is complete: When the window procedure returns. At that time, it can free the temporary buffers used to marshal the message from the sender to the recipient. If the message were posted, the window manager would never be sure.

    Suppose the message is placed in a MSG structure as the result of a call to GetMessage. Now the window manager knows that the receiving thread has the potential for taking action on the message and the buffers need to be valid. But how would it know when the buffers can be freed? "Well you can wait until the exact same parameters get passed in a MSG structure to the Dispatch­Message function." But what if the message loop discards the message? Or what if it decides to dispatch it twice? Or what if it decides to smuggle it inside another message?

    Posted messages have no guarantee of delivery nor do they provide any information as to when the message has been definitely processed, or even if it has been processed at all. If the window manager let you post a WM_COPY­DATA message, it would have to use its psychic powers to know when the memory can be freed.

  • The Old New Thing

    Some preliminary notes from //build/ 2011

    • 29 Comments

    Hey everybody, I'm down at the //build/ conference. (The extra slash is to keep the d from falling over.) I'm not speaking this year, but you can find me in the Apps area of the Expo room today until 3:30pm (except lunchtime), and Friday morning before lunch. I'll also be at Ask the Experts tonight.

    There are so many great sessions to choose from. The one I would attend if I weren't working that time slot would be Bring apps to life with Metro style animations in HTML5. Instead, I'll probably go to Building high performance Metro style apps using HTML5. Fortunately, the sessions are being recorded, so I can catch up later.

    (At PDC 2008, I learned of a class of conference attendee known as the overflow vulture. These people decide which sessions to attend by looking for the ones that are close to filling up, on the theory that "500 people can't be wrong." These people often fail to take into account the room size. A talk in a 200-person room which fills up is not necessarily more popular than a talk in a 500-person room which doesn't.)

    Here are my observations so far:

    • At the airport, I heard a page for "Katy Perry". Normally, my reaction would be, "Oh, that poor woman has the same name as the singer." But since I'm in Los Angeles, I have to give consideration to the possibility that it really is the singer.
    • On the ride from the airport to the hotel, I observed part of a police car chase, or at least two police cars rushing through traffic with lights on. Welcome to Los Angeles.
    • I decided to walk from my hotel to the convention center rather than taking the shuttle bus. Along the way, I spotted a bus coming down the street. The driver parked the bus in the right-hand lane (a lane which is normally used for driving), got off, and walked into the Carl's Jr. I took a peek inside, and he was at the counter ordering breakfast. I guess he figured the bus wouldn't fit in the drive-through. Welcome to Los Angeles.
    • I thought it would have been funny if Michael Anguilo had said, "And we're making these devices available to attendees for just $500. [beat] Just kidding. You're each getting one for free." Or pulled an Oprah. "Everybody, look under your chair! Ha-ha, made you look!"
    • You spend a good amount of time listening to the music that plays before the keynote begins. Imagine having that as your job. "I write music for conferences. My music is peppy, but not too much; hopeful, but with a little bit of attitude. And not so good you want to dance to it. And I have to write a dozen different versions, each one exactly fifteen seconds longer than the previous one. Oh, and it needs to segue into a higher-energy version when the speaker arrives on stage."
    • The City National Grove at Anaheim is not a city, not national, and not a grove. I do concede, however that it is in Anaheim.
    • If you look closely at the //build/ logo, you'll also notice that the second slash has partially decapitated the b. I tried reproducing the effect here, but my CSS-fu isn't powerful enough.
    • Bonus: The hotel I'm staying at is hosting a conference on hotel conference security. I wonder who provides security for that conference.
  • The Old New Thing

    What happens to a sent message when SendMessageTimeout reaches its timeout?

    • 13 Comments

    The Send­Message­Timeout function tries to send a message, but gives up if the timeout elapses. What exactly happens when the timeout elapses?

    It depends.

    The first case is if the receiving thread never received the message at all. (I.e., if during the period the sender is waiting, the receiving thread never called GetMessage, PeekMessage, or a similar message-retrieval function which dispatches inbound sent messages.) In that case, if the timeout is reached, then the entire operation is canceled; the window manager cleans up everything and makes it look as if the call to SendMessageTimeout never took place. The message is removed from the list of the thread's non-queued messages, and when it finally gets around to calling GetMessage (or whatever), the message will not be delivered.

    The second case is if the receiving thread received the message, and the message was delivered to the destination window procedure, but the receiving thread is just slow to process the message and either return from its window procedure or call Reply­Message. In that case, if the timeout is reached, then the sender is released from waiting, but the message is allowed to proceed to completion.

    Since people seem to like tables, here's a timeline showing the two cases.

    Sending thread Case 1 Case 2
    SendMessageTimeout(WM_X) called ... not responding ... ... not responding ...
    ... not responding ... ... not responding ...
    ... not responding ... GetMessage() called
    ... not responding ... WndProc(WM_X) called
    ... not responding ... WndProc(WM_X) still executing
    timeout elapses ... not responding ... WndProc(WM_X) still executing
    SendMessageTimeout(WM_X) returns ... not responding ... WndProc(WM_X) still executing
    ... not responding ... WndProc(WM_X) returns
    GetMessage() called
    (message WM_X not received)

    Notice that in case 2, the window manager has little choice but to let the window procedure continue with the message. After all, time travel has yet to be perfected, so the window manager can't go back in time and tell the younger version of itself, (Possibly with a slow-motion "Nooooooooooooo" for dramatic effect.) "No, don't give him the message; he won't finish processing it in time!"

    If you are in case 2 and the message WM_X is a system-defined message that is subject to marshaling, then the data is not unmarshaled until the window procedure returns. It would be bad to free the memory out from under a window procedure. On the other hand, if the message is a custom message, then you are still on the hook for keeping the values valid until the window procedure is done.

    But wait, how do I know when the window procedure is done? The Send­Message­Timeout function doesn't tell me! Yup, that's right. If you need to do cleanup after message processing is complete, you should use the Send­Message­Callback function, which calls you back when the receiving thread completes message processing. When the callback fires, that's when you do your cleanup.

  • The Old New Thing

    A common control for associating extensions is well overdue

    • 23 Comments

    Mark complained that a common control for associating extensions is *well* overdue.

    This is a recurring theme I see in the comments: People complaining that Windows lacks some critical feature that it in fact already has. (In the case, Windows had the feature for over two years at the time the question was asked. Maybe the SDK needs a ribbon? j/k)

    Windows Vista added the Default Programs UI as a control panel program, and it also has a programmable interface. You can use IApplication­Association­Registration to query and set default associations, and you can use IApplication­Association­Registration­UI to invoke the control panel itself on a set of associations associated with your program.

Page 123 of 457 (4,567 items) «121122123124125»