• The Old New Thing

    Why does the Directory.GetFiles method sometimes ignore *.html files when I ask for *.htm?

    • 68 Comments

    The documentation for the Directory.Get­Files method says

    When using the asterisk wildcard character in a search­Pattern, such as "*.txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A search­Pattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the search­Pattern. A search­Pattern with a file extension of one, two, or more than three characters returns only files having extensions of exactly that length that match the file extension specified in the search­Pattern. When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, while a search pattern of "file*.txt" returns both files.

    A customer reported that one of their programs stopped working, and they traced the problem to the fact that a search for *.htm on some machines was no longer return files like awesome.html, contrary to the documentation. What's going on?

    What's going on is that the documentation is trying too hard to explain an observed behavior. (My guess is that some other customer reported the behavior, and the documentation team incorporated the customer's observations into the documentation without really thinking it through.)

    The real issue is that the Get­Files method matches against both short file names and long file names. If a long file name has an extension that is longer than three characters, the extension is truncated to form the short file name. And it is that short file name that gets matched by *.htm or *.txt.

    Even as originally written, in the presence of short file names, the documentation is wrong, because it would imply that a search for reallylong*.txt could match reallylong_filename.txtother. But try it: It doesn't. That's because the short name is probably REALLY~1.TXT, and that doesn't match reallylong*.txt.

    What happened is that short file name generation was disabled on the drive at the time the files were created, so there was no short file name available, so there was consequently no SHORTN~1.HTM file to match against.

    The documentation should really say something more like this:

    Because this method checks against file names with both the 8.3 file name format (if available) and the long file name format, a search pattern like "*.txt" may return unexpected results. For example, the file longfilename.txtother may be returned if the short file name for the file is LONGFI~1.TXT.

    Update: It looks like the documentation has added my alternate remarks, but they kept the original misleading remarks as well, so now it's double-confusing. And to make things even more confusing, the original misleading remark has been made even more misleading in the part where it talks about question marks overriding the three-character rule. This is another failed attempt to explain observed behavior. If you search for "file?.txt", it will not match "file1.txtother". But the reason is not that the question mark overrides the three-character rule. The reason is that the short name for "file1.txtother" is "FILE1~1.TXT", and the question mark matches only one character.

  • The Old New Thing

    Operations jargon: Internet egress

    • 11 Comments

    As I've noted before, the operations team has their own jargon which superficially resembles English. Some time ago, they sent out a message with the subject A New Internet Egress Path Is Coming.

    Translation: We're changing the way computers access the Internet.

    Bonus jargon: traffic on the edge. This does not refer to traffic that is on the verge of a nervous breakdown. It merely refers to traffic that crosses the boundary between intranet and Internet.

  • The Old New Thing

    On live performances of Star Trek

    • 1 Comments

    Spock's Brain is generally considered to be the worst episode of Star Trek. That may be why in 2009 Mike Carano decided to perform it as a theatrical production. Here is the opening scene, and here's Carano talking about the show's genesis. In the second video, skip ahead to 2:40 to see more clips from the show, or go to 4:35 for the fight scene.

    Whereas Carano played the show for laughs, the folks at Atomic Arts in Portland (yes, that Portland) played it straight for their Trek in the Park series, but they still get laughs because Star Trek.

    2009 Amok Time
    2010 Space Seed
    2011 Mirror, Mirror
    2012 A Journey to Babel
    2013 The Trouble with Tribbles

    And yes, when the Enterprise is hit, everybody jerks to the left or right, even the unconscious bodies in sickbay.

    Their five-year mission complete, the Atomic Arts folks are unloading their larger set pieces, so if you always wanted a pair of sickbay beds that can pump Vulcan blood, well now you know where to go.

    But just because Trek in the Park is over doesn't mean you should give up on Star Trek live in the park yet. Seattle arts group Hello Earth Productions continues to stage Star Trek episodes in public parks under the name Outdoor Trek. Hello Earth aims for a more creative interpretation rather than trying to do a perfect impersonation of the original.

    2010 The Naked Time
    2011 This Side of Paradise
    2012 (hiatus)
    2013 Devil in the Dark (bonus Horta content)
    2014 Mirror, Mirror

    They are currently holding auditions for Mirror, Mirror.

  • The Old New Thing

    How do I disable zone markers for downloaded files, so that Explorer stops being a nag about running downloaded files and just trusts me to do the right thing?

    • 43 Comments

    My Little Program about manipulating the zone identifier for downloaded files appears to have struck a nerve with commenter Tess, who launched into some sort of diatribe about how Microsoft should stop being a busybody and warning users about opening files that they downloaded.

    You are welcome to disable the feature if it offends you so.

    In the Group Policy editor, go to User Configuration, Administrative Templates, Windows Components, Attachment Manager, and enable Do not preserve zone information in file attachments.

    For bonus points, you can set a bunch of other policies to make your computer even more dangerous. Here's a list of them. For example, if your goal is to create the most insecure deployment of Internet Explorer, you can set Inclusion list for moderate risk file types and Inclusion list for low risk file types both to *.*, and then on top of that, set Launching applications and unsafe files to Enabled (not secure) so that Internet Explorer never warns you about running anything.

    Welcome to 1995. Enjoy your stay.

  • The Old New Thing

    Why Johnny can't read music

    • 9 Comments

    In the book He Bear, She Bear, the musical instrument identified as a tuba is clearly a sousaphone.

    (For those who are wondering what the title has to do with the topic of musical instrument identification: It's a reference to the classic book Why Johnny Can't Read.)

  • The Old New Thing

    Programmatically uploading a file to an FTP site

    • 24 Comments

    Today's Little Program uploads a file to an FTP site in binary mode with the assistance of the Wininet library. This program has sat in my bag of tools for years.

    #define STRICT
    #define UNICODE
    #include <windows.h>
    #include <wininet.h>
    #include <shellapi.h>
    
    int __cdecl wmain(int argc, PWSTR argv[])
    {
     if (argc == 6) {
      HINTERNET hintRoot = InternetOpen(TEXT("ftpput/1.0"),
                INTERNET_OPEN_TYPE_DIRECT,
                NULL, NULL, 0);
      if (hintRoot) {
       HINTERNET hintFtp = InternetConnect(hintRoot,
                argv[1],
                INTERNET_DEFAULT_FTP_PORT,
                argv[2],
                argv[3],
                INTERNET_SERVICE_FTP,
                INTERNET_FLAG_PASSIVE,
                NULL);
       if (hintFtp) {
        FtpPutFile(hintFtp, argv[4], argv[5],
             FTP_TRANSFER_TYPE_BINARY,
             NULL);
    
        InternetCloseHandle(hintFtp);
       }
    
       InternetCloseHandle(hintRoot);
      }
     }
    
     return 0;
    }
    

    The program accepts five command line arguments:

    1. site (no "ftp://" in front)
    2. userid
    3. password
    4. path for the file to upload
    5. location to place the uploaded file

    For example, I might say ftpput ftp.contoso.com admin seinfeld newversion.zip subdir/newversion.zip

  • The Old New Thing

    Converting from a UTC-based SYSTEMTIME directly to a local-time-based SYSTEMTIME

    • 18 Comments

    Last year, I presented this commutative diagram

    A 2-by-2 grid of boxes. The top row is labeled FILE­TIME; the bottom row is labeled SYSTEM­TIME. The first column is labeled UTC; the second column is labeled Local. The upper left box is labeled Get­System­Time­As­File­Time. There is an outgoing arrow to the right labeled File­Time­To­Local­File­Time leading to the box in the second column labeled None. There is an outgoing arrow downward labeled File­Time­To­System­Time leading to the box in the second row, first column, labeled Get­System­Time. From the box in the upper right corner labeled None, there is an outgoing arrow downward labeled File­Time­To­System­Time leading to the box in the second row, second column, labeled Get­Local­Time.
    UTC
    Local
    FILE­TIME
    Get­System­Time­As­File­Time
    File­Time­To­Local­File­Time
    (None)
    File­Time­To­System­Time
    File­Time­To­System­Time
    SYSTEM­TIME
    Get­System­Time
    Get­Local­Time

    I claimed that there was no function to complete the commutative diagram by connecting the bottom two boxes.

    I was wrong, but I'm going to try to get off on a technicality.

    You can connect the two boxes by calling System­Time­To­Tz­Specific­Local­Time with NULL as the time zone parameter, which means "Use the current time zone."

    The same diagram as above, but there is a new arrow connecting Get­System­Time to Get­Local­Time labeled System­Time­To­Tz­Specific­Local­Time.
    UTC
    Local
    FILE­TIME
    Get­System­Time­As­File­Time
    File­Time­To­Local­File­Time
    (None)
    File­Time­To­System­Time
    File­Time­To­System­Time
    SYSTEM­TIME
    Get­System­Time
    System­Time­To­Tz­Specific­Local­Time
    Get­Local­Time

    This works here because the time being converted always refers to the current time.

    Here comes the technicality.

    This technique doesn't work in general because System­Time­To­Tz­Specific­Local­Time uses the time zone in effect at the time being converted, whereas the File­Time­To­Local­File­Time function uses the time zone in effect right now. Furthermore, it doesn't take into account changes in daylight savings rules that may have historically been different from the current set of rules. (Though this is easily repaired by switching to System­Time­To­Tz­Specific­Local­Time­Ex.) The trick works here because the time we are converting is right now.

    In other words, the more general diagram does not commute. Instead, it looks more like this:

    Same as before, but this time the boxes are unlabeled, and the bottom right box is split in two. The inbound arrow from the left goes to one box and the inbound arrow from the top goes to another box. The two halves of the split boxes are marked as not equal.
    UTC
    Local
    FILE­TIME
    File­Time­To­Local­File­Time
    File­Time­To­System­Time
    File­Time­To­System­Time
    SYSTEM­TIME
    System­Time­To­Tz­Specific­Local­Time­Ex

    This is why the documentation for File­Time­To­Local­File­Time tells you that if you want to get from the upper left corner to the upper right corner while accounting for daylight saving time relative to the time being converted, then you need to take the long way around.

    So what we have is not so much a commutative diagram as a something like covering space: If you start at any box and travel around the diagram, you won't necessarily end up where you started. Let's start at the upper left corner for the sake of example.

    Back to the four-box diagram, with empty boxes. The arrows follow a clockwise path. From the upper left, we go to the upper right via File­Time­To­Local­File­Time, then to the bottom right via File­Time­To­System­Time, then to the bottom left via Tz­Specific­Local­Time­To­System­Time­Ex, then back to the upper left via Local­File­Time­To­File­Time.
    UTC
    Local
    FILE­TIME
    File­Time­To­Local­File­Time
    System­Time­To­File­Time
    File­Time­To­System­Time
    SYSTEM­TIME
    Tz­Specific­Local­Time­To­System­Time

    When you return to the upper left box, you might end up somewhere else, probably an hour ahead of or behind where you started. Each time you take a trip around the diagram, you drift another hour further away. Well, until you hit another daylight saving time changeover point.

  • The Old New Thing

    We're currently using FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH, but we would like our WriteFile to go even faster

    • 29 Comments

    A customer said that their program's I/O pattern is to open a file and then every so often write about 100KB of data into the file. They are currently using the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags to open a file, and they wanted to know what else they could do to make their writes go even faster.

    Um, for one thing, you stop passing those two flags!

    Those two flags in combination basically mean "Give me the slowest possible I/O performance!" because they force all I/O to go through to the physical media right away.

    Removing the FILE_FLAG_WRITE_THROUGH flag will be a big help. This allows the hardware disk cache to do its normal job of completing the I/O immediately and performing the physical I/O lazily (perhaps in an optimized order based on subsequent writes). A 100KB write is a small enough write that your I/O time on rotational media will be dominated by the seek time. It'll take five to ten milliseconds to move the head into position and only one millisecond to write out the data. You're wasting 80% or more of your time just preparing for the write.

    Much better would be to issue the I/O without the FILE_FLAG_WRITE_THROUGH flag so that the entire 100KB I/O request goes into the hard drive on-board cache. (It will fit quite easily, since the on-board cache for today's hard drives will be 8 megabytes or larger.) Your Write­File will complete immediately, and the commit to physical storage will occur while your program is busy doing computation.

    If the writes truly are sporadic (as the customer claims), the I/O buffer will be flushed out by the time the next round of application I/O begins.

    Removing the FILE_FLAG_NO_BUFFERING flag will also help, because that allows the operating system disk cache to get involved. If the application reads back from the file, the read can be satisfied from the disk cache, avoiding the physical I/O entirely.

    As a side note, the FILE_FLAG_WRITE_THROUGH flag is largely ineffective nowadays, because SATA drivers ignore the flush request. The file system doesn't know that the driver is lying to it, so it will still do all the work on the assumption that the write-through request worked, even though we know that the extra work is ultimately pointless.

    For example, NTFS will issue metadata writes with a flush to ensure that the data on the physical media is consistent. But if the driver is ignoring flush requests, all this extra work accomplishes nothing aside from wasting I/O bandwidth. Even worse, NTFS thinks that the data on the drive is physically consistent, but it isn't. The result is that a poorly-timed power outage (or device removal) can result in metadata corruption that takes a chkdsk to repair.

    Now, it may be that the customer's program is using the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags for a specific purpose unrelated to performance, so you can't just go walking in and ripping them out without understanding why they were there. But if they added the flags thinking that it would make the program run faster, then they were operating under a false assumption.

  • The Old New Thing

    Why do I have to add 1 to the color index when I set it as the hbrBackground of a window class?

    • 23 Comments

    Our scratch program sets the background color to COLOR_WINDOW by setting the class background brush as follows:

        wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 1);
    

    What's with the +1?

    Okay, first of all, let's backtrack a bit.

    The real first question is, "What's the deal with taking an integer (COLOR_WINDOW) and casting it to a HBRUSH and expecting anything sane to happen?"

    The window manager wants to provide multiple ways of setting the class background brush.

    1. The application can request that no automatic background drawing should occur at all.
    2. The application can request custom background drawing and provide that custom drawing by handling the WM_ERASE­BKGND message.
    3. The application can request that the background be a specific brush provided by the application.
    4. The application can request that the background be a specific system color.

    The first three cases are easy: If you don't want automatic background drawing, then pass the hollow brush. If you want custom background drawing, then pass NULL as the brush. And if you want background drawing with a specific brush, then pass that brush. It's the last case that is weird.

    Now, if Register­Class were being invented today, we would satisfy the last requirement by saying, "If you want the background to be a system color, then use a system color brush like this:

        wc.hbrBackground = GetSysColorBrush(COLOR_WINDOW);
    
    System color brushes match the corresponding system color, so this sets your background to whatever the current system window color is."

    But just as NASA couldn't use the Space Shuttle to rescue the Apollo 13 astronauts, the Register­Class function couldn't use Get­Sys­Color­Brush for class brushes: At the time Register­Class was designed, system color brushes had not yet been invented yet. In fact, they won't have been invented for over a decade.

    Therefore, Register­Class had to find some way of smuggling an integer inside a pointer, and the traditional way of doing this is to say that certain numerically-small pointer values are actually integers in disguise. We've seen this with the HINSTANCE returned by Shell­Execute, with the MAKE­INT­ATOM macro, with the MAKE­INT­RESOURCE/IS_INT­RESOURCE macro pair, and with the second parameter to the Get­Proc­Address function. (There are plenty of other examples.)

    The naïve solution would therefore be to say, "Well, if you want a system color to be used as the brush color, then just cast the COLOR_XXX value to an HBRUSH, and the Register­Class function will recognize it as a smuggled integer and treat it as a color code rather than an actual brush."

    And then you run into a problem: The numeric value of COLOR_SCROLL­BAR is zero. Casting this to a HBRUSH would result in a NULL pointer, but a NULL brush already means something else: Don't draw any background at all.

    To avoid this conflict, the Register­Class function artificially adds 1 to the system color number so that none of its smuggled integers will be mistaken for NULL.

  • The Old New Thing

    What order does the DIR command arrange files if no sort order is specified?

    • 28 Comments

    If you don't specify a sort order, then the DIR command lists the files in the order that the files are returned by the Find­First­File function.

    Um, okay, but that just pushes the question to the next level: What order does Find­First­File return files?

    The order in which Find­First­File returns files in unspecified. It is left to the file system driver to return the files in whatever order it finds most convenient.

    Now we're digging into implementation details.

    For example, the classic FAT file system simply returns the names in the order they appear on disk, and when a file is created, it is merely assigned the first available slot in the directory. Slots become available when files are deleted, and if no slots are available, then a new slot is created at the end.

    Modern FAT (is that an oxymoron?) with long file names is more complicated because it needs to find a sequence of contiguous entries large enough to hold the name of the file.

    There used to be (maybe there still are) some low-level disk management utilities that would go in and manually reorder your directory entries.

    The NTFS file system internally maintains directory entries in a B-tree structure, which means that the most convenient way of enumerating the directory contents is in B-tree order, which if you cover one eye and promise not to focus too closely looks approximately alphabetical for US-English. (It's not very alphabetical for most other languages, and it falls apart once you add characters with diacritics or anything outside of the Latin alphabet, and that includes spaces and digits!)

    The ISO 9660 file system (used by CD-ROMs) requires that directory entries be lexicographical sorted by ASCII code point. Pretty much everybody has abandoned the base ISO 9660 file system and uses one of its many extensions, such as Joliet or UDF, so you have that additional wrinkle to deal with.

    If you are talking to a network file system, then the file system on the other end of the network cable could be anything at all, so who knows what its rules are (if it even has rules).

    When people ask this question, it's usually in the context of a media-playing device which plays media from a CD-ROM or USB thumb drive in the raw native file order. But they don't ask this question right out; they ask some side question that they think will solve their problem, but they don't come out and say what their problem is.

    So let's solve the problem in context: If the storage medium is a CD-ROM or an NTFS-formatted USB thumb drive, then the files will be enumerated in sort-of-alphabetical order, so you can give your files names like 000 First track.mp3, 001 Next track.mp3, and so on.

    If the storage medium is a FAT-formatted USB thumb drive, then the files will be enumerated in a complex order based on the order in which files are created and deleted and the lengths of their names. But the easy way out is simply to remove all the files from a directory then move file files into the directory in the order you want them enumerated. That way, the first available slot is the one at the end of the directory, so the file entry gets appended.

    Of course, none of this behavior is contractual. NTFS would be completely within its rights to, for example, return entries in reverse alphabetical order on odd-numbered days. Therefore, you shouldn't write a program that relies on any particular order of enumeration. (Or even that the order of enumeration is consistent between two runs!)

Page 4 of 419 (4,182 items) «23456»