Holy cow, I wrote a book!
The rules were simple but led to complicated results.
MS-DOS files were eleven characters long with an implicit dot
between characters eight and nine.
Theoretically, spaces were permitted anywhere, but in practice
they could appear only at the end of the file name or
immediately before the implicit dot.
Wildcard matching was actually very simple.
The program passed an eleven-character pattern; each
position in the pattern consisted either of a file name
character (which had to match exactly) or consisted of
a question mark (which matched anything).
Consider the file "ABCD····TXT",
where I've used · to represent a space.
This file name would more traditionally be written as ABCD.TXT,
but I've written it out in its raw 11-character format
to make the matching more obvious.
Let's look at some patterns and whether they would match.
The tricky part is converting the traditional notation with dots
and asterisks into the eleven-character pattern.
The algorithm used by MS-DOS was the same one used by CP/M,
worked hard at being backwards compatible with CP/M.
(You may find some people who call this the FCB matching algorithm,
because file names were passed to and from the operating system
in a structure called a File Control Block.)
Let's parse a few patterns using this algorithm, since the results
can be surprising.
In the diagrams, I'll underline the cursor position.
First, let's look at the traditional "ABCD.TXT".
The final result is what we expected:
Let's look at a weird case: the pattern is
Sure, this was extremely boring to watch, but look at the result:
What you got was equivalent to ABCDEFGH.IJK.
The dot is optional if it comes after exactly eight characters!
Next, let's look at the troublesome A*B.TXT.
Notice that the result is
the same as you would have gotten
from the pattern A*.TXT.
Any characters other than a dot that come after an asterisk have
no effect, since the asterisk moves the cursor to position 12,
at which point nothing changes the parse state except for a dot,
which clears the last three positions and moves the cursor.
I won't work it out here, but if you stare at it for a while,
you'll also discover that
*.* is the same as * by itself.
In addition to the rules above, the MS-DOS command prompt had
some quirks in its parsing.
If you typed
DIR .TXT, the command prompt acted as if you
had typed DIR *.TXT;
it silently inserted an asterisk if the first character of the
pattern was a dot.
This behavior was probably by accident, not intentional,
but it was an accident that some people came to rely upon.
When we fixed the bug in Windows 95, more than one person
complained that their DIR .TXT command wasn't
The FCB matching algorithm was abandoned during the transition to Win32
since it didn't work with long file names.
Long file names can contain multiple dots,
and of course files can be longer than eleven characters,
and there can be more than eight characters before the dot.
But some quirks of the FCB matching algorithm persist into Win32
because they have become idiom.
As another example, a pattern that ends in a dot doesn't actually match files
which end in a dot;
it matches files with no extension.
And a question mark can match zero characters if it comes
immediately before a dot.
There may be other weird Win32 pattern matching quirks,
but those are the two that come to mind right away,
and they both exist to maintain batch file
compatibility with the old 8.3 file pattern