Holy cow, I wrote a book!
Henke37 wonders why the Marlett font was introduced. Why use a font for drawing symbols on window buttons?
Using a font was a convenient way to have scalable graphics.
It's not like Windows could've used VML or SVG since they hadn't been invented yet. EMFs would have been overkill as well. Fonts were very convenient because the technology to render scalable fonts already existed and was well-established. It's always good to build on something that has been proven, and TrueType scalable font technology proved itself very nicely in Windows 3.1. TrueType has the added benefit of supporting hinting, allowing tweaks to the glyph outlines to be made for particular pixel sizes. (A feature not available in most vector drawing languages, but also a feature very important when rendering at small font sizes.)
A customer reported that when they called GetFileAttributes on a ZIP file, the FILE_ATTRIBUTE_COMPRESSED attribute was not returned. But ZIP files are compressed. Why isn't the FILE_ATTRIBUTE_COMPRESSED attribute being set?
GetFileAttributes
FILE_ATTRIBUTE_COMPRESSED
Because FILE_ATTRIBUTE_COMPRESSED tells you whether the file was compressed by the file system. It is not a flag which describes the semantics of the bytes stored in the file. After all, the file system doesn't know that this particular collection of bytes is a ZIP file and contains data that was compressed externally. Who knows, maybe it's just some uncompressed file that just happens to look superficially like a ZIP file (but isn't)?
If a text file consists of the string "ADTUR ADKUH", is this a compressed file? Maybe it's somebody's product key, in which it isn't compressed. Or maybe it is short for "Await instructions before taking further action. Acknowledge receipt of this telegram by wire." That's an example of a commercial code, used to save telegram transmission costs by compressing frequently-used business phrases into five-letter pseudo-words.
The file system doesn't try to figure out whether a particular sequence of bytes it has been asked to store was externally compressed. It just stores the bytes on disk, perhaps after performing its own internal compression, and if that internal compression was performed (even if it didn't actually result in any compression), the FILE_ATTRIBUTE_COMPRESSED attribute is set.
Similarly, the FILE_ATTRIBUTE_ENCRYPTED attribute is set if the file contents were encrypted by the file system. If encryption took place externally, then the attribute is not set because the file system doesn't know that the byte sequence it was asked to store represented encrypted data.
FILE_ATTRIBUTE_ENCRYPTED
(Note that many special-purpose file formats, such as DOCX, JAR, JPG, and PNG, are internally compressed, even though they are not advertised as such.)
The shell team often gets questions like these from customers:
Attached please find a sample program which continuously writes data to a file. If you open the folder containing the file in Explorer, you can see that the file size is reported as zero. Even manually refreshing the Explorer window does not update the file size. Even the dir command shows the file size as zero. On the other hand, calling GetFileSize reports the correct file size. If I close the file handle, then Explorer and the dir command both report the correct file size. We can observe this behavior on Windows Server 2008 R2, but on Windows Server 2003, the file sizes are updated in both Explorer and dir. Can anybody explain what is happening?
dir
GetFileSize
We have observed that Windows gives the wrong file size for files being written. We have a log file that our service writes to, and we like to monitor the size of the file by watching it in Explorer, but the file size always reports as zero. Even the dir command reports the file size as zero. Only when we stop the service does the log file size get reported correctly. How can we get the file size reported properly?
We have a program that generates a large number of files in the current directory. When we view the directory in Explorer, we can watch the files as they are generated, but the file size of the last file is always reported as zero. Why is that?
Note that this is not even a shell issue. It's a file system issue, as evidenced by the fact that a dir command exhibits the same behavior.
Back in the days of FAT, all the file metadata was stored in the directory entry.
The designers of NTFS had to decide where to store their metadata. If they chose to do things the UNIX way, the directory entry would just be a name and a reference to the file metadata (known in UNIX-land as an inode). The problem with this approach is that every directory listing would require seeking all over the disk to collect the metadata to report for each file. This would have made NTFS slower than FAT at listing the contents of a directory, a rather embarrassing situation.
Okay, so some nonzero amount of metadata needs to go into the directory entry. But NTFS supports hard links, which complicates matters since a file with multiple hard links has multiple directory entries. If the directory entries disagree, who's to say which one is right? One way out would be try very hard to keep all the directory entries in sync and to make the chkdsk program arbitrary choose one of the directory entries as the "correct" one in the case a conflict is discovered. But this also means that if a file has a thousand hard links, then changing the file size would entail updating a thousand directory entries.
chkdsk
That's where the NTFS folks decided to draw the line.
In NTFS, file system metadata is a property not of the directory entry but rather of the file, with some of the metadata replicated into the directory entry as a tweak to improve directory enumeration performance. Functions like FindFirstFile report the directory entry, and by putting the metadata that FAT users were accustomed to getting "for free", they could avoid being slower than FAT for directory listings. The directory-enumeration functions report the last-updated metadata, which may not correspond to the actual metadata if the directory entry is stale.
FindFirstFile
The next question is where and how often this metadata replication is done; in other words, how stale is this data allowed to get? To avoid having to update a potentially unbounded number of directory entries each time a file's metadata changed, the NTFS folks decided that the replication would be performed only from the file into the directory entry that was used to open the file. This means that if a file has a thousand hard links, a change to the file size would be reflected in the directory entry that was used to open the file, but the other 999 directory entries would contain stale data.
As for how often, the answer is a little more complicated. Starting in Windows Vista (and its corresponding Windows Server version which I don't know but I'm sure you can look up, and by "you" I mean "Yuhong Bao"), the NTFS file system performs this courtesy replication when the last handle to a file object is closed. Earlier versions of NTFS replicated the data while the file was open whenever the cache was flushed, which meant that it happened every so often according to an unpredictable schedule. The result of this change is that the directory entry now gets updated less frequently, and therefore the last-updated file size is more out-of-date than it already was.
Note that even with the old behavior, the file size was still out of date (albeit not as out of date as it is now), so any correctly-written program already had to accept the possibility that the actual file size differs from the size reported by FindFirstFile. The change to suppress the "bonus courtesy updates" was made for performance reasons. Obviously, updating the directory entries results in additional I/O (and forces a disk head seek), so it's an expensive operation for relatively little benefit.
If you really need the actual file size right now, you can do what the first customer did and call GetFileSize. That function operates on the actual file and not on the directory entry, so it gets the real information and not the shadow copy. Mind you, if the file is being continuously written-to, then the value you get is already wrong the moment you receive it.
Why doesn't Explorer do the GetFileSize thing when it enumerates the contents of a directory so it always reports the accurate file size? Well, for one thing, it would be kind of presumptuous of Explorer to second-guess the file system. "Oh, gosh, maybe the file system is lying to me. Let me go and verify this information via a slower alternate mechanism." Now you've created this environment of distrust. Why stop there? Why not also verify file contents? "Okay, I read the first byte of the file and it returned 0x42, but I'm not so sure the file system isn't trying to trick me, so after reading that byte, I will open the volume in raw mode, traverse the file system data structures, and find the first byte of the file myself, and if it isn't 0x42, then somebody's gonna have some explaining to do!" If the file system wants to lie to us, then let the file system lie to us.
All this verification takes an operation that could be done in 2 + N/500 I/O operations and slows it down to 2 + N/500 + 3N operations. And you're reintroduced all the disk seeking that all the work was intended to avoid! (And if this is being done over the network, you can definitely feel a 1500× slowdown.) Congratulations, you made NTFS slower than FAT. I hope you're satisfied now.
If you were paying close attention, you'd have noticed that I wrote that the information is propagated into the directory when the last handle to the file object is closed. If you call CreateFile twice on the same file, that creates two file objects which refer to the same underlying file. You can therefore trigger the update of the directory entry from another program by simply opening the file and then closing it.
CreateFile
void UpdateFileDirectoryEntry(__in PCWSTR pszFileName) { HANDLE h = CreateFileW( pszFileName, 0, // don't require any access at all FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, // lpSecurityAttributes OPEN_EXISTING, 0, // dwFlagsAndAttributes NULL); // hTemplateFile if (h != INVALID_HANDLE_VALUE) { CloseHandle(h); } }
You can even trigger the update from the program itself. You might call a function like this every so often from the program generating the output file:
void UpdateFileDirectoryEntry(__in HANDLE hFile) { HANDLE h = ReOpenFile( hFile, 0, // don't require any access at all FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, 0); // dwFlags if (h != INVALID_HANDLE_VALUE) { CloseHandle(h); } }
If you want to update all file directory entries (rather than a specific one), you can build the loop yourself:
// functions ProcessOneName and EnumerateAllNames // incorporated by reference. void UpdateAllFileDirectoryEntries(__in PCWSTR pszFileName) { EnumerateAllNames(pszFileName, UpdateFileDirectoryEntry); }
Armed with this information, you can now give a fuller explanation of why ReadDirectoryChangesW does not report changes to a file until the handle is closed. (And why it's not a bug in ReadDirectoryChangesW.)
ReadDirectoryChangesW
Bonus chatter: Mind you, the file system could expose a flag to a FindFirstFile-like function that means "Accuracy is more important than performance; return data that is as up-to-date as possible." The NTFS folks tell me that implementing such a flag wouldn't be all that hard. The real question is whether anybody would bother to use it. (If not, then it's a bunch of work for no benefit.)
Bonus puzzle: A customer observed that whether the file size in the directory entry was being updated while the file was being written depended on what directory the file was created in. Come up with a possible explanation for this observation.
Bonus reading:
I hadn't even noticed this until somebody pointed it out: When you hover your mouse over a button in the Windows 7 taskbar which corresponds to a running application, the taskbar button lights up in a color that matches the colors in the icon itself. (And even more subtly, the lighting effect is centered on the mouse.)
This feature even has a name: Color hot-track. (Gentlemen, start your photocopiers.)
Some people ask how it's done. It's really nothing special. The code just looks for the predominant color in the icon. (And, since visual designers are sticklers for this sort of thing, black, white, and shades of gray are not considered "colors" for the purpose of this calculation.)
A customer noticed that when the user hovered over their application name in the Start menu, the infotip that pops up includes their product name:
... but no other program on the Start menu included the product name in the description:
The customer compared their shortcut with the other ones but couldn't find anything that was telling Explorer, "Include the program name in the pop-up infotip, please."
Because the reason for the name being included in the infotip had nothing to do with the properties stored in the shortcut. The reason the name was included in the infotip is that the name was being truncated in the main display.
When an infotip is about to be displayed for a listview item, the listview sends a LVN_GETINFOTIP notification with a NMLVGETINFOTIP structure. If the LVGIT_UNFOLDED flag is not set, then the infotip is being displayed for a truncated item, and the pszText is pre-filled with the full name. The program should then append its information to the existing text so that the full name is the first line of the infotip. On the other hand, if the LVGIT_UNFOLDED flag is set, then the item text is fully-visible and you should just copy your desired description text into the pszText buffer.
LVN_GETINFOTIP
NMLVGETINFOTIP
LVGIT_UNFOLDED
pszText
The customer was happy to get this information. Their designer wanted only the description to appear in the infotip, and now they know that they need to shorten the program name to make the name disappear from the infotip.
Bonus chatter: Microsoft® WinFX™ Software Development Kit for Microsoft® Pre-Release Windows Operating System Code-Named "Longhorn", Beta 1 Web Setup.
Consider the following sequence of operations, assuming that F: is a USB thumb drive with plenty of disk space.
C:\Users\Bob\Downloads> copy readme.txt F:\ 1 file(s) copied. C:\Users\Bob\Downloads> copy Update.iso F:\ The parameter is incorrect.
Why is the second file copy failing?
The hint is the file extension: *.iso, which suggests that this is a CD or DVD image, and DVD images have the feature that they tend to be really big.
Like more than 4GB big.
USB thumb drives tend to be formatted with the FAT32 file system rather than with NTFS. And FAT32 has a maximum file size of 4GB minus one byte.
The user confirmed that the Update.iso file was larger than 4GB and that the USB thumb drive was formatted as FAT32.
Update.iso
Mind you, the error message doesn't help at all in identifying that this is what's going on. I don't know where it's coming from, but my guess is that somewhere inside the copy command, it tries to create the destination file and set its file size. Since the file size is out of range for FAT32, the call fails with the error ERROR_INVALID_PARAMETER, and that's what ends up bubbling out to the user.
copy
ERROR_INVALID_PARAMETER
But at least now you know what the confusing error message is trying to tell you.
We saw last time that the unattend file lets you change some Windows configuration settings that cannot be changed after Setup is complete. But one of the things you can't change is the location of the Program Files directory. Many people wish they could relocate their Program Files directory to another drive in order to relieve disk space pressure on the system partition. Why won't Windows let them do this?
Now that NTFS is mandatory for the system volume (it took only 13 years to get there!), Windows itself can start taking advantage of NTFS features.
Windows Setup takes advantage of hard links. A large percentage of the files installed by Windows are hard-linked to copies in the C:\Windows\WinSxS directory for reasons I do not understand, but the phrase "component store" may be part of it. (This is why asking Explorer for the size of the C:\Windows directory gives a misleading view of the actual amount of disk space occupied by Windows, because Explorer uses a naive algorithm which counts each hard link as a separate file.) Oh, and in Windows 7, the two copies of Notepad are now hard links to each other.
Ah, but one of the limitations of hard links is that they cannot span volumes. Some of the hard links out of the WinSxS directory point into places like C:\Program Files\Windows NT\Accessories\wordpad.exe, and this in turn requires that the Program Files directory be on the same volume as your Windows directory.
C:\Program Files\Windows NT\Accessories\wordpad.exe
Sorry for the inconvenience.
Some Windows settings can only be established as part of the installation process. This is done with a so-called unattend file. (Remember, no matter where you put an advanced setting, somebody will tell you that you are an idiot.) In earlier versions of Windows, the unattend file took the form of an INI file, but Windows Vista hopped aboard the XML bandwagon, and the unattend file format changed to XML. The nice thing about using XML is that you can publish a schema so people can validate their unattend file without having to perform a test install (only to discover twenty minutes later that a typo resulted in an entire section of the unattend file being ignored, say).
If you spend a lot of time setting up computers, you can use an unattend file to answer all the Setup questions (like "enter your product key") so all you have to do is type "setup /unattend:myconfiguration.xml" and go out to lunch. When you come back, your machine will be installed and ready.
Here are two of the most popular unattend settings which must be set during installation. (There are a bunch of popular unattend settings for things that can also be changed post-install; for those other settings, the unattend file is not your only chance.)
Wait, the C:\Program Files directory isn't on the list of directories that can be relocated. There's a reason for that, which we'll look at next time.
C:\Program Files
On unix, you can use wc -l to count the number of lines in stdin. Windows doesn't come with wc, but there's a sneaky way to count the number of lines anyway:
wc -l
wc
some-command-that-generates-output | find /c /v ""
It is a special quirk of the find command that the null string is treated as never matching. The /v flag reverses the sense of the test, so now it matches everything. And the /c flag returns the count.
find
/v
/c
It's pretty convoluted, but it does work.
(Remember, I provide the occasional tip on batch file programming as a public service to those forced to endure it, not as an endorsement of batch file programming.)
Now come da history: Why does the find command say that a null string matches nothing? Mathematically, the null string is a substring of every string, so it should be that if you search for the null string, it matches everything. The reason dates back to the original MS-DOS version of find.exe, which according to the comments appears to have been written in 1982. And back then, pretty much all of MS-DOS was written in assembly language. (If you look at your old MS-DOS floppies, you'll find that find.exe is under 7KB in size.) Here is the relevant code, though I've done some editing to get rid of distractions like DBCS support.
find.exe
mov dx,st_length ;length of the string arg. dec dx ;adjust for later use mov di, line_buffer lop: inc dx mov si,offset st_buffer ;pointer to beg. of string argument comp_next_char: lodsb cmp al,byte ptr [di] jnz no_match dec dx jz a_matchk ; no chars left: a match! call next_char ; updates di jc no_match ; end of line reached jmp comp_next_char ; loop if chars left in arg.
If you're rusty on your 8086 assembly language, here's how it goes in pseudocode:
int dx = st_length - 1; char *di = line_buffer; lop: dx++; char *si = st_buffer; comp_next_char: char al = *si++; if (al != *di) goto no_match; if (--dx == 0) goto a_matchk; if (!next_char(&di)) goto no_match; goto comp_next_char;
In sort-of-C, the code looks like this:
int l = st_length - 1; char *line = line_buffer; l++; char *string = st_buffer; while (*string++ == *line && --l && next_char(&line)) {}
The weird - 1 followed by l++ is an artifact of code that I deleted, which needed the decremented value. If you prefer, you can look at the code this way:
- 1
l++
int l = st_length; char *line = line_buffer; char *string = st_buffer; while (*string++ == *line && --l && next_char(&line)) {}
Notice that if the string length is zero, there is an integer underflow, and we end up reading off the end of the buffers. The comparison loop does stop, because we eventually hit bytes that don't match. (No virtual memory here, so there is no page fault when you run off the end of a buffer; you just keep going and reading from other parts of your data segment.)
In other words, due to an integer underflow bug, a string of length zero was treated as if it were a string of length 65536, which doesn't match anywhere in the file.
This bug couldn't be fixed, because by the time you got around to trying, there were already people who discovered this behavior and wrote batch files that relied on it. The bug became a feature.
The integer underflow was fixed, but the code is careful to treat null strings as never matching, in order to preserve existing behavior.
Exercise: Why is the loop label called lop instead of loop?
lop
loop
A customer wanted a way to determine which users were using specific files on their server. They fired up the Shared Folders MMC snap-in and went to the Open Files list. They found that the results were inconsistent. Some file types like .exe and .pdf did show up in the list when they were open, but other file types like .txt did not. The customer asked for an explanation of the inconsistency and for a list of which file types work and which ones don't.
.exe
.pdf
.txt
The customer is confusing two senses of the term open file. From the file system point of view, an open file is one that has an outstanding handle reference. This is different from the user interface concept of "There is an open window on my screen showing the contents of the file."
The Open Files list shows files which are open in the file system sense, not in the user interface sense.
Whether a file shows up in the Open Files list depends on the application that is used to open the file (in the user interface sense). Text files are typically opened by Notepad, and Notepad reads the entire contents of the file into memory and closes the file handle. Therefore, the file is open (in the file system sense) only when it is in the process of being loaded or saved.
There is no comprehensive list of which types of files fall into which category because the behavior is not a function of the file type but rather a function of the application being used to view the file. (If you open a .txt file in Word, I believe it will keep the file system handle open until you close the document window.)
The customer seemed satisfied with the explanation. They ran some experiments and observed that Hey, check it out, if I load a really big text file into Notepad, I can see it show up in the Open Files list momentarily. They never did come back with any follow-up questions, so I don't know how they went about solving the original problem. (Maybe they used a SACL to audit who was opening the files.)