Holy cow, I wrote a book!
In response to an article on hierarchical storage management, Karellen suggests that the problem could be ameliorated by having the hierarchical storage manager keep the first 4KB of the file online, thereby allowing programs that sniff the start of the file for metadata to continue operating without triggering a recall. "The way that file read operations tend to work (fread, read, and ReadFile), if an application opens a file and requests a large read, just returning the first 4KB is a valid response."
Premature short reads may technically be a valid response, but it won't be the correct response.
When your program reads from a file, do you retry partial reads? Be honest.
Suppose you want to read a 32-bit value from a file. You probably write this.
uint32_t value; DWORD bytesRead; if (ReadFile(file, &value, sizeof(value), &bytesRead, nullptr) && bytesRead == sizeof(value)) { // Got the value - use it... }
You probably don't write this:
uint32_t value; BYTE *nextRead = reinterpret_cast<BYTE*>&value; DWORD bytesRemaining = sizeof(value); while (bytesRemaining) { DWORD bytesRead; if (!ReadFile(file, &value, bytesRemaining, &bytesRead, nullptr)) return false; if (bytesRead == 0) break; // avoid infinite loop bytesRemaining -= bytesRead; nextRead += bytesRead; } if (bytesRemaining == 0) { // Got the value - use it... }
Most programs assume that a short read from a disk file indicates that the end of the file has been reached, or some other error has occurred. Consider, for example, this file parser:
struct CONTOSOFILEHEADER { uint32_t magic; uint32_t version; }; bool IsContosoFile(HANDLE file) { CONTOSOFILEHEADER header; DWORD bytesRead; if (!ReadFile(file, &header, sizeof(header), &bytesRead, nullptr)) { // Couldn't read the file - assume not a Contoso file. return false; } if (bytesRead != sizeof(header)) { // File doesn't hold a header - not a Contoso file. return false; } if (header.magic != CONTOSO_MAGIC) { // Does not start with magic number - not a Contoso file. return false; } if (header.version != CONTOSO_VERSION_1 && header.version != CONTOSO_VERSION_2) { // Unsupported version - not a Contoso file. return false; } // Passed basic tests. return true; }
The problem is even worse if you use fread, because fread does not provide information on how to resume a partial read. It reports only the total number of items read in full; you get no information about how much progress was made in the items that were read only in part.
fread
// Read 10 32-bit integers. uint32_t flags[10]; auto itemsRead = fread(flags, sizeof(uint32_t), 10, fp); if (itemsRead < 10) { if (!feof(fp) && !ferror(fp)) { // At this point, we have a short read. // We are now screwed. } }
Since nobody is actually prepared for a short read to occur on disk files anywhere other than the end of the file, you shouldn't introduce a new failure mode that nobody can handle.
Because they won't handle it.
And recall that the original question was in the context of displaying a file in a folder. Even if you know that Hierarchical Storage Management is not involved, you still have to deal with the cost of opening the file at all. If the folder is on a remote server where each I/O operation has 500ms of latency, then enumerating the contents of a directory with 1000 files will take over eight minutes. I suspect the user will have lost patience by then.
I noted it in the interview with the Defrag Tools show, but I'll make a proper Microspeak for it. Today's term is North star.
This term rose quickly to prominence in October 2015. My research suggests that it had been simmering below the surface for about a year. For example, here's an isolated citation from May 2015:
The best you can do is paint a compelling picture of an improved world (your north star), and plan the long journey to it.
This citation is interesting because it seems to give a definition for "north star": It means "a compelling picture of an improved world".
The term has become wildly popular of late at Microsoft. I guess a major executive used the term recently, so now it's suddenly the cool thing to say.
We had a team meeting a little while ago. One of the agenda items was "Longer term North star topics", which was itself rather intriguing. During the meeting, I noted¹ the following uses of the term:
There may be changes along the way, but your north star of the feature is intact.
We have to decide where we want to go as a north star.
I raised my hand. "What do you mean by north star? Because if you follow the north star, you end up at the north pole, and not where you actually want to go."
The speaker seemed a bit frustrated by this question. "Who is this idiot who doesn't know what a north star is? Certainly this person hasn't been in all the meetings I've been in, where people are saying 'north star' all over the place."
The speaker noted that I might want to look it up in the dictionary, because it would have told me that the north star is the goal you have beyond your immediate goal. It's a guiding principle that keeps you on the right path for your journey. (Curiously, this definition doesn't appear anywhere in any online dictionary I could find. It also doesn't match the citation at the top of this article.)
So there you go. An explicit definition, as provided by somebody who used the term. I embarrassed myself in front of my whole team for you.
Bonus chatter: Later that same day, a top executive sent mail to the entire company. It too used the term "north star":
With Microsoft's mission as our north star—to empower every person and every organization on the planet to achieve more—we have a...
¹ Yes, when I attend meetings, one of the things I pay particular attention to is new jargon, so I can add it to my collection of citations. If you see me pull out my phone and jot something down, it's either because I'm writing down a question to ask later, or I'm preserving something you said so I can add it to my Microspeak citations.
A customer was developing a custom namespace extension and they found that when displayed in My Computer, it showed up in the Other category.
They wanted it to appear in the Network Locations category because it represented a network device.
Explorer categorizes the items based on the SHDESCRIPTIONID.dwDescriptionId. We saw this structure some time ago when we tried to detect the Recycle Bin. By default shell namespace extensions are given SHDID_COMPUTER_OTHER as their description ID and the clsid is the class ID of the shell extension itself.
SHDESCRIPTIONID.dwDescriptionId
SHDID_COMPUTER_OTHER
clsid
To customize the description ID, go to the shell namespace registration and add the following:
HKEY_CLASSES_ROOT\ CLSID\ {clsid}\ DescriptionID=REG_DWORD:9
The magic number 9 is SHDID_COMPUTER_NETDRIVE. You can use any of the values supported by the SHDESCRIPTIONID structure. For example, if your shell extension wraps a file system directory, you may want to use SHDID_FS_DIRECTORY so that it gets categorized under Folders.
SHDID_COMPUTER_NETDRIVE
SHDESCRIPTIONID
SHDID_FS_DIRECTORY
We saw some time ago that the timestamp of a file increases by up to 2 seconds when you copy it to a USB thumb drive. The underlying reason is that USB thumb drives tend to be formatted with the FAT file system, and the FAT file system records timestamps in local time to only two-second resolution.
The same logic applies to ZIP archives. The ZIP archive format records file times in MS-DOS format, so it too is subject to the two-second resolution limitation.
And the reason the time increases to the nearest two-second interval rather than rounding is so that files do not go backward in time. This is useful when you freshen a ZIP archive: If the file time went backward, then the freshen operation would always report that there were files that needed to be updated.
From the point of view of time stamps, the ZIP archive acts like a tiny FAT-formatted USB thumb drive.
Bonus chatter: If you want to copy files whose timestamps are newer, but take into account MS-DOS timestamp rounding, you can use the robocopy command with the /FFT command line options.
robocopy
/FFT
There are two categories of "Access denied" errors. One occurs when you attempt to create the handle, and the other occurs when you attempt to use the handle.
HANDLE hEvent = OpenEvent(SYNCHRONIZE, FALSE, TEXT("MyEvent"));
If this call fails with Access denied, then it means that you don't have access to the object to the level you requested. In the above example, it means that you don't have SYNCHRONIZE access to the event.
SYNCHRONIZE
A common reason for getting an Access denied when trying to create a handle is that you asked for too much access. For example, you might write
HKEY hkey; LONG lError = RegOpenKeyEx( hkeyRoot, subkeyName, 0, KEY_ALL_ACCESS, &hkey); if (lError == ERROR_SUCCESS) { DWORD dwType; DWORD dwData; DWORD cbData = sizeof(dwData); lError = RegQueryValueEx(hkey, TEXT("ValueName"), nullptr, &dwType, &dwData, &cbData); if (lError == ERROR_SUCCESS && dwType == REG_DWORD && cbData == sizeof(dwData)) { .. do something with dwData .. } RegCloseKey(hkey); }
The call to RegOpenKeyEx fails with Access denied. The proximate reason is that you don't have KEY_ALL_ACCESS permission on the registry key, which makes sense because KEY_ALL_ACCESS asks for permission to do everything imaginable to the registry key, including crazy things like "Change the permissions of the key to deny access to the rightful owner."
RegOpenKeyEx
KEY_ALL_ACCESS
But why are you asking for full access to the key if all you're going to do is read from it?
HKEY hkey; LONG lError = RegOpenKeyEx( hkeyRoot, subkeyName, 0, KEY_READ, &hkey); if (lError == ERROR_SUCCESS) { DWORD dwType; DWORD dwData; DWORD cbData = sizeof(dwData); lError = RegQueryValueEx(hkey, TEXT("ValueName"), nullptr, &dwType, &dwData, &cbData); if (lError == ERROR_SUCCESS && dwType == REG_DWORD && cbData == sizeof(dwData)) { .. do something with dwData .. } RegCloseKey(hkey); }
If you want to go for bonus points, ask for KEY_QUERY_VALUE instead of KEY_READ, since all you are going to do with the key is read a value.
KEY_QUERY_VALUE
KEY_READ
When requesting access to an object, it's best to ask for the minimum access required to get the job done.
This is like the old principle of mathematics: After you've proved something, try to weaken the hypothesis as much as possible and strengthen the conclusions as much as possible. In other words, once you've solved a problem, figure out the absolute minimum requirements for your solution to work, and figure out the largest amount of information your solution produces.
On the other hand, if you get an Access denied error when trying to use a handle, then the problem is that you didn't open the handle with enough access.
HKEY hkey; LONG lError = RegOpenKeyEx( hkeyRoot, subkeyName, 0, KEY_READ, &hkey); if (lError == ERROR_SUCCESS) { DWORD dwData = 1; lError = RegSetValueEx(hkey, TEXT("ValueName"), nullptr, REG_DWORD, (const BYTE*>)&dwData, sizeof(dwData)); if (lError == ERROR_SUCCESS && dwType == REG_DWORD && cbData == sizeof(dwData)) { .. do something with dwData .. } RegCloseKey(hkey); }
Here, the RegOpenKeyEx succeeds, but the RegSetValueEx fails. That's because the registry key was opened for KEY_READ access, but the RegSetValueEx operation requires KEY_SET_VALUE access. To fix this, you need to open the key with the access you actually want:
RegSetValueEx
KEY_SET_VALUE
HKEY hkey; LONG lError = RegOpenKeyEx( hkeyRoot, subkeyName, 0, KEY_SET_VALUE, &hkey); if (lError == ERROR_SUCCESS) { DWORD dwData = 1; lError = RegSetValueEx(hkey, TEXT("ValueName"), nullptr, REG_DWORD, (const BYTE*>)&dwData, sizeof(dwData)); if (lError == ERROR_SUCCESS && dwType == REG_DWORD && cbData == sizeof(dwData)) { .. do something with dwData .. } RegCloseKey(hkey); }
When requesting access to an object, it's best to ask for the minimum access required to get the job done, but no less.
Armed with this information, you can solve this problem:
In the main thread, we create an event like this: TheEvent = CreateEvent(NULL, TRUE, FALSE, name); A worker thread opens the event like this: EventHandle = OpenEvent(SYNCHRONIZE, FALSE, name); The OpenEvent succeeds, but we try to use the handle, we get Access denied: SetEvent(EventHandle); On the other hand, if the worker thread uses the CreateEvent function to get the handle, then the SetEvent succeeds. What are we doing wrong?
In the main thread, we create an event like this:
TheEvent = CreateEvent(NULL, TRUE, FALSE, name);
A worker thread opens the event like this:
EventHandle = OpenEvent(SYNCHRONIZE, FALSE, name);
The OpenEvent succeeds, but we try to use the handle, we get Access denied:
OpenEvent
SetEvent(EventHandle);
On the other hand, if the worker thread uses the CreateEvent function to get the handle, then the SetEvent succeeds.
CreateEvent
SetEvent
What are we doing wrong?
A customer asked
We're using the PropertySheet function to create a wizard. We've been able to remove almost all of the wizard elements, but we can't figure out how to get rid of the footer area. Is there a way to get rid of that?
PropertySheet
The customer included a picture of what they were talking about, with the offending area highlighted:
Text text text
Um, that footer area is where the Back/Next buttons go. If you remove the navigation buttons, then what you have isn't really a wizard any more.
What you've got there is a dialog box.
So use a dialog box.
You've ordered a cheeseburger and are trying to remove every last bit of cheese. Don't do that. Just order a hamburger if that's what you want.
At the announcement of Windows 10, Joe Belfiore remarked,
We want all these Windows 7 users to have the sentiment that yesterday they were driving a first-generation Prius... and now with Windows 10 it's like a Tesla.
Well, at least it's not a Ferrari.
Inside the Redmond Reality Distortion Field, everybody loves to compare their project to a Ferrari. My guess is that the people making these comparisons are young male engineers who love fast cars and dream someday of owning a Ferrari, or they are older male executives who love cars and already own Ferraris.
I fall into neither category.
When I hear somebody compare their project to a Ferrari, I think, "So your target audience is wealthy jerks? Your project is dangerously fast, over-engineered, absurdly expensive, and is always in the repair shop?"
I would rather a product be a Honda Civic or Toyota Corolla: Affordable, gets the job done with no fuss, high reliability, low TCO, no exotic parts or special driver training required. Especially if your product is an infrastructure component. Those things should be as boring as all get-out. The last thing anybody wants is exciting electrical wiring.
"Hi, yeah, um, nobody can print because somebody took a corner too hard and dinged the tire and suspension, and the printer is up on the lift right now, and the shop says it'll be around $5,000 to fix it, plus $500 for a new tire. But trust me, once it's fixed, we can go from 0ppm to 60ppm in three seconds. Sure we use twice as much toner, but man that printer is sweet!"
One of my colleagues fell somewhere in between the two categories above: He was a young male engineer who owned a Ferrari. It was a used Ferrari, but still. Ferrari. The problem was that the Ferrari was always in the repair shop. His solution: Buy a second Ferrari.
This is not a solution available to most people.
Of course, this is also not a problem most people have.
Bonus chatter: The day after I wrote this, I was in a meeting where a team was showing off their project. The presenter said, "We created a Ferrari, basically."
I tend to be a bit sloppy with my desktop folder. How bad is it to have a ton of files on your desktop?
One consequence of having a ton of files on your desktop is that you're slowing down logon, because Explorer has to load up all the icons for your desktop when it starts up. Mind you, you are staring at the spinning dots while all this is going on, so you don't know that part of the time you're spending sitting and twiddling your thumbs is caused by all your desktop icons, but that's one of the things that's going on.
Another consequence of having all those files on your desktop is that they all need to get scanned to see if any of them are shortcuts with a hotkey, and to gather information about what programs they refer to so it can be used to provide the icon for a grouped icon on the taskbar.
It's not the end of the world to have a lot of files on your desktop, but you did ask.
Okay, you didn't ask, but I answered it anyway.
One horrible gotcha of the CoGetInterfaceAndReleaseStream function is that it releases the stream. This is a holdover from the old days before smart pointers. The function released the stream to save you from having to call Release yourself. But nowadays, everybody is using smart pointers, so you never had to type Release to begin with. The problem is that you can fall into a double-Release situation without realizing it.
CoGetInterfaceAndReleaseStream
Release
// Code in italics is wrong void GetTheInterface(REFIID iid, void** ppv) { Microsoft::WRL::ComPtr<IStream> stream; GetTheStream(&stream); CoGetInterfaceAndReleaseStream(iid, ppv, stream.Get()); } void GetTheInterface(REFIID iid, void** ppv) { ATL::CComPtr<IStream> stream; GetTheStream(&stream); CoGetInterfaceAndReleaseStream(iid, ppv, stream); } void GetTheInterface(REFIID iid, void** ppv) { _com_ptr_t<IStream> stream; GetTheStream(&stream); CoGetInterfaceAndReleaseStream(iid, ppv, stream); } struct Releaser { void operator()(IUnknown* p) { if (p) p->Release(); } }; void GetTheInterface(REFIID iid, void** ppv) { IStream* rawStream; GetTheStream(&rawStream); std::unique_ptr<IStream, Releaser> stream(rawStream); CoGetInterfaceAndReleaseStream(iid, ppv, stream.get()); }
All of the code fragments above look completely natural, and they all have a bug because the smart pointer object stream is going to call Release at destruction, which will double-release the pointer because CoGetInterfaceAndReleaseStream already released it.
stream
This type of bug is really hard to track down.
One way to fix this is to call the function and tell the smart pointer class that you are transferring ownership of the stream to the function.
void GetTheInterface(REFIID iid, void** ppv) { Microsoft::WRL::ComPtr<IStream> stream; GetTheStream(&stream); CoGetInterfaceAndReleaseStream(iid, ppv, stream.Detach()); } void GetTheInterface(REFIID iid, void** ppv) { ATL::CComPtr<IStream> stream; GetTheStream(&stream); CoGetInterfaceAndReleaseStream(iid, ppv, stream.Detach()); } void GetTheInterface(REFIID iid, void** ppv) { _com_ptr_t<IStream> stream; GetTheStream(&stream); CoGetInterfaceAndReleaseStream(iid, ppv, stream.Detach()); } void GetTheInterface(REFIID iid, void** ppv) { IStream* rawStream; GetTheStream(&rawStream); std::unique_ptr<IStream, Releaser> stream(rawStream); CoGetInterfaceAndReleaseStream(iid, ppv, stream.release()); }
Another way to fix this is to simply stop using CoGetInterfaceAndReleaseStream with smart pointers, because the function was designed for dumb pointers. For smart pointers, use CoUnmarshalInterface.
CoUnmarshalInterface
void GetTheInterface(REFIID iid, void** ppv) { Microsoft::WRL::ComPtr<IStream> stream; GetTheStream(&stream); CoUnmarshalInterface(iid, ppv, stream.Get()); } void GetTheInterface(REFIID iid, void** ppv) { ATL::CComPtr<IStream> stream; GetTheStream(&stream); CoUnmarshalInterface(iid, ppv, stream); } void GetTheInterface(REFIID iid, void** ppv) { _com_ptr_t<IStream> stream; GetTheStream(&stream); CoUnmarshalInterface(iid, ppv, stream); } void GetTheInterface(REFIID iid, void** ppv) { IStream* rawStream; GetTheStream(&rawStream); std::unique_ptr<IStream, Releaser> stream(rawStream); CoUnmarshalInterface(iid, ppv, stream.get()); }
Last time, we looked at the rules for CoMarshalInterThreadInterfaceInStream and CoGetInterfaceAndReleaseStream, the functions you use for sharing an object with another thread in the sample case where you there is only one other thread you want to share with, and you need to share it only once. Let's continue with the Q&A.
CoMarshalInterThreadInterfaceInStream
What if I want to unmarshal more than once?
In this case, you use the more general CoMarshalInterface. You can pass the MSHLFLAGS_TABLESTRONG flag to indicate that you want to be able to unmarshal many times. In that case, you need to tell COM when you are finished unmarshaling so it knows when to clean up, because it cannot assume that you are finished after the first unmarshal. The pattern goes like this:
CoMarshalInterface
MSHLFLAGS_TABLESTRONG
CoReleaseMarshalData
What is the relationship between CoMarshalInterThreadInterfaceInStream and CoMarshalInterface?
The CoMarshalInterThreadInterfaceInStream function is a helper function that does the following:
CreateStreamOnHGlobal
MSHCTX_INPROC
MSHLFLAGS_NORMAL
Similarly, CoGetInterfaceAndReleaseStream is a helper function that does
IStream::Release
Since a one-shot marshal to another thread within the same process is by far the most common case, the helper functions exist to let you get the job done with just one function call on each side.
What if I want to marshal only once, but to another process?
Again, you need to use the more general CoMarshalInterface function. This time, you pass the MSHCTX_LOCAL flag if you intend to marshal to another process on the same computer, or the MSHCTX_DIFFERENTMACHINE flag if you intend to marshal to another computer. For the marshal flags, use MSHLFLAGS_NORMAL to indicate that you want a one-shot marshal. The recipient can unmarshal with CoGetInterfaceAndReleaseStream as before.
MSHCTX_LOCAL
MSHCTX_DIFFERENTMACHINE
What if I want to marshal to another process and unmarshal more than once?
This is just combining the two axes. On the marshaling side, you do the same as a one-shot cross-process marshal, except you pass the MSHLFLAGS_TABLESTRONG flag to indicate that you want to be able to unmarshal many times. You then send copies of that stream to all your intended recipients, and each of them calls CoGetInterfaceAndReleaseStream, just like before.
Can you marshal a proxy? Does it get all Inception-like?
Go ahead and marshal a proxy. COM detects that you're marshaling a proxy and does the Right Thing. For example, if you marshal a proxy back to the originating thread, then when you unmarshal, you get a direct pointer again!
¹ If the thread wants to unmarshal from the stream than once, it could call CoUnmarshalInterface and not release the stream immediately. Then each time it wants to unmarshal from the stream, it calls CoUnmarshalInterface again, releasing the stream only when it has decided that it will not do any more unmarshaling. This seems silly because once you unmarshal the first time, you can just AddRef the pointer if you want to make another copy. I guess this is for the case where the thread wants to pass the stream off to yet another thread? Definitely a fringe case.
AddRef