Holy cow, I wrote a book!
The technique of taking symbols along for the ride is quite handy if that's what you want, but sometimes you don't actually want it. For example, a symbol taken along for the ride may create conflicts or create unwanted dependencies.
Here's an example: Suppose you have a library called stuff.lib where you put functions that are used by various modules in different projects. One of the files in your library might look like this:
stuff.lib
// filedatestuff.cpp BOOL GetFileCreationTimeW( LPCWSTR pszFile, FILETIME *pft) { WIN32_FILE_ATTRIBUTE_DATA wfad; BOOL fSuccess = GetFileAttributesExW(pszFile, GetFileExInfoStandard, &wfad); if (fSuccess) { *pft = wfad.ftCreationTime; } else { pft->dwLowDateTime = 0; pft->dwHighDateTime = 0; } return fSuccess; } BOOL GetFileCreationTimeAsStringW( LPCWSTR pszFile, LPWSTR pszBuf, UINT cchBuf) { FILETIME ft; BOOL fSuccess = GetFileCreationTimeW(pszFile, &ft); if (fSuccess) { fSuccess = SHFormatDateTimeW(&ft, NULL, pszBuf, cchBuf) > 0; } return fSuccess; }
Things are working out great, people like the helper functions in your library, and then you get a bug report:
When my program calls the GetFileCreationTimeW function, I get a linker error: unresolved external: __imp__SHFormatDateTimeW. If I remove my call to GetFileCreationTimeW, then my program builds fine.
GetFileCreationTimeW
You scratch your head. "The program is calling GetFileCreationTimeW, but that function doesn't call SHFormatDateTimeW, so why are we getting an unresolved external error? Any why hasn't anybody else run into this problem before?"
SHFormatDateTimeW
First question first. Why are we getting an unresolved external error for a nonexistent external dependency?
Because the GetFileCreationTimeAsStringW function got taken along for the ride. When the customer's program called GetFileCreationTimeW, that pulled in the filedatestuff.obj file, and that OBJ file contains both GetFileCreationTimeW and GetFileCreationTimeAsStringW. Since they are in the same OBJ file, pulling in one function pulls in all of them.
GetFileCreationTimeAsStringW
filedatestuff.obj
The fix is to split the filedatastuff.cpp file into two files, one for each function. That way, when you pull in one function, nobody else comes along for the ride.
filedatastuff.cpp
Now to the second half of the question: Why did nobody run into this problem before?
The GetFileCreationTimeW function has a dependency on GetFileAttributesExW, which is a function in KERNEL32.DLL. On the other hand, the GetFileCreationTimeAsStringW function has a dependency on SHFormatDateTimeW, which is a function in SHLWAPI.DLL. If somebody lists KERNEL32.LIB as a dependent library in their project, but they don't include SHLWAPI.LIB on that list, then they will encounter this problem because the linker will pull in the reference to SHFormatDateTimeW and have no way of resolving it.
GetFileAttributesExW
KERNEL32.DLL
SHLWAPI.DLL
KERNEL32.LIB
SHLWAPI.LIB
Nobody ran into this before because SHLWAPI.LIB has lots of cute little functions in it, so most people include it in their project. Only if somebody is being frugal and leaving SHLWAPI.LIB out of their project will they run into this problem.
Bonus chatter: The suggestion to split the file into two will work, but if you are really clever, you can still do some consolidation. Instead of splitting up files by functional group (for example, "all FILETIME functions"), you need to split them up based on their dependencies ("functions that are dependent solely on SHLWAPI.LIB"). Of course, this type of organization may make the code harder to follow ("Why did you put GetFileCreationTimeAsStringW and HashString in the same file?"), so you have to balance this against maintainability and readability. For example, somebody who is not aware of the classical model for linking may add a function to the file that has a dependency on SHELL32.DLL, and now your careful separation has fallen apart.
FILETIME
HashString
SHELL32.DLL
Bohemian Rhapsody was not part of my world growing up, so I view the continuing cultural fascination with the piece with detached confusion.
The hallmark of cultural preoccupation is the fact that the Wikipedia entry deconstructs the piece moment by moment, clocking in at over 2000 words, far in excess of the Wikipedia recommendation of a 60-word summary for a 6-minute piece (10 words per minute). And longer than the entire Wikipedia page for Ruth Bader Ginsburg.
If you study the classical model for linking, you'll see that OBJ files provided directly to the linker have a special property: They are added to the module even if nobody requests a symbol from them.
OBJs bundled into a library are pulled into the module only if they are needed to resolve a needed symbol request. If nobody needs a symbol in the OBJ, then the OBJ doesn't get added to the module. On the other hand, OBJs handed directly to the linker get added to the module whether anybody wants them or not.
Last time, we learned about the along for the ride technique which lets you pull components into a module even if they were not explicitly requested by an OBJ. Today's problem is sort of the reverse of this: If you move an OBJ from the explicit OBJ list to a library, then somebody has to remember to take it for a ride.
Some time ago, Larry Osterman described how some components use sections to have one component automatically register itself with another component when the OBJ is pulled into the module. But in order for that to work, you have to make sure the OBJ gets pulled into the module in the first place. (That's what Larry's CallForceLoad function is for: By putting it an explicit OBJ, that function forces the OBJ from the LIB to be pulled in. And then, since nobody ever calls CallForceLoad, a later linker pass discards it as an unused function.)
CallForceLoad
Another consequence of the algorithm by which the linker pulls OBJs from libraries to form a module is that if a needed symbol can be satsified without consulting a library, then the OBJ in the library will not be used. This lets you override a symbol in a library by explicitly placing it an OBJ. You can also override a symbol in a library to putting it in another library that gets searched ahead of the one you want to override. But you can't override a symbol in an explicit OBJ, because those are part of the initial conditions.
Exercise:
Discuss this user's analysis of a linker issue.
I have three files: // awesome1.cpp int index; // awesome2.cpp extern int index; void setawesomeindex(int i) { index = i; } // main.cpp int index = 0; int main(int, char**) { setawesomeindex(3); return index; } When I link the object files together, I get an error complaining that index is multiply defined, as expected. On the other hand, if I put awesome1.cpp and awesome2.cpp into a library, then the program links fine, but the two copies of the index variable were merged by the linker! When I set the awesome index to 3, it also changes my main program's variable index which has the same name. Why is the linker merging my variables, and how can I keep them separate? When I share my awesome.lib with others, I don't want to have to give them a list of all my global variables and say, "Don't create a global variable with any of these names, because they will conflict with my library." (And that would also prevent me from adding any new global variables to my library.)
I have three files:
// awesome1.cpp int index; // awesome2.cpp extern int index; void setawesomeindex(int i) { index = i; } // main.cpp int index = 0; int main(int, char**) { setawesomeindex(3); return index; }
When I link the object files together, I get an error complaining that index is multiply defined, as expected. On the other hand, if I put awesome1.cpp and awesome2.cpp into a library, then the program links fine, but the two copies of the index variable were merged by the linker! When I set the awesome index to 3, it also changes my main program's variable index which has the same name. Why is the linker merging my variables, and how can I keep them separate?
index
awesome1.cpp
awesome2.cpp
When I share my awesome.lib with others, I don't want to have to give them a list of all my global variables and say, "Don't create a global variable with any of these names, because they will conflict with my library." (And that would also prevent me from adding any new global variables to my library.)
awesome.lib
Exercise: Clarify the following remark by making it more precise and calling out the cases where it is false. "Multiple definitions for a symbol are allowed if they appear in LIBs."
Exercise (harder): The printf function is in a bit of a pickle regarding whether it should support the floating point formats. If it includes them unconditionally, then its use of the floating point data types causes the floating point emulation library to be linked into the module, even if the module didn't otherwise use floating point! Use what you've learned so far this week to provide one way that the printf function could determine whether it should include floating point format support based on whether the module uses floating point.
printf
Last time, we learned the basics of the classical model for linking. Today, we'll look at the historical background for that model, and how the model is exploited by libraries.
In the classical model, compilers and assemblers consume source code and spit out an OBJ file. They do as much as they can, but eventually they get stuck because they don't have the entire module at their disposal. To record the work remaining to be done, the OBJ file contains various sections: a data section, a code section (historically and confusingly called text), an uninitialized data section, and so on. The linker resolves symbols, and then for each OBJ file that got pulled into the module, it combines all the code sections into one giant code section, all the data sections into one giant data section, and so on.
One thing you may have noticed is that the unit of consumption is the OBJ file. If an OBJ file is added to the module, the whole thing gets added, even if you needed only a tiny part of the OBJ file. Historically, the reason for this rule is that the compilers and assemblers did not include information in the OBJ file to indicate how to separate all the little pieces. It's like if somebody said, "Can you get me a portable mp3 player?" and the only thing available in the library was a smartphone. Sure, it plays mp3 files, but there's a lot of other electronic junk in there that you didn't ask for, but it came along for the ride. And you don't know how to disassemble the smartphone and extract just the mp3-player part.
This behavior is actually exploited as a feature, because it allows for tricks like this:
/* magicnumber.h */ extern int magicNumber; /* magicnumber.c */ int magicNumber; class InitMagicNumber { InitMagicNumber() { magicNumber = ...; } } g_InitMagicNumber;
I'm not going to go into the magic of how the compiler knows to construct the g_InitMagicNumber object at module entry; I'll let you read up on that.
g_InitMagicNumber
The point is that if anybody in the module refers to the magicNumber variable, then that causes magicnumber.obj to be pulled into the module, which brings in not just the magicNumber variable, but also the g_InitMagicNumber object, which initializes the magic number when the process starts.
magicNumber
magicnumber.obj
One place the C runtime library took advantage of this was in deciding whether or not to include floating point support.
As you may recall, the 8086 processor did not have native floating support. You had to buy the 8087 coprocessor for that. It was therefore customary for programs of that era to include a floating point library if they did any floating point arithmetic. The library would redirect floating point operations from the coprocessor to the emulator.
The floating point emulation library was pretty hefty, and it would have been a waste to include it for programs that didn't use floating point (which was most of them), so the compiler used a trick to allow it to pull in the floating point library only if the program used floating point: If you used floating point, then the compiler added a needed symbol to your OBJ file: __fltused.
__fltused
That magical __fltused symbol was marked as provided by... the floating point emulation library!
The linker found the symbol in an OBJ in the floating point emulation library, and that served as the loose thread that caused the rest of the floating point emulation library to be pulled into your module.
Next time, we'll look at the interaction between OBJ files and LIB files.
Bonus reading: Larry Osterman gives another example of this trick.
The classical model for linking goes like this:
Each OBJ file contains two lists of symbols.
(The official terms for these are exported and imported, but I will use provided and needed to avoid confusion with the concepts of exported and imported functions in DLLs, and because provided and needed more clearly captures what the two lists are for.)
Naturally, there is other bookkeeping information in there. For example, for provided symbols, not only is the name given, but also additional information on locating the definition. Similarly, for needed symbols, in addition to the name, there is also information about what should be done once its definition has been located.
Collectively, provided and needed symbols are known as symbols with external linkage, or just externals for short. (Of course, by giving them the name symbols with external linkage, you would expect there to be things known as symbols with internal linkage, and you'd be right.)
For example, consider this file:
// inventory.c extern int InStock(int id); int GetNextInStock() { static int Current = 0; while (!InStock(++Current)) { } return Current; }
This very simple OBJ file has one provided symbol, GetNextInStock: That is the object defined in this file that can be used by other files. It also has one needed symbol, InStock: That is the object required by this file in order to work, but which the file itself did not provide a definition for. It's hoping that somebody else will define it. There's also a symbol with internal linkage: Current, but that's not important to the discussion, so I will ignore it from now on.
GetNextInStock
InStock
OBJ files can hang around on their own, or they can be bundled together into a LIB file.
When you ask the linker to generate a module, you hand it a list of OBJ files and a list of LIB files. The linker's goal is to resolve all of the needed symbols by matching them up to a provided symbol. Eventually, everything needed will be provided, and you have yourself a module.
To do this, the linker keeps track of which symbols in the module are resolved and which are unresolved.
Whenever the linker adds an OBJ file to the module, it goes through the list of provided and needed symbols and updates the list of symbols in the module. The algorithm for updating this list of symbols is obvious if you've been paying attention, because it is a simple matter of preserving the invariants described above.
For each provided symbol in an OBJ file added to a module:
For each needed symbol in an OBJ file added to a module:
The algorithm the linker uses to resolve symbols goes like this:
That's all there is to linking and unresolved externals. At least, that's all there is to the classical model.
Next time, we'll start looking at the consequences of the rules for classical linking.
Sidebar: Modern linkers introduce lots of non-classical behavior. For example, the rule
has been replaced with the rules
__declspec(selectany)
Another example of non-classical behavior is dead code removal. If you pass the /OPT:REF linker flag, then after all externals have been resolved, the linker goes through and starts discarding functions and data that are never referenced, taking advantage of another non-classical feature (packed functions) to know where each function begins and ends.
/OPT:REF
But I'm going to stick with the classical model, because you need to understand classical linking before you can study non-classical behavior. Sort of how in physics, you need to learn your classical mechanics before you study relativity.
Occasionally, a customer will ask, "What is Rundll32.exe and when should I use it instead of just writing a standalone exe?"
The guidance is very simple: Don't use rundll32. Just write your standalone exe.
Rundll32 is a leftover from Windows 95, and it has been deprecated since at least Windows Vista because it violates a lot of modern engineering guidelines. If you run something via Rundll32, then you lose the ability to tailor the execution environment to the thing you're running. Instead, the environment is set up for whatever Rundll32 requests.
TSAWARE
LARGEADDRESSAWARE
HeapEnableTerminationOnCorruption
You get the idea.
Note also that Rundll32 assumes that the entry point you provide corresponds to a task which pumps messages, since it creates a window on your behalf and passes it as the first parameter. A common mistake is writing a Rundll32 entry point for a long-running task that does not pump messages. The result is an unresponsive window that clogs up broadcasts.
Digging deeper, one customer explained that they asked for guidance making this choice because they want to create a scheduled task that runs code inside a DLL, and they wanted to decide whether to create a Rundll32 entry point in their DLL, or whether they should just create a custom executable whose sole job is loading the DLL and calling the custom code.
By phrasing it as an either/or question, they missed the third (correct) option: Create your scheduled task with an IComHandlerAction that specifies a CLSID your DLL implements.
IComHandlerAction
CLSID
More than once, a customer has noticed that running the exact same program under the debugger rather than standalone causes it to change behavior. And not just in the "oh, the timing of various operations changed to hit different race conditions" but in much more fundamental ways like "my program runs really slow" or "my program crashes in a totally different location" or (even more frustrating) "my bug goes away".
What's going on? I'm not even switching between the retail and debug versions of my program, so I'm not a victim of changing program semantics in the debug build.
When a program is running under the debugger, some parts of the system behave differently. One example is that the CloseHandle function raises an exception (I believe it's STATUS_INVALID_HANDLE but don't quote me) if you ask it to close a handle that isn't open. But the one that catches most people is that when run under the debugger, an alternate heap is used. This alternate heap has a different memory layout, and it does extra work when allocating and freeing memory to help try to catch common heap errors, like filling newly-allocated memory with a known sentinel value.
CloseHandle
STATUS_INVALID_HANDLE
But this change in behavior can make your debugging harder or impossible.
So much for people's suggestions to switch to a stricter implementation of the Windows API when a debugger is attached.
On Windows XP and higher, you can disable the debug heap even when debugging. If you are using a dbgeng-based debugger like ntsd or WinDbg, you can pass the -hd command line switch. If you are using Visual Studio, you can set the _NO_DEBUG_HEAP environment variable to 1.
dbgeng
ntsd
WinDbg
-hd
_NO_DEBUG_HEAP
1
If you are debugging on a version of Windows prior to Windows XP, you can start the process without a debugger, then connect a debugger to the live process. The decision to use the debug heap is made at process startup, so connecting the debugger afterwards ensures that the retail heap is chosen.
Miscellaneous notes, largely unorganized.
MOV EDI, EDI
/hotpatch
Some time ago, I noted that in order to format a USB drive as NTFS, you have to promise to go through the removal dialog.
But wait, NTFS is a journaling file system. The whole point of a journaling file system is that it is robust to these sorts of catastrophic failures. So how can surprise removal of an NTFS-formatted USB drive result in corruption?
Well, no it doesn't result in corruption, at least from NTFS's point of view. The file system data structures remain intact (or at least can be repaired from the change journal) regardless of when you yank the drive out of the computer. So from the file system's point of view, the answer is "Go ahead, yank the drive any time you want!"
This is a case of looking at the world through filesystem-colored glasses.
Sure, the file system data structures are intact, but what about the user's data? The file system's autopilot system was careful to land the plane, but yanking the drive killed the passengers.
Consider this from the user's point of view: The user copies a large file to the USB thumb drive. Chug chug chug. Eventually, the file copy dialog reports 100% success. As soon as that happens, the user yanks the USB thumb drive out of the computer.
The user goes home and plugs in the USB thumb drive, and finds that the file is corrupted.
"Wait, you told me the file was copied!"
Here's what happened:
Now you insert the USB drive into another computer. Since NTFS is a journaling file system, it can auto-repair the internal data structures that are used to keep track of files, so the drive itself remains logically consistent. The file is correctly set to the final size, and its directory entry is properly linked in. But the data you wrote to the file? It never made it. The journal didn't have a copy of the data you wrote in step 2. It only got as far as the metadata updates from step 1.
That's why the default for USB thumb drives is to optimize for Quick Removal. Because people expect to be able to yank USB thumb drives out of the computer as soon as the computer says that it's done.
If you want to format a USB thumb drive as NTFS, you have to specify that you are Optimizing for Performance and that you promise to warn the file system before yanking the drive, so that it can flush out all the data sitting in the disk cache.
Even though NTFS is robust and can recover from the surprise removal, that robustness does not extend to the internal consistency of the data you lost. From NTFS's point of view, that's just a passenger.
Update: It seems that people missed the first sentence of this article. Write-behind caching is disabled by default on removable drives. You get into this mess only if you override the default. And on the dialog box that lets you override the default, there is a warning message that says that when you enable write-behind caching, you must use the Safely Remove Hardware icon instead of just yanking the drive. In other words, this problem occurs because you explicitly changed a setting from the safe setting to the dangerous one, and you ignored the warning that came with the dangerous setting, and now you're complaining that the setting is dangerous.