Holy cow, I wrote a book!
Now we will use fibers to fight back. Before you decide to use fibers in your programs, make sure to read the dire warnings at the end of this article. My goal here is to show one use of fibers, not to say that fibers are the answer to all your problems. Fibers can create more problems than they solve. We'll come back to all the dire warnings later.
As with most clever ideas, it has a simple kernel: Use a fiber to run both the caller and the enumerator each on their own stack.
#include <windows.h> #include <shlwapi.h> #include <stdio.h> #include <strsafe.h> enum FEFOUND { FEF_FILE, // found a file FEF_DIR, // found a directory FEF_LEAVEDIR, // leaving a directory FEF_DONE, // finished }; enum FERESULT { FER_CONTINUE, // continue enumerating // (if directory: recurse into it) FER_SKIP, // skip directory (do not recurse) }; class __declspec(novtable) FiberEnumerator { public: FiberEnumerator(); ~FiberEnumerator(); FEFOUND Next(); void SetResult(FERESULT fer) { m_fer = fer; } void Skip() { SetResult(FER_SKIP); } virtual LPCTSTR GetCurDir() = 0; virtual LPCTSTR GetCurPath() = 0; virtual const WIN32_FIND_DATA* GetCurFindData() = 0; protected: virtual void FiberProc() = 0; static void DECLSPEC_NORETURN WINAPI s_FiberProc(void* pvContext); FERESULT Produce(FEFOUND fef); protected: void* m_hfibCaller; void* m_hfibEnumerator; FEFOUND m_fef; FERESULT m_fer; }; FiberEnumerator::FiberEnumerator() : m_fer(FER_CONTINUE) { m_hfibEnumerator = CreateFiber(0, s_FiberProc, this); } FiberEnumerator::~FiberEnumerator() { DeleteFiber(m_hfibEnumerator); } void DECLSPEC_NORETURN FiberEnumerator:: s_FiberProc(void *pvContext) { FiberEnumerator* self = reinterpret_cast<FiberEnumerator*>(pvContext); self->FiberProc(); // Theoretically, we need only produce Done once, // but keep looping in case a consumer gets // confused and asks for the Next() item even // though we're Done. for (;;) self->Produce(FEF_DONE); }
This helper class does the basic bookkeeping of fiber-based enumeration. At construction, it remembers the fiber that is consuming the enumeration, as well as creating a fiber that will produce the enumeration. At destruction, it cleans up the fiber. The derived class is expected to implement the FiberProc method and call Produce() every so often.
FiberProc
Produce()
The real magic happens in the (somewhat anticlimactic) Produce() and Next() methods:
Next()
FERESULT FiberEnumerator::Produce(FEFOUND fef) { m_fef = fef; // for Next() to retrieve m_fer = FER_CONTINUE; // default SwitchToFiber(m_hfibCaller); return m_fer; } FEFOUND FiberEnumerator::Next() { m_hfibCaller = GetCurrentFiber(); SwitchToFiber(m_hfibEnumerator); return m_fef; }
To Produce() something, we remember the production code, pre-set the enumeration result to its default of FER_CONTINUE, and switch to the consumer fiber. When the consumer fiber comes back with an answer, we return it from Produce().
FER_CONTINUE
To get the next item, we remember the identity of the calling fiber, then switch to the enumerator fiber. This runs the enumerator until it decides to Produce() something, at which point we take the production code and return it.
That's all there is to it. The m_fef and m_fer members are for passing the parameters and results back and forth across the fiber boundary.
m_fef
m_fer
Okay, with that groundwork out of the way, writing the producer itself is rather anticlimactic.
Since we want to make things easy for the consumer, we use the interface the consumer would have designed, with some assistance from the helper class.
class DirectoryTreeEnumerator : public FiberEnumerator { public: DirectoryTreeEnumerator(LPCTSTR pszDir); ~DirectoryTreeEnumerator(); LPCTSTR GetCurDir() { return m_pseCur->m_szDir; } LPCTSTR GetCurPath() { return m_szPath; } const WIN32_FIND_DATA* GetCurFindData() { return &m_pseCur->m_wfd; } private: void FiberProc(); void Enum(); struct StackEntry { StackEntry* m_pseNext; HANDLE m_hfind; WIN32_FIND_DATA m_wfd; TCHAR m_szDir[MAX_PATH]; }; bool Push(StackEntry* pse); void Pop(); private: StackEntry *m_pseCur; TCHAR m_szPath[MAX_PATH]; }; DirectoryTreeEnumerator:: DirectoryTreeEnumerator(LPCTSTR pszDir) : m_pseCur(NULL) { StringCchCopy(m_szPath, MAX_PATH, pszDir); } DirectoryTreeEnumerator::~DirectoryTreeEnumerator() { while (m_pseCur) { Pop(); } } bool DirectoryTreeEnumerator:: Push(StackEntry* pse) { pse->m_pseNext = m_pseCur; m_pseCur = pse; return SUCCEEDED(StringCchCopy(pse->m_szDir, MAX_PATH, m_szPath)) && PathCombine(m_szPath, pse->m_szDir, TEXT("*.*")) && (pse->m_hfind = FindFirstFile(m_szPath, &pse->m_wfd)) != INVALID_HANDLE_VALUE; } void DirectoryTreeEnumerator::Pop() { StackEntry* pse = m_pseCur; if (pse->m_hfind != INVALID_HANDLE_VALUE) { FindClose(pse->m_hfind); } m_pseCur = pse->m_pseNext; } void DirectoryTreeEnumerator::FiberProc() { Enum(); } void DirectoryTreeEnumerator::Enum() { StackEntry se; if (Push(&se)) { do { if (lstrcmp(se.m_wfd.cFileName, TEXT(".")) != 0 && lstrcmp(se.m_wfd.cFileName, TEXT("..")) != 0 && PathCombine(m_szPath, se.m_szDir, se.m_wfd.cFileName)) { FEFOUND fef = (se.m_wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) ? FEF_DIR : FEF_FILE; if (Produce(fef) == FER_CONTINUE && fef == FEF_DIR) { Enum(); // recurse into the subdirectory we just produced } } } while (FindNextFile(se.m_hfind, &se.m_wfd)); } Produce(FEF_LEAVEDIR); Pop(); }
As you can see, this class is a mix of the two previous classes. Like the consumer-based class, information about the item being enumerated is obtained by calling methods on the enumerator object. But like the callback-based version, the loop that generates the objects themselves is a very simple recursive function, with a call to Produce in place of a callback.
Produce
In fact, it's even simpler than the callback-based version, since we don't have to worry about the FER_STOP code. If the consumer wants to stop enumeration, the consumer simply stops calling Next().
Most of the complexity in the class is just bookkeeping to permit abandoning the enumeration prematurely.
Okay, let's take this fiber out for a spin. You can use the same TestWalk function as last time, but for added generality, change the first parameter from DirectoryTreeEnumerator* to FiberEnumerator*. (The significance of this will become apparent next time.)
TestWalk
DirectoryTreeEnumerator*
FiberEnumerator*
A little tweak needs to be made to the main function, though.
int __cdecl main(int argc, char **argv) { ConvertThreadToFiber(NULL); DirectoryTreeEnumerator e(TEXT(".")); TestWalk(&e); ConvertFiberToThread(); return 0; }
Since the enumerator is going to switch between fibers, we'd better convert the thread to a fiber so it'll have something to switch back to!
Here's a schematic of what happens when you run this fiber-based enumerator:
ConvertThreadToFiber
DirectoryTreeEnumerator
CreateFiber
Next(CONTINUE)
SwitchToFiber()
FindFirstFile
Produce(FILE)
FindNextFile
Produce(DIR)
Produce(DONE)
DeleteFiber
ConvertFiberToThread
Observe that from each fiber's point of view, the other fiber is just a subroutine!
Coding subtlety: Why do we capture the caller's fiber each time the Next() method is called? Why not capture it when the FiberEnumerator is constructed?
FiberEnumerator
Next time, we'll see how this fiber-based enumerator easily admits higher-order operations such as filtering and composition.
Dire warnings about fibers
Fibers are like dynamite. Mishandle them and your process explodes.
And since each fiber has its own stack, it also has its own exception chain. This means that if a fiber throws an exception, only that fiber can catch it. (Same as threads.) That's a strong argument against using an STL std::stack object to maintain our state: STL is based on an exception-throwing model, but you can't catch exceptions raised by another fiber. (You also can't throw exceptions past a COM boundary, which severely limits how much you can use STL in a COM object.)
One of the big problems with fibers is that everybody has to be in cahoots. You need to decide on one person who will call the ConvertThreadToFiber function since fiber/thread conversion is not reference-counted. If two people call ConvertThreadToFiber on the same thread, the first will convert it, and so will the second! This results in two fibers for the same thread, and things can only get worse from there.
You might think, "Well, wouldn't the GetCurrentFiber function return NULL if the thread hasn't been converted to a fiber?" Try it: It returns garbage. (It's amazing how many people ask questions without taking even the slightest steps towards figuring out the answer themselves. Try writing a test program.)
GetCurrentFiber
But even if GetCurrentFiber told you whether or not the thread had been converted to a fiber, that still won't help. Suppose two people want to do fibrous activity on the thread. The first converts, the second notices that the thread is already a fiber (somehow) and skips the conversion. Now the first operation completes and calls the ConvertFiberToThread function. Oh great, now the second operation is stranded doing fibrous activity without a fiber!
Therefore, you can use fibers safely only if you control the thread and can get all your code to agree on who controls the fiber/thread conversion.
An important consequence of the "in cahoots" rule is that you have to make sure all the code you use on a fiber is "fiber-safe" - a level of safety even beyond thread-safety. The C runtime library keeps information in per-thread state: There's errno, all sorts of bonus bookkeeping when you create a thread, or call various functions that maintain state in per-thread data (such as strerror, _fcvt, and strtok).
In particular, C++ exception handling is managed by the runtime, and the runtime tracks this data in per-thread state (rather than per-fiber state). Therefore, if you throw a C++ exception from a fiber, strange things happen.
(Note: Things may have changed in the C runtime lately; I'm operating from information that's a few years old.)
Even if you carefully avoid the C runtime library, you still have to worry about any other libraries you use that use per-thread data. None of them will work with fibers. If you see a call to the TlsAlloc function, then there's a good chance that the library is not fiber-safe. (The fiber-safe version is the FlsAlloc function.)
TlsAlloc
FlsAlloc
Another category of things that are not fiber-safe are windows. Windows have thread affinity, not fiber affinity.
#include <windows.h> #include <shlwapi.h> #include <stdio.h> #include <strsafe.h> enum FEFOUND { FEF_FILE, // found a file FEF_DIR, // found a directory FEF_LEAVEDIR, // leaving a directory FEF_DONE, // finished }; enum FERESULT { FER_CONTINUE, // continue enumerating // (if directory: recurse into it) FER_SKIP, // skip directory (do not recurse) }; class DirectoryTreeEnumerator { public: DirectoryTreeEnumerator(LPCTSTR pszDir); FEFOUND Next(); void SetResult(FERESULT fer); void Skip() { SetResult(FER_SKIP); } LPCTSTR GetCurDir(); LPCTSTR GetCurPath(); const WIN32_FIND_DATA* GetCurFindData(); private: ... implementation ... };
Under this design, the enumerator spits out files, and the caller tells the enumerator when to move on to the next one, optionally indicating that an enumerated directory should be skipped rather than recursed into.
Notice that there is no FER_STOP result code. If the consumer wants to stop enumerating, it will merely stop calling Next().
FER_STOP
With this design, our test function that computes the inclusive and exclusive sizes of each directory is quite simple:
ULONGLONG FileSize(const WIN32_FIND_DATA *pwfd) { return ((ULONGLONG)pwfd->nFileSizeHigh << 32) + pwfd->nFileSizeLow; } ULONGLONG TestWalk(DirectoryTreeEnumerator* penum) { ULONGLONG ullSizeSelf = 0; ULONGLONG ullSizeAll = 0; for (;;) { FEFOUND fef = penum->Next(); switch (fef) { case FEF_FILE: ullSizeSelf += FileSize(penum->GetCurFindData()); break; case FEF_DIR: ullSizeAll += TestWalk(penum); break; case FEF_LEAVEDIR: ullSizeAll += ullSizeSelf; printf("Size of %s is %I64d (%I64d)\n", penum->GetCurDir(), ullSizeSelf, ullSizeAll); return ullSizeAll; case FEF_DONE: return ullSizeAll; } } /* notreached */ } int __cdecl main(int argc, char **argv) { DirectoryTreeEnumerator e(TEXT(".")); TestWalk(&e); return 0; }
Of course, this design puts all the work on the enumerator. Instead of letting the producer walking the tree and calling the callback as it finds things, the caller calls Next() repeatedly, and each time, the enumerator has to find the next file and return it. Since the enumerator returns, it can't store its state in the call stack; instead it has to mimic the call stack manually with a stack data structure.
class DirectoryTreeEnumerator { public: DirectoryTreeEnumerator(LPCTSTR pszDir); ~DirectoryTreeEnumerator(); FEFOUND Next(); void SetResult(FERESULT fer) { m_es = fer == FER_SKIP ? ES_SKIP : ES_NORMAL; } void Skip() { SetResult(FER_SKIP); } LPCTSTR GetCurDir() { return m_pseCur->m_szDir; } LPCTSTR GetCurPath() { return m_szPath; } const WIN32_FIND_DATA* GetCurFindData() { return &m_pseCur->m_wfd; } private: struct StackEntry { StackEntry *m_pseNext; HANDLE m_hfind; WIN32_FIND_DATA m_wfd; TCHAR m_szDir[MAX_PATH]; }; StackEntry* Push(LPCTSTR pszDir); void StopDir(); bool Stopped(); void Pop(); enum EnumState { ES_NORMAL, ES_SKIP, ES_FIRST, }; StackEntry *m_pseCur; EnumState m_es; TCHAR m_szPath[MAX_PATH]; }; DirectoryTreeEnumerator::StackEntry* DirectoryTreeEnumerator::Push( LPCTSTR pszDir) { StackEntry* pse = new StackEntry(); if (pse && SUCCEEDED(StringCchCopy(pse->m_szDir, MAX_PATH, pszDir)) && PathCombine(m_szPath, pse->m_szDir, TEXT("*.*")) && (pse->m_hfind = FindFirstFile(m_szPath, &pse->m_wfd)) != INVALID_HANDLE_VALUE) { pse->m_pseNext = m_pseCur; m_es = ES_FIRST; m_pseCur = pse; } else { delete pse; pse = NULL; } return pse; } void DirectoryTreeEnumerator::StopDir() { StackEntry* pse = m_pseCur; if (pse->m_hfind != INVALID_HANDLE_VALUE) { FindClose(pse->m_hfind); pse->m_hfind = INVALID_HANDLE_VALUE; } } bool DirectoryTreeEnumerator::Stopped() { return m_pseCur->m_hfind == INVALID_HANDLE_VALUE; } void DirectoryTreeEnumerator::Pop() { StackEntry* pse = m_pseCur; m_pseCur = pse->m_pseNext; delete pse; } DirectoryTreeEnumerator::~DirectoryTreeEnumerator() { while (m_pseCur) { StopDir(); Pop(); } } DirectoryTreeEnumerator:: DirectoryTreeEnumerator(LPCTSTR pszDir) : m_pseCur(NULL) { Push(pszDir); } FEFOUND DirectoryTreeEnumerator::Next() { for (;;) { /* Anything to enumerate? */ if (!m_pseCur) return FEF_DONE; /* If just left a directory, pop */ if (Stopped()) { Pop(); m_es = ES_NORMAL; } /* If accepted a directory, recurse */ else if (m_es == ES_NORMAL && (m_pseCur->m_wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) { Push(m_szPath); } /* Any more files in this directory? */ if (m_es != ES_FIRST && !FindNextFile(m_pseCur->m_hfind, &m_pseCur->m_wfd)) { StopDir(); return FEF_LEAVEDIR; } /* Don't recurse into . or .. */ if (lstrcmp(m_pseCur->m_wfd.cFileName, TEXT(".")) == 0 || lstrcmp(m_pseCur->m_wfd.cFileName, TEXT("..")) == 0 || !PathCombine(m_szPath, m_pseCur->m_szDir, m_pseCur->m_wfd.cFileName)) { m_es = ES_NORMAL; continue; } /* Return this found item */ m_es = ES_NORMAL; /* default state */ if (m_pseCur->m_wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { return FEF_DIR; } else { return FEF_FILE; } } /* notreached */ }
Yuck-o-rama. The simple recursive function has turned into this horrible mess of state management.
Wouldn't it be great if we could have it both ways? The caller would see a simple enumerator that spits out files (or directories). But the enumerator sees a callback that it can throw files into.
We'll build that next time.
The COM model for enumeration (enumeration objects) is biased towards making life easy for the consumer and hard for the producer. The enumeration object (producer) needs to be structured as a state machine, which can be quite onerous for complicated enumerators, for example, tree walking or composite enumeration.
On the other hand, the callback model for producer (used by most Win32 functions) is biased towards making life easy for the enumerator and hard for the consumer. This time, it is the consumer that needs to be structured as a state machine, which is more work if the consumer is doing something complicated with each callback. (And even if not, you have to create a context structure to pass state from the caller, through the enumerator, to the callback.)
For example, suppose we want to write a routine that walks a directory structure, allowing the caller to specify what to do at each decision point. Let's design this first using the callback paradigm:
#include <windows.h> #include <shlwapi.h> #include <stdio.h> enum FERESULT { FER_CONTINUE, // continue enumerating // (if directory: recurse into it) FER_SKIP, // skip this file/directory FER_STOP, // stop enumerating }; enum FEOPERATION { FEO_FILE, // found a file FEO_DIR, // found a directory FEO_LEAVEDIR, // leaving a directory }; typedef FERESULT (CALLBACK *FILEENUMCALLBACK) (FEOPERATION feo, LPCTSTR pszDir, LPCTSTR pszPath, const WIN32_FIND_DATA* pwfd, void *pvContext); FERESULT EnumDirectoryTree(LPCTSTR pszDir, FILEENUMCALLBACK pfnCB, void* pvContext);
The design here is that the caller calls EnumDirectoryTree and provides a callback function that is informed of each file found and can decide how the enumeration should proceed.
EnumDirectoryTree
Designing this as a callback makes life much simpler for the implementation of EnumDirectoryTree.
FERESULT EnumDirectoryTree( LPCTSTR pszDir, FILEENUMCALLBACK pfnCB, void *pvContext) { FERESULT fer = FER_CONTINUE; TCHAR szPath[MAX_PATH]; if (PathCombine(szPath, pszDir, TEXT("*.*"))) { WIN32_FIND_DATA wfd; HANDLE hfind = FindFirstFile(szPath, &wfd); if (hfind != INVALID_HANDLE_VALUE) { do { if (lstrcmp(wfd.cFileName, TEXT(".")) != 0 && lstrcmp(wfd.cFileName, TEXT("..")) != 0 && PathCombine(szPath, pszDir, wfd.cFileName)) { FEOPERATION feo = (wfd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) ? FEO_DIR : FEO_FILE; fer = pfnCB(feo, pszDir, szPath, &wfd, pvContext); if (fer == FER_CONTINUE) { if (feo == FEO_DIR) { fer = EnumDirectoryTree(szPath, pfnCB, pvContext); if (fer == FER_CONTINUE) { fer = pfnCB(FEO_LEAVEDIR, pszDir, szPath, &wfd, pvContext); } } } else if (fer == FER_SKIP) { fer = FER_CONTINUE; } } } while (FindNextFile(hfind, &wfd)); FindClose(hfind); } } return fer; }
Note: I made no attempt to make this function at all efficient since that's not my point here. It's highly wasteful of stack space (which can cause problems when walking deep directory trees). This function also doesn't like paths deeper than MAX_PATH; fixing this is beyond the scope of this series. Nor do I worry about reparse points, which can induce infinite loops if you're not careful.
MAX_PATH
Well, that wasn't so hard to write. But that's because we made life hard for the consumer. The consumer needs to maintain state across each callback. For example, suppose you wanted to build a list of directories and their sizes (both including and excluding subdirectories).
class EnumState { public: EnumState() : m_pdirCur(new Directory(NULL)) { } ~EnumState() { Dispose(); } FERESULT Callback(FEOPERATION feo, LPCTSTR pszDir, LPCTSTR pszPath, const WIN32_FIND_DATA* pwfd); void FinishDir(LPCTSTR pszDir); private: struct Directory { Directory(Directory* pdirParent) : m_pdirParent(pdirParent) , m_ullSizeSelf(0) , m_ullSizeAll(0) { } Directory* m_pdirParent; ULONGLONG m_ullSizeSelf; ULONGLONG m_ullSizeAll; }; Directory* Push(); void Pop(); void Dispose(); Directory* m_pdirCur; }; EnumState::Directory* EnumState::Push() { Directory* pdir = new Directory(m_pdirCur); if (pdir) { m_pdirCur = pdir; } return pdir; } void EnumState::Pop() { Directory* pdir = m_pdirCur->m_pdirParent; delete m_pdirCur; m_pdirCur = pdir; } void EnumState::Dispose() { while (m_pdirCur) { Pop(); } } void EnumState::FinishDir(LPCTSTR pszDir) { m_pdirCur->m_ullSizeAll += m_pdirCur->m_ullSizeSelf; printf("Size of %s is %I64d (%I64d)\n", pszDir, m_pdirCur->m_ullSizeSelf, m_pdirCur->m_ullSizeAll); } ULONGLONG FileSize(const WIN32_FIND_DATA *pwfd) { return ((ULONGLONG)pwfd->nFileSizeHigh << 32) + pwfd->nFileSizeLow; } FERESULT EnumState::Callback(FEOPERATION feo, LPCTSTR pszDir, LPCTSTR pszPath, const WIN32_FIND_DATA* pwfd) { if (!m_pdirCur) return FER_STOP; switch (feo) { case FEO_FILE: m_pdirCur->m_ullSizeSelf += FileSize(pwfd); return FER_CONTINUE; case FEO_DIR: if (Push()) { return FER_CONTINUE; } else { return FER_SKIP; } case FEO_LEAVEDIR: FinishDir(pszPath); /* Propagate size into parent */ m_pdirCur->m_pdirParent->m_ullSizeAll += m_pdirCur->m_ullSizeAll; Pop(); return FER_CONTINUE; default: return FER_CONTINUE; } /* notreached */ } FERESULT CALLBACK EnumState_Callback( FEOPERATION feo, LPCTSTR pszDir, LPCTSTR pszPath, const WIN32_FIND_DATA* pwfd, void* pvContext) { EnumState* pstate = reinterpret_cast<EnumState*>(pvContext); return pstate->Callback(feo, pszDir, pszPath, pwfd); } int __cdecl main(int argc, char **argv) { EnumState state; if (EnumDirectoryTree(TEXT("."), EnumState_Callback, &state) == FER_CONTINUE) { state.FinishDir(TEXT(".")); } return 0; }
Boy that sure was an awful lot of typing, and what's worse, the whole structure of the program has been obscured by the explicit state management. It sure is hard to tell at a glance what this chunk of code is trying to do. Instead, you have to stare at the EnumState class and reverse-engineer what's going on.
EnumState
(Yes, I could have simplified this code a little by using a built-in stack class, but as I have already noted in the context of smart pointers, I try to present these articles in "pure" C++ so people won't get into arguments about which class library is best.)
Tomorrow, we'll look at how the world would be if the function EnumDirectoryTree were spec'd out by the caller rather than the enumerator!
One might think that computing the size of a directory would be a simple matter of adding up the sizes of all the files in it.
Oh if it were only that simple.
There are many things that make computing the size of a directory difficult, some of which even throw into doubt the even existence of the concept "size of a directory".
If you're computing the size in order to show the user how much disk space they will gain by deleting the directory, then you do or don't, depending on how you're going to delete the reparse point.
If you're computing the size in preparation for copying, then you probably do. Or maybe you don't - should the copy merely copy the reparse point instead of tunneling through it? What do you if the user doesn't have permission to create reparse points? Or if the destination doesn't support reparse points? Or if the user is creating a copy because they are making a back-up?
Dividing the "weight" of the file among its hard links avoids double-counting (or higher), so that when all the hard links are found, the file's total size is correctly accounted for. And it represents the concept that all the hard links to a file "share the cost" of the resources the file consumes. But what if you don't find all the hard links? It it correct that the file was undercounted? [Minor typo fixed, 12pm]
If you're copying a file and you discover that it has multiple hard links, what do you do? Do you break the links in the copy? Do you attempt to reconstruct them? What if the destination doesn't support hard links?
When adding up the size of the files in a directory, do you add up the logical size or the physical size? If you're computing the size in preparation for copying, then you probably want the logical size, but if you're computing to see how much disk space would be freed up by deleting it, then you probably want physical size.
But if you're computing for copying and the copy destination supports compression, do you want to use the physical size after all? Now you're assuming that the source and destination compression algorithms are comparable.
There is no single answer to all of the above questions. You have to consider each one, apply it to your situation, and decide which way you want to go.
(And copying a directory tree is even scarier. What do you do with the ACLs? Do you copy them too? Do you preserve the creation date? It all depends on why you're copying the tree.)
It is possible to create an infinitely recursive directory tree. This throws many recursive directory-traversal functions into disarray. Here's how you do it. (Note: Requires NTFS.)
Create a directory in the root of your C: drive, call it C:\C, for lack of a more creative name. Right-click My Computer and select Manage. click on the Disk Management snap-in.
From the Disk Management snap-in, right-click the C drive and select "Change Drive Letter and Paths...".
From the "Change Drive Letter and Paths for C:" dialog, click "Add", then where it says "Mount in the following empty NTFS folder", enter "C:\C". Click OK.
Congratulations, you just created an infinitely recursive directory.
C:\> dir Volume in drive has no label Volume Serial Number is A035-E01D Directory of C:\ 08/19/2001 08:43 PM 0 AUTOEXEC.BAT 12/23/2004 09:43 PM <JUNCTION> C 05/05/2001 04:09 PM 0 CONFIG.SYS 12/16/2001 04:34 PM <DIR> Documents and Settings 08/10/2004 12:00 AM <DIR> Program Files 08/28/2004 01:08 PM <DIR> WINDOWS 2 File(s) 0 bytes 4 Dir(s) 2,602,899,968 bytes free C:\> dir C:\C Volume in drive has no label Volume Serial Number is A035-E01D Directory of C:\C 08/19/2001 08:43 PM 0 AUTOEXEC.BAT 12/23/2004 09:43 PM <JUNCTION> C 05/05/2001 04:09 PM 0 CONFIG.SYS 12/16/2001 04:34 PM <DIR> Documents and Settings 08/10/2004 12:00 AM <DIR> Program Files 08/28/2004 01:08 PM <DIR> WINDOWS 2 File(s) 0 bytes 4 Dir(s) 2,602,899,968 bytes free C:\> dir C:\C\C\C\C\C\C Volume in drive has no label Volume Serial Number is A035-E01D Directory of C:\C\C\C\C\C\C 08/19/2001 08:43 PM 0 AUTOEXEC.BAT 12/23/2004 09:43 PM <JUNCTION> C 05/05/2001 04:09 PM 0 CONFIG.SYS 12/16/2001 04:34 PM <DIR> Documents and Settings 08/10/2004 12:00 AM <DIR> Program Files 08/28/2004 01:08 PM <DIR> WINDOWS 2 File(s) 0 bytes 4 Dir(s) 2,602,899,968 bytes free
Go ahead and add as many "\C"s as you like. You'll just get your own C drive back again.
Okay, now that you've had your fun, go back to the "Change Drive Letter and Paths for C:" dialog and Remove the "C:\C" entry. Do this before you create some real havoc.
Now imagine what happens if you had tried a recursive treecopy from that mysterious C:\C directory. Or if you ran a program that did some sort of recursive operation starting from C:\C, like, say, trying to add up the sizes of all the files in it.
If you're writing such a program, you need to be aware of reparse points (that thing that shows up as <JUNCTION> in the directory listing). You can identify them because their file attributes include the FILE_ATTRIBUTE_REPARSE_POINT flag. Of course, what you do when you find one of these is up to you. I'm just warning you that these strange things exist and if you aren't careful, your program can go into an infinite loop.
<JUNCTION>
FILE_ATTRIBUTE_REPARSE_POINT
Alton Brown, geek cooking hero and Bon Appetit Magazine Cooking Teacher of the Year 2004 will be spending January 2005 promoting his latest book, Food × Mixing + Heat = Baking (I'm Just Here for More Food), sequel to his award-winning debut cookbook Food + Heat = Cooking (I'm Just Here for the Food). Check the schedule to see when/whether he'll be in your area.
When you set environment variables with the System control panel, the TEMP and TMP variables are silently converted to their short file name equivalents (if possible). Why is that?
TEMP
TMP
For compatibility, of course.
It is very common for batch files to assume that the paths referred to by the %TEMP% and %TMP% environment variables do not contain any embedded spaces. (Other programs may also make this assumption, but batch files are the most common place where you run into this problem.)
%TEMP%
%TMP%
I say "if possible" because you can disable short name generation, in which case there is no short name equivalent, and the path remains in its original long form.
If you are crazy enough to set this value and point your TEMP/TMP variables at a directory whose name contains spaces and doesn't have a short name, then you get to see what sorts of things stop working properly. Don't say I didn't warn you.
Small electronics nowadays come in those impossible-to-open plastic packages. A few weeks ago I tried to open one and managed not to slice my hand with the knife I was using. (Those who know me know that knives and I don't get along well.) Unfortunately, I failed to pay close attention to the sharp edges of the cut plastic and ended up cutting three of my fingers.
The next day, I called the manufacturer's product support number and politely asked, "How do I open the package?"
The support person recommended using a pair of very heavy scissors. (I tried scissors, but mine weren't heavy enough and couldn't cut through the thick plastic.) Cut across the top, then down the sides, being careful to avoid the sharp shards you're creating. (You might want to wear gloves.)
If you bought someone a small electronics thingie, consider keeping a pair of heavy scissors on hand. That's my tip for the season.
The CreateTimerQueueTimer function allows you to create one-shot timers by passing the WT_EXECUTEONLYONCE flag. The documentation says that you need to call the DeleteTimerQueueTimer function when you no longer need the timer.
CreateTimerQueueTimer
WT_EXECUTEONLYONCE
DeleteTimerQueueTimer
Why do you need to clean up one-shot timers?
To answer this, I would like to introduce you to one of my favorite rhetorical questions when trying to puzzle out API design: "What would the world be like if this were true?"
Imagine what the world would be like if you didn't need to clean up one-shot timers.
Well, for one thing, it means that the behavior of the function would be confusing. The caller of the the CreateTimerQueueTimer function would have to keep track of whether the timer was one-shot or not, to know whether or not the handle needed to be deleted.
But far, far worse is that if one-shot timers were self-deleting, it would be impossible to use them correctly.
Suppose you have an object that creates a one-shot timer, and you want to clean it up in your destructor if it hasn't fired yet. If one-shot timers were self-deleting, then it would be impossible to write this object.
class Sample { HANDLE m_hTimer; Sample() : m_hTimer(NULL) { CreateTimerQueueTimer(&m_hTimer, ...); } ~Sample() { ... what to write here? ... } };
You might say, "Well, I'll have my callback null out the m_hTimer variable. That way, the destructor will know that the timer has fired."
m_hTimer
Except that's a race condition.
Sample::Callback(void *context) { /// RACE WINDOW HERE ((Sample*)context)->m_hTimer = NULL; ... }
If the callback is pre-empted during the race window and the object is destructed, and one-shot timers were self-deleting, then the object would attempt to use an invalid handle.
This race window is uncloseable since the race happens even before you get a chance to execute a single line of code.
So be glad that you have to delete handles to one-shot timers.
Still more ways of saying the same thing. Why so many?
Because each was invented by different people at different times to solve different problems.
BOOL is the oldest one. Its definition is simply
BOOL
typedef int BOOL;
The C programming language uses "int" as its boolean type, and Windows 1.0 was written back when C was the cool language for systems programming.
Next came BOOLEAN.
BOOLEAN
typedef BYTE BOOLEAN;
This type was introduced by the OS/2 NT team when they decided to write a new operating system from scratch. It lingers in Win32 in the places where the original NT design peeks through, like the security subsystem and interacting with drivers.
Off to the side came VARIANT_BOOL.
VARIANT_BOOL
typedef short VARIANT_BOOL; #define VARIANT_TRUE ((VARIANT_BOOL)-1) #define VARIANT_FALSE ((VARIANT_BOOL)0)
This was developed by the Visual Basic folks. Basic uses -1 to represent "true" and 0 to represent "false", and VARIANT_BOOL was designed to preserve this behavior.
-1
0
Common bug: When manipulating VARIANTs of type VT_BOOL, and you want to set a boolean value to "true", you must use VARIANT_TRUE. Many people mistakenly use TRUE or true, which are not the same thing as VARIANT_TRUE. You can cause problem with scripting languages if you get them confused. (For symmetry, you should also use VARIANT_FALSE instead of FALSE or false. All three have the same numerical value, however. Consequently, a mistake when manipulating "false" values is not fatal.)
VARIANT
VT_BOOL
VARIANT_TRUE
TRUE
true
VARIANT_FALSE
FALSE
false
Newest on the scene is bool, which is a C++ data type that has the value true or false. You won't see this used much (if at all) in Win32 because Win32 tries to remain C-compatible.
bool
(Note that C-compatible isn't the same as C-friendly. Although you can do COM from C, it isn't fun.)