Welcome to MSDN Blogs Sign in | Join | Help

Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

The COM model for enumeration (enumeration objects) is biased towards making life easy for the consumer and hard for the producer. The enumeration object (producer) needs to be structured as a state machine, which can be quite onerous for complicated enumerators, for example, tree walking or composite enumeration.

On the other hand, the callback model for producer (used by most Win32 functions) is biased towards making life easy for the enumerator and hard for the consumer. This time, it is the consumer that needs to be structured as a state machine, which is more work if the consumer is doing something complicated with each callback. (And even if not, you have to create a context structure to pass state from the caller, through the enumerator, to the callback.)

For example, suppose we want to write a routine that walks a directory structure, allowing the caller to specify what to do at each decision point. Let's design this first using the callback paradigm:

#include <windows.h>
#include <shlwapi.h>
#include <stdio.h>

enum FERESULT {
 FER_CONTINUE,      // continue enumerating
                    // (if directory: recurse into it)
 FER_SKIP,          // skip this file/directory
 FER_STOP,          // stop enumerating
};

enum FEOPERATION {
 FEO_FILE,          // found a file
 FEO_DIR,           // found a directory
 FEO_LEAVEDIR,      // leaving a directory
};

typedef FERESULT (CALLBACK *FILEENUMCALLBACK)
    (FEOPERATION feo,
     LPCTSTR pszDir, LPCTSTR pszPath,
     const WIN32_FIND_DATA* pwfd,
     void *pvContext);

FERESULT EnumDirectoryTree(LPCTSTR pszDir,
    FILEENUMCALLBACK pfnCB, void* pvContext);

The design here is that the caller calls EnumDirectoryTree and provides a callback function that is informed of each file found and can decide how the enumeration should proceed.

Designing this as a callback makes life much simpler for the implementation of EnumDirectoryTree.

FERESULT EnumDirectoryTree(
    LPCTSTR pszDir,
    FILEENUMCALLBACK pfnCB, void *pvContext)
{
 FERESULT fer = FER_CONTINUE;
 TCHAR szPath[MAX_PATH];
 if (PathCombine(szPath, pszDir, TEXT("*.*"))) {
  WIN32_FIND_DATA wfd;
  HANDLE hfind = FindFirstFile(szPath, &wfd);
  if (hfind != INVALID_HANDLE_VALUE) {
   do {
    if (lstrcmp(wfd.cFileName, TEXT(".")) != 0 &&
        lstrcmp(wfd.cFileName, TEXT("..")) != 0 &&
        PathCombine(szPath, pszDir, wfd.cFileName)) {
     FEOPERATION feo = (wfd.dwFileAttributes &
                     FILE_ATTRIBUTE_DIRECTORY) ?
                     FEO_DIR : FEO_FILE;
     fer = pfnCB(feo, pszDir, szPath, &wfd, pvContext);
     if (fer == FER_CONTINUE) {
      if (feo == FEO_DIR) {
       fer = EnumDirectoryTree(szPath, pfnCB, pvContext);
       if (fer == FER_CONTINUE) {
        fer = pfnCB(FEO_LEAVEDIR, pszDir, szPath,
                    &wfd, pvContext);
       }
      }
     } else if (fer == FER_SKIP) {
      fer = FER_CONTINUE;
     }
    }
   } while (FindNextFile(hfind, &wfd));
   FindClose(hfind);
  }
 }
 return fer;
}

Note: I made no attempt to make this function at all efficient since that's not my point here. It's highly wasteful of stack space (which can cause problems when walking deep directory trees). This function also doesn't like paths deeper than MAX_PATH; fixing this is beyond the scope of this series. Nor do I worry about reparse points, which can induce infinite loops if you're not careful.

Well, that wasn't so hard to write. But that's because we made life hard for the consumer. The consumer needs to maintain state across each callback. For example, suppose you wanted to build a list of directories and their sizes (both including and excluding subdirectories).

class EnumState {
public:
 EnumState()
   : m_pdirCur(new Directory(NULL)) { }
 ~EnumState() { Dispose(); }
 FERESULT Callback(FEOPERATION feo,
    LPCTSTR pszDir, LPCTSTR pszPath,
    const WIN32_FIND_DATA* pwfd);
 void FinishDir(LPCTSTR pszDir);

private:

 struct Directory {
  Directory(Directory* pdirParent)
   : m_pdirParent(pdirParent)
   , m_ullSizeSelf(0)
   , m_ullSizeAll(0) { }
  Directory* m_pdirParent;
  ULONGLONG m_ullSizeSelf;
  ULONGLONG m_ullSizeAll;
 };
 Directory* Push();
 void Pop();
 void Dispose();

 Directory* m_pdirCur;
};

EnumState::Directory* EnumState::Push()
{
 Directory* pdir = new Directory(m_pdirCur);
 if (pdir) {
  m_pdirCur = pdir;
 }
 return pdir;
}

void EnumState::Pop()
{
 Directory* pdir = m_pdirCur->m_pdirParent;
 delete m_pdirCur;
 m_pdirCur = pdir;
}

void EnumState::Dispose()
{
 while (m_pdirCur) {
  Pop();
 }
}

void EnumState::FinishDir(LPCTSTR pszDir)
{
  m_pdirCur->m_ullSizeAll +=
    m_pdirCur->m_ullSizeSelf;
  printf("Size of %s is %I64d (%I64d)\n",
   pszDir, m_pdirCur->m_ullSizeSelf,
   m_pdirCur->m_ullSizeAll);
}

ULONGLONG FileSize(const WIN32_FIND_DATA *pwfd)
{
  return 
    ((ULONGLONG)pwfd->nFileSizeHigh << 32) +
    pwfd->nFileSizeLow;
}

FERESULT EnumState::Callback(FEOPERATION feo,
    LPCTSTR pszDir, LPCTSTR pszPath,
    const WIN32_FIND_DATA* pwfd)
{
 if (!m_pdirCur) return FER_STOP;

 switch (feo) {
 case FEO_FILE:
  m_pdirCur->m_ullSizeSelf += FileSize(pwfd);
  return FER_CONTINUE;

 case FEO_DIR:
  if (Push()) {
   return FER_CONTINUE;
  } else {
   return FER_SKIP;
  }

 case FEO_LEAVEDIR:
  FinishDir(pszPath);

 /* Propagate size into parent */
  m_pdirCur->m_pdirParent->m_ullSizeAll +=
    m_pdirCur->m_ullSizeAll;
  Pop();
  return FER_CONTINUE;

 default:
  return FER_CONTINUE;
 }
 /* notreached */
}

FERESULT CALLBACK EnumState_Callback(
    FEOPERATION feo,
    LPCTSTR pszDir, LPCTSTR pszPath,
    const WIN32_FIND_DATA* pwfd,
    void* pvContext)
{
 EnumState* pstate =
    reinterpret_cast<EnumState*>(pvContext);
 return pstate->Callback(feo, pszDir,
            pszPath, pwfd);
}

int __cdecl main(int argc, char **argv)
{
 EnumState state;
 if (EnumDirectoryTree(TEXT("."),
        EnumState_Callback,
        &state) == FER_CONTINUE) {
  state.FinishDir(TEXT("."));
 }
 return 0;
}

Boy that sure was an awful lot of typing, and what's worse, the whole structure of the program has been obscured by the explicit state management. It sure is hard to tell at a glance what this chunk of code is trying to do. Instead, you have to stare at the EnumState class and reverse-engineer what's going on.

(Yes, I could have simplified this code a little by using a built-in stack class, but as I have already noted in the context of smart pointers, I try to present these articles in "pure" C++ so people won't get into arguments about which class library is best.)

Tomorrow, we'll look at how the world would be if the function EnumDirectoryTree were spec'd out by the caller rather than the enumerator!

Published Wednesday, December 29, 2004 7:00 AM by oldnewthing
Filed under:

Comments

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 8:05 AM by Drazen Dotlic
>> (Yes, I could have simplified this code a little by using a built-in stack class, but as I have already noted in the context of smart pointers, I try to present these articles in "pure" C++ so people won't get into arguments about which class library is best.)

stack is a part of C++ Standard Library, it's not any particular vendor's one, thus it is "safe" to use in this context.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 9:52 AM by Raymond Chen
It may be the part of the standard library, but it's still a library. Some people may like std::string, others might prefer CString, still others may prefer their own string library. (If everybody agreed on std::string then why are there so many string libraries?)

# Standard Library

Wednesday, December 29, 2004 10:36 AM by John McCormick
Perhaps if more people treated the standard library as a standard, more people would accept it as such. In any case, I don't think too many people would lampoon you for using it.

I always find this sort of mix of c-like code and c++ code a bit jarring, but then I'm not really a windows programmer. Looking forward to tomorrow's installment!

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 10:38 AM by Raymond Chen
True, but if I chose a specific library (even the standard one), I would lose people who weren't familiar with (or actively disliked) that library. But everybody who programs Windows knows what TCHAR is.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 2:47 PM by Drazen Dotlic
>>If everybody agreed on std::string then why are there so many string libraries?

So many? Besides MFC, I am not familiar with any other in widespread use.
The only reason for these "other" libraries, including (parts of) MFC has been poor implementation (or buggy) of both C++ compiler and/or Standard Library. Things are a lot better now with VS.NET 2003 and will be even better with 2005. It's time to start treating Standard Library as a part of C++ language, not something we might not use because we don't like it or are not familiar with.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 2:57 PM by Eric TF Bat
Maybe it's because I've been reading a lot of LISP code lately, but I looked at this article and my first thought was "Why is Raymond posting randomly-indented assembler code on his blog?"

C++: Just Say AAAARRGHH OH GOD STOP AAAAARRRGGGHHH HELP PLEASE PLEASE PLEASE UUURKKK!

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 3:04 PM by Noob
What does fibers have to do with this?

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 3:06 PM by Larry Osterman
Drazen,
There are lots of reasons not to use std::string, and CString/CAtlString often provide an alternative to it.

Just because STL's a part of the C++ standard library doesn't mean that it is a worthy solution for everyone - the code size bloat you get from using STL is enough to preclude its use in many scenarios (like most of the Windows core). Also the fact that STL's localization support is mediochre (it has to be because it's platform neutral, and localization basically requires a platform specific infrastructure (how you specify your localizable resources depends on the platform)) precludes its use in many places.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 3:08 PM by Rick C
Noob, this is a multi-part lecture. I'm sure Rand will get to that. Part 1 is setting a stage.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 4:09 PM by MYG
Noob,

Because state is easily held on the CPU stack. A fiber is a bit like a thread in that it has its own stack but the context switching is driven by the application rather than the scheduler.

So the idea here is to have a tasking module without the tasking overhead (or part of it, stack space is overhead): two "fibers", one a consumer, one a producer.

Think of fibers as setjmp()/longjmp() with separate stacks.

IIRC, there is a "fiber like" construct in later versions of VMS. I wonder if they are a Cutler decree or just a common good idea.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 4:15 PM by Michael Kaplan
Speaking as the owner of a technology (namely, linguistic collation) and a colleague to the owner of a technology (namely encoding / codepages) and a former owner of a technology (namely, locale based formatting) which is superior in almost every way to the correponding CRT analogue, I can tell you that there are indeed times that the CRT is not always the best solution. And there do exist many times that it is not the answer....

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 4:33 PM by Kent Boogaart
This is OT and for that I apologise . . .

Does anyone know where I can find information about the next standard version of C++? I remember hearing about this somewhere but have no idea what it is codenamed etcetera.

Nice article Raymond (as per usual).

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Wednesday, December 29, 2004 10:00 PM by Tim Smith
std::auto_ptr is a perfect example of something that just because it is the standard doesn't mean it is good.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Thursday, December 30, 2004 1:38 AM by Andreas Häber
Kent:
You can find some links about the next C++ standard here: http://www.gotw.ca/iso/

And also check out Herb Sutter's blog at http://www.pluralsight.com/blogs/hsutter/default.aspx

Happy new year everyone :)

# Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 9:58 AM by The Old New Thing
This time, we'll watch the enumerator lose.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Thursday, December 30, 2004 9:38 AM by Drazen Dotlic
>>Drazen,
>>There are lots of reasons not to use std::string, and CString/CAtlString often provide an alternative to it.

Larry,

you are right. But let's put things into context here - this is supposed to be an example of a certain technique. State management is only a necessary "evil" here, it is not a meat of the post. I am simply arguing that standard stack could have been used to simplify the code sample without jeopardizing the message of the post - even better, shorter code might have improved the readibility and help push the message through.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Thursday, December 30, 2004 9:49 AM by Raymond Chen
Perhaps true, but when I see std:: stuff I tend to get more confused rather than less.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Thursday, December 30, 2004 10:18 AM by Michael Grier [MSFT]
The std:: stuff assumes you are comfortable with exceptions. I personally believe that there are unsolvable quality problems associated with use of exceptions. Thus, even though the library seems very useful, I stay as far away from the STL libraries as possible.

C and C++ (pre-STL) were systems programming languages which provided you with a syntax for using the underlying machine. The libraries provided were really just tokens to portability.

With the introduction of STL, C++ is moving/has moved out of the systems programming space and is trying to seem like an application programming language like BASIC. I personally don't see the point. If I wanted to use BASIC, with all its plusses and minuses, I'd use BASIC.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Thursday, December 30, 2004 10:54 AM by asdf
"an application programming language like BASIC"

I see Microsoft doesn't do any drug testing...

Raymond: You need to cast the ULONGLONGs to unsigned __int64 when you pass it in to printf and you need the format to be %I64u because %I64d is the signed version.

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Monday, January 03, 2005 10:09 AM by Cooney
> I see Microsoft doesn't do any drug testing...

Perhaps they do, just not the way you were thinking ;)

# re: Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

Tuesday, January 04, 2005 10:58 AM by Sig9
is part of the software industry. if you do not learn to learn (ya ya ya) you will be out of a job as fast as you can say "hello world!"

# Why does Win32 even have Fibers?

Wednesday, January 05, 2005 6:58 PM by Larry Osterman's WebLog

# Why does Win32 even have Fibers?

Monday, January 10, 2005 2:03 PM by Larry Osterman's WebLog

# CSci 101 Part I

Wednesday, January 19, 2005 12:11 AM by public Blog
CSci 101 Part I

# Partially Debuggable CoRoutines

Thursday, January 20, 2005 6:32 PM by Wizzy's World
A while back there was an article in MSDN magazine about wrapping up the unmaged fibers API to implement...

# Coroutines

Thursday, January 11, 2007 10:19 PM by Joe White's Blog

# Coroutines

Thursday, January 11, 2007 10:21 PM by Joe White's Blog

# Labr??tens Web Log &raquo; Blog Archive &raquo; Mimic the C# yield instruction in VC++

New Comments to this post are disabled
 
Page view tracker