Welcome to MSDN Blogs Sign in | Join | Help

Using fibers to simplify enumerators, part 2: When life is easier for the caller

Last time, we looked at how a directory tree enumerator function would have been written if the person writing the enumerator (the producer) got to write the spec. Now let's look at what it would look like if the person consuming the enumerator wrote the spec:
#include <windows.h>
#include <shlwapi.h>
#include <stdio.h>
#include <strsafe.h>

enum FEFOUND {
 FEF_FILE,          // found a file
 FEF_DIR,           // found a directory
 FEF_LEAVEDIR,      // leaving a directory
 FEF_DONE,          // finished
};

enum FERESULT {
 FER_CONTINUE,      // continue enumerating
                    // (if directory: recurse into it)
 FER_SKIP,          // skip directory (do not recurse)
};

class DirectoryTreeEnumerator {
public:
  DirectoryTreeEnumerator(LPCTSTR pszDir);

  FEFOUND Next();
  void SetResult(FERESULT fer);
  void Skip() { SetResult(FER_SKIP); }

  LPCTSTR GetCurDir();
  LPCTSTR GetCurPath();
  const WIN32_FIND_DATA* GetCurFindData();
private:
    ... implementation ...
};

Under this design, the enumerator spits out files, and the caller tells the enumerator when to move on to the next one, optionally indicating that an enumerated directory should be skipped rather than recursed into.

Notice that there is no FER_STOP result code. If the consumer wants to stop enumerating, it will merely stop calling Next().

With this design, our test function that computes the inclusive and exclusive sizes of each directory is quite simple:

ULONGLONG FileSize(const WIN32_FIND_DATA *pwfd)
{
  return 
    ((ULONGLONG)pwfd->nFileSizeHigh << 32) +
    pwfd->nFileSizeLow;
}

ULONGLONG TestWalk(DirectoryTreeEnumerator* penum)
{
 ULONGLONG ullSizeSelf = 0;
 ULONGLONG ullSizeAll = 0;
 for (;;) {
  FEFOUND fef = penum->Next();
  switch (fef) {
  case FEF_FILE:
   ullSizeSelf += FileSize(penum->GetCurFindData());
   break;

  case FEF_DIR:
   ullSizeAll += TestWalk(penum);
   break;

  case FEF_LEAVEDIR:
   ullSizeAll += ullSizeSelf;
   printf("Size of %s is %I64d (%I64d)\n",
    penum->GetCurDir(), ullSizeSelf, ullSizeAll);
   return ullSizeAll;

  case FEF_DONE:
   return ullSizeAll;
  }
 }
 /* notreached */
}

int __cdecl main(int argc, char **argv)
{
 DirectoryTreeEnumerator e(TEXT("."));
 TestWalk(&e);
 return 0;
}

Of course, this design puts all the work on the enumerator. Instead of letting the producer walking the tree and calling the callback as it finds things, the caller calls Next() repeatedly, and each time, the enumerator has to find the next file and return it. Since the enumerator returns, it can't store its state in the call stack; instead it has to mimic the call stack manually with a stack data structure.

class DirectoryTreeEnumerator {
public:
 DirectoryTreeEnumerator(LPCTSTR pszDir);
 ~DirectoryTreeEnumerator();

 FEFOUND Next();
 void SetResult(FERESULT fer)
  { m_es = fer == FER_SKIP ? ES_SKIP : ES_NORMAL; }
 void Skip() { SetResult(FER_SKIP); }

 LPCTSTR GetCurDir()
    { return m_pseCur->m_szDir; }
 LPCTSTR GetCurPath()
    { return m_szPath; }
 const WIN32_FIND_DATA* GetCurFindData()
    { return &m_pseCur->m_wfd; }

private:
 struct StackEntry {
  StackEntry *m_pseNext;
  HANDLE m_hfind;
  WIN32_FIND_DATA m_wfd;
  TCHAR m_szDir[MAX_PATH];
 };

 StackEntry* Push(LPCTSTR pszDir);
 void StopDir();
 bool Stopped();
 void Pop();

 enum EnumState {
  ES_NORMAL,
  ES_SKIP,
  ES_FIRST,
 };

 StackEntry *m_pseCur;
 EnumState m_es;
 TCHAR m_szPath[MAX_PATH];
};

DirectoryTreeEnumerator::StackEntry*
DirectoryTreeEnumerator::Push(
    LPCTSTR pszDir)
{
 StackEntry* pse = new StackEntry();
 if (pse &&
     SUCCEEDED(StringCchCopy(pse->m_szDir,
                 MAX_PATH, pszDir)) &&
     PathCombine(m_szPath, pse->m_szDir,
                  TEXT("*.*")) &&
     (pse->m_hfind = FindFirstFile(m_szPath,
       &pse->m_wfd)) != INVALID_HANDLE_VALUE) {
  pse->m_pseNext = m_pseCur;
  m_es = ES_FIRST;
  m_pseCur = pse;
 } else {
  delete pse;
  pse = NULL;
 }
 return pse;
}

void DirectoryTreeEnumerator::StopDir()
{
 StackEntry* pse = m_pseCur;
 if (pse->m_hfind != INVALID_HANDLE_VALUE) {
  FindClose(pse->m_hfind);
  pse->m_hfind = INVALID_HANDLE_VALUE;
 }
}

bool DirectoryTreeEnumerator::Stopped()
{
 return m_pseCur->m_hfind == INVALID_HANDLE_VALUE;
}

void DirectoryTreeEnumerator::Pop()
{
 StackEntry* pse = m_pseCur;
 m_pseCur = pse->m_pseNext;
 delete pse;
}

DirectoryTreeEnumerator::~DirectoryTreeEnumerator()
{
 while (m_pseCur) {
  StopDir();
  Pop();
 }
}

DirectoryTreeEnumerator::
    DirectoryTreeEnumerator(LPCTSTR pszDir)
 : m_pseCur(NULL)
{
 Push(pszDir);
}

FEFOUND DirectoryTreeEnumerator::Next()
{
 for (;;) {
  /* Anything to enumerate? */
  if (!m_pseCur) return FEF_DONE;

  /* If just left a directory, pop */
  if (Stopped()) {
   Pop();
   m_es = ES_NORMAL;
  }

  /* If accepted a directory, recurse */
  else if (m_es == ES_NORMAL &&
      (m_pseCur->m_wfd.dwFileAttributes &
                      FILE_ATTRIBUTE_DIRECTORY)) {
   Push(m_szPath);
  }

  /* Any more files in this directory? */
  if (m_es != ES_FIRST &&
       !FindNextFile(m_pseCur->m_hfind,
             &m_pseCur->m_wfd)) {
   StopDir();
   return FEF_LEAVEDIR;
  }

  /* Don't recurse into . or .. */
  if (lstrcmp(m_pseCur->m_wfd.cFileName,
                   TEXT(".")) == 0 ||
      lstrcmp(m_pseCur->m_wfd.cFileName,
                   TEXT("..")) == 0 ||
      !PathCombine(m_szPath, m_pseCur->m_szDir,
                   m_pseCur->m_wfd.cFileName)) {
   m_es = ES_NORMAL;
   continue;
  }

  /* Return this found item */
  m_es = ES_NORMAL; /* default state */
  if (m_pseCur->m_wfd.dwFileAttributes &
                      FILE_ATTRIBUTE_DIRECTORY) {
   return FEF_DIR;
  } else {
   return FEF_FILE;
  }
 }
 /* notreached */
}

Yuck-o-rama. The simple recursive function has turned into this horrible mess of state management.

Wouldn't it be great if we could have it both ways? The caller would see a simple enumerator that spits out files (or directories). But the enumerator sees a callback that it can throw files into.

We'll build that next time.

Published Thursday, December 30, 2004 6:58 AM by oldnewthing
Filed under:

Comments

# Are Fibers only on Win32?

Thursday, December 30, 2004 7:24 AM by Nate
I've always been baffled by the purpose of fibers, but it seems pretty clear where Raymond is going and I like what I see.

Do other platforms (UNIX, Mac) have constructs comparable to fibers? I suppose that you could build fibers on top of threads using a couple of semaphores but that seems pretty roundabout.

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 8:56 AM by Igor Tandetnik
"it seems pretty clear where Raymond is going"

Probably the same place Matthew Wilson has gone in C++ Users Journal's February 2004 article "Callback Enumeration APIs & the Input Iterator Concept"

http://www.cuj.com/documents/s=8188/cuj0402wilson/0402wilson.htm

(paid subscription required).

In this article, Matthew uses fibers to wrap EnumWindows API into STL-like iterator.


"Do other platforms (UNIX, Mac) have constructs comparable to fibers?"

POSIX has (deprecated) functions getcontext, setcontext, swapcontext and makecontext. These are effectively equivalent to fibers.

Then there's GNU Pth library that supports non-preemtive cooperatively scheduled "threads" (which are closer to Windows fibers than to Windows or POSIX threads)

http://www.gnu.org/software/pth/

Disclaimer: I have not personally used Windows fibers, nor getcontext et al, nor Pth

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 9:37 AM by Ben Cooke
I'm getting a hankering for co-routines.

(I'm also starting to get the idea that Fibers are basically just co-routines, but I'll wait until tomorrow to see if I'm right!)

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 10:19 AM by Memet
Amen Raymond. I've been waiting to see a use of Fibers all my (programming) life.
Enough with the previews already, Get on with the show! =)

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 10:30 AM by Ovidiu
Besides SQL Server, is there any other product on the market that can actually use fibers for something useful?

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 10:33 AM by David Phillips
UNIX has ucontext, but it is not implemented everywhere:

http://www.opengroup.org/onlinepubs/007908799/xsh/ucontext.h.html

This paper describes a portable method to do the above:

http://xmailserver.org/rse-pmt.pdf

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 11:13 AM by Joe White
Indy can use fibers. Indy is an open-source TCP/IP library (with classes for just about every Internet protocol you've ever heard of, both clients and servers), written in Delphi, usable from both Win32 (in Delphi) and .NET (from any .NET language). Last I knew, though, the fiber support was only in the Win32 version. I'm pretty sure Indy-with-fibers is used in some high-end products, because the fiber support took them a lot of engineering. I couldn't offhand tell you which high-end apps it's used in, but you could check out their Web site (http://www.indyproject.org/).

I heard Kudzu (Chad Hower, one of the main coders on Indy) talk about this stuff at last year's BorCon, so I know a decent bit about it. If you compile Indy with fiber support, then it uses manually-scheduled fibers together with some sort of low-level OS calls (might be I/O completion ports, but don't quote me on that) to build very high-performance TCP servers. (Transparent, too -- you write the same code either way, then just compile Indy with fibers or not.) I don't remember all the details, but if you've got hundreds or thousands of simultaneous connections, it's supposed to be a substantial performance boost. (On the other hand, if you've only got a few connections, it actually slows things down compared to the non-fiber implementation, so you have to plan accordingly. Why can't optimization ever be easy? (grin))

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Thursday, December 30, 2004 12:17 PM by David Phillips
I'm looking forward to seeing tomorrows blog. While we are waiting, here is a way to do this sort of thing in C cleanly using the preprocessor. It still requires a state machine, but the state machine is generated for you automatically:

http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

# Using fibers to simplify enumerators, part 3: Having it both ways

Friday, December 31, 2004 10:00 AM by The Old New Thing
Fibers let both sides think they're in control.

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Monday, January 03, 2005 10:44 AM by Cooney
Joe:
(On the other hand, if you've only got a few connections, it actually slows things down compared to the non-fiber implementation, so you have to plan accordingly. Why can't optimization ever be easy? (grin))

Is it really a problem? With 10 connections, the relative overhead may be more, but does it impact performance?

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Tuesday, January 04, 2005 4:47 PM by Brian
David: An interesting article. A downside not addressed there is that by using static storage instead of a second stack, you pick up reentrancy issues. But if that's not a concern, it's a clever hack.

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Tuesday, January 11, 2005 5:25 AM by Richard
> I'm getting a hankering for co-routines.

You have seen this... using fibers to implement co-routines?

http://msdn.microsoft.com/msdnmag/issues/03/09/CoroutinesinNET/default.aspx

# re: Using fibers to simplify enumerators, part 2: When life is easier for the caller

Tuesday, January 11, 2005 7:30 AM by Raymond Chen
Noting that the article uses undocumented CLR functions and even admits that it can deadlock with the garbage collector. Makes you wonder why anybody would follow the advice in that article in a real program. The last sentence of the article should scare everybody.

# Partially Debuggable CoRoutines

Thursday, January 20, 2005 6:32 PM by Wizzy's World
A while back there was an article in MSDN magazine about wrapping up the unmaged fibers API to implement...

# Coroutines

Thursday, January 11, 2007 10:19 PM by Joe White's Blog

# Coroutines

Thursday, January 11, 2007 10:21 PM by Joe White's Blog
New Comments to this post are disabled
 
Page view tracker