Inspired by a few recent posts I’ve run across (http://msdn.microsoft.com/en-us/magazine/cc546608.aspx and http://lambda-the-ultimate.org/node/3131). I decided to write a simple threading library  so, I can run the Erlang ring benchmark in C#.

Ring Benchmark (C# Version)

Here is the code of the ring benchmark itself, it uses my own ThreadIterators library which I’ll get around to posting one day once I clean it up a bit.

 

   1:  using System.Collections.Generic;
   2:  using ThreadIterators;
   3:   
   4:  static public class RingBenchmark
   5:  {
   6:      static IEnumerable<Event> 
   7:         ForwardMessage(Mailbox<int> inbox,
   8:                        Mailbox<int> outbox) {
   9:          for (; ; ) 
  10:          {
  11:              var evt = inbox.Recv();
  12:              yield return evt;
  13:              outbox.Send(evt.Value);
  14:          }
  15:      }
  16:      static Mailbox<int> 
  17:         SpawnForwaders(int n, Mailbox<int> inbox)
  18:      {
  19:          var prev = inbox;
  20:          var next = inbox;
  21:          for (int i = 0; i < n; i++)
  22:          {
  23:              next = new Mailbox<int>();
  24:              var thIter = ForwardMessage(prev, next);
  25:              ThreadIterator.Spawn(thIter);
  26:              prev = next;
  27:          }
  28:          return next;
  29:      }
  30:      static IEnumerable<Event>
  31:          PumpRing(int m, Mailbox<int> outbox)
  32:      {
  33:          for (int i = 0; i < m; i++)
  34:          {
  35:              outbox.Send(i);
  36:          }
  37:          outbox.Send(-1);
  38:          yield return null;
  39:      }
  40:      static IEnumerable<Event>
  41:          DrainRing(Mailbox<int> inbox)
  42:      {
  43:          for (; ; )
  44:          {
  45:              var evt = inbox.Recv();
  46:              yield return evt;
  47:              if (evt.Value < 0)
  48:              {
  49:                  yield break;
  50:              }
  51:          }
  52:      }
  53:      public static IEnumerable<Event> 
  54:          RingBenchMark(int n, int m)
  55:      {
  56:          var inbox = new Mailbox<int>();
  57:          var outbox = SpawnForwaders(n, inbox);
  58:          var thIter1 = PumpRing(m, inbox)
  59:          var thIter2 = DrainRing(outbox)
  60:          ThreadIterator.Spawn(thIter1);
  61:          var th = ThreadIterator.Spawn(thIter2);
  62:          yield return th.ExitedEvent;
  63:      }
  64:   
  65:  }

 

The Mailbox<T> type is from my library that’s a communication channel with unbounded buffering. This makes Send a non-blocking operation. Potentially, blocking operations like Recv return events which are yielded to inner loop of the threading libraries scheduler. The representation of a thread in my library is simply and enumeration of events, which is just a potentially infinite list of blocking operations, when ever a thread may block, it needs to yield an event which can be used to wake the thread up when the blocking operation completes.

Benchmark Results

The scheduler itself is a thread safe library, with a single method to pull off  any ready thread from the queue and process it. I call this execution method in a single threaded fashion, but since it is thread safe I can spawn two managed threads to call this scheduling method concurrently. So I can get coroutines or preemptive concurrency. I happen to have an Intel Core 2 Duo running at 2.0 Ghz with 2 GB running 32-bit Vista. So I have data for both cases. I’m comparing it to the Erlang ring benchmark from the link above using the lasted Erlang distribution for Windows.

Here is a table with the raw data, where I’ve fixed M to 1000 and exponentially increased N.

N, M=1000 Erlang (ms) C# 2 Threads (ms) C# Single Threaded
64 31 33 42
128 31 66 84
256 94 174 170
512 156 258 336
1024 358 513 670
2048 624 1038 1354
4096 1404 2086 2724
8192 2808 4171 5587
16384 5351 8307 11288
32768 10437 17657 24160
65536 20810 33920 63662
131072 41042 83268 136578
262144 80133 174877 446013

Here’s a graph for that compares the ratio of runtimes.

image

 

First you’ll notice that Erlang is roughly a factor of 2 faster than my version using 2 threads and about 5 times faster for my single threaded version. The version using 2 threads is for the most part faster then the single thread version. The 2 threaded version varies between 3% slower and 100% faster than the single threaded one. I suspect I could do a few things to make my scheduler a bit smarter about utilizing available CPU cores, mainly by taking fewer locks on the work queue.

The other thing to keep in mind is that these results are for up to 262,144 lightweight threads. There would be no way to have this many C#  managed threads, with a default stack size of 1MB.  (Perhaps, I’ll post a ring benchmark using normal C# threads later.) What’s not shown on this graph is memory footprint. Surprisingly the CLRs footprint was smaller than Erlang, but it’s hard to compare because of GC issues. C# code scaled up pretty well in comparison to Erlang, which to me is not to surprising, assuming you have the right representation of thread control state.

Conclusions

C# iterators are a quite powerful construct that lets you implement your own lightweight threading abstraction. My library is pure C# and doesn’t use any native code to do this. The only other threading approaches I know about that work at this level, require call-with-current-continuation or monads.  The library is efficient enough that C# can scale as well as Erlang which has a highly specialized and optimized runtime for this scenario. I think there is some room for improvement in my library, so I would not be surprised if I could make the factor of two go away to 10 – 20%. I suspect the GC maybe the final bottleneck though. I think the big take way here is that you don’t need a specialized  language and highly customized runtime to get a highly scalable thread abstraction that is also relatively easy to program.