In my last post, I included the unverified claim, "I have heard (but I have not personally verified) that GDI font rendering is approximately 10x faster than GDI+ font rendering (1M glyphs / second vs. 100K glyphs / second). Of course, there is a reason why GDI+ takes more work - text is truly device independent using GDI+. Yet another tradeoff. If you are interested in the details, check out GDI+ Text, Resolution Independence, and Rendering Methods. Or - Why does my text look different in GDI+ and in GDI?." Well, as you can probably guess by the title of this entry, making that unverified claim kept gnawing away at me, so when I finally had some time to cut some code, I decided to investigate that myself. What really is the performance overhead of these features?

I didn't really have the time to put together a scientific study, but I was OK with that. What I really wanted was a gut check, and if I was correct within an order of magnitude, then this was good enough for me. Rendering performance is, of course, going to change based on font, character, how the characters are put together, and how we leverage any internal optimizations of the framework. To keep things simple, I decided to stick with my favorite font, Segoe UI, using ClearType. Rather than trying to select a single character, or limiting myself to letters of the alphabet, I iterated through ASCII codes from 33 to 126, all of which are printable characters. Rather than constructing arbitrary strings, I just rendered one character at a time. I wanted to render a lot of characters, but I didn't want any potential confounds from overlapping characters, so I spaced them out evenly across the entire screen and maximized my window. I wanted to take a reasonable number of samples, so I handled a mouse click to fire up a timer and invalidated the window 100 times for each rendering type. Finally, I wanted to create this in unmanaged code, for a couple of reasons. First, it eliminates another potential confound. Second, every time I create software in unmanaged code, it reminds me of just how much I prefer using managed code!

Simple Application for Measuring GDI and GDI+ Font Rendering Performance

What did I discover? On my Toshiba M200 Tablet PC, using a GeForce FX 5200Go with 32 MB of graphics memory, the absolute values of my numbers were lower. However, the relative values did align with the claims that I published earlier: GDI+ font rendering was approximately an order of magnitude slower than GDI rendering, proving once again that executing code is more expensive than not executing code. Specifically, my GDI code path was rendering approximately 99,000 glyphs per second, while my GDI+ code path was rendering approximately 16,000 glyphs per second.

Of course, I have rendered my share of text using GDI+ and it was always fast enough that I never thought twice about sacrificing the important features and looking for something faster, but it's always good to know what the options and tradeoffs are. This is something to think long and hard about. There is a big difference between optimizing away waste and optimizing away features. If I have a computer that is much more powerful than what was available when GDI was designed, I could theoretically just run the same applications much, much faster. Alternately, I could run updated versions of those new applications that provide me with tangible (whether or not they are particularly sexy) benefits. Resolution independence is a good thing. Enhanced readability is a good thing. Only when I find an audience who is reading more than 16,000 glyphs per second (some serious speed readers) am I going to start wanting to thinking about the trade-offs. With the implementation I have here, I get noticeably more flicker with the GDI+ implementation than I do with the GDI implementation, but double buffering is a solution I would consider before losing the ability to satisfy a user 2 years from now using a 500 dpi screen who thinks my font rendering is less readable than competing applications.

For those who are interested, this is the code I used for my little experiment. It's quick and dirty, but it was good enough for my limited purposes.

#include <WINDOWS.H>
#include <STRSAFE.H>
#include <GDIPLUS.H>

#define ID_TIMER 1

using namespace Gdiplus;

LRESULT CALLBACK WndProc (HWND, UINT, WPARAM, LPARAM);

int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance, 
                    LPSTR lpCmdLine, int nShowCmd) {

  HWND hwnd;
  MSG msg;
  WNDCLASS wndclass;
  GdiplusStartupInput gdiplusStartupInput;
  ULONG_PTR gdiplusToken;
  static TCHAR szAppName[] = TEXT("GDI vs. GDI+ Performance");

  GdiplusStartup(&gdiplusToken, &gdiplusStartupInput, NULL);

  wndclass.style = CS_HREDRAW | CS_VREDRAW;
  wndclass.lpfnWndProc = WndProc;
  wndclass.cbClsExtra = 0;
  wndclass.cbWndExtra = 0;
  wndclass.hInstance = hInstance;
  wndclass.hIcon = LoadIcon(NULL, IDI_APPLICATION);
  wndclass.hCursor = LoadCursor(NULL, IDC_ARROW);
  wndclass.hbrBackground = (HBRUSH) GetStockObject(WHITE_BRUSH);
  wndclass.lpszMenuName = NULL;
  wndclass.lpszClassName = szAppName;

  RegisterClass(&wndclass);

  hwnd = CreateWindow(szAppName, szAppName, 
    WS_OVERLAPPEDWINDOW | WS_MAXIMIZE, CW_USEDEFAULT, 
    CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, NULL,
    NULL, hInstance, NULL);
  ShowWindow(hwnd, nShowCmd);
  UpdateWindow(hwnd);

  while (GetMessage(&msg, NULL, 0, 0)) {
    TranslateMessage(&msg);
    DispatchMessage(&msg);
  }

  GdiplusShutdown(gdiplusToken);

  return msg.wParam;
}

LRESULT CALLBACK WndProc(HWND hwnd, UINT message, 
                         WPARAM wParam, LPARAM lParam) {

  static LOGFONT lf;
  static double gdiDuration = 0;
  static double gdiplusDuration = 0;
  static __int64 gdiGlyphCount = 0;
  static __int64 gdiplusGlyphCount = 0;
  static int timerCounter;
  static bool useGdi = true;

  switch (message) {

    case WM_CREATE:
      lf.lfWidth = 0;
      lf.lfItalic = FALSE;
      lf.lfQuality = 6; //CLEARTYPE_NATURAL_QUALITY
      StringCchCopy(lf.lfFaceName, 9, TEXT("Segoe UI"));
      return 0;

    case WM_LBUTTONDOWN:
      timerCounter = 0;
      gdiDuration = 0;
      gdiplusDuration = 0;
      gdiGlyphCount = 0;
      gdiplusGlyphCount = 0;
      SetTimer(hwnd, ID_TIMER, 100, NULL);
      return 0;

    case WM_TIMER:
      if (timerCounter++ < 100) {
        useGdi = true;
        InvalidateRect(hwnd, NULL, TRUE);
      } else if (timerCounter < 200) {
        useGdi = false;
        InvalidateRect(hwnd, NULL, TRUE);
      } else {
        KillTimer(hwnd, ID_TIMER);
        __int64 frequency;
        QueryPerformanceFrequency((LARGE_INTEGER*)&frequency);
        gdiDuration = (double)gdiGlyphCount /
          (gdiDuration / (double)frequency);
        gdiplusDuration = (double)gdiplusGlyphCount /
          (gdiplusDuration / (double)frequency);
        wchar_t durationString[100];
        swprintf_s(durationString, 100, 
          TEXT("GDI:\t%0.5f glyphs/sec\nGDI+:\t%0.5f glyphs/sec"),
          gdiDuration, gdiplusDuration);
        MessageBox(hwnd, durationString, TEXT("Duration"), 0);
      }
      return 0;

    case WM_PAINT:

      {

        PAINTSTRUCT ps;
        HDC hdc = BeginPaint(hwnd, &ps);
        RECT clientRect;
        GetClientRect(hwnd, &clientRect);

        lf.lfHeight = -MulDiv(12, GetDeviceCaps(hdc, LOGPIXELSY), 72);
        SelectObject(hdc, CreateFontIndirect(&lf));
        TEXTMETRIC tm;
        GetTextMetrics(hdc, &tm);
        RECT charRect;
        charRect.top = 0;
        charRect.bottom = tm.tmHeight;
        char displayChar = 33;

        Graphics g(hdc);
        g.SetTextRenderingHint(TextRenderingHintClearTypeGridFit);
        SolidBrush brush(Color(255, 0, 0, 0));
        FontFamily fontFamily(L"Segoe UI");
        Font font(&fontFamily, 12, FontStyleRegular, UnitPoint);

        __int64 start;
        __int64 end;
        QueryPerformanceCounter((LARGE_INTEGER*)&start);

        while (charRect.bottom < clientRect.bottom) {
          charRect.left = 0;
          charRect.right = tm.tmMaxCharWidth;
          while (charRect.right < clientRect.right) {
            PointF pointF(charRect.left, charRect.top);
            WCHAR charString[2];
            charString[0] = (WCHAR)displayChar;
            charString[1] = '\0';
            if (useGdi) {
              DrawText(hdc, charString, -1, &charRect, DT_SINGLELINE);
              gdiGlyphCount++;
            } else {
              g.DrawString(charString, -1, &font, pointF, &brush);
              gdiplusGlyphCount++;
            }
            displayChar = displayChar < 126 ? displayChar + 1 : 33;
            charRect.left += tm.tmMaxCharWidth;
            charRect.right += tm.tmMaxCharWidth;
          }
          charRect.top += tm.tmHeight;
          charRect.bottom += tm.tmHeight;
        }
        QueryPerformanceCounter((LARGE_INTEGER*)&end);
        if (useGdi) {
          gdiDuration += (double)end - (double)start;
        } else {
          gdiplusDuration += (double)end - (double)start;
        }

        DeleteObject(SelectObject(hdc, GetStockObject(SYSTEM_FONT)));
        EndPaint(hwnd, &ps);
        return 0;

      }

    case WM_DESTROY:
      PostQuitMessage(0);
      return 0;
  }
  return DefWindowProc(hwnd, message, wParam, lParam);
}

You may notice one thing about how I am timing: while I have moved a lot of the code outside of the timer, there is still quite a bit inside. This is another tradeoff that I had to make in measurement. If I just timed the rendering calls themselves, then there is a larger percentage of error incorporated into my measurements. Moved outside, as it is now, I gain precision, while at the same time losing it by incorporating non-rendering code in my measurement. Neither option seemed perfect, but the latter seemed the superior approach to me. I did take the time to measure it both ways, however - the outcome with timing each individual rendering call was slightly lower performance numbers: approximately 89,000 glyphs/second with GDI and approximately 10,000 glyphs/second with GDI+. However, the relationship between the two was very similar, so my confidence was not shaken.

Of course, if you are interested at all in this line of thought, you may also be interested in rendering performance using Windows Presentation Foundation ("Avalon"). However, that is someplace I am not willing to go at this point. First of all, it would be completely unfair to start benchmarking beta code. I certainy wouldn't want to benchmark beta code on a beta operating system running on beta WDDM drivers! Furthermore, as DirectX 10 hardware acceleration begins to arrive in consumer products, this will change dramatically. I remember back during the Windows XP (Whistler) beta the remarkable difference we experienced in some of the new 2D effects (such as the "fade to grey" effect when you are logging out) once video cards and drivers began to support 2D hardware acceleration...