December, 2008

  • Eric Gunnerson's Compendium

    Benchmarking, C++, and C# Micro-optimizations


    Two posts (1 2) on C# loop optimization got me thinking recently.

    Thinking about what I did when I first joined Microsoft.

    Way back in the spring of 1995 or so (yes, we did have computers back then, but the Internet of the time really *was* just a series of tubes), I was on the C++ compiler test team, and had just picked up the responsibility for running benchmark tests on various C++ compilers. I would run compilation speed and execution speed tests in controlled environments, so that we could always know where we were.

    We used a series of “standard” benchmarks – such as Spec – and a few of our own.

    Because execution speed was one of the few ways (other than boxes with lots of checkmarks) that you could differentiate your compiler from the other guy’s, all the compiler companies invested  resources at being faster at the benchmarks.

    The starting point was to look at the benchmark source, the resultant IL, and the final machine code, and see if you could see any opportunity for improvement. Were you missing any optimization opportunities?

    Sometimes, that wasn’t enough, so some compiler writers (*not* the ones I worked with) sometimes got creative.

    You could, for example, identify the presence of a specific expression tree that just “happened to show up” in the hot part of of a benchmark, and bypass your usual code generation with a bit of hand-tuned assembly that did things a lot faster.

    Or, with a little more work, you could identify the entire benchmark, and substitute another bit of hand-tuned assembly.

    Or, perhaps that hand-tuned assembly doesn’t really do *all* the work it needed to, but took a few shortcuts but still managed to return the correct answer.

    For some interesting accounts, please text “compiler benchmark cheating” to your preferred search engine.

    As part of that work, I got involved a bit in the writing and evaluation of benchmarks, and I thought I’d share a few rules around writing and interpreting micro-benchmarks. I’ll speak a bit about the two posts – which are about looping optimizations in C# – along the way. Just be sure to listen closely, as I will be speaking softly (though not in the Rooseveltian sense…)

    Rule 0: Don’t

    There has always been a widespread assumption that the speed of individual language constructs matter. It doesn’t.

    Okay, it does, but only in limited cases, and frankly people devote more time to it than it deserves.

    The more productive thing is to follow the agile guideline and write the simplest thing that works. And note that “works” is a bit of a weasely word here – if you write scientific computing software, you may have foreknowledge about what operations need to be fast and can safely choose something more complicated, but for most development that is assuredly not true.

    Rule 1: Do something useful

    Consider the following:

    void DoLoop()
        for (int x = 0; x < XMAX; x++)
            for (int y = 0; y < YMAX; y++)


    void TimeLoop()
        // start timer
        for (int count = 0; count < 1000; count++)
        // stop timer

    if XMAX is 1000, YMAX is 1000, and the total execution time is 0.01 seconds, what is the time spent per iteration?

    Answer: Unknown.

    The average C++ optimizer is smarter that this. That nested loop has no effect on the result of the program, so the compiler is free to optimize it out (the .NET JIT may not have time to do this).

    So, you modify the loop to be something like:

    void DoLoop()
        int sum;

        for (int x = 0; x < XMAX; x++)
            for (int y = 0; y < YMAX; y++)
                sum += y;

    The loop now has some work done inside of it, so the loop can’t be eliminated.

    Rule 2: No, really. Do something useful

    However, the numbers won’t change. The call to DoLoop() has no side effects, so the entire call can be safely eliminated.

    To make sure your loop is really a loop, there needs to be a side effect. The best bet is to have a value returned from the method and write it out to the console. This has the added benefit of giving you a way of checking whether things are working correct.

    Rule 3: Benchmark != Real world

    There are lurking effects that invalidate your results. Your benchmark is likely tiny and places very different memory demands on the system than your real program does.

    Rule 4: Profile, don’t benchmark


    C# loop optimization

    If you are writing code that needs the utmost in speed, there is an improvement to be had using for rather than foreach. There is also improvement to be had using arrays rather than lists, and unsafe code and pointers rather than array indexing.

    Whether this is worthwhile in a specific case depends exactly on what the code is doing. I don’t see a lot of point in spending time measuring loops when you could spend time measuring the actual code.

  • Eric Gunnerson's Compendium

    Holiday light project 2008 in pictures


    A trip through the new project in pictures:

    I was late in getting started on the project due to putting finished touches on the office, but I ended up with this wonderful new workbench. Quite the step up from the old door on top of shelf units that I've used for the last 35 years or so (really).

    Left to right, we have the Tektronix dual-channel 20MHz oscilloscope (thank you eBay), a bench power supply, a perfboard with sockets on it in front of my venerable blue toolbox (also 35+ years old), a outlet strip with a power supply plugged into it, a perfboard, a STK500 microcontroller programmer, a weller soldering staioin, and a fluke voltmeter.

    This the project in its first working version. On the far left partially clipped is the power supply. The upper-left part of the prototype board (the white part) has the zero crossing circuit, and the upper-right has a solid-state relay. A brown wire takes the zero-crossing signal to the microcontroller on the development board, and a brown wire takes the signal back to the relay. The Atmel AVR microcontroller that I use comes in a lot of different sizes, so the development board has any different sockets to support the. On the far-right is a white serial line which leads to my laptop - the AVR is programmed over a serial link.

    Back to the zero-crossing circuit. To switch AC power, you use a semiconductor device known as a triac. The triac has a weird characteristic - once you turn it on, it stays on until the voltage goes back to zero. That happens 120 times per second, so to implement diming you need to control when you turn on the power for for each half-cycle.

    Here's a picture that should make it easier to understand.

    The wavy part is the AC sine wave, and the nice square pulse is the zero-crossing signal, which goes high whenever the AC voltage is low enough. The microcontroller gets interrupted when the zero-crossing signal goes high, and then waits a little time until just after the real zero crossing happens.

    If it turned the output right at that point, it would stay on for the whole half-cycle, which would means the light was on full bright. If it never turned it on, it would mean the light was (does anybody know the answer? class?) off. If it turned it off halfway in between, the light would be at half brightness. To implement the 32 levels of brightness means dividing the half-cycle into 32 divisions of each area, corresponding to areas of equal power.

    (To be better, I should take into account the power/luminance curve of the incandescent bulb that I'm using and use that to figure out what the delays are. Perhaps the next version).

    To do this for multiple channels, you end up with code that does the following:

    1. Check all channels to see which ones should be turned on at the current time.
    2. Figure out when the next time is to check.
    3. Set an interrupt to that time.

    That gives us a set of lights stuck at a given dim level. To animate, you need to change that dim level over time. That is handled at two levels.

    The interrupt-level code handles what I call a dim transition. At least, that's what I call it now, since I didn't have a name for it before. We have a vector of current dim levels, one for each channel, and a vector of increments that are added to the current dim vector during each cycle.

    So, if we want to slowly dim up channel 1 while keeping all the others constant, we would set dimIncrement[0] to 1 and set the count to 31. 31 cycles later, channel 1 would be at full brightness.

    If we want to do a cross-fade, we set two values in the increment vector.

    That all happens underneath the covers - the main program loop doesn't know about it. The main program loop figures out what will happen next after the current dim transition, and then blocks.

    My early controllers were all table-based, with the tables pre-computed. This was because I was writing in assembler. The current system could also use that approach, but with only 2K of program memory, the procedural approach is more compact, though it is considerably harder to debug. I have a C# program I use to create and test the animations, but I need to rewrite it to use DirectX because I need a 120Hz frame rate to match what I the dimming hardware does.

    To get back to the zero-crossing circuit, I first built this circuit using a wall wart with a switching power supply. Such power supplies are small and efficient, but put a lot of noise into the transformer side. I wasted a lot of time on this, and ultimately switched back to a conventional wall wart (from an old Sony discman I used to have) with a linear power supply. Problem solved.

    Back to pictures:

    Here's the completed controller board.

    In the center is the AVR 2313V microcontroller. The U-shape is the solid-state relays that switch the AC for the lights. These are nice little Panasonic AQH2223 relays, which switch 600mA (about 75 watts) (though you can get the in versions that do 150 watts), are tiny, and, most importantly, they're less than $2 each.

    Note that these do not have built-in zero-crossing circuits built in. Most solid-state relays do, but you can't use those to do dimming.

    The top has the one transistor used to generate the zero-crossing circuit, a 7805 voltage regulator to provide +5V to the circuit, and a few passive components.

    Careful viewers will notice that the upper-right socket is empty. That's because it's only a 15-channel controller, but I used 16-pin sockets.  The blue wire that holds the AC socket wires on is wire-wrap wire that I had lying around - these are hot-glued down later on. The two black wires provide the rectified voltage (about 15V IIRC) from the wall-wart.

    The controller board is in a small plastic box designed to hold 4x6 photos, and then that's inside of a larger box. This lets me keep all of the electrical connections inside of the box. It's not really required for safety, but if you have a lot of exposed plugs and some water on them, you can get enough leakage from them to trip the GFI on your circuit. So having them all inside is better.

    The box will be enclosed in a white kitchen garbage bag for weather protection when it's outside. That seems low-tech, but has worked will in all of my controllers over the years.


    Projects like this often come down to cabling. Each light needs a wire that goes from the controller out to the light. I did a random layout of lights on the tree, and put them on 5 different ropes so they could easily be pulled up the tree on a pulled.

    Here are the 15 lights and all the extension cords required to hook them up. In this case, I used 24 15' extension cords because it was cheaper and easier than building the cables up from scratch.

    That's all for now.

  • Eric Gunnerson's Compendium

    Holiday Lights 2008


    Today, I took a break in the snow and finished the installation of the new light display. It's functional, except for one light that isn't working.  I've been extra busy this year, so while the main displays are up, there aren't as many additional lights as I would like to have.


    Our recent snowstorm has changed the look quite a bit - normally you only get a little light from the streetlight on the left, but now there's a ton.

    On the left, there are 8 strings of multipcolored LEDs in a circle around the light pole. To the right in front of the truck are some other lights. Hiding behind the truck is the first animated display, the "tree of lights". The big tree (about 40' tall) has red leds around the trunk, and features to animated displays. At the top is the second animated display, the "ring of fire", arrayed on the tree is the new display. To the right you can see the original animated display, santa in the sleigh and on the roof. Finally, outlining the house is a red/green/blue/white string, the last animated display.

    Tree of Lights

    16 channel sequenced controller, about 1500 lights total. From base of tree to top is about 14'.

    The controller is 68HC11 based.














    Ring of Fire

    Ring of Fire is 16 high-output red LEDs driven by a custom 16 channel controller, supporting 16 dim levels per LED.

    The controller is Atmel AVR based.

    I wrote a fair bit about it last year.








    The display that started it all. It animates as follows:

    1. Blue lights in foreground strobe towards santa.
    2. Reindeer, sleigh, and santa appear.
    3. Santa leaves sleight and hops up on the roof edge.
    4. Santa goes up to the peak near the chimney.
    5. Santa disappears, and then his hat disappears soon after.

    Then the whole things reverses itself.

    The display itself is painted plywood, with about 800 lights in total. After 12 years the lights had gotten a bit dim, so this year we replaced all of them. The santa at the top of the roof is usually a it more distinct, but he has a big snow beard this year.

    The controller is based on the Motorola 68HC11, running 8 channels.

    House Lights

    The house lights are 4 individual strands in red, green, blue, and white, with a 4-channel controller that dims between the colors. So, the house changes between colors.

    The controller is based on the Motorola 68HC11, with 4 channels, this time dimmable.

    Tree Lights

    The tree lights are the new display for this year.

    These are jumbo lights lit up with C7 (7 watt) bulbs inside of of a colored plastic housing. They really don't show up that well in the picture because of all the light coming off the snow, but even so, I think I will likely need to upgrade the bulbs in them to something brighter (say, in the 25 watt range). And I think I will go with clear bulbs - having both colored bulbs and colored lenses works well for yellow and orange but the blues and greens are really dark.

    The controller can support up to about 100 watts per channel, though I'm not sure my power budget can support it.

    The controller is Atmel AVR based (my new platform of choice), and the code is written in C. There are 15 channels, and each of them has 32 dimming levels. 

    You can find a lot more boring info here.

  • Eric Gunnerson's Compendium

    Retro gaming at its best...


    Back when I was in high school, in the early 1980s, was when I was first introduced to computer games.  What we called "arcade games" at the time.

    There were three common systems for this.

    First of all, there was the TRS-80. You can read about all the exciting detail in the link, but the system could display text at 64 x 18, and graphics at a resolution of 128 x 48 if you used the special 2x3 grid characters. Each pixel was either black or an interesting blue/white (the phosphor used in the monitor was not pure white).

    In addition to not having any storage built in, it also had no sound output whatsoever. However, it was well-renowned for putting out tremendous amounts of RF interference, and somebody discovered that if you put an AM radio next to it, you could, through use of timing loops, generate interference that was somewhat related to what was going on in the game.

    But it was cheap. Not cheap enough for me to afford one, but cheap.

    The second system was the Apple II, sporting 280x192 high-resolution color graphics. Well, 'kindof-color' graphics - any pixel could be white or black, but only odd ones could be green or orange, and only even ones could be violet or blue.

    The sound system was a great improvement over the TRS-80, with a speaker that you could toggle to off or on. Not off or on with a tone, just off or on - any tone had to be done in software synthesis.

    Finally, the third system was the Atari 800 and 400. It was far more sophisticated - not only did it have the 6502 as the Apple II did, it had a separate graphics processor (named Antic) that worked its way through a display list, a television interface processor that implemented player-missile graphics (aka "sprites") and collision detection in hardware, and a third custom chip that handles the keyboard, 4-channel sound output, and controllers (we called them "paddles" and "joysticks" back then).

    It was light-years beyond the Apple in sophistication, which only shows you the importance of cheapness over elegance of design and implementation.

    Oh, and you could plug *cartridges* into it, so you didn't have to wait for your game to load from the cassette (or later, floppy disk) before you played it.

    My brother-in-law bought an Atari 400 (the younger sibling of the 800), and of course he had a copy of Star Raiders, arguably one of the first good first-person shooters.  He also had a copy of Miner 2049er, a 2-D walker/jumper that's a little bit like donkey kong and a bit like pac man.

    It was very addictive, and put 10 levels into a small cartridge.

    It was followed in 1984 by "Bounty Bob Strikes Back", featuring 30 levels.

    I played both a fair bit until we finally broke down and sold our Atari 800XL in the early 1990s.

    And now, Big Five software has released both games in emulator form, so you can run them on your PC.

    Marvel to the advanced graphics, and wonderful sound. Note that the gameplay and addiction is still there.

    I have to run it at 640x480 size or the keys aren't sensed correctly. Play Miner first, as the controls are slightly different, and you'll get confused.

    Highly recommended.

Page 1 of 1 (4 items)