Funny, It Worked Last Time

... and other odd mutterings of a performance junkie

October, 2004

  • Funny, It Worked Last Time

    Encodings In Strings Are Evil Things (Part 1)

    What is a string? About six months ago at the Game Developers Conference in San Jose, I sat in on a talk about performance tuning in Xbox games. The presenter had a slide that read: "Programmers love strings. Love hurts." This was shown while he described a game which was using a string identifier for every object in the game world and hashing on them, and was incurring a huge performance hit from thousands of strcmp()s each frame. I nodded -- but my mind was thinking......
  • Funny, It Worked Last Time

    Encodings in Strings are Evil Things (Part 2)

    At the end of the last post, we reduced the abstract concept of "string" down to an "ordered sequence of Unicode code points." (We did so by choosing to actively ignore glyph information, but we'll be coming back to it later.) Unicode code points are simply numbers; of course, numbers have to be reduced to binary to be stored in a computer. Someone who is reading a string needs to use the exact same encoding scheme. And not all encoding schemes are equal......
  • Funny, It Worked Last Time

    Encodings in Strings are Evil Things (Part 4)

    In our last episode, we established that we wouldn't be able to make a true std::string replacement and still handle variable-width encodings. So, we started with the beginning lines of an rmstring class. However, this doesn't mean we are going to dispense with std::string entirely! And, as it turns out, compatibility with it is both easier and harder than actually making a std::string, depending on what you're implementing and where......
  • Funny, It Worked Last Time

    Encodings in Strings are Evil Things (Part 3)

    Yesterday, we took the definition of string as an ordered sequence of Unicode code points, and explored various schemes for encoding and decoding code point indices on a binary computer. At the end, we had a new definition for string -- a stream of bits, and some type of information identifying the encoding scheme used to interpret the bits as a stream of Unicode codepoints. Today, since I'm a coder, we'll be starting a C++ implementation of a string library based on this definition....
  • Funny, It Worked Last Time

    Encodings in Strings are Evil Things (Part 5)

    However, regardless of whether pre-composed characters are favored or not, there are some character sequences which do not have pre-composed equivalents and must be represented using combining characters. Of course, our problem here is that most programmers don't think about accents as being distinct elements to iterate through! When you hit the right arrow in Microsoft Word to skip over an À, you don't go first to an A and then to the A's accent -- you move past the whole "character." (Unico...
Page 1 of 1 (5 items)