Welcome to MSDN Blogs Sign in | Join | Help

January 2005 - Posts

Going from a legacy format to Unicode is fairly simple; in addition to combining characters, Unicode also provides an array of compatibility characters. Compatibility characters are canonically equivalent to a sequence of one or more other Unicode characters; they are usually placed so that you have a single codepoint that's equivalent to a character in some older standard. For example, ISO8859-2 defines 0x5A to be equivalent to a capital letter L with a caron accent (Ľ). The "simple" equival Read More...
8 Comments
Filed under: ,
Imagine that you've allocated a byte array for recv()ing something in from a TCP socket. If we know that said content is UCS-4, the natural urge is to cast it to an unsigned long * to iterate over... except that you can't. Or, at least, you shouldn't. If that byte array isn't suitably aligned for 32-bit accesses, code will either run slowly (on x86 and AMD64) or crash (on IA-64, unless SetErrorMode() is called to force OS alignment fixups, in which case it will run extremely slowly). Of cour Read More...
1 Comments
Filed under: ,
 
Page view tracker