Sign in
Funny, It Worked Last Time
... and other odd mutterings of a performance junkie
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Email Blog Author
RSS for posts
Atom
RSS for comments
OK
Search
Tags
C++
I18N
Performance
Archive
Archives
June 2005
(1)
May 2005
(2)
January 2005
(2)
November 2004
(1)
October 2004
(5)
Posts
Subscribe via RSS
Sort by:
Most Recent
|
Most Views
|
Most Comments
Excerpt View
|
Full Post View
Funny, It Worked Last Time
Encodings in Strings are Evil Things (Part 8)
Posted
over 8 years ago
by
ryanmy
8
Comments
Going from a legacy format to Unicode is fairly simple; in addition to combining characters, Unicode also provides an array of compatibility characters. Compatibility characters are canonically equivalent to a sequence of one or more other Unicode characters; they are usually placed so that you have a single codepoint that's equivalent to a character in some older standard. For example, ISO8859-2 defines 0x5A to be equivalent to a capital letter L with a caron accent (Ľ). The "simple" equival...
Funny, It Worked Last Time
Encodings in Strings are Evil Things (Part 7)
Posted
over 8 years ago
by
ryanmy
1
Comments
Imagine that you've allocated a byte array for recv()ing something in from a TCP socket. If we know that said content is UCS-4, the natural urge is to cast it to an unsigned long * to iterate over... except that you can't. Or, at least, you shouldn't. If that byte array isn't suitably aligned for 32-bit accesses, code will either run slowly (on x86 and AMD64) or crash (on IA-64, unless SetErrorMode() is called to force OS alignment fixups, in which case it will run extremely slowly). Of cour...
Page 1 of 1 (2 items)