Sign in
I'm not a Klingon (
)
Shawn Steele's thoughts about Windows and .Net Framework globalization APIs
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Email Blog Author
RSS for posts
Atom
RSS for comments
OK
Search
Tags
AppBar
Custom Cultures / Locales / CultureInfo
Dates & Times
eMail Address Internationalization
IDN (Internationalized Domain Names)
Klingon
Lego
Media Center
Pages
Sailing
Silverlight
sorting
System.Text
Unicode and Code Pages/Encodings
Windows Home Server
Archive
Archives
May 2013
(2)
April 2013
(2)
January 2013
(1)
November 2012
(1)
July 2012
(2)
June 2012
(8)
November 2011
(3)
March 2011
(2)
January 2011
(1)
December 2010
(2)
October 2010
(2)
September 2010
(1)
July 2010
(1)
May 2010
(2)
February 2010
(2)
January 2010
(2)
November 2009
(2)
October 2009
(2)
September 2009
(3)
August 2009
(3)
July 2009
(1)
June 2009
(3)
May 2009
(4)
November 2008
(1)
September 2008
(8)
August 2008
(1)
July 2008
(1)
June 2008
(2)
May 2008
(1)
April 2008
(2)
March 2008
(3)
January 2008
(1)
December 2007
(3)
November 2007
(1)
October 2007
(1)
September 2007
(1)
August 2007
(2)
July 2007
(4)
June 2007
(6)
May 2007
(5)
April 2007
(3)
March 2007
(8)
February 2007
(5)
January 2007
(2)
December 2006
(1)
November 2006
(3)
October 2006
(4)
September 2006
(7)
August 2006
(5)
July 2006
(3)
June 2006
(8)
May 2006
(2)
April 2006
(1)
March 2006
(1)
February 2006
(4)
January 2006
(4)
December 2005
(5)
November 2005
(2)
October 2005
(1)
September 2005
(2)
August 2005
(3)
May 2005
(1)
April 2005
(2)
March 2005
(17)
Posts
Subscribe via RSS
Sort by:
Most Recent
|
Most Views
|
Most Comments
Excerpt View
|
Full Post View
I'm not a Klingon (
)
Converting text file code pages
Posted
3 months ago
by
Shawn Steele - MSFT
4
Comments
I've said "use Unicode" a lot, but sometimes there are programs that aren't doing what you'd expect, and outputting stuff in a different code page. Additionally, you might sometimes encounter a text file that was created using the system code page of...
I'm not a Klingon (
)
What is Title Case?
Posted
over 4 years ago
by
Shawn Steele - MSFT
4
Comments
Disclaimer: I'm not an English teacher (that's my mom), so I'm sure my description of title casing in English probably has exceptions/variations. Title casing has an interesting history in computer programming. Programmers like to use CamelCase to...
I'm not a Klingon (
)
Writing "fields" of data to an encoded file.
Posted
over 4 years ago
by
Shawn Steele - MSFT
2
Comments
The moral here is "Use Unicode," so you can skip the details below if you want :) A common problem when storing string data in various fields is how to encode it. Obviously you can store the Unicode as Unicode, which is a good choice for an XML file...
I'm not a Klingon (
)
Why can't we strip the diacritics?
Posted
over 6 years ago
by
Shawn Steele - MSFT
5
Comments
We have some "best-fit" behavior which we generally consider to be "bad". Any loss of data is generally a bad thing, so we recommend storing data in Unicode (so you don't lose anything). Assuming you can't use Unicode, why is it so bad to just make everything...
I'm not a Klingon (
)
Encoder/Decoder Encoding fallbacks fail after 2GB of data has been converted
Posted
over 6 years ago
by
Shawn Steele - MSFT
0
Comments
We have an unfortunate bug in .Net v2.0+ that causes encoding or decoding of more than 2GB of data to fail. That's a lot of data, but it still shouldn't do that. This is a problem with our built in fallbacks. Ironically, if you encounter bad bytes...
I'm not a Klingon (
)
How do I get HKSCS 2004 characters from Big-5 in .Net?
Posted
over 6 years ago
by
Shawn Steele - MSFT
3
Comments
Well, that's pretty tricky. We provide the Microsoft Character Code Conversion Routines For HKSCS-2004 functions, but those are intended for use with unmanaged code. The fundemental problem is that these "HKSCS" characters were in use prior to the...
I'm not a Klingon (
)
Please avoid UTF-7
Posted
over 6 years ago
by
Shawn Steele - MSFT
1
Comments
UTF-7 inherently some of the security issues that concern people about encodings. For example, by shifting in & out of the base64 mode one can create multiple representations of the same string, enabling spoofing and other problems. UTF-7 is primarily...
I'm not a Klingon (
)
Some Reasons to Make Your Application Unicode
Posted
over 6 years ago
by
Shawn Steele - MSFT
3
Comments
[Updated Mar 30 2007: Mike pointed out errors which I've corrected] Many applications are "still" ANSI and can't handle Unicode. We (Microsoft) have even released non-Unicode applications reasonably recently. even though we should know better. In particular...
I'm not a Klingon (
)
A History of Code Pages or What Made Code Page XXXX (or many other computer things) The Way It Is?
Posted
over 6 years ago
by
Shawn Steele - MSFT
1
Comments
Disclaimer: This is mostly my conjecture, so I could be completely wrong about some of this, but it seems plausible to me. I’m aiming for the general concepts here, not to start a discussion about the specific details of the history of code pages. ...
I'm not a Klingon (
)
Expected names of Microsoft Windows "ANSI" Code Pages (Encodings)
Posted
over 7 years ago
by
Shawn Steele - MSFT
0
Comments
I was asked about our use of the windows "ansi" code page names, as used in things like MIME types, http content-type tags, etc. Each "code page" has a name that most accuratly round trips back to the same code page, which I've listed as the "preferred...
I'm not a Klingon (
)
Example of overriding your own Encoding.
Posted
over 7 years ago
by
Shawn Steele - MSFT
7
Comments
Previously I wrote about the Best Way to Make Your Own Encoding , but didn't include an example, so today I'm including an example of a replacement Encoding. I also included an EncoderFallback example, which replaces unknown characters with numerical...
I'm not a Klingon (
)
Best Way to Make Your Own Encoding
Posted
over 7 years ago
by
Shawn Steele - MSFT
5
Comments
Martin recently asked what the best way to roll his own encoding in .Net 2.0, in particular can you override Encoding/Encoder/Decoder, or should he write his own StreamWriter. #1 is, of course, to use Unicode :), but apparently Martin doesn't have...
I'm not a Klingon (
)
Encoding.GetEncodings() has a couple "duplicate" names
Posted
over 7 years ago
by
Shawn Steele - MSFT
4
Comments
The Microsoft.Net v2.0 Encoding.GetEncodings() method returns a complete list of supported encodings, uniquely distinguished by code page. Note that in general I consider the code page number to be a poor way to exchange code page information since its...
I'm not a Klingon (
)
What's with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? Part 2
Posted
over 7 years ago
by
Shawn Steele - MSFT
1
Comments
A little over a year ago I wrote What's with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()? to address the question "Why does GetMaxCharCount(1) for my favorite Encoding return 2 instead of 1." (Short answer is that the Decoder/Encoder could...
I'm not a Klingon (
)
Change to Unicode Encoding for Unicode 5.0 conformance
Posted
over 7 years ago
by
Shawn Steele - MSFT
3
Comments
The behavior for UTF8Encoding, UnicodeEncoding and UTF32Encoding has changed in Windows Vista to conform better to the Unicode 5.0 requirements for Unicode Encodings. [23 July 2007: Now this behavior has also been made to .Net 2.0 with MS07-040 update...
I'm not a Klingon (
)
Best Fit in WideCharToMultiByte and System.Text.Encoding Should be Avoided
Posted
over 7 years ago
by
Shawn Steele - MSFT
7
Comments
Windows and the .Net Framework have the concept of "best-fit" behavior for code pages and encodings. Best fit can be interesting, but often its not a good idea. In WideCharToMultiByte() this behavior is controlled by a WC_NO_BEST_FIT_CHARS flag. In .Net...
I'm not a Klingon (
)
What's my Encoding Called?
Posted
over 8 years ago
by
Shawn Steele - MSFT
1
Comments
There is a bit of confusion about the System.Text.Encoding names, primarily "Which name do I use for my Encoding?" The Encoding class has 3 hame properties: BodyName, WebName and HeaderName, and the EncodingInfo objects returned by Encoding.GetEncodings...
I'm not a Klingon (
)
Code Page 21027 "Extended/Ext Alpha Lowercase"
Posted
over 8 years ago
by
Shawn Steele - MSFT
1
Comments
I was playing with code pages and ran into an interesting case: Code Page 21027 - Ext Alpha Lowercase. This code page has some interesting behavior. It looks like a Japaneses EBCDIC code page, however its kind of "missing" mappings for some characters...
I'm not a Klingon (
)
Encoding/Decoding/Crypting and buffer lengths
Posted
over 8 years ago
by
Shawn Steele - MSFT
0
Comments
This code snippet has a somewhat common bug. I've seen this bug in all sorts of buffer manipulation code, not just cryptography, so I thought I'd share this. CryptoStream cs = new CryptoStream(myStream, myTransform, CryptoStreamMode.Read); byte[] fromEncrypt...
I'm not a Klingon (
)
UTF8 Security and Whidbey Changes
Posted
over 8 years ago
by
Shawn Steele - MSFT
4
Comments
Unicode is always in the process of evolving, and some changes have been made to UTF8 in the last few versions. The UTF-8 algorithm is fairly simple, but there are a few clarifications that are important for security reasons. Primarily there is...
I'm not a Klingon (
)
Don't Use Encoding.Default
Posted
over 8 years ago
by
Shawn Steele - MSFT
3
Comments
So you want to save some data and don't know which Encoding to use. My biggest suggestion is please do NOT use Encoding.Default. Huh? That can't be right. You heard me right, please don't use Encoding.Default. Encoding.Default sounds like the right...
I'm not a Klingon (
)
What's the difference between an Encoding, Code Page, Character Set and Unicode?
Posted
over 8 years ago
by
Shawn Steele - MSFT
1
Comments
Encoding, Code Page and Character Set are often used interchangeably, even when that isn't strictly correct. There are some distinctions though: Characters are usually thought of as the smallest element of writing that has a meaning. It could be a punctuation...
I'm not a Klingon (
)
What's with Encoding.GetMaxByteCount() and Encoding.GetMaxCharCount()?
Posted
over 8 years ago
by
Shawn Steele - MSFT
1
Comments
The behavior of Encoding.GetMaxByteCount() changed somewhat between .Net version 1.0/1.1 and version 2.0 (Whidbey). The reason for this change is partially because GetMaxByteCount() didn't always return the worst-case byte count, and also because the...
Page 1 of 1 (23 items)