Welcome to MSDN Blogs Sign in | Join | Help

Browse by Tags

All Tags » Unicode and Code Pages/Encodings   (RSS)

Where to Look Up Information About Microsoft Code Pages?

First of all, remember to Use Unicode when practical :) Sometimes older applications don't allow Unicode, although they usually then don't allow Microsoft code pages as well (usually being ASCII or Latin-1, which are different). But when you do have a

Unicode use on the web

Google posted a blog about unicode use on the web http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html They also announced that they now support Unicode 5.1, which is probably a good thing, but I found the graph most interesting http://bp1.blogger.com/_Ap14FtNN91w/SBzrtHJfLnI/AAAAAAAAA5U/TV7_g2_sWq0/s1600-h/Unicode2.gif

Server 2008 U+FFFD behavior for unknown or illegal UTF-8 sequences.

In my post Change to Unicode Encoding for Unicode 5.0 conformance I mentioned that the behavior of illegal characters has changed for Unicode 5 conformance in Windows Vista / .Net 2.0+. Those changes have also been inherited by Server 2008. Also check

Code pages and security issues

One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. In short if data

Michael has a blog about converting apps from ANSI to Unicode

Lots of apps are now Unicode, but some need to make the shift from ANSI (like Japanese shift-jis) to Unicode. Michael has a series of blog posts about a project conversion. http://blogs.msdn.com/michkap/archive/2007/01/05/1413001.aspx I recently had a

Are we going to update or maintain the best fit &/or code page mappings?

People wonder if we're going to update our best fit code page mappings, or even our code page mappings. The answer is no. Changing character mappings causes difficulties for applications and our experience has been that doing so breaks as much as it "fixes".

UTF-16, UTF-8 & UTF-32 update to conform with Unicode 5.0's security concerns.

My post Change to Unicode Encoding for Unicode 5.0 conformance now applies to .Net 2.0 with MS07-040 applied. Updates include a list of known issues, please see the list of known issues for MS07-040 described in KB 931212 for more information. KB 940521

I see my favorite Ansi function has the behavior I want.

Occasionally I am asked about the A version of a W function. Ie: GetLocaleInfoA does something that appears more convenient to some user than GetLocaleInfoW. The implied thought is that maybe they should just use the A version. For the most part our A

Why can't we strip the diacritics?

We have some "best-fit" behavior which we generally consider to be "bad". Any loss of data is generally a bad thing, so we recommend storing data in Unicode (so you don't lose anything). Assuming you can't use Unicode, why is it so bad to just make everything

Encoder/Decoder Encoding fallbacks fail after 2GB of data has been converted

We have an unfortunate bug in .Net v2.0+ that causes encoding or decoding of more than 2GB of data to fail. That's a lot of data, but it still shouldn't do that. This is a problem with our built in fallbacks. Ironically, if you encounter bad bytes then

MLang & MSXML6 doesn't like UTF-7

In some cases MLang (on which MSXML6 depends) can added extra ? to decoded UTF-7 data, which can cause UTF-7 encoded XML to fail to parse. UTF-7 isn't a great encoding anyway, so this is just another reason to Please Avoid UTF-7 . In particular there

How do I get HKSCS 2004 characters from Big-5 in .Net?

Well, that's pretty tricky. We provide the Microsoft Character Code Conversion Routines For HKSCS-2004 functions, but those are intended for use with unmanaged code. The fundemental problem is that these "HKSCS" characters were in use prior to the assigment

How do I get my ANSI based application to run correctly?

A common question is "how do I get my ANSI based code page application to run on a system that has a different code page?" The most obvious solution is to use Unicode :) Then you won't have the code page messiness that leads to this kind of problem. For

Please avoid UTF-7

UTF-7 inherently some of the security issues that concern people about encodings. For example, by shifting in & out of the base64 mode one can create multiple representations of the same string, enabling spoofing and other problems. UTF-7 is primarily

Some Reasons to Make Your Application Unicode

[Updated Mar 30 2007: Mike pointed out errors which I've corrected] Many applications are "still" ANSI and can't handle Unicode. We (Microsoft) have even released non-Unicode applications reasonably recently. even though we should know better. In particular
More Posts Next page »
 
Page view tracker