Welcome to MSDN Blogs Sign in | Join | Help

Browse by Tags

All Tags » Unicode and Code Pages/Encodings   (RSS)

Alternate encoding names recognized by .Net / IE

If you run the sample from http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx then you can get a list of what Microsoft .Net thinks each Encoding/Code Page's name is. (WebName is more consistent to what's used in charset).

Unicode, IDN (IDNA), EAI (IMA) and Homograph Security

I wrote about IDN & Security before http://blogs.msdn.com/shawnste/archive/2005/03/03/384692.aspx but thought I'd share some of my more updated views about security of URLs/IDN/Unicode/Email addresses. People haven't really bothered much with DNS

Writing "fields" of data to an encoded file.

The moral here is "Use Unicode," so you can skip the details below if you want :) A common problem when storing string data in various fields is how to encode it. Obviously you can store the Unicode as Unicode, which is a good choice for an XML file or

Don't use MB_COMPOSITE, MB_PRECOMPOSED or WC_COMPOSITECHECK

This pretty much demonstrates another reason to Use Unicode, but if you do need to use some non-Unicode encoding until you can convert to Unicode, please don't use these flags. MultiByteToWideChar() and WideCharToMultiByte() provide some interesting sounding

Front page uses windows-1252, shouldn't it be iso-8859-1?

I received this question: I use Frontpage for my webpage design and FP automatically inserts the meta tag "<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">". Should I have reference to ISO-8859-1 ? I'm not a front page

Where to Look Up Information About Microsoft Code Pages?

First of all, remember to Use Unicode when practical :) Sometimes older applications don't allow Unicode, although they usually then don't allow Microsoft code pages as well (usually being ASCII or Latin-1, which are different). But when you do have a

Unicode use on the web

Google posted a blog about unicode use on the web http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html They also announced that they now support Unicode 5.1, which is probably a good thing, but I found the graph most interesting http://bp1.blogger.com/_Ap14FtNN91w/SBzrtHJfLnI/AAAAAAAAA5U/TV7_g2_sWq0/s1600-h/Unicode2.gif

Server 2008 U+FFFD behavior for unknown or illegal UTF-8 sequences.

In my post Change to Unicode Encoding for Unicode 5.0 conformance I mentioned that the behavior of illegal characters has changed for Unicode 5 conformance in Windows Vista / .Net 2.0+. Those changes have also been inherited by Server 2008. Also check

Code pages and security issues

One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. In short if data

Michael has a blog about converting apps from ANSI to Unicode

Lots of apps are now Unicode, but some need to make the shift from ANSI (like Japanese shift-jis) to Unicode. Michael has a series of blog posts about a project conversion. http://blogs.msdn.com/michkap/archive/2007/01/05/1413001.aspx I recently had a

Are we going to update or maintain the best fit &/or code page mappings?

People wonder if we're going to update our best fit code page mappings, or even our code page mappings. The answer is no. Changing character mappings causes difficulties for applications and our experience has been that doing so breaks as much as it "fixes".

UTF-16, UTF-8 & UTF-32 update to conform with Unicode 5.0's security concerns.

My post Change to Unicode Encoding for Unicode 5.0 conformance now applies to .Net 2.0 with MS07-040 applied. Updates include a list of known issues, please see the list of known issues for MS07-040 described in KB 931212 for more information. KB 940521

I see my favorite Ansi function has the behavior I want.

Occasionally I am asked about the A version of a W function. Ie: GetLocaleInfoA does something that appears more convenient to some user than GetLocaleInfoW. The implied thought is that maybe they should just use the A version. For the most part our A

Why can't we strip the diacritics?

We have some "best-fit" behavior which we generally consider to be "bad". Any loss of data is generally a bad thing, so we recommend storing data in Unicode (so you don't lose anything). Assuming you can't use Unicode, why is it so bad to just make everything

Encoder/Decoder Encoding fallbacks fail after 2GB of data has been converted

We have an unfortunate bug in .Net v2.0+ that causes encoding or decoding of more than 2GB of data to fail. That's a lot of data, but it still shouldn't do that. This is a problem with our built in fallbacks. Ironically, if you encounter bad bytes then
More Posts Next page »
 
Page view tracker