What Everyone Should Know About Character Encoding

What Everyone Should Know About Character Encoding

  • Comments 4

Thank goodness Joel wrote this article -- that means that I can cross it off of my list of potential future blog entries!  Thanks Joel!

 

http://www.joelonsoftware.com/articles/Unicode.html

 

Fortunately the script engines are entirely Unicode inside.  Making sure that the script source code passed in to the engine is valid UTF-16 is the responsibility of the host, and as Joel mentions, IE certainly jumps through some hoops to try and deduce the encoding.  WSH also has heuristics which try to determine whether the file is UTF-8 or UTF-16, but nothing nearly so complex as IE.

 

I should mention that in JScript you can use the \u0000 syntax to put unicode codepoints into literal strings.  In VBScript it is a little trickier -- you need to use the CHRW method.

 

  • What about characters with code-points above 0xFFFF ?
  • Since JScript is UTF-16 internally, you can use the surrogate pair code units to represent a code point above u-FFFF.
  • for info's sake: does this apply to jscript.net as well?
  • Yes, JScript .NET is also entirely UTF-16 internally, and is fully backwards compatible with JScript Classic (modulo a few edge cases.)
Page 1 of 1 (4 items)