Fun with PowerShell - character manipulation...

Fun with PowerShell - character manipulation...

  • Comments 7

I was investigating a localization test failure today and ran into the following error message:

 

'actual error is 爠捡湩敤氠捥整牵꿿渠攧楸瑳畯渠攧瑳瀠獡甠獳敩⹲਍, expected '

 

Since it was failing in the French locale, getting what appeared to be a Chinese error message didn’t make a lot of sense. A coworker confirmed that it was garbage. My next guess was that it was actually an ANSI string that was mangled into Unicode. In other words, two characters in the source string became one character in the output. But how to test this? Well – fortunately we have PowerShell! I pasted the string into the console window which looked like:

 

$t = '??????????????????????????????????'

 

And then used casts to take the string apart:

 

&{$ofs=''; [string][char[]] ([int[]] [char[]] $t |% { $_ -band 0xff ;  [int] ($_ / 256 ) })}

 

Running this exercise in stunt-casting translated $t back into:

 

La racine de lecteur rÿF:\ÿ° n'existe pas ou n'est pas un dossier.

 

Now that’s more like the error message one would expect in the French locale J

 

Anyone want to take a stab at explaining how this works?

 

-bruce

 

Bruce Payette [MSFT]
Windows PowerShell Tech Lead

Visit the Windows PowerShell ScriptCenter at:  http://www.microsoft.com/technet/scriptcenter/hubs/msh.mspx

Leave a Comment
  • Please add 7 and 6 and type the answer here:
  • Post
  • I see this kind of stuff too often.
    Look at the characters being displayed.
    The first character: 慌 is U+614C (0x4C 0x61 in UTF-16LE). A usual first guess is that the intended character sequence was 0x61 'a' and 0x4C 'L'. Windows usually works in little endian, so the reverse order is more likely: "La". Work your way down the rest of the characters.

    So, I guess that your string is encoded in ANSI and being decoded as UTF-16LE. Not a very good idea.
  • Leverage the BCL to do the heavy lifting instead:

    [text.encoding]::ascii.getstring([text.encoding]::unicode.getbytes($t))
  • wouldn't it have been easier to paste it into a text editor and open it using a different encoding?
  • I'd appreciate some sort of Visual Studio-like dev environment for PowerShell as more complicated scripts easily run into syntax overload.  This mainly happens because you are streaming objects + calling object methods + native powershell language featues all on the same line of code.  

    This should be avoided for the same reasons that we stopped having each method for a C++ classe return a reference to itself.  Thereby allowing for overly complicated code like

    int x - 100, y = 100, radius = 20;
    Circle c(x, y, radius);

    c.Fill( Color.Random() ).Scale(1.4).Outline( Line.Dotted, Line.Width2 ).Transpose(-10, +20);

    The PowerShell code below is about as readable.

    &{$ofs=''; [string][char[]] ([int[]] [char[]] $t |% { $_ -band 0xff ;  [int] ($_ / 256 ) })}
  • scott, if you want such IDE (but more interactive than an ide) check out my powershell analyzer
    http://karlprosser.edify.us/coder/
  • We have a server application that writes Unicode using UTF-8 encoding to stdout. Does PowerShell support UTF-8 encoding? Will it make sense of my UTF-8 and display the right thing? The regular command shell seemed not to support UTF-8. It is particularly a problem when running the Japanese version because the old command shell on a J system appears to only support shiftjis encoding. What about fonts, is there a unicode encoded font on J systems for PowerShell?

  • PingBack from http://cityjokesblog.info/windows-powershell-fun-with-powershell-character-manipulation/

Page 1 of 1 (7 items)