$OutputEncoding to the rescue

$OutputEncoding to the rescue

  • Comments 11

You might have noticed that “findstr” does not work properly with non-English text in PowerShell.

For example:

Let’s create a text file with some Chinese characters in it.

PS C:\> ${c:\test.txt}="中文"

Try to use findstr to find  one of the Chinese characters, and it did not find anything.

PS C:\> Get-Content test.txt | findstr /c:

The same command works in Cmd.exe.

PS C:\> cmd /c "findstr /c: test.txt"

中文

 

What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.

PS C:\> $OutputEncoding

 

 

IsSingleByte      : True

BodyName          : us-ascii

EncodingName      : US-ASCII

HeaderName        : us-ascii

WebName           : us-ascii

WindowsCodePage   : 1252

IsBrowserDisplay  : False

IsBrowserSave     : False

IsMailNewsDisplay : True

IsMailNewsSave    : True

EncoderFallback   : System.Text.EncoderReplacementFallback

DecoderFallback   : System.Text.DecoderReplacementFallback

IsReadOnly        : True

CodePage          : 20127

 

 

 

PS C:\> $OutputEncoding = [Console]::OutputEncoding

PS C:\> $OutputEncoding

 

 

BodyName          : gb2312

EncodingName      : 体中文(GB2312)

HeaderName        : gb2312

WebName           : gb2312

WindowsCodePage   : 936

IsBrowserDisplay  : True

IsBrowserSave     : True

IsMailNewsDisplay : True

IsMailNewsSave    : True

IsSingleByte      : False

EncoderFallback   : System.Text.InternalEncoderBestFitFallback

DecoderFallback   : System.Text.InternalDecoderBestFitFallback

IsReadOnly        : True

CodePage          : 936

 

 

 

PS C:\> Get-Content test.txt | findstr /c:

中文

 

Voila! Now findstr works!

 

Wei Wu [MSFT]

 

 

 

POSTSCRIPT:  The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.  Some do, most don't. 

 

Jeffrey Snover

Leave a Comment
  • Please add 6 and 4 and type the answer here:
  • Post
  • In this day and age, why would you default to ASCII? I can understand the history of CMD and the difficult in changing it now, but PowerShell is a clean break. It would have been an excellent opportunity to fully enable Unicode (by default).

  • Why are we still mucking about with codepages in 2006? Why can’t we just use Unicode in Powershell/CMD?

  • Hah!

    It seems that Wei Wu U are Chinese.

    So am I.

  • Why isn't it UTF8 or the current codepage by default?

    If there's a reason for it to be ASCII, I presume there must be a problem with other encodings.

    What are those problems?

  • I'd really like to hear a response to these questions. The biggest flaw in CMD is its dependency on legacy encodings, and that heritage seems to still be alive in PowerShell. Unless there is rational reason, this seems like a major flaw.

  • There also seems to be a Codepage issue when using edit.exe from powershell.

    all keys are remapped to what seems like a random section of a keymap (possibly greek, i thought i saw a capital phi in there)

    How would you go about changing the codepage before you start edit.exe?

  • I have a set of multi-language websites to dynamically generate off a common MS Access database.

    Access can handle unicode text and I have fields of English and Korean, Japanese, Chinese equivalents for things.

    My OS (dual booted XP PRo and Vista home prem) have the languages setup via language bar - primary is English (United States)

    I can't get Powershell to use the unicode encoding.

    e.g. (サウスホキアンガ) looks like this (????????

    PsSH >  [Console]::OutputEncoding

    IsSingleByte      : True

    BodyName          : IBM437

    EncodingName      : OEM United States

    HeaderName        : IBM437

    WebName           : IBM437

    WindowsCodePage   : 1252

    IsBrowserDisplay  : False

    IsBrowserSave     : False

    IsMailNewsDisplay : False

    IsMailNewsSave    : False

    EncoderFallback   : System.Text.InternalEncoderBestFitFallback

    DecoderFallback   : System.Text.InternalDecoderBestFitFallback

    IsReadOnly        : True

    CodePage          : 437

    Is there another source of object values other than [Console]::OutputEncoding ??

    RickW

  • This just worked for me for getting UTF8 Unicode:

    $OutputEncoding = New-Object -typename System.Text.UTF8Encoding

  • Hi, how do you get to display Chinese fonts, or for that matter any Unicode typefaces, in your shell?

    Thanks.

  • simply change the console codepage to whatever you need and then set the ps default output to the consol output

    -----SAMPLE-----------------

    chcp 1250

    $OutputEncoding =[Console]::OutputEncoding

  • Event Powershell at version 3.0 does not process input script encoding properly.

    With this command

    powershell.exe -Version 3.0 -NonInteractive -ExecutionPolicy ByPass -Command - <.\script.ps1

    if script if Big-endian UTF you will receive error.

Page 1 of 1 (11 items)