Welcome to MSDN Blogs Sign in | Join | Help

$OutputEncoding to the rescue

You might have noticed that “findstr” does not work properly with non-English text in PowerShell.

For example:

Let’s create a text file with some Chinese characters in it.

PS C:\> ${c:\test.txt}="中文"

Try to use findstr to find  one of the Chinese characters, and it did not find anything.

PS C:\> Get-Content test.txt | findstr /c:

The same command works in Cmd.exe.

PS C:\> cmd /c "findstr /c: test.txt"

中文

 

What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.

PS C:\> $OutputEncoding

 

 

IsSingleByte      : True

BodyName          : us-ascii

EncodingName      : US-ASCII

HeaderName        : us-ascii

WebName           : us-ascii

WindowsCodePage   : 1252

IsBrowserDisplay  : False

IsBrowserSave     : False

IsMailNewsDisplay : True

IsMailNewsSave    : True

EncoderFallback   : System.Text.EncoderReplacementFallback

DecoderFallback   : System.Text.DecoderReplacementFallback

IsReadOnly        : True

CodePage          : 20127

 

 

 

PS C:\> $OutputEncoding = [Console]::OutputEncoding

PS C:\> $OutputEncoding

 

 

BodyName          : gb2312

EncodingName      : 体中文(GB2312)

HeaderName        : gb2312

WebName           : gb2312

WindowsCodePage   : 936

IsBrowserDisplay  : True

IsBrowserSave     : True

IsMailNewsDisplay : True

IsMailNewsSave    : True

IsSingleByte      : False

EncoderFallback   : System.Text.InternalEncoderBestFitFallback

DecoderFallback   : System.Text.InternalDecoderBestFitFallback

IsReadOnly        : True

CodePage          : 936

 

 

 

PS C:\> Get-Content test.txt | findstr /c:

中文

 

Voila! Now findstr works!

 

Wei Wu [MSFT]

 

 

 

POSTSCRIPT:  The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.  Some do, most don't. 

 

Jeffrey Snover

Published Monday, December 11, 2006 11:26 PM by PowerShellTeam

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: $OutputEncoding to the rescue

In this day and age, why would you default to ASCII? I can understand the history of CMD and the difficult in changing it now, but PowerShell is a clean break. It would have been an excellent opportunity to fully enable Unicode (by default).

Monday, December 11, 2006 10:23 PM by soregasi

# re: $OutputEncoding to the rescue

Why are we still mucking about with codepages in 2006? Why can’t we just use Unicode in Powershell/CMD?

Monday, December 11, 2006 11:07 PM by Jeffrey L. Whitledge

# re: $OutputEncoding to the rescue

Hah!

It seems that Wei Wu U are Chinese.

So am I.

Monday, December 11, 2006 11:20 PM by hayate

# re: $OutputEncoding to the rescue

Why isn't it UTF8 or the current codepage by default?

If there's a reason for it to be ASCII, I presume there must be a problem with other encodings.

What are those problems?

Tuesday, December 12, 2006 12:24 AM by Rei Miyasaka

# re: $OutputEncoding to the rescue

I'd really like to hear a response to these questions. The biggest flaw in CMD is its dependency on legacy encodings, and that heritage seems to still be alive in PowerShell. Unless there is rational reason, this seems like a major flaw.

Sunday, December 24, 2006 7:19 PM by oidon

# re: $OutputEncoding to the rescue

There also seems to be a Codepage issue when using edit.exe from powershell.

all keys are remapped to what seems like a random section of a keymap (possibly greek, i thought i saw a capital phi in there)

How would you go about changing the codepage before you start edit.exe?

Thursday, December 28, 2006 1:32 PM by Remi

# re: $OutputEncoding to the rescue

I have a set of multi-language websites to dynamically generate off a common MS Access database.

Access can handle unicode text and I have fields of English and Korean, Japanese, Chinese equivalents for things.

My OS (dual booted XP PRo and Vista home prem) have the languages setup via language bar - primary is English (United States)

I can't get Powershell to use the unicode encoding.

e.g. (サウスホキアンガ) looks like this (????????

PsSH >  [Console]::OutputEncoding

IsSingleByte      : True

BodyName          : IBM437

EncodingName      : OEM United States

HeaderName        : IBM437

WebName           : IBM437

WindowsCodePage   : 1252

IsBrowserDisplay  : False

IsBrowserSave     : False

IsMailNewsDisplay : False

IsMailNewsSave    : False

EncoderFallback   : System.Text.InternalEncoderBestFitFallback

DecoderFallback   : System.Text.InternalDecoderBestFitFallback

IsReadOnly        : True

CodePage          : 437

Is there another source of object values other than [Console]::OutputEncoding ??

RickW

Tuesday, May 22, 2007 6:18 PM by Rick

# re: $OutputEncoding to the rescue

This just worked for me for getting UTF8 Unicode:

$OutputEncoding = New-Object -typename System.Text.UTF8Encoding

Friday, May 25, 2007 10:20 AM by Jacques Beaurain

# re: $OutputEncoding to the rescue

Hi, how do you get to display Chinese fonts, or for that matter any Unicode typefaces, in your shell?

Thanks.

Thursday, October 04, 2007 8:47 PM by King Kong

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker