<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx</link><description>A question I received in email: In the FRA and ESN OSes, when I type some word on the command prompt with an acute-accented e like génération and redirect it to a file (eg: “echo génération &amp;gt; abcd.txt”) then the file contains a comma instead of the</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571095</link><pubDate>Fri, 07 Apr 2006 23:53:08 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571095</guid><dc:creator>Maurits</dc:creator><description>&amp;gt; I am also able to copy-paste to that file with the characters intact&lt;br&gt;&lt;br&gt;So the copy/paste is switching code pages automatically? &amp;nbsp;How does that work?</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571116</link><pubDate>Sat, 08 Apr 2006 00:21:49 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571116</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Hi Maurits --&lt;br&gt;&lt;br&gt;Well, usually each application that is not smart enough to use Unicode (such as the console) is smart enough to properly pivot from the code page it is using TO Unicode (either converting and putting CF_UNICODETEXT on the clipboard or just putting up the code page and letting the clipboard map and convert through synthetic clipboard formats)....</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571177</link><pubDate>Sat, 08 Apr 2006 01:17:48 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571177</guid><dc:creator>Maurits</dc:creator><description>I see... copying from the console gets you to Unicode (through WM_COPY, presumably) but output redirection is a naked string of bytes.&lt;br&gt;&lt;br&gt;And for some reason (?) the console is using a different code page than Notepad.&lt;br&gt;&lt;br&gt;So &amp;quot;type abcd.txt&amp;quot; shows the accents, and &amp;quot;notepad abcd.txt&amp;quot; shows the commas. (Verified)</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571189</link><pubDate>Sat, 08 Apr 2006 01:27:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571189</guid><dc:creator>Maurits</dc:creator><description>Ah, a fix!&lt;br&gt;&lt;br&gt;&amp;quot;The command processor has an option (/U) to generate all piped and redirected output in Unicode rather than the OEM code page.&amp;quot;&lt;br&gt;&lt;a rel="nofollow" target="_new" href="http://blogs.msdn.com/oldnewthing/archive/2005/03/08/389527.aspx"&gt;http://blogs.msdn.com/oldnewthing/archive/2005/03/08/389527.aspx&lt;/a&gt;</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571338</link><pubDate>Sat, 08 Apr 2006 07:43:17 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571338</guid><dc:creator>Gabe</dc:creator><description>I would just run &amp;quot;chcp 1252&amp;quot; so that the console code page was the same as the system code page.</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571408</link><pubDate>Sat, 08 Apr 2006 10:12:43 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571408</guid><dc:creator>Srivatsn</dc:creator><description>Why is it that when i run chcp 1252 and paste &amp;#233; (U+00E9) from character map, it displays Θ &amp;nbsp;which is E9 in cp 437? What is the conversion that happens here and on what basis?</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571701</link><pubDate>Sun, 09 Apr 2006 02:09:21 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571701</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Hi Srivatsn,&lt;br&gt;&lt;br&gt;Well, chcp affects the output code page -- but what you enter in the console is input, not output. So the OEMCP is used....</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571822</link><pubDate>Sun, 09 Apr 2006 12:16:12 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571822</guid><dc:creator>Gabe</dc:creator><description>So if I type:&lt;br&gt;&lt;br&gt;chcp 1252&lt;br&gt;echo g&amp;#233;n&amp;#233;ration &amp;gt; abcd.txt&lt;br&gt;notepad abcd.txt&lt;br&gt;&lt;br&gt;Notepad will show an eacute.&lt;br&gt;&lt;br&gt;Now, by default the console uses the Terminal font, which has a theta at code point 0xE9. However, Lucida Console is a Unicode font and things show up as I expect them to.&lt;br&gt;&lt;br&gt;I would just recommend that the original email user set his console font to Lucida Console and use chcp 1252, and he should get what he expects.</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#571857</link><pubDate>Sun, 09 Apr 2006 16:50:23 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:571857</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Or try the /U option and have some of those other scebarios work, too.... :-)</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#572207</link><pubDate>Mon, 10 Apr 2006 07:52:35 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:572207</guid><dc:creator>Gabe</dc:creator><description>When I write batch files that require &amp;quot;international&amp;quot; characters, I put &amp;quot;chcp 1252&amp;quot; at the beginning of them because I can't guarantee that they'll be run by a Unicode cmd.exe.</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#572238</link><pubDate>Mon, 10 Apr 2006 08:49:49 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:572238</guid><dc:creator>Michael S. Kaplan</dc:creator><description>Um, &amp;quot;international&amp;quot; is at a minimum worthy of a &amp;quot;chcp 65001&amp;quot;, isn't it?&lt;br&gt;&lt;br&gt;I mean, with 1252 being such a far cry from &amp;quot;international&amp;quot; ? :-)</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#572953</link><pubDate>Tue, 11 Apr 2006 04:46:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:572953</guid><dc:creator>Dean Harding</dc:creator><description>I think that's why he put &amp;quot;international&amp;quot; in quotes... at least it's &amp;quot;more&amp;quot; international that US-ASCII.&lt;br&gt;&lt;br&gt;Anyway, for serious international stuff, I'd say switch to Monad or something (if possible anyway)... it's much more consistent, being .NET and all Unicode internally.</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#573719</link><pubDate>Wed, 12 Apr 2006 01:11:59 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:573719</guid><dc:creator>Maurits</dc:creator><description>There doesn't seem to be a code page that will allow &amp;quot;type&amp;quot; to print a UTF-16 text file. &amp;nbsp;chcp 1200 and chcp 1201 both return &amp;quot;Invalid code page.&amp;quot; &amp;nbsp;This could be worked-around with some kind of utf16le_to_utf8.exe, which would read UTF-16LE and spit it out as UTF8:&lt;br&gt;&lt;br&gt;rem make a utf16le file&lt;br&gt;cmd /c /u echo g&amp;#233;n&amp;#233;ration &amp;gt; utf16le.txt&lt;br&gt;&lt;br&gt;rem switch console to the UTF8 code page&lt;br&gt;chcp 65001&lt;br&gt;&lt;br&gt;rem type the file back to the console with the utf16le_to_utf8 shim&lt;br&gt;type utf16le.txt | utf16le_to_utf8</description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#573724</link><pubDate>Wed, 12 Apr 2006 01:15:43 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:573724</guid><dc:creator>Maurits</dc:creator><description>Er, switch /c and /u in that cmd call:&lt;br&gt;cmd /u /c echo g&amp;#233;n&amp;#233;ration &amp;gt; utf16le.txt </description></item><item><title>re: File redirection corruption?</title><link>http://blogs.msdn.com/michkap/archive/2006/04/07/570980.aspx#573790</link><pubDate>Wed, 12 Apr 2006 03:07:07 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:573790</guid><dc:creator>Maurits</dc:creator><description>Yup, that works.&lt;br&gt;&lt;br&gt;C:\&amp;gt;chcp&lt;br&gt;Active code page: 437&lt;br&gt;&lt;br&gt;C:\&amp;gt;cmd /u /c echo g&amp;#233;n&amp;#233;ration &amp;gt; utf16le.txt&lt;br&gt;&lt;br&gt;(Opening utf16le.txt in Notepad and a hex editor confirms the UTF16-LE-ness of the file.)&lt;br&gt;&lt;br&gt;C:\&amp;gt;type utf16le.txt&lt;br&gt; &amp;nbsp;Θ n Θ r a t i o n&lt;br&gt;&lt;br&gt;C:\&amp;gt;chcp 65001&lt;br&gt;Active code page: 65001&lt;br&gt;&lt;br&gt;C:\&amp;gt;type utf16le.txt&lt;br&gt;???n?r?a?t?i?o?n? ?&lt;br&gt;?&lt;br&gt;&lt;br&gt;(type'ing a UTF16-LE-encoded file in a UTF8 code page doesn't work...)&lt;br&gt;&lt;br&gt;C:\&amp;gt;type utf16le.txt | perl utf16le_to_utf8.pl&lt;br&gt;g&amp;#233;n&amp;#233;ration&lt;br&gt;&lt;br&gt;(... but piping it through a converter does.)&lt;br&gt;&lt;br&gt;For the sake of completeness, here's the code for the converter:&lt;br&gt;&lt;br&gt;C:\&amp;gt;type utf16le_to_utf8.pl&lt;br&gt;use strict;&lt;br&gt;use Encode;&lt;br&gt;&lt;br&gt;# slurp whole files to avoid spurious line break issues with 0d 00 0a 00 etc.&lt;br&gt;undef $/;&lt;br&gt;&lt;br&gt;# read text&lt;br&gt;my $text = &amp;lt;&amp;gt;;&lt;br&gt;&lt;br&gt;# convert text&lt;br&gt;Encode::from_to($text, 'UTF-16LE', 'UTF-8');&lt;br&gt;&lt;br&gt;# output converted text&lt;br&gt;print $text;&lt;br&gt;&lt;br&gt;(It's probably reasonably trivial to write a simple .exe to convert from UTF-16LE on wcin to UTF-8 on cout... that would obviate the need for Perl.)</description></item></channel></rss>