<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Jie Li's GeekWorld : Encoding Convert</title><link>http://blogs.msdn.com/opal/archive/tags/Encoding+Convert/default.aspx</link><description>Tags: Encoding Convert</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Build Custom Federated Search Connector in Microsoft Search Server (and SharePoint) - Solve Problems and Extend Your Ideas</title><link>http://blogs.msdn.com/opal/archive/2008/02/29/build-custom-federated-search-connector-in-microsoft-search-server-and-sharepoint-solve-problems-and-extend-your-ideas.aspx</link><pubDate>Fri, 29 Feb 2008 14:17:23 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7946098</guid><dc:creator>Jie Li</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/opal/comments/7946098.aspx</comments><wfw:commentRss>http://blogs.msdn.com/opal/commentrss.aspx?PostID=7946098</wfw:commentRss><description>&lt;p&gt;I assume the read of this article understand what is federated search. So we already know that in order to use Federated Search webpart in Search Server, you need to provide a RSS feed to it, which can also be called &amp;quot;OpenSearch&amp;quot; stuff. &lt;/p&gt;  &lt;p&gt;But, not every application you search will return this kind of RSS/ATOM feed. For example, Google, Baidu and many other web sites. So how can you federate search results from this kind of web sites?&lt;/p&gt;  &lt;p&gt;&lt;a title="http://msdn2.microsoft.com/en-us/library/bb931083.aspx" href="http://msdn2.microsoft.com/en-us/library/bb931083.aspx"&gt;http://msdn2.microsoft.com/en-us/library/bb931083.aspx&lt;/a&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;h5&gt;&lt;em&gt;Scenario 2: Connecting to an External Search Site That Returns Results in HTML Format&lt;/em&gt;&lt;/h5&gt;    &lt;p&gt;&lt;em&gt;&lt;b&gt;Scenario background:&lt;/b&gt; The site is configured to use Anonymous access.&lt;/em&gt;&lt;/p&gt;    &lt;p&gt;&lt;em&gt;&lt;b&gt;Possible solution:&lt;/b&gt; Use a Web application outside of the context of a SharePoint site, which contains a lightweight ASPX page that does the following:&lt;/em&gt;&lt;/p&gt;    &lt;ol&gt;     &lt;li&gt;       &lt;p&gt;&lt;em&gt;Submits a search request to the site by using the search terms passed in the initial request URL.&lt;/em&gt;&lt;/p&gt;     &lt;/li&gt;      &lt;li&gt;       &lt;p&gt;&lt;em&gt;Converts the results in the HTML response received from the external search site to RSS format.&lt;/em&gt;&lt;/p&gt;     &lt;/li&gt;      &lt;li&gt;       &lt;p&gt;&lt;em&gt;Returns the RSS XML in the response to the search server.&lt;/em&gt;&lt;/p&gt;     &lt;/li&gt;   &lt;/ol&gt;    &lt;p&gt;&lt;em&gt;In this scenario, the federated connector&amp;#8217;s Web application could be located on a remote server; however, a simpler solution is to create the Web application within the _layouts folder for the SharePoint site. For more information about creating this type of Web application, see &lt;/em&gt;&lt;a href="http://msdn2.microsoft.com/en-us/library/ms433526.aspx"&gt;&lt;em&gt;How to: Modify Configuration Settings for an Application to Coexist with Windows SharePoint Services&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;    &lt;p&gt;&lt;em&gt;In a variation for this federated connector solution, you can add support for multiple external search sites by modifying the ASPX page to include details for more than one site within a case statement. The query template specified for these locations could then include a custom parameter that specifies which site in the case statement receives the federated query. Another variation is to combine the results for multiple external search providers, incorporating logic to order the results based on relevance.&lt;/em&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Well, there're already some people who did a nice job, for example Andrew Woodward:&lt;/p&gt;  &lt;p&gt;&lt;a title="http://www.21apps.com/2008/01/search-server-2008-federated-sites-that.html" href="http://www.21apps.com/2008/01/search-server-2008-federated-sites-that.html"&gt;http://www.21apps.com/2008/01/search-server-2008-federated-sites-that.html&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;I would go a little further on this. Here I take Baidu as an example. Baidu is the biggest Internet search engine in China. (Google China? God knows where them are. Baidu introduced many interesting applications that Chinese users love to use. But Google China, is only famous for stealing the input method dictionary of another major Internet company SOHU, and then made its own Pinyin input method. After this was exposed to the public, they did a not so honest &amp;quot;apologize&amp;quot; and said that were two interns who did it. Perfect, later this became a popular phase in China, if anyone did evil things but was discovered by the public, he would say it's intern's or temporary employees' fault. Well, what a shame on this &amp;quot;not to be evil&amp;quot; company. - little off topic) . &lt;/p&gt;  &lt;p&gt;Baidu.com does not return any RSS feed. What's more, it is using GB2312 encoding method to show the results. So if you directly use regex to capture something in Baidu, you will get some squares which do not make sense.&lt;/p&gt;  &lt;p&gt;And there're some limitations in asp.net Request.QueryString method. It cannot correctly process Gb2312 encoding. So the Page Load Method must be changed to the following code:&lt;/p&gt;  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;protected&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Page_Load(&lt;span class="kwrd"&gt;object&lt;/span&gt; sender, EventArgs e)
    {
        &lt;span class="kwrd"&gt;if&lt;/span&gt; (Request.QueryString[&lt;span class="str"&gt;&amp;quot;q&amp;quot;&lt;/span&gt;]!= &lt;span class="kwrd"&gt;null&lt;/span&gt;)
        {
            query = Request.Url.Query.ToString();
            query = query.Remove(0,3);
        }
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;


.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }&lt;/style&gt;

&lt;p&gt;In this way, a query string will be kept so you can process it with Encode and Decode. If you use QueryString, you will get a stupid behavior that it incorrectly use Decode method in a wrong encoding charset...The result is a disater. Stupid, stupid, stupid. I want to slap the guy who wrote this method. Does he know there're not only English in this world?&lt;/p&gt;

&lt;p&gt;For example, my nickname opal, in Chinese is 猫眼石. If queried from IE, it will be encoded using UTF-8. But Baidu can only consume GB-2312.&lt;/p&gt;

&lt;p&gt;In UTF-8, 猫眼石 is %E7%8C%AB%E7%9C%BC%E7%9F%B3. &lt;/p&gt;

&lt;p&gt;In GB2312, 猫眼石 is %C3%A8%D1%DB%CA%AF.&lt;/p&gt;

&lt;p&gt;It's quite different. If you want do a search for %E7%8C%AB%E7%9C%BC%E7%9F%B3, and it is treaten as a GB2312 string, it will become 4.5 Chinese charactors. and none of them will make sense.&lt;/p&gt;

&lt;p&gt;Okay, compain less, do more. So then we need to decode query string.&lt;/p&gt;

&lt;pre class="csharpcode"&gt; &lt;span class="kwrd"&gt;private&lt;/span&gt; &lt;span class="kwrd"&gt;string&lt;/span&gt; getRssItemXml(&lt;span class="kwrd"&gt;string&lt;/span&gt; query)
    {
        &lt;span class="rem"&gt;//first you must decode it as UTF8. Because when IE access a utf-8 based website, it will pass the corresponding encoded strings.&lt;/span&gt;
        &lt;span class="rem"&gt;//Of course, you can modify web.config to make this application using Gb2312, but that doesn't make sense.&lt;/span&gt;
        query = HttpUtility.UrlDecode(query, Encoding.UTF8);
        &lt;span class="rem"&gt;//Then we need do encode it to gb2312. Baidu can only consume that.&lt;/span&gt;
        query = HttpUtility.UrlEncode(query, Encoding.GetEncoding(&lt;span class="str"&gt;&amp;quot;gb2312&amp;quot;&lt;/span&gt;));
        &lt;span class="kwrd"&gt;string&lt;/span&gt; url = &lt;span class="kwrd"&gt;string&lt;/span&gt;.Format(&lt;span class="str"&gt;&amp;quot;http://www.baidu.com/s?wd={0}&amp;quot;&lt;/span&gt;, query);

        WebClient client = &lt;span class="kwrd"&gt;new&lt;/span&gt; WebClient();
        &lt;span class="kwrd"&gt;byte&lt;/span&gt;[] byteData = client.DownloadData(url);
        &lt;span class="rem"&gt;//Returned results are also in GB2312, so you have to rebuild it.&lt;/span&gt;
        &lt;span class="kwrd"&gt;string&lt;/span&gt; strData = Encoding.GetEncoding(&lt;span class="str"&gt;&amp;quot;gb2312&amp;quot;&lt;/span&gt;).GetString(byteData);
        Regex searchPattern = &lt;span class="kwrd"&gt;new&lt;/span&gt; Regex(&lt;span class="str"&gt;&amp;quot;\\)\&amp;quot; href=\&amp;quot;(?&amp;lt;link&amp;gt;.*?)\&amp;quot; target=\&amp;quot;_blank\&amp;quot;&amp;gt;&amp;lt;font size=\&amp;quot;3\&amp;quot;&amp;gt;(?&amp;lt;title&amp;gt;.*?)&amp;lt;/font&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;br&amp;gt;(?&amp;lt;desc&amp;gt;.*?)&amp;lt;br&amp;gt;&amp;quot;&lt;/span&gt;);
        StringBuilder sb = &lt;span class="kwrd"&gt;new&lt;/span&gt; StringBuilder();

        &lt;span class="kwrd"&gt;foreach&lt;/span&gt; (Match m &lt;span class="kwrd"&gt;in&lt;/span&gt; searchPattern.Matches(strData))
        {
            sb.AppendFormat(&lt;span class="str"&gt;&amp;quot;&amp;lt;item&amp;gt;&amp;lt;title&amp;gt;&amp;lt;![CDATA[{0}]]&amp;gt;&amp;lt;/title&amp;gt;&amp;lt;link&amp;gt;&amp;lt;![CDATA[{1}]]&amp;gt;&amp;lt;/link&amp;gt;&amp;lt;description&amp;gt;&amp;lt;![CDATA[{2}]]&amp;gt;&amp;lt;/description&amp;gt;&amp;lt;/item&amp;gt;&amp;quot;&lt;/span&gt;,m.Groups[&lt;span class="str"&gt;&amp;quot;title&amp;quot;&lt;/span&gt;].Value,m.Groups[&lt;span class="str"&gt;&amp;quot;link&amp;quot;&lt;/span&gt;].Value, m.Groups[&lt;span class="str"&gt;&amp;quot;desc&amp;quot;&lt;/span&gt;].Value);
        }

        &lt;span class="kwrd"&gt;return&lt;/span&gt; sb.ToString();
    }&lt;/pre&gt;
&lt;style type="text/css"&gt;


.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }&lt;/style&gt;

&lt;pre class="csharpcode"&gt;&amp;#160;&lt;/pre&gt;

&lt;p&gt;So then put this aspx file to a website, have your federated search webpart point to it, like &lt;a href="http://www.abc.com/Baidu.aspx?q={searchTerms}"&gt;http://www.abc.com/Baidu.aspx?q={searchTerms}&lt;/a&gt;, and then you can get Baidu federated search in Microsoft Search Server 2008.&lt;/p&gt;

&lt;p&gt;I put part of my work here:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://cid-8007edf5c56fc334.skydrive.live.com/self.aspx/Microsoft%20Search%20Server/CaptureWeb.rar"&gt;http://cid-8007edf5c56fc334.skydrive.live.com/self.aspx/Microsoft%20Search%20Server/CaptureWeb.rar&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It contains:&lt;/p&gt;

&lt;p&gt;Baidu Federated Search Web Service
  &lt;br /&gt;Baidu News Federated Search Web Service

  &lt;br /&gt;iCiba (English-Chinese Dictionary) Federated Search Web Service

  &lt;br /&gt;Dictionary.com Federated Search Web Service&lt;/p&gt;

&lt;p&gt;Yes! You can put dictionaries on your federated search web page so if anybody want to search a word, he will get the meaning immediately! You can also have some triggers to make this happen only with numbers or charactors, etc.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/BuildCustomFederatedSearchConnectorinMic_13B66/snap048_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="246" alt="snap048" src="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/BuildCustomFederatedSearchConnectorinMic_13B66/snap048_thumb.jpg" width="644" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7946098" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/opal/archive/tags/Enterprise+Search/default.aspx">Enterprise Search</category><category domain="http://blogs.msdn.com/opal/archive/tags/OpenSearch/default.aspx">OpenSearch</category><category domain="http://blogs.msdn.com/opal/archive/tags/Microsoft+Search+Server+2008/default.aspx">Microsoft Search Server 2008</category><category domain="http://blogs.msdn.com/opal/archive/tags/User+Experience/default.aspx">User Experience</category><category domain="http://blogs.msdn.com/opal/archive/tags/MOSS/default.aspx">MOSS</category><category domain="http://blogs.msdn.com/opal/archive/tags/Encoding+Convert/default.aspx">Encoding Convert</category><category domain="http://blogs.msdn.com/opal/archive/tags/SharePoint/default.aspx">SharePoint</category><category domain="http://blogs.msdn.com/opal/archive/tags/Federated+Search/default.aspx">Federated Search</category></item><item><title>A Command Line Encoding Convertor in .Net</title><link>http://blogs.msdn.com/opal/archive/2008/01/07/a-command-line-encoding-convertor-in-net.aspx</link><pubDate>Mon, 07 Jan 2008 08:39:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7013103</guid><dc:creator>Jie Li</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/opal/comments/7013103.aspx</comments><wfw:commentRss>http://blogs.msdn.com/opal/commentrss.aspx?PostID=7013103</wfw:commentRss><description>&lt;P&gt;*If you are looking for something to transcode audio, please do a google on LAME, besweet. You can also take a look at one of my old work &lt;A href="http://paradiso.cn/converter/any2wav.htm"&gt;http://paradiso.cn/converter/any2wav.htm&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;*If you are looking for cmdline video encoding, please try mencoder, ffmpeg, etc. You can look for help in Doom9 forum.&lt;/P&gt;
&lt;P&gt;The tool I talk here is only for TEXT encoding problems. &lt;/P&gt;
&lt;P&gt;Well, this is a pretty simple and stupid tool. It contains no more than 10 lines of useful C# code, and the performance is not very good. But sometimes, when you want to deal with stupid problems, you have to use such tool. I like GNU's iconv, but there's no good port on Windows.&lt;/P&gt;
&lt;P&gt;So, I have to write one for my own usage.&lt;/P&gt;
&lt;P&gt;&lt;A title=http://cid-8007edf5c56fc334.skydrive.live.com/self.aspx/Public/ec.rar href="http://cid-8007edf5c56fc334.skydrive.live.com/self.aspx/Public/ec.rar" mce_href="http://cid-8007edf5c56fc334.skydrive.live.com/self.aspx/Public/ec.rar"&gt;http://cid-8007edf5c56fc334.skydrive.live.com/self.aspx/Public/ec.rar&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Usage: ec inputfile outputfile [input Encoding] [output Encoding]&lt;/P&gt;
&lt;P&gt;No wildcard support, but you can simply do a trick in command shell.&lt;/P&gt;
&lt;P&gt;For example, you want to convert all xml files in every sub-directory from GB2312 to UTF8, you need to type the following:&lt;/P&gt;
&lt;P&gt;&lt;FONT color=#000080&gt;for /R %%i in (*.xml) do (ec %i %i GB2312 UTF8)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Then, job done.&lt;/P&gt;
&lt;P&gt;Another way is to use powershell.&lt;/P&gt;
&lt;P&gt;PS C:\temp&amp;gt; $a = type gb2312.txt&lt;BR&gt;PS C:\temp&amp;gt; out-file -filepath utf8.txt -inputobject $a -encoding utf8 
&lt;P&gt;It's also very easy, but sometimes you cannot control all the process...&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7013103" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/opal/archive/tags/Encoding+Convert/default.aspx">Encoding Convert</category></item></channel></rss>