April, 2008

  • Jie Li's GeekWorld

    “Evil” way to federate search results through a password protected proxy


    In real world environment, people sometimes use password protected proxy to make company employees to access the Internet. Most of the time, that is a basic authentication. So in this kind of environment, the federated search webpart of Microsoft Search Server 2008 will not work out-of-the-box because we only support non-password protected proxy.

    But is there any way to workaround?

    Yes, otherwise why I’m talking about it?

    For the word “evil”, I’m not referring to the definition of the word from a “not evil” company. My “evil” is always some kind of tricks, or hacks, and you will love them because they can really solve problems. BTW – in MMO RPGs I’m always a chaotic neutral character, but I like evil ones - that’s my best description.

    The theory is to make a data tunnel through the password protected proxy, so we can map external website to local port, and federate the search result. There’re some applications which can do the job, but here we will use HTTPort as an example.

    Here’re the steps!

    1. Get a copy of HTTPort from www.htthost.com, the newest version is HTTPort 3.SNFM. Install it.

    2. In proxy configuration window, fill in your proxy server name and port, check “Proxy requires authentication” and then input your username and password for accessing this proxy.


    3. Check the RSS feed website domain name you want to federate. In this example, we are using Live Search China. The domain name is “cnweb.search.live.com”.


    4. Click Port mapping tab of HTTPort, and add a new port tunnel. Fill in a local port, for example, 991, then fill in remote host name and port.


    5. Switch back to proxy tab. press start button in lower right corner.


    6. Check if it works in your browser: replace domain name of your RSS feed with If everything is going on well, and you are lucky enough, the RSS feed will be there and you can make Search Server federate it through this new local URL!


    7. Just some note: Not every service can be federated like this. If the target website has more security check, for example Yahoo search, the RSS feed cannot be fetched through such tunnel. Therefore you have to consider other ways, or spend some time to imporve this evil hack:).

    HTTPort is a free software written by Dmitry Dvoinikov.

  • Jie Li's GeekWorld

    What may happen when I crawl MILLIONS of files in MOSS/MSS? Part II - Why I need X64 instead of X86?


    In last post of this series we talked about the crawl time and and CPU usage. This time we will talk about process memory usage and x86/x64 issue.

    Many customers asked me questions about x86/x64 comparison. Most of them consider x64 would be a benefit, but they don't know what kind of benefit it really brings.

    So now I tell you why it's needed.

    As we know, SharePoint/search server has three layers. Each layer can only be single architecture.

    Layer 1: Web Front Server (WFE)

    WFE is needed to host the web site, process different user events. This layer, of course can be and should be x64, unless you have some pretty old webpart which still need x86. At this layer, the most memory consummation comes from IIS(w3wp.exe). For IIS in X86, it can only use at about 1.1-1.2G memory. If you hit this barrier, the process may just hang there.  This situation happens when very big number of request lasted for a long time(several hours or days, it depends on how many users are accessing the site at the same time). You can have multiple WFE in one server farm.

    It's quite important to have IIS recycle automatically, and sometimes you even need to manually recycle it.

    Layer 2: Query Server.

    Query Server hosts query engine. It continually receive index propaganda from index server. When users make a query, it will be sent to query engine. Query engine will check SQL Server for document properties, and check index for content chunk. So the disk performance is important for query server when query load is high.  Other workload, like query time security trimming by custom security trimmer, is also done by Query Servers.

    You can have multiple Query Server per farm. And Query Servers, should be x64 if you have the hardware.

    Layer 3: Index Server.

    Index Server is the spider. Because you can only have one Index Server currently in SharePoint 2007/Search Server 2008, you need to take great care of it. DO NOT put ANY other applications on it, DO NOT share the box with SQL. If you do, you will quickly be hit by the bad performance.

    Yes, Index Server should be x64. The problem is, sometimes we cannot make it x64. For example, we have a 32bit ifilter which does not have any x64 implementation but is very important to customer business, or we only have 32bit protocol handler like Lotus Notes PH... In these situations, you can only have a 32bit Indexer.

    You may suffer from the same limitation when using 32bit Indexer, especially in Lotus Notes. When you continues crawl too many Notes docs into SharePoint, index engine may hang there for several hours because of memory limit. I suggest, if you happen to come across such problem, make a simple application to monitor and automatically restart search service after one DB is finished. This can manually recycle the memory used by the engine.

    In the future, nearly everything can be made x64. So stay tuned.

    Database: SQL Server

    No matter querying or indexing, the backend database is always a main contributor to the whole performance. So make it independent, make it faster, and even make it cluster. This will help with overall performance. x64 is the best choice for DB.

  • Jie Li's GeekWorld

    Index FTP content with SharePoint 2007/Search Server 2008


    Question: Can SharePoint 2007/Search Server 2008 index and search ftp?

    Answer: OOTB you cannot. But we have a FTP protocol handler in SharePoint 2001 ResKit, and that one can be used on SharePoint 2007 or Search Server 2008.

    Important Notice: This approach is NOT supported by Microsoft. It is just for your test purpose only, and should not be used in an important production envoriment.

    Some restrictions may apply: x86 and anonymous ftp only.

    Here's the steps.

    1. Get a copy of ftpph.dll. It is in SharePoint 2001 ResKit.


    2. Copy it to c:\Program Files\Microsoft Office Servers\12.0\Bin


    3. Register this dll. Open command prompt, navigate to c:\Program Files\Microsoft Office Servers\12.0\Bin, run "regsvr32 ftpph.dll".


    4. Open Regedit, navigate to HKLM\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\Protocol Handlers


    5. Insert a new string item: "ftp". The value is "FtpPH.SearchProtocol.1".


    6. In command prompt, restart search service by "net stop osearch" and "net start osearch"

    7. Download SharePoint Search Admin at http://www.codeplex.com/SearchAdmin , and then add a new custom content source for your ftp site.


    8. Start full crawl, and you will see the crawl log. Job done!


Page 1 of 1 (3 items)