In this post I will cover how to get a custom word-breaker working on SQL Server 2008 64-bit instances. Many of you out-there might already be familiar with an old sample that shipped with Windows Platform SDK called LRSAMPLE. I am going to use this as a sample reference and must warn you this post is not a how-to on going about coding a custom word-breaker.

Let’s start off with the assumption that you have a requirement for a custom word-breaker, since the kind of word-breaking that you require is not possible with the default Word-breaker DLL that ships with SQL Server 2008.

To give you an idea of what I mean by “custom requirements”, have a look at this scenario.

Sample Rows
Brick 4 Sided, Other (See Remarks)
Stucco Unspecified, Other (See Remarks)
Aluminum/Vinyl, Brick/Frame, Concrete Siding
Brick 4 Sided, Brick/Frame, Concrete Siding


Let’s say you want to differentiate between “Brick” and “Brick/Frame” in such a way that  a search for the keyword “Brick” return results for only entries which contain Brick alone and not ones that contain Brick/Frame. This of course is not possible using LANGWRBK.dll (SQL 2005) or NaturalLanguage6.dll (SQL 2008).


Okay, now you followed MSDN and compiled LRSAMPLE, got the DLL, registered it following the instructions in http://msdn.microsoft.com/en-us/library/ms345188.aspx , but when you actually tried to use the new word-breaker in a CONTAINS query, you are faced with this error


Msg 30053, Level 16, State 102, Line 1
Word breaking timed out for the full-text query string. This can happen if the wordbreaker took a long time to process the full-text query string, or if a large number of queries are running on the server. Try running the query again under a lighter load.


The LRSAMPLE is a C++ COM based DLL (I did warn you it was an old sampleJ) and as part of its registration it puts in the CLSID under HKLM\System\currentcontrolset\control\contentindex\language 

Even after following the instructions to register it to the letter, it does not work! Even a sample query to test the word-breaker using sys.dm_fts_parser gives the same error.

SELECT * FROM sys.dm_fts_parser (' "Hello" ', 19465, 0, 0)
-- 19465 is the language LCID is registered it as


Digging a little further in the SQLFT logs to see the reason for the word-breaking failure, I saw this error.


2010-10-20 14:19:32.59 spid23s     Error '0x8007007f' occurred during full-text index population for table or indexed view '[GAMLS_NEW].[dbo].[AllClassesCommonData]' (table or indexed view ID '1182067397', database ID '10'), full-text key value '578004'. Attempt will be made to reindex it.

0x8007007f  ---> The specified procedure could not be found.


Okay, this clears points to the DLL having some problems. I tried to register this DLL using regsvr32 and got these errors.
regsvr32 "C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\lrsample.dll"


DLLEntryPoint could not be found.
Specified procedure could not be found


One thing I should point out is that the sample is based on VC6 and by default builds a DEBUG version of the DLL. So if your SQL Server is on the same machine as your Visual Studio, it might actually work. The moment you move this to a machine that does not have VS you will get these errors.


Here are the changes required to be done for importing this sample to a VS project and getting it to work on x64 architecture :-
(Note: This was done on VS 2010 on Windows 7)

 

1.       Create an empty VS2010 project and pull in all the files in LRSAMPLE folder.

2.       Change the configuration to Release and platform to x64

 


3.      
The following code change has to be made in lrsample.cxx to specify the path for DLL registration. (The registration code is hardcoded).

 

SLangRegistry const English_Sample_LangRes =
{
L"English_Sample", MAKELANGID( LANG_ENGLISH, SUBLANG_ENGLISH_SAMPLE ),
{ L"{d225281a-7ca9-4a46-ae7d-c63a9d4815d4}",
L"English_Sample Word Breaker",
//L"lrsample.dll",
L"C:\\Program Files\\Microsoft SQL Server\\MSSQL10.SQLSERVER2008\\MSSQL\\Binn\\lrsample.dll", --> Change required here to specify location where DLL will be place
L"both" },
{ L"{0a275611-aa4d-4b39-8290-4baf77703f55}",
L"English_Sample Stemmer",
//L"lrsample.dll",
L"C:\\Program Files\\Microsoft SQL Server\\MSSQL10.SQLSERVER2008\\MSSQL\\Binn\\lrsample.dll", --> Change required here to specify location where DLL will be place
L"both" }
};


You need to specify the path of your SQL Server Instance Binn directory.


4.       Make the following changes in the VS Project Properties.

   a.    General:  Set the configuration type to Dynamic Library (.dll)


   b.    Linker\Input: Set module definition file to .\lrsample.def


 
c.       Linker\System: Set the subsystem to WINDOWS




5.       Build the Solution and copy the DLL from the Release folder to your SQL Binn directory.

6.       Register the DLL from here using regsvr32 from command prompt.

Regsvr32 “C:\Program Files\Microsoft SQL Server\MSSQL10.SQLSERVER2008\MSSQL\Binn\lrsample.dll”


After we did the above steps, the regular procedure of using custom word-breaker from http://msdn.microsoft.com/en-us/library/ms345188.aspx was enough to get this working on SQL Server 2008 X64.

EXEC sp_fulltext_service @action='load_os_resources', @value=1;
exec sp_fulltext_service 'verify_signature' , 0;
exec sp_fulltext_service 'update_languages';
exec sp_fulltext_service 'restart_all_fdhosts'

 

To confirm if the word-breaker was loaded you can check this DMV select * from sys.fulltext_languages


To test the word-breaker we used this query and viola, no more errors!

 SELECT * FROM sys.dm_fts_parser (' "Breaking Word" ', 19465, 0, 0)


I must point out one important thing here if you are going to use this DLL on other machines. You will need to have the Visual Studio Redistributable package installed on the target machine where you plan to deploy this. You will need the corresponding redist package for the VS version you used to build the DLL.

If you want to use this on multiple instances on the same machine you still need to have it copied over to each instance Binn folder and do the registration steps as per above article to create the registry keys.

 

Some References

Word Breaker and Stemmer Sample
http://msdn.microsoft.com/en-us/library/ms693176(v=VS.85).aspx

How to: Load Licensed Third-Party Word Breakers
http://msdn.microsoft.com/en-us/library/ms345188.aspx

IStemmer'ed the tide (or, Language-specific processing #2)
http://blogs.msdn.com/b/michkap/archive/2005/03/13/394822.aspx

You toucha my letters, IWordBreaker you face (or, Language-specific processing, #3)
http://blogs.msdn.com/b/michkap/archive/0001/01/01/395199.aspx