Data cleaning is often a big challenge when working with textual data. The Fuzzy Lookup Add-In for Excel is a new tool from Microsoft Research and BI Labs that helps with the problem of identifying and matching textually similar string data in Excel. It is robust to spelling mistakes, synonyms, missing or added words and a number of other data quality problems frequently encountered in the real world. It has support for most languages and works well across a wide variety of data domains. Common uses include cleaning up lists of names, addresses, products or other entity descriptions which contain fuzzy duplicates. It can also be used to fuzzy join two different tables together. For instance, you might clean and augment a table of dirty city, state data with a zip code by matching it against a clean reference table of city, state and zip codes. Give it a whirl and let us know how it works for your data!
How to get this Addin to load?, I'm running Excel 2010 and I can see the Add In under Inactive Applications Addins but when I try to load it I get "Not Loaded. A runtime error occurred during the loading of the Com Add-In.
portfolio example works as advertised.
Tried it on a table >200K rows and got a TITLE:
Exception from HRESULT: 0x800A03EC
Having issues with the transformations not working. Is there a good walk-through to show how to use these advanced configuration features properly?
Does this Add-In work with Excel 2007 over Windows XP (Spanish)?
Is there a way to access this Addin through Excel VBA?
Job well done. It also works perfectly well with Excel 2007.
However, as Jim Baldwin said, it would be ideal to access it through Excel VBA ...
Do you have any plans ?
I am performing fuzzy match on 2 tables, one of 12000 rows, the other one with 18000. I match 4 colums into each tab (street, zip, city and name of a prospect).
My problem is that my 12GB server is too slow at performing this. After 6h running, Excel frizz.
Any idea on how to handle that?
Thanks a lot,
I had the samee problem matching two address files. It was caused by an unrecognised charecter in the data whci was the = sign. Clean the databases and the problem goes away
Can you point me to a White Paper that describes the logic used by this Fuzzy Lookup Add-In
Am new to the Fuzzy Lookup functionality. I tested it out on small tables and it seems to work fine. However when I tried applying it to my work where I need to compare a text string (paragraph of more than 1000 words) with a list of other such big paragraphs (say 1000 of them), the tool seems to hang. Does the add-in have such a limitation?
I got it working using the setup file instead of the msi file. Please note I have only excel 2007 (not 2010). The installation downloaded some extra files from microsoft to make it work and work it did.
For those who had trouble getting the GO button to do anything, make sure your current cell is selected where you want the results to go. I used the format suggested by the helpful pdf file that comes with it and put the current cell a few rows below one of my tables used in the matching.
I've only used it for a quick project and the results were almost perfect... good luck!
I installed the Fuzzy Lookup Add-in, and it works great.
But I am trying to solve a bit different problem than described in sample file Portfolio.xlsx.
I would like to fuzzy compare two texts in each row:
I have two different tables – let say table A and table B. But I don’t want to compare the first row from A to all rows of B, but just the first row of A to first row of B. And then do the same for all other rows. I can’t find a way to use this this Add-in in this way…
I found some examples how to that with a custom macro but they do not seem robust enough unlike the Fuzzy Lookup Add-in.
What I would like to know is, if there is any possibility to do what I need using Fuzzy Lookup Add-in? Can it be customized this way via Configure option? Any suggestions?
Thank you for your answer.
No results are inserted when I press the "Go" button. Do you have any idea what could be wrong?
Office Standard 14.0.6112.5000 64-bit German
Windows 7 Professional, 64-bit German
The problem I reported earlier has been resolved after uninstalling the add-in and installing it again.
If you face the same problem, uninstall the add-in Thenand install it again by running setup.exe. Do not run the .msi file.
First of all, I must appreciate smart people at MS who produced this add-in. It's really great add-in and can save hours data cleansing. I've got to report two issues:
1) After comparing for few times... it stops responding that is, when I press 'go' nothing happens. Based upon the recommendation, I've uninstalled it and then installed it using the setup file. It worked for some time, but it stopped working again. Is it due to some bug in the code?
I've noticed just before it stopped working, it started giving me peculiar results i.e. two not so same items are returned with very high similarity rating.
I can certainly try uninstalling and installing again... and it might work... but I was just wondering if there is a way to avoid this hassle all together.
2) Secondly, when matching rows in the left and right table, I would like the tool to use the row in the right table once... that is once it has been matched (to the best possible) with a row on the left, it shouldnt be reused and appear again infront of another left row. E.g:
AA A A
Combined table after Fuzzy Lookup...
AAA A A (or may be blank)
I would really appreciate if any of the fine gentlemen here could answer my queries.
MS Office Professional Plus 2010
MS Excel 2010 (14.0.6129.5000) 32-bit
MS Windows 7 Professional - 64 bits
Dell i7 1.73GHz, 8GB