My boss forwarded me this blog which is a programming puzzle challenge: given a phone number, try to create a mnemonic, like 800-776-4726 is 800-PROGRAM
He knows that I’ve been writing word games on computers for years: I decoded a spelling dictionary from a commercial software product about 20 years ago, and have been using that as a source. Using Borland C about 20 years ago, my scrabble program can’t be beat<g>
So I wrote a solution to the challenge in about 100 lines of Visual FoxPro.
Today, my dictionary is a DLL COM server of size 630784, which includes 2 separate dictionaries: one with 171201 words, and the other with 53869. That’s around the same size as some pictures that my 4 Megapixel digital camera takes. So, a picture is worth 225000 words<g>
As a corollary, you might wonder how 225000 words can fit into 679k: that’s an average of 3 bytes per English word not including the dictionary logic itself! But that’s another subject.
To run the solution, you’ll need my dictionary
There are many optimizations that could be made, but they would add more lines and obscure the algorithm itself.
In any case, here’s the non-optimized solution.
There are 2 fundamental algorithms here. The first (EnumNum, 20 lines) enumerates all the possible permutations of NUMDIGS digits. It loops through the digits, allowing each one to range through its possible values ("2" goes through "2,d,e,f")
loops through the digits, allowing each one to range through its possible values ("2" goes through "2,d,e,f")
The 2nd algorithm (DoSeparators, 13 lines) just inserts a "-" in every possible position in a NUMDIGS length string. There are NUMDIGS-1 possible places to insert a "-" (each of the gaps between each letter). The gap can be either of 2 values: a "-" or a "". Thus there are 2^(NUMDIGS-1) locations to put a "-". So the DoSeparators routine just loops from 0 TO 2^(NUMDIGS-1)-1 and uses the bits of the loop index represented in binary to insert the "-".
Click to download Dictionary.dll
#define NUMDIGS 10
PUBLIC ox as dictionary.dict
ox=CREATEOBJECT('dictionary.dict') && Instantiate the dictionary COM object
ox.DictNum=2 && 1 = 171000 word dictionary, 2= 53869 word
&&This solution uses 2 parallel strings: the number to permute and a pattern
cPattern=REPLICATE("0",NUMDIGS) && like "0000000000"
&& cPattern governs the pattern of letters to use for each digit's place
&& a "0" for each digits place to indicate which letter of the digit it is (0,1,2,3). 0 indicates use the raw digit
&& At the end, it'll be "1" (for 0,1), "3" (for 2,3,4,5,6,8), or "4" for "7,9", like "3334433143"
&& Given a phone #, the # of patterns is 3^NUMDIGS if all digits are "234568"
cNumber="642-394-6369" && number of patterns is 4^8*5^2 = 1638400
cNumber=STRTRAN(cNumber,"-","") && remove dashes
CREATE TABLE phrases (phrase c(30)) && a table into which we can put partial results
USE phrases SHARED && open it shared, so another instance can view/query it while we're executing
BROWSE LAST NOWAIT NAME oBrowse && show the table
nCnt=0 && count of results
EnumNum(cNumber,cPattern,1) && Call enumerator starting at 1st digit
?"Completed Phrase search ",dtstart,dtEnd,dtEnd-dtStart
PROCEDURE EnumNum(cNumber as String,cPattern as String, nDigNum as Integer) && cNumber is raw string of digits with no separators
cDigit=SUBSTR(cNumber,nDigNum,1) && the digit we're working on
cPat=SUBSTR(cPattern,nDigNum,1) && where are we on this digit?
cStartLet=SUBSTR(" adgjmptw",ASC(cDigit)-47,1) && Starting letter for this digit: 0 1 2abc, 3def, 4ghi, 5jkl, 6mno, 7pqrs, 8tuv, 9wxyz
cLastpat =SUBSTR("0033333434",ASC(cDigit)-47,1) && # permutations -1 for this digit: 0 0 4, 4, 4, 4, 4, 5, 4, 5
DO WHILE .t.
IF nDigNum < NUMDIGS && we haven't enumerated all digits yet
EnumNum(cNumber,cPattern,nDigNum+1) && recur with next digit
DoSeparators(cNumber) && Gone through all digits. Now we've got a pattern, like "nicewinfox".
IF cPat=cLastPat && this digit has reached the end (for '7', we've done "7","p","q","r","s")
cPat = CHR(ASC(cPat)+1) && increment the pattern
cPattern=LEFT(cPattern,nDigNum-1) + cPat + SUBSTR(cPattern, nDigNum+1) && insert the new letter into the pattern
cNewlet = CHR(ASC(cStartLet)+ASC(cPat)-49) && from 'a' to 'b' or from 'n' to 'o'
cNumber=LEFT(cNumber,nDigNum-1) + cNewLet + SUBSTR(cNumber, nDigNum+1) && insert the new letter into the number
&& DoSeparators: given NUMDIGS letters like "nicewinfox", insert "-" into all positions to create phrases, then test them
&& For NUMDIGS, (NUMDIGS-1) locations to insert a separator. It's either there or not, so 2^(NUMDIGS-1) permutations
PROCEDURE DoSeparators(cNumber as String)
FOR i = 0 TO 2^(NUMDIGS-1)-1 && loop on # possible separator positions. For 10 digits, it's 512
cPhrase=SUBSTR(cNumber,1,1) && first char can't be a separator
FOR j = 1 TO NUMDIGS-1 && for each of the positions, construct a multi-word phrase like "nice-win-fox"
IF BITTEST(i,j-1) && add a separator (BITTEST(1024,10) is true because the 10th bit is 1)
cPhrase=cPhrase+SUBSTR(cNumber,j+1,1) && add a letter
ENDFOR && for each digit
TestPhrase(cPhrase+"-") && add a trailing "-"
ENDFOR && try next separator position
PROCEDURE TestPhrase(cPhrase as String) && given a phrase like "645-fox-test" see if all alpha sequences are in dictionary
IF nDash = 0 && we parsed all words
cWord=LEFT(cStr,nDash-1) && the first word before the dash
cStr=SUBSTR(cStr,nDash+1) && the rest of the string
IF !TestWord(cWord) && if it's not a word
INSERT INTO phrases VALUES (LEFT(cPhrase,LEN(cPhrase)-1)) && Bingo! remove trailing '-'
PROCEDURE TestWord(cWord as String) as Boolean
IF ISDIGIT(cWord) && if it starts with a digit
FOR i = 2 TO LEN(cWord)
IF !ISDIGIT(SUBSTR(cWord,i,1)) && if they're not all digits
FOR i = 2 TO LEN(cWord) && starts with a char, lets ensure the rest are chars
Here are the first 20 results:
6423946369 6-423946369 64-23946369 6-4-23946369 642-3946369 6-42-3946369 64-2-3946369 6-4-2-3946369 6423-946369 6-423-946369 64-23-946369 6-4-23-946369 642-3-946369 6-42-3-946369 64-2-3-946369 6-4-2-3-946369 64239-46369 6-4239-46369 64-239-46369 6-4-239-46369
You can see that the dashes are being inserted into the number in a binary fashion: 0000, 0001, 0010, 0011, etc.
One optimization: when inserting the separators: each partial word must be either all digits or all letters. Also, don't insert a separator between 2 digits.
Another optimization: if we constrain the words to be at least a certain length, we won't get silly results like a-i-i-a-i-a-i-i-i-a
A major optimization: after we have examined all phrases that start with "642-", we can use the same stored results for all 3 letter combinations of 642 (like "mda") that are stored in the cursor. Permutations of the last 7 digits have already been calculated and put in the cursor.
We could also limit all digit words to 1 or 2 in length
The contents of the dictionary are also important. If it is a true spell checker dictionary, then it contains things like b, c, d, e, f.
These are not spelling errors, but they're not words either. Many spelling dictionaries also contain abbreviations, like ny, md, etc.
Because the results are in a cursor, it's easy to do a query:
SELECT * from phrases WHERE AT("fox",phrase)>0nice-win-fox