As I said before in this blog, I have an optimized version of a program to solve a programming challenge
To recap, the challenge was to enumerate the possible phrases that can be obtained from a phone number, like 625-329-4741 is MakeAWish1
I understood the original challenge to be to solve the problem, without regard to efficiency. So, in a couple hours, I came up with a solution that wasn’t very efficient that I posted last week. Subsequently, I polished the optimized version below.
You can try this algorithm out by visiting this site and entering any phone number or string of letters. (Sorry, this site is internal to Microsoft only: I don’t want to bog down my tired old server, described in this blog)
It takes up to 15 seconds to calculate phrases based on a phone keypad and look them up in a 50,000 word dictionary.
For the original phone number of “642-394-6369” there were 20101 results calculated in about 10 seconds.
&& There are 3 parts to this algorithm.
&& First (GetDigSeqs) all the possible unique digit subsequences are enumerated into DigSeq.
&& For 123, this would be (123), (23), (3). For N digits, it would be maximum N sequences.
&& Part 2: (EnumNum) for each DigSeq, enumerate all possible letter words of length from 1 to len(DigSeq),
&& look them up in the dictionary, and place unique ones in SubWords
&& Part 3: (FormPhrase) For each digseq and, subword, assemble an N length phrase
&& EnumNum This part uses 2 parallel strings: the number to permute and a pattern
&& cPattern governs the pattern of letters to use for each digit's place
&& a "0" for each digits place to indicate which letter of the digit it is (0,1,2,3). 0 indicates use the raw digit
&& At the end, it'll be "1" (for 0,1), "3" (for 2,3,4,5,6,8), or "4" for "7,9", like "3334433143"
cNumber=INPUTBOX("Phone #?","Phone #","642-394-6369") && number of patterns is 4^8*5^2 = 1638400
GetPhrases(STRTRAN(cNumber,"-","")) && remove dashes
#define MAXDIGITLEN 2 && max seq length of digits in final result
PROCEDURE GetPhrases(cNumber as String) && Given a digit sequence("6423946369") get all phrases
PUBLIC NUMDIGS,ox as dictionary.dict
NUMDIGS = LEN(cNumber)
ox=CREATEOBJECT('dictionary.dict') && Instantiate the dictionary COM object
ox.DictNum=2 && 1 = 171000 word dictionary, 2= 53869 word
CREATE CURSOR DigSeqs (DigSeq char(NUMDIGS+1)) && all Digit subsequences
INDEX ON DigSeq TAG DigSeq
CREATE CURSOR phrases (phrase char(30)) && a table into which we can put all resulting phrases
CREATE CURSOR SubWords (SubDigits char(NUMDIGS+1),SubWord char(NUMDIGS+1)) && All subwords found in dictionary
INDEX ON SubDigits TAG SubDigits
INDEX ON SubWord TAG SubWord
SET ORDER TO 1
PROCEDURE GetDigSeqs(cNumber as String) && cNumber is all digis. Find the subsequences.
IF !SEEK(cNumber+' ',"DigSeqs") && if the number is not in the table
fShorterFound= SEEK(LEFT(cNumber,LEN(cNumber)-1)+' ') && Chop off 1 digit at the end and see if that's found
INSERT INTO DigSeqs VALUES (cNumber) && add the number into the table
IF !fShorterFound && if the shorter was not found,
EnumNum(cNumber,REPLICATE("0",LEN(cNumber)),1) && enumerate all words, look in dictionary
IF LEN(cNumber) >1 && If we need to recur
GetDigSeqs(SUBSTR(cNumber,2)) && Recur with the string without the first digit
PROCEDURE EnumNum(cNumber as String,cPattern as String, nDigNum as Integer) && cNumber is raw string of digits with no separators
cDigit=SUBSTR(cNumber,nDigNum,1) && the digit we're working on
cPat=SUBSTR(cPattern,nDigNum,1) && where are we on this digit?
cStartLet=SUBSTR(" adgjmptw",ASC(cDigit)-47,1) && Starting letter for this digit: 0 1 2abc, 3def, 4ghi, 5jkl, 6mno, 7pqrs, 8tuv, 9wxyz
cLastpat =SUBSTR("0033333434",ASC(cDigit)-47,1) && # permutations -1 for this digit: 0 0 4, 4, 4, 4, 4, 5, 4, 5
DO WHILE .t.
IF !SEEK(LEFT(cNumber,nDigNum)+' ',"SubWords",2) and ox.isword(LEFT(cNumber,nDigNum))
INSERT INTO SubWords (SubDigits,SubWord) VALUES (LEFT(DigSeqs.DigSeq,nDigNum),LEFT(cNumber,nDigNum))
IF nDigNum < LEN(cNumber) && we haven't enumerated all digits yet
EnumNum(cNumber,cPattern,nDigNum+1) && recur with next digit
IF cPat=cLastPat && this digit has reached the end (for '7', we've done "7","p","q","r","s")
cPat = CHR(ASC(cPat)+1) && increment the pattern
cPattern=LEFT(cPattern,nDigNum-1) + cPat + SUBSTR(cPattern, nDigNum+1) && insert the new letter into the pattern
cNewlet = CHR(ASC(cStartLet)+ASC(cPat)-49) && from 'a' to 'b' or from 'n' to 'o'
cNumber=LEFT(cNumber,nDigNum-1) + cNewLet + SUBSTR(cNumber, nDigNum+1) && insert the new letter into the number
PROCEDURE FormPhrase(cNumber as String,cPartResult as String) && cNumber is digit only sequence
nLen = LEN(cNumber)
IF nLen = 0
INSERT INTO phrases VALUES (SUBSTR(cPartResult,2)) && if we've used all the digits, then we have a phrase. Remove the initial '-'
FOR i = 1 TO nLen
cPartDigit=LEFT(cNumber,i) && the first i digits of the number
IF i <=MAXDIGITLEN AND LEN(cPartResult)>0 AND ISALPHA(RIGHT(cPartResult,1)) && don't insert 2 digit sequences adjacent
FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit) && insert the digits as a part result
IF LEFT(cPartDigit,1)$"01" AND i <= MAXDIGITLEN && if it starts with '0' or '1', it's ok
FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit)
SEEK cPartDigit+' ' && now find any subword associated with this digit sequence
SCAN WHILE SubDigits=cPartDigit+' ' && for each subwords found (364 can be "dog" or "fog")
nRec=RECNO() && save/restore the current record
FormPhrase(SUBSTR(cNumber,i+1),cPartResult+'-'+ TRIM(SubWords.SubWord)) && recur with those subwords as partial results
End of program