As I said before in this blog, I have an optimized version of a program to solve a programming challenge

To recap, the challenge was to enumerate the possible phrases that can be obtained from a phone number, like 625-329-4741 is MakeAWish1

I understood the original challenge to be to solve the problem, without regard to efficiency. So, in a couple hours, I came up with a solution that wasn’t very efficient that I posted last week. Subsequently, I polished the optimized version below.

You can try this algorithm out by visiting this site and entering any phone number or string of letters. (Sorry, this site is internal to Microsoft only: I don’t want to bog down my tired old server, described in this blog)

It takes up to 15 seconds to calculate phrases based on a phone keypad  and look them up in a 50,000 word dictionary.

For the original phone number of “642-394-6369” there were 20101 results calculated in about 10 seconds.

&& There are 3 parts to this algorithm.

&& First (GetDigSeqs) all the possible unique digit subsequences are enumerated into DigSeq.

&& For 123, this would be (123), (23), (3). For N digits, it would be maximum N sequences.

&& Part 2: (EnumNum) for each DigSeq, enumerate all possible letter words of length from 1 to len(DigSeq),

&& look them up in the dictionary, and place unique ones in SubWords

&& Part 3: (FormPhrase) For each digseq and, subword, assemble an N length phrase

&& EnumNum This part uses 2 parallel strings: the number to permute and a pattern

&& cPattern governs the pattern of letters to use for each digit's place

&& a "0" for each digits place to indicate which letter of the digit it is (0,1,2,3). 0 indicates use the raw digit

&& At the end, it'll be "1" (for 0,1), "3" (for 2,3,4,5,6,8), or "4" for "7,9", like "3334433143"

CLEAR

cNumber=INPUTBOX("Phone #?","Phone #","642-394-6369")          && number of patterns is 4^8*5^2 = 1638400

dtStart=SECONDS()

GetPhrases(STRTRAN(cNumber,"-",""))          && remove dashes

dtEnd=SECONDS()

?"#secs= ",dtEnd-dtStart,"num=",RECCOUNT("phrases")

#define MAXDIGITLEN         2          && max seq length of digits in final result

PROCEDURE GetPhrases(cNumber as String)            && Given a digit sequence("6423946369") get all phrases

PUBLIC NUMDIGS,ox as dictionary.dict

NUMDIGS = LEN(cNumber)

ox=CREATEOBJECT('dictionary.dict')            && Instantiate the dictionary COM object

ox.DictNum=2 && 1 = 171000 word dictionary, 2= 53869 word

CREATE CURSOR DigSeqs (DigSeq char(NUMDIGS+1))    && all Digit subsequences

INDEX ON DigSeq TAG DigSeq

CREATE CURSOR phrases (phrase char(30))                   && a table into which we can put all resulting phrases

CREATE CURSOR SubWords (SubDigits char(NUMDIGS+1),SubWord char(NUMDIGS+1))        && All subwords found in dictionary

INDEX ON SubDigits TAG SubDigits

INDEX ON SubWord TAG SubWord

SET ORDER TO 1

SELECT SubWords

GetDigSeqs(cNumber)

FormPhrase(cNumber,"")

PROCEDURE GetDigSeqs(cNumber as String)           && cNumber is all digis. Find the subsequences.

LOCAL i,fShorterFound

IF !SEEK(cNumber+' ',"DigSeqs")                   && if the number is not in the table

fShorterFound= SEEK(LEFT(cNumber,LEN(cNumber)-1)+' ')           && Chop off 1 digit at the end and see if that's found

INSERT INTO DigSeqs VALUES (cNumber)                                                 && add the number into the table

IF !fShorterFound                                                                    && if the shorter was not found,

EnumNum(cNumber,REPLICATE("0",LEN(cNumber)),1)       && enumerate all words, look in dictionary

ENDIF

ENDIF

IF LEN(cNumber) >1              && If we need to recur

GetDigSeqs(SUBSTR(cNumber,2))     && Recur with the string without the first digit

ENDIF

RETURN

PROCEDURE EnumNum(cNumber as String,cPattern as String, nDigNum as Integer) && cNumber is raw string of digits with no separators

LOCAL i,cDigit,cPat,cStartLet,cLastPat,cNewLet

cDigit=SUBSTR(cNumber,nDigNum,1)           && the digit we're working on

cPat=SUBSTR(cPattern,nDigNum,1)               && where are we on this digit?

cStartLet=SUBSTR("  adgjmptw",ASC(cDigit)-47,1)   &&   Starting letter for this digit: 0  1  2abc, 3def, 4ghi, 5jkl, 6mno, 7pqrs, 8tuv, 9wxyz

cLastpat =SUBSTR("0033333434",ASC(cDigit)-47,1)            && # permutations -1 for this digit: 0  0    4,    4,    4,    4,    4,    5,     4,    5

DO WHILE .t.

IF !"0"\$LEFT(cPattern,nDigNum)

IF !SEEK(LEFT(cNumber,nDigNum)+' ',"SubWords",2) and ox.isword(LEFT(cNumber,nDigNum))

INSERT INTO SubWords (SubDigits,SubWord) VALUES (LEFT(DigSeqs.DigSeq,nDigNum),LEFT(cNumber,nDigNum))

ENDIF

IF nDigNum < LEN(cNumber) && we haven't enumerated all digits yet

EnumNum(cNumber,cPattern,nDigNum+1)      && recur with next digit

ENDIF

ENDIF

IF cPat=cLastPat && this digit has reached the end (for '7', we've done "7","p","q","r","s")

EXIT

ENDIF

cPat = CHR(ASC(cPat)+1)                 && increment the pattern

cPattern=LEFT(cPattern,nDigNum-1) + cPat + SUBSTR(cPattern, nDigNum+1)         && insert the new letter into the pattern

cNewlet = CHR(ASC(cStartLet)+ASC(cPat)-49)        && from 'a' to 'b' or from 'n' to 'o'

cNumber=LEFT(cNumber,nDigNum-1) + cNewLet + SUBSTR(cNumber, nDigNum+1)        && insert the new letter into the number

ENDDO

RETURN

PROCEDURE FormPhrase(cNumber as String,cPartResult as String)   && cNumber is digit only sequence

LOCAL i,nLen,nRec,cPartDigit

nLen = LEN(cNumber)

IF nLen = 0

INSERT INTO phrases VALUES (SUBSTR(cPartResult,2))   && if we've used all the digits, then  we have a phrase. Remove the initial '-'

ELSE

FOR i = 1 TO nLen

cPartDigit=LEFT(cNumber,i)   && the first i digits of the number

IF i <=MAXDIGITLEN AND LEN(cPartResult)>0 AND ISALPHA(RIGHT(cPartResult,1))       && don't insert 2 digit sequences adjacent

FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit)       && insert the digits as a part result

ENDIF

IF LEFT(cPartDigit,1)\$"01" AND i <= MAXDIGITLEN && if it starts with '0' or '1', it's ok

FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit)

ELSE

SEEK cPartDigit+' '      && now find any subword associated with this digit sequence

SCAN WHILE SubDigits=cPartDigit+' '           && for each subwords found (364 can be "dog" or "fog")

nRec=RECNO()                                  && save/restore the current record

FormPhrase(SUBSTR(cNumber,i+1),cPartResult+'-'+ TRIM(SubWords.SubWord))  && recur with those subwords as partial results

GO nRec

ENDSCAN

ENDIF

ENDFOR

ENDIF

RETURN

End of program