the1's WebLog

  • More on Getting the Number of Array Elements

    In a previous post, I discussed a safer way to get the number of elements in a C++ array.  I mentioned that the countof() macro doesn’t work with local types (i.e. types defined inside a function definition).

     

    I just realized that the macro also fails to work with anonymous types under certain circumstances.  To be exact, If you call countof() on two arrays that have different anonymous types AND same size, then the second call will fail to compile.

    e.g.

     

    struct

    {

        int x;

    } g_a[100];

     

    struct

    {

        int x;

    } g_b[100];

     

     

        int g_size;

       

        g_size = countof( g_a );  // First call - OK

        g_size = countof( g_b );  // Second call - compiler ERROR!

     

    The reason is that in this case, the second instantiation of the template helper function _ArraySizeHelper clashes with the first instantiation.

     

    Note that if you change the size of g_b to something other than 100, it compiles fine.

     

    So the caveat should be “countof() does not work with local types, and under certain circumstances does not work with anonymous types.”

     

    I cannot believe it’s so difficult to get such a simple job (getting the number of array elements) done in C++.

     

    What a broken language it is.

  • Programming in XSLT!

    Not so long ago, I posted a programming challenge, and received solutions in many different languages.  Just when I thought people had forgot about this, I got this one from Anton Triest.  It was written in a programming language called ... (surprise) XSLT!

    Ask not what you can do in XSLT; ask what you cannot do in XSLT. :-)

    (For those of you concerned that you need to buy an XSLT compiler to try it out, take heart -- all you need is your web browser)

     

     

  • Generalize Smart Pointers in C++

    After programming in C++ for a while, you will inevitably be introduced to the concept of “smart pointers” (or you will discover them on your own).  These are pointers that know to automatically release the objects they point to when they (the pointers) are destructed.  They have proved quite handy in avoiding memory leaks.

     

    The most well known implementation of smart pointers probably is auto_ptr in STL and CAutoPtr in ATL, which will delete the object upon destruction.

     

    auto_ptr and CAutoPtr are fine, but they have limitations:

     

    1. Sometimes the pointer is not allocated via the new operator (e.g. malloc()).  In these cases, you don’t want to use auto_ptr or CAutoPtr as the pointer will by wrongfully deleted (as opposed to be free()-ed in the case of malloc()).
    2. They are good for wrapping pointers only.  In general, you would like to allocate some resource and have the compiler release it for you automatically.

     

    Why not generalize smart pointers to smart resource managers?  It’s actually easy to do so.  Here’s my take on this:

     

    ///////////////////////////////////////////////////////////

    // Class CAutoResource

    //

    // An object of this template class holds a resource, and will dispose of

    // the resource when it is destructed.  This class is a generalization of

    // auto_ptr in STL and CAutoPtr in ATL.  The user must supply the function

    // doing the disposition as a template parameter.

    //

    // Examples where this class can be useful include file/window/registry

    // handles and pointers.

     

    template< typename T, void (*Dispose)( T ), const T NullResource = 0>

    class CAutoResource

    {

    public:

        typedef T ResourceType;

          CAutoResource( T Resource = NullResource )

                : m_Resource( Resource )

          {

          }

     

        ~CAutoResource()

        {

            if ( NullResource != m_Resource )

            {

                Dispose( m_Resource );

            }

        }

     

        operator T() const

        {

            return m_Resource;

        }

     

        T Get() const

        {

            return m_Resource;

        }

     

        void Attach( T NewResource )

        {

            Dispose( m_Resource );

            m_Resource = NewResource;

        }

     

          // The assert on operator & usually indicates a bug.

        T * operator & ()

        {

            _ASSERTE( NullResource == m_Resource );

            return & m_Resource;

        }

     

    private:

        T   m_Resource;

     

        CAutoResource( const CAutoResource& );

        CAutoResource& operator = ( const CAutoResource& );

    };

     

    To use the CAutoResource template class, you need to supply:

     

    1. The type of the resource;
    2. A function that disposes of the resource; and
    3. (Optional) The value that represents the “null” resource.  If you don’t specify it, 0 is used.

     

    Upon destruction, the resource in a CAutoResource object will be released by calling the resource disposition function.

     

    I hope it becomes obvious to you that auto_ptr and CAutoPtr are basically special cases of CAutoResource where the disposition function is the delete operator.  They do have more member functions, but those are not hard to add.

     

    Let’s see some examples of using CAutoResource.

     

    The first one is to wrap the generic Windows HANDLE, which is used by the Win32 API to represent a lot of different things (windows, fonts, brushes, memory, and etc).  A typedef instantiates CAutoResource for this scenario:

     

    #include <windows.h>

     

    // Generic Windows handle

    void DisposeHandle( HANDLE handle )

    {

        if ( NULL != handle )

        {

            CloseHandle( handle );

        }

    }

     

    typedef CAutoResource< HANDLE, DisposeHandle >  CAutoHandle;

     

    Our second example is handle to a registry key:

     

    // Handle to registry key

    void DisposeHKey( HKEY hKey )

    {

        if ( NULL != hKey )

        {

            RegCloseKey( hKey );

        }

    }

     

    typedef CAutoResource< HKEY, DisposeHKey >  CAutoHKey;

     

    Sometimes, two resources have the same C++ type but different semantics and should be released in different manners.  For example, an HINTERNET should be released by either WinHttpCloseHandle() or InternetCloseHandle(), depending on whether it was allocated using the WinHttp or the WinINet API.  This is no problem for us, as the disposition function is a type parameter to CAutoResource and you can specify different functions for the same C++ type:

     

    #include <winhttp.h>

     

    // Internet handle opened via the WinHttp API

    void DisposeWinHttpHandle( HINTERNET handle )

    {

        if ( NULL != handle )

        {

            WinHttpCloseHandle( handle );

        }

    }

     

    typedef CAutoResource< HINTERNET, DisposeWinHttpHandle > CAutoHWinHttp;

     

    #include <wininet.h>

     

    // Internet handle opened via the WinINet API

    void DisposeHInternet( HINTERNET handle )

    {

        if ( NULL != handle )

        {

            InternetCloseHandle( handle );

        }

    }

     

    typedef CAutoResource< HINTERNET, DisposeHInternet >  CAutoHInternet;

     

    Let’s give one example where the resource we want to manage is a pointer:

     

    // Memory allocated via LocalAlloc() or LocalReAlloc()

    void DisposeLocalMem( LPVOID lpMem )

    {

        if ( NULL != lpMem )

        {

            LocalFree( lpMem );

        }

    }

     

    typedef CAutoResource< LPVOID, DisposeLocalMem > CAutoLocalMem;

     

    So far so good, but as soon as you try to dereference a pointer masqueraded as a CAutoResource, you’ll find we haven’t defined the dereference operator (*) and the arrow operator (->) yet.  This is because they don’t make sense in the context of general resources.

     

    What we can do is to define these operators in a sub-class of CAutoResource.  I would’ve called it CAutoPtr, but that name is taken, so instead we have CAutoPtrEx:

     

    ///////////////////////////////////////////////////////////

    // Class CAutoPtrEx

    //

    // An object of this template class holds a pointer, and will dispose of

    // the object pointed to when it is destructed.  This class is a

    // generalization of auto_ptr in STL and CAutoPtr in ATL.

    // It is preferred to auto_ptr and CAutoPtr when 'delete' is not the right

    // way to dispose of the object.  The user must supply the function doing

    // the disposition as a template parameter.

     

    template< typename T, void (*Dispose)( T * ) >

    class CAutoPtrEx : public CAutoResource< T *, Dispose, NULL >

    {

    public:

        CAutoPtrEx( T * ptr = NULL )

            : CAutoResource< T *, Dispose >( ptr )

        {}

     

        T& operator * () const

        {

            return *Get();

        }

     

        T * operator -> () const

        {

            return Get();

        }

    private:

        CAutoPtrEx( const CAutoPtrEx& );

        CAutoPtrEx& operator = ( const CAutoPtrEx& );

    };

     

    I’ll finish this article with an example using CAutoPtrEx:

     

    // Memory allocated via CoTaskMemAlloc() or CoTaskMemRealloc()

    template <typename T>

    class CComMem

    {

    public:

        static void Dispose( T * pMem )

        {

            if ( pMem )

            {

                ::CoTaskMemFree( pMem );

            }

        }

     

        typedef CAutoPtrEx< T, CComMem::Dispose > AutoPtr;

    };

     

    (This posting is provided "AS IS" with no warranties, and confers no rights.)

  • Top reasons for failing to download Visual Studio Express beta 1

    As you might know, now you can download Visual Studio express SKU's from here for free.

    Since the release last Monday, we have seen that some users failed to download this product.  The top reasons include:

    1. The user cancelled the download.

    Please be reminded that you are downloading a large product, for example, the documentation component (MSDN express) itself is over 150 MB.  And, given the volume of download we are receiving, sometimes the progress will be very slow and it may look like hanging.  Give it more time and you have a good chance of eventually finish the download.

    2. The BITS service is disabled on the machine.

    The VS express web setup relies on the BITS (Background Intelligent Transfer Service) service on the user's machine to download the setup files.  BITS is enabled by default when you install Windows, but some user may choose to disable it for “performance gain”.  This is not necessary as BITS is taking very little resource.  You need to enable BITS before trying to install VS express over the web.

    3. The user is not logged on.

    Some users are trying to run VS express web setup under other's identity by using the “runas” command.  This is not supported by BITS.  You have to be logged on interactively.  Another unsupported scenario is running the setup in a Terminal Server session into a Windows 2000 server machine.  (Terminal Server into Windows XP and Windows 2003 server is fine.)

  • Improve Longhorn code quality

    Starting the coming Monday (July 12, 2004), I will be working for the Windows devision instead of Visual Studio.  My new team is focusing on improving the quality of the Longhorn code base.  I might write more about it once I get an idea on what it is really like there.

    I would love to hear your suggestions and comments on best software development practice, and what you think we can do to improve the quality of a software project that involves millions of lines of code.  Do you have a story to share?  That would be great.

  • Welcome, Aaron!

    I'm glad to see Aaron Stebner started blogging about setup (installer).

    Before Aaron switched team, he was a QA lead in the VS / .NET setup team, which I'm working for.  He was a great resource then.  Looks like his posts will be a new valuable source of information for my work.  I expect them to be useful to anyone interested in setup technologies as well.

    In particular, he recommended this article on MSI component rules written by Rob Mensching.  It's 11pm and I need to rest now, but the first thing on Monday I'm gonna read it and save it in OneNote for future reference.

     

     

  • How Would You Get the Count of an Array in C++?

    The question is simple: given a C++ array (e.g. x as in int x[10]), how would you get the number of elements in it?

     

    An obvious solution is the following macro (definition 1):

     

    #define countof( array ) ( sizeof( array )/sizeof( array[0] ) )

     

    I cannot say this isn’t correct, because it does give the right answer when you give it an array.  However, the same expression gives you something bogus when you supply something that is not an array.  For example, if you have

     

              int * p;

     

    then countof( p ) always give you 1 on a machine where an int pointer and an int have the same size (e.g. on a Win32 platform).

     

    This macro also wrongfully accepts any object of a class that has a member function operator[].  For example, suppose you write

     

    class IntArray

    {

    private:

        int * p;

        size_t size;

     

    public:

        int & operator [] ( size_t i );

    } x;

     

    then sizeof( x ) will be the size of the x object, not the size of the buffer pointed to by x.p.  Therefore you won’t get a correct answer by countof( x ).

     

    So we conclude that definition 1 is not good because the compiler does not prevent you from misusing it.  It fails to enforce that only an array can be passed in.

     

    What is a better option?

     

    Well, if we want the compiler to ensure that the parameter to countof is always an array, we have to find a context where only an array is allowed.  The same context should reject any non-array expression.

     

    Some beginners may try this (definition 2):

     

    template <typename T, size_t N>

    size_t countof( T array[N] )

    {

        return N;

    }

     

    They figure, this template function will accept an array of N elements and return N.

     

    Unfortunately, this doesn’t compile because C++ treats an array parameter the same as a pointer parameter, i.e. the above definition is equivalent to:

     

    template <typename T, size_t N>

    size_t countof( T * array )

    {

        return N;

    }

     

    It now becomes obvious that the function body has no way of knowing what N is.

     

    However, if a function expects an array reference, then the compiler does make sure that the size of the actual parameter matches the declaration.  This means we can make definition 2 work with a minor modification (definition 3):

     

    template <typename T, size_t N>

    size_t countof( T (&array)[N] )

    {

        return N;

    }

     

    This countof works very well and you cannot fool it by giving it a pointer.  However, it is a function, not a macro.  This means you cannot use it where a compile time constant is expected.  In particular, you cannot write something like:

     

    int x[10];

    int y[ 2*countof(x) ]; // twice as big as x

     

    Can we do anything about it?

     

    Someone (I don’t know who it is – I just saw it in a piece of code from an unknown author) came up with a clever idea: moving N from the body of the function to the return type (e.g. make the function return an array of N elements), then we can get the value of N without actually calling the function.

     

    To be precise, we have to make the function return an array reference, as C++ does not allow you to return an array directly.

     

    The implementation of this is:

     

    template <typename T, size_t N>

    char ( &_ArraySizeHelper( T (&array)[N] ))[N];

     

    #define countof( array ) (sizeof( _ArraySizeHelper( array ) ))

     

    Admittedly, the syntax looks awful.  Indeed, some explanation is necessary.

     

    First, the top-level stuff

     

    char ( &_ArraySizeHelper( ... ))[N];

     

    says “_ArraySizeHelper is a function that returns a reference (note the &) to a char array of N elements”.

     

    Next, the function parameter is

     

    T (&array)[N]

     

    which is a reference to a T array of N elements.

     

    Finally, countof is defined as the size of the result of the function _ArraySizeHelper.  Note we don’t even need to define _ArraySizeHelper(), -- a declaration is enough.

     

    With this new definition,

     

    int x[10];

    int y[ 2*countof(x) ]; // twice as big as x

     

    becomes valid, just as we desire.

     

    Am I happy now?  Well, I think this definition is definitely better than the others we have visited, but it is still not quite what I want.  For one thing, it doesn’t work with types defined inside a function.  That’s because the template function _ArraySizeHelper expects a type that is accessible in the global scope.

     

    I don’t have a better solution.  If you know one, please let me know.

     

    BTW, how does one solve the same problem for C#?  It’s trivial there.  You just say “x.length()” (I forgot the actual syntax, but you should get the idea).

     

    One thing to point out is that, in C# the count of an array is not in its type.  You decide how many elements are there when you create the array at run time.  Therefore, don’t even try to derive this information at compile time.

  • A JScript entry to the Phaser chanllenge

    You probably have got tired of me, and my phone number phraser chanllenge, but you should check this one out.

     

    Today I got a JavaScript (!) program that solves the phraser problem from Nicholas Allen.  This one is surprisingly short, and fairly clear too.  I have to study JScript sometimes.

     

    Note that JScript is not statically typed, which means many type errors will not be detected until run time.  This is a major drawback to me.  I strongly prefer to work in a type safe language that is statically typed.  Actually, many times when you get rid of all type errors in a Haskell program, it just works – a good type system allows you to express so much!

     

    Nicholas used a trie from letters to words.  I would use a trie from digits to sets of words.  Since more than one letters map to one digit, this change will result in a much smaller trie (there are much fewer keys) and a much faster algorithm (we look up a whole set of words, instead of a single word, at one time).

     

    Here’s Nicholas’ message:

     

    “I was reading over the solutions today and found the variety interesting.  The Haskell method was very well done and really showed off the language well.  Some of the great features of functional languages like lambda and apply are starting to appear in C-like languages so maybe one day everyone will be able to solve problems like this so pithily :)

     

    Since I had a JavaScript solution written previously I thought I'd share it.  It's amazing how many languages JS borrowed from in addition to Java.  My solution uses a trie which I think was done fairly simply.  It's a bit shorter than even the Haskell.

     

    Note: to actually run this you'll need a JS interpreter that can handle files, such as Rhino.”

     

    var number = "6423946369";

     

    var letters = [,,{A:1,B:2,C:3},{D:1,E:2,F:3},{G:1,H:2,I:3},{J:1,K:2,L:3},

                   {M:1,N:2,O:3},{P:1,Q:2,R:3,S:4},{T:1,U:2,V:3},{W:1,X:2,Y:3,Z:4}];

    var trie = [];

    var words = readFile ("dict").toUpperCase ().split ("\n");

     

    function add (node, word) {

       if (!node [word [0]]) node [word [0]] = [];

       if (word == "") node.isWord = true;

       else add (node [word [0]], word.substr (1));

    }

     

    function find (node, word, number) {

       if (number == "") {

          if (node == trie || node.isWord) print (word);

          return;

       }

       for (letter in letters [number [0]])

          if (node [letter]) find (node [letter], word + letter, number.substr (1));

       if (node.isWord) find (trie, word, number);

       if (node == trie) find (trie, word + number [0], number.substr (1));

    }

     

    for (word in words) if (words [word] != "") add (trie, words [word]);

    find (trie, "", number);

     

  • One more entry to the Phaser challenge

    Since I keep receiving entries to my programming challenge, I’ve compiled a list.  New entries will be added to that list as they are received.

     

    Last Saturday (4/3/04) I received another C# solution from Frans Bouma.  In his own words:

     

    “Hi,

     

    This week I started my phraser solution in C#, and today I finished it. As you had to make it as clean as possible I did, and it might look somewhat verbose through the comments (no not // add 1 comments ;)).

     

    The core phraser routine is just 34 lines with half of them { and } or comments.

     

    I paste the complete code below. As a wordlist, you should simply use a textfile and on each line 1 word.

     

    When you compile the code, simply run it like:

    phraser.exe 6423946369 words.txt 2

     

    where 2 is the maximum amount of digits. You can also specify -1 which means no limit.

     

    The algo is performing ok and can be made even faster with a more clever hashtable setup, as I've explained in the comments :)

     

    Cheers, and thanks for this fun compo. :)”

     

    using System;

    using System.Collections;

    using System.IO;

     

    namespace Phraser

    {

        /// <summary>

        /// Phrase creator class, which creates all phrases possible of a phone number, using a word list.

        /// The algorithm used: read all words in the wordlist and encode them into digit sequences (thus the

        /// digit sequence required to form the word using the phone keyboard). Then match the digit sequences

        /// with the phone number digits to find words in the phone number.

        /// This is more efficient than the other algorithm: calculate all permutations of possible character

        /// sequences which can be formed with the phone number.

        /// </summary>

        public class PhraseCreator

        {

            #region Class Member Declarations

            private   Hashtable         _word2DigitSequence;

            private string              _phoneNumber, _wordListFilename;

            private int                       _maxAmountOfDigitsInPhrase;

            #endregion

     

            #region Constructors

            /// <summary>

            /// Constructor.

            /// </summary>

            /// <param name="phoneNumber">The phone number to transform into phrases</param>

            /// <param name="wordlistFilename">The filename with the english words to use in phrases.</param>

            /// <param name="maxAmountOfDigitsInPhrase">The maximum amount of digits to be placed in the phrase.

            /// -1 means unlimited.</param>

            public PhraseCreator(string phoneNumber, string wordListFilename, int maxAmountOfDigitsInPhrase)

            {

                _phoneNumber = phoneNumber;

                _wordListFilename = wordListFilename;

                _word2DigitSequence = new Hashtable();

                _maxAmountOfDigitsInPhrase = maxAmountOfDigitsInPhrase;

            }

            #endregion

     

            #region Public methods

            /// <summary>

            /// Initializes the class for the phrasing process.

            /// Loads wordlist and encodes all words read into digit sequences. Wordlists have to have 1 word per

            /// line. Skips all words which are longer than the specified phonenumber.

            /// </summary>

            public void Initialize()

            {

                StreamReader reader = null;

                string wordFromFile;

     

                try

                {

                    reader = new StreamReader(_wordListFilename);

     

                    // loop through the lines in the file. Each word is on a single line. Strip whitespace

                    while((wordFromFile = reader.ReadLine()) != null)

                    {

                        wordFromFile = wordFromFile.Trim();

                        if((wordFromFile.Length==0) || (wordFromFile.Length>_phoneNumber.Length))

                        {

                            // not valid.

                            continue;

                        }

     

                        // line read, each line contains 1 word. Create digit sequence for word read and store it in

                        // the hashtable.

                        wordFromFile = wordFromFile.ToLower();

     

                        // transform it to a digit sequence so we can use it for easy matching later.

                        string wordAsDigits = DigitizeWord(wordFromFile);

     

                        // store both in hashtable

                        if(!_word2DigitSequence.ContainsKey(wordFromFile))

                        {

                            _word2DigitSequence.Add(wordFromFile, wordAsDigits);

                        }

                    }

                }

                finally

                {

                    reader.Close();

                }

            }

     

     

            /// <summary>

            /// Starts the phrasing process by calling the routine which dives into the recursion.

            /// </summary>

            public void StartPhrasingProcess()

            {

                DoCreatePhrases(0, "", 0);

            }

     

            #endregion

     

     

            #region Private methods

            /// <summary>

            /// Creates all the possible phrases for the phone number passed in in the constructor.

            /// It uses the read words in the file which filename was passed in in the constructor.

            /// </summary>

            /// <param name="startIndex">the current index to start scanning in the phonenumber</param>

            /// <param name="phraseInProgress">the phrase which is currently being build</param>

            /// <param name="currentAmountOfDigits">the current amount of digits in the phraseInProgress</param>

            /// <remarks>

            /// It uses a simple algorithm. It walks all words read and uses their digit sequence representations to

            /// find them back in the phone number passed in starting with the first digit, and so on. If the current

            /// digit in the phone number is not usable in any word, it is kept as a digit in the phrase. After the

            /// complete wordlist has been processed for the current index, the digit is kept and the current index is

            /// increased with 1. Until the end of the phone number is reached and recursion stops and back tracks. The

            /// amount of digits is limited by the amount passed to the constructor. A '-' is placed between words.

            /// <br><br>

            /// It will write all found phrases to the console. Although the algorithm is very fast, it can be sped up

            /// even more by constructing a hashtable with per digit (0-9) a list of words which digit sequence representations

            /// start with that digit. This would greatly reduce  the amount of words to loop through for each digit position.

            /// </remarks>

            private void DoCreatePhrases(int startIndex, string phraseInProgress, int currentAmountOfDigits)

            {

                if(startIndex >= _phoneNumber.Length)

                {

                    // we have scanned the complete phonenumber through recursion. We can exit now by printing the phrase

                    // we've found to the console.

                    Console.WriteLine("Phrase found: {0}", phraseInProgress);

                    return;

                }

     

                string newPhraseInProgress = String.Empty;

                foreach(DictionaryEntry wordDigitSequencePair in _word2DigitSequence)

                {

                    string wordAsDigitSequence = (string)wordDigitSequencePair.Value;

     

                    // check if word matches the digits starting on startindex in phonenumber. If so, we have found a word.

                    if((startIndex + wordAsDigitSequence.Length) > _phoneNumber.Length)

                    {

                        // word can never match as it is longer than the remaining digits.

                        continue;

                    }

                    if(_phoneNumber.Substring(startIndex, wordAsDigitSequence.Length) == wordAsDigitSequence)

                    {

                        // Match found. Dive into recursion with this new phraseInProgress

                        DoCreatePhrases(startIndex + wordAsDigitSequence.Length, MakeNewCurrentPhrase(phraseInProgress,

                            wordDigitSequencePair.Key.ToString()), currentAmountOfDigits);

                    }

                }

     

                if((_maxAmountOfDigitsInPhrase == -1) || (_maxAmountOfDigitsInPhrase > currentAmountOfDigits))

                {

                    // simply use the digit as a 'word' in the phrase and continue with the next digit. Then dive into recursion

                    // with this new phraseInProgress

                    DoCreatePhrases(startIndex+1, MakeNewCurrentPhrase(phraseInProgress, _phoneNumber[startIndex].ToString()),

                        currentAmountOfDigits+1);

                }

            }

     

     

            /// <summary>

            /// Creates a new phrase based on the two inputs. If the current phrase is empty, no '-' is emitted.

            /// </summary>

            /// <param name="currentPhrase">the current phrase we're working on</param>

            /// <param name="wordToAdd">the new word to add to the current phrase</param>

            /// <returns>new current phrase</returns>

            private string MakeNewCurrentPhrase(string currentPhrase, string wordToAdd)

            {

                if(currentPhrase.Length > 0)

                {

                    return currentPhrase + "-" + wordToAdd;

                }

                else

                {

                    return wordToAdd;

                }

            }

     

     

            /// <summary>

            /// Transforms the passed in word into a sequence of digits, the same sequence which is required to press

            /// on a phone keyboard to form the word.

            /// </summary>

            /// <param name="word">The word to transform into digits</param>

            /// <returns>The word as a string of digits</returns>

            /// <example>"nice" will be transformed into: "6423"</example>

            private string DigitizeWord(string word)

            {

                // get word as characters

                char[] wordAsCharacters = word.ToCharArray();

                char[] wordAsDigitSequence = new char[wordAsCharacters.Length];

     

                // digitize the word using CharacterToDigit

                for (int i = 0; i < wordAsCharacters.Length; i++)

                {

                    wordAsDigitSequence[i] = CharacterToDigit(wordAsCharacters[i]);

                }

     

                // create a new string from the digits which are stored as characters.

                return new string(wordAsDigitSequence);

            }

     

     

            /// <summary>

            /// Transforms the passed in character, which can be a member of the range a-z, to a digit, which can be

            /// 2-9 (as only the keys 2-9 have characters on them on a phone keyboard).

            /// </summary>

            /// <param name="character">The character to transform into a digit</param>

            /// <returns>The digit representation of the passed in character.</returns>

            /// <example>'n' will be transformed into '6'</example>

            private char CharacterToDigit(char character)

            {

                // set variable to return to a default: 2.

                char digitToReturn = '2';

     

                // do the transformation.

                switch(character)

                {

                    case 'a':

                    case 'b':

                    case 'c':

                        digitToReturn = '2';

                        break;

                    case 'd':

                    case 'e':

                    case 'f':

                        digitToReturn = '3';

                        break;

                    case 'g':

                    case 'h':

                    case 'i':

                        digitToReturn = '4';

                        break;

                    case 'j':

                    case 'k':

                    case 'l':

                        digitToReturn = '5';

                        break;

                    case 'm':

                    case 'n':

                    case 'o':

                        digitToReturn = '6';

                        break;

                    case 'p':

                    case 'q':

                    case 'r':

                    case 's':

                        digitToReturn = '7';

                        break;

                    case 't':

                    case 'u':

                    case 'v':

                        digitToReturn = '8';

                        break;

                    case 'w':

                    case 'x':

                    case 'y':

                    case 'z':

                        digitToReturn = '9';

                        break;

                }

     

                return digitToReturn;

            }

            #endregion

     

        }

     

     

        /// <summary>

        /// Startup class for the application.

        /// </summary>

        public class Startup

        {

            /// <summary>

            /// The main entry point for the application.

            /// Expects 3 parameters:

            /// parameter 1: phone number to phrase.

            /// parameter 2: filename with English words.

            /// parameter 3: max amount of digits to place in the phrase. can be -1, which means 0 maximum set.

            /// </summary>

            [STAThread]

            static void Main(string[] args)

            {

                // check input

                if(args.Length < 3)

                {

                    // wrong amount of arguments

                    Console.WriteLine("Usage: Phraser Phonenumber Filename MaxAmountOfDigits" + Environment.NewLine +

                        "where 'Filename' is the file with words to use" + Environment.NewLine +

                        "where 'MaxAmountOfDigits is the maximum amount of digits to end up in the phrase. Can be -1 (no limit)");

                    return;

                }

     

                try

                {

                    // 3 arguments specified. Create the phrases. Use the PhraseCreator class to do that.

                    PhraseCreator thePhraser = new PhraseCreator(args[0], args[1], Convert.ToInt32(args[2]));

                    thePhraser.Initialize();

                    // perform the phrasing

                    thePhraser.StartPhrasingProcess();

                }

                catch(Exception ex)

                {

                    // display the exception in the console.

                    Console.WriteLine("Exception caught: {0}", ex.Message);

                    Console.WriteLine("Source: {0}", ex.Source);

                    Console.WriteLine("Stacktrace: {0}", ex.StackTrace);

                }

            }

        }

    }

     

  • My own solution to my programming challenge

    I posted this programming puzzle on Monday this week, and have received a solution in Fox Pro from Calvin Hsia, one in C# from Justin Rodgers, and one in C++ from Michael Scholz.  So we have 3 players using 3 different languages.  Cool!

     

    I promised to post my own solution, and here you go.  I even threw in some analysis for free!

     

    Of the 4 solutions, I like mine the most. :-)  OK, so I may be a Narcissist, but please read what I have to say first – you can decide afterwards.

     

    1.     Overview of the data structure and algorithm

     

    Like coding, problem-solving is often about abstraction: once you abstract away all irrelevant details, the solution becomes much more obvious.  This is how I tried to attack this problem: by first finding the essence of the problem.  I made the problem more formal by re-phrasing (excuse the pun) it as:

     

    Given a sequence of n digits, find all the ways to represent it using a sequence of words.

     

    Note that the original problem allows digits in the result, while the above simplification allows only words.  Rest assured that there is no loss of generality though: we can simply add the 10 digits to our vocabulary list and count them as words.

     

    Next, I observed that once we fix the first word w in the result sequence, we only need to solve the same problem on a smaller scale:

     

    Given a sequence of n-k digits, where k is the length of w, find all the ways to represent it using a sequence of words.

     

    When we have a solution to the smaller problem, we can just append it to w to get a solution to the original problem.  In fact, all solutions to the original problem can be obtained this way, by enumerating all possible w’s.

     

    This divide-and-conquer strategy naturally lends itself to a recursive definition:

     

    Let P(s) be the set of all valid ways to represent the digit sequence s with a sequence of words, and P1(s) be the set of all valid ways to represent s with a single word.  We have (the base case):

     

    P( empty sequence ) = { empty sequence }

     

    and when s is not empty (the inductive case):

     

              P( s ) = { w++p | s1 is non-empty, s1++s2 == s, w is in P1( s1 ) and p is in P( s2 ) }

     

    (Here I’m using “++” for concatenation.)

     

    Now, scroll up to the beginning of this section and read again.  Do this until you are convinced this is the right definition.  Or until you find an error (in this case, drop me a note).  If neither of these happens, I’m sorry I have confused you – let me know and I will happily refund you and try to explain it better.

     

    Now that you’ve digested P( s ), we only need to find a way to calculate P1( s ).  Basically, this means given a digit sequence s, we want to find all the words that can single-handed represent s.  This suggests that we need a map from digit sequences to sets of words.  And that’s what I used to represent the word list.  I call it the WordMap in my program.

     

    With this WordMap, we don’t need to enumerate all possible source strings that correspond to a phone #, as Calvin and Scott’s solutions do.  Instead, we only generate valid source strings.  This is much more efficient.

     

    Calvin’s program treats different ways for dividing a digit sequence as different phrases (so “1-2-3-4” and “123-4” are considered different phrases).  I believe this is undesirable and hence don’t generate those otherwise identical entries.  This also improves the efficiency a lot without sacrificing clarity.

     

    Michael took a similar approach and also expressed his algorithm using recursion.  However, his has one more level of loop surrounding the recursive call, and is more complex.  I believe his program’s efficiency and mine are comparable.

     

    We now have a complete algorithm.  Bingo!

     

    2.     Choice of the programming language

     

    Next I need to implement the algorithm.  Unlike other contestants, who use Fox Pro, C#, and C++, my favorite programming language is a functional language called Haskell.  (“What the heck is that?”  Don’t worry.  I’ll walk you through my code and hopefully you’ll start to appreciate Haskell afterwards.)

     

    I chose Haskell for its ability to abstract – it allows me to express ideas at a high level, as if I’m explaining them to an intelligent person, as opposed to explaining to a machine that seems more interested in low-level details.  As a result, Haskell programs are often much shorter (an order of magnitude for large programs) than those written in C++, and therefore easier to maintain.

     

    My program contains 53 non-blank lines, shorter than Calvin’s 100-line Fox Pro, Justin’s 140-line C#, and Michael’s 120-line C++.  If you don’t count I/O and typedefs, my core logic is just 22 lines.  Now try to beat that. :-)

     

    In fact, I could’ve made my code even shorter (and clearer to a functional programmer), but then an unprepared audience will have a harder time understanding it.  Therefore what I have now is a compromise.  Still, I hope it can convince you of Haskell’s abstraction power!

     

    Of course, fewer lines don’t always lead to clarity (APL and Perl are known for capable of producing short but cryptic programs).  However, I do believe my version contains less low-level details and is thus easier to understand – given that you are willing to learn a little about Haskell.

     

    3.     Source code

     

    The complete Haskell source code is here:

     

    import Data.Set

    import Char

    import List

    import IO

     

    ------------------------------------------------------------

    -- Some friendly type synonyms

     

    type Digit   = Int

    type Word    = String

    type Phrase  = [Word]

    type WordMap = [Digit] -> Set Word

    type Phraser = WordMap -> [Digit] -> [Phrase]

     

    ------------------------------------------------------------

    -- Hash utilities

     

    -- The characters on each key in a telephone keypad

    keypad = [ "0",     "1",     "2abc",  "3def",  "4ghi"

             , "5jkl",  "6mno",  "7pqrs", "8tuv",  "9wxyz" ]

     

    -- Converts a character to its corresponding digit in the keypad.

    charHash :: Char -> Digit

    charHash ch = case findIndex (elem (toLower ch)) keypad of

                    Just i  -> i

                    Nothing -> -1

     

    -- Converts a String to its corresponding digit sequence.

    strHash :: String -> [Digit]

    strHash str = map charHash str

     

    ------------------------------------------------------------

    -- Constructing a WordMap

     

    -- Reads a list of Words from the word list file.

    readWords :: IO [Word]

    readWords = do h <- openFile "words.txt" ReadMode

                   contents <- hGetContents h

                   return (words contents)

     

    -- Inserts a Word into a WordMap.  Returns the updated WordMap.

    insertWord :: Word -> WordMap -> WordMap

    insertWord word wordMap ds =

      let oldWords = wordMap ds in

        if ds == strHash word

             then addToSet oldWords (map toLower word)

             else oldWords

     

    -- Converts a list of Words to a WordMap.

    wordsToMap :: [Word] -> WordMap

    wordsToMap words = foldr insertWord emptyWordMap (digitWords ++ words)

      where emptyWordMap :: WordMap

            emptyWordMap ds = emptySet

     

            digitWords :: [Word]

            digitWords = [show i | i <- [0..9]]

     

    -- Reads a WordMap from the word list file.

    readWordMap :: IO WordMap

    readWordMap = do words <- readWords

                     return (wordsToMap words)

     

    ------------------------------------------------------------

    -- Pretty printer

     

    -- Capitalizes the first letter in a word

    cap1 :: Word -> Word

    cap1 "" = ""

    cap1 (x:xs) = toUpper x : map toLower xs

     

    -- Prints a Phrase

    showPhrase :: Phrase -> String

    showPhrase p = concat (map cap1 p)

     

    -- Prints a list of Phrases

    showPhrases :: [Phrase] -> String

    showPhrases ps = concat (intersperse "\t" (map showPhrase ps))

     

    ------------------------------------------------------------

    -- Core algorithm

     

    -- Finds all possible ways to phrase a digit sequence

    allPhrases :: Phraser

    allPhrases wordMap [] = [[]]

    allPhrases wordMap ds = [ w : ph | i  <- [1..length ds]

                                     , w  <- setToList (wordMap (take i ds))

                                     , ph <- allPhrases wordMap (drop i ds) ]

     

    -- Runs a Phraser with a phone number

    run :: Phraser -> String -> IO ()

    run phraser num = do wordMap <- readWordMap

                         let ps = phraser wordMap (strHash num)

                         putStrLn ("There are " ++ show (length ps) ++ " solutions:")

                         putStrLn (showPhrases ps)

     

    As can be seen from the in-code comments, the code is divided into the following sections:

     

    • Type definitions
    • Utilities for calculating the corresponding digits (I took the liberty of calling it the Hash, abusing the term) of a text string
    • Input: constructing a WordMap from the word list file
    • Output: a pretty printer for phrases
    • The core algorithm: calculating and printing the result

     

    Since I anticipate most of the readers are not familiar with Haskell, I’ll give an almost line-by-line explanation on the code.

     

    Let’s start with the type definitions.

     

    3.1             Type definitions

     

    type Digit   = Int

    type Word    = String

    type Phrase  = [Word]

    type WordMap = [Digit] -> Set Word

    type Phraser = WordMap -> [Digit] -> [Phrase]

     

    By giving friendly and informative synonyms to types, we can make the code more self-documenting – the reader can tell the purpose of a variable by just looking at its type.

     

    The first couple of them are simple: we represent a Digit by an Int, and a Word is just a String.

     

    A Phrase is a sequence of Words.  In Haskell, [] is the list type constructor, and hence [Word] means “a list of Words”.  It’s like list< Word > in C++.

     

    The only interesting operation we want to do with the vocabulary list (other than constructing it) is to look up which words match a given digit sequence.  Hence we define WordMap as a function from [Digit] to a set of Words.  You may have figured out that “->” means function, and Set Word is like set< Word > in C++.

     

    Finally, a Phraser is something capable of generating all possible phrases for a given phone number, using a WordMap as reference.  Hence it’s defined as a function from a WordMap and a digit sequence (the phone number) to a list of Phrases.  Note that in Haskell two arrows are used for a function type that has two parameters.

     

    3.2             Hash utilities

     

    keypad defines which letters fall on which keys.  It is a list of Strings:

     

    -- The characters on each key in a telephone keypad

    keypad = [ "0",     "1",     "2abc",  "3def",  "4ghi"

             , "5jkl",  "6mno",  "7pqrs", "8tuv",  "9wxyz" ]

     

    charHash finds the Hash (i.e. corresponding digit) of a character.  It does this by finding the index of the element of keypad that contains this character.  If no index is found, -1 is returned, meaning that this character is not in the keypad.

     

    -- Converts a character to its corresponding digit in the keypad.

    charHash :: Char -> Digit

    charHash ch = case findIndex (elem (toLower ch)) keypad of

                    Just i  -> i

                    Nothing -> -1

     

    :: is used to declare the type of an identifier.  Is this case, we are saying that charHash is a function from Char to Digit.  Type declaration is usually not necessary in Haskell, as the compiler can infer the type for you most of the time.  However, it is good documentation to provide your types.

     

    Note that in Haskell function applications do not need the parentheses.  So instead of charHash( ch ), you just say charHash ch.

     

    strHash finds the Hash of a String (i.e. the list of digits corresponding to that String) by calculating the Hash of each character in the String and putting the result in a list.  Note in Haskell we don’t need an iterator or a loop to do that: map takes a function and applies it to every element of a list (What happens to the original list?  Nothing.  It is untouched and map will return a new list to hold the result).  It’s kind of like foreach in C#, but more abstract.

     

    -- Converts a String to its corresponding digit sequence.

    strHash :: String -> [Digit]

    strHash str = map charHash str

     

    3.3             Input

     

    readWords opens the word list file, reads its contents, and splits them into a list of Words (consecutive non-blank characters).  The IO in the type just means readWords needs to do some I/O action.  This is Haskell’s way of reminding you that this function might have side effects.

     

    readWords :: IO [Word]

    readWords = do h <- openFile "words.txt" ReadMode

                   contents <- hGetContents h

                   return (words contents)

     

    We construct the WordMap by inserting Words into it, one at a time.  Hence the following function:

     

    -- Inserts a Word into a WordMap.  Returns the updated WordMap.

    insertWord :: Word -> WordMap -> WordMap

    insertWord word wordMap ds =

      let oldWords = wordMap ds in

        if ds == strHash word

             then addToSet oldWords (map toLower word)

             else oldWords

     

    Remember that WordMap is a function from [Digit] to Set Word?  insertWord is actually accepting a function as parameter and returns another.  This is called higher-order functions.  As C# version 2 makes delegates and anonymous methods easier to use, you can expect to see C# programmers starting to use higher-order functions (at least good C# programmers :-) ).

     

    Then we can construct the entire WordMap by adding a whole list of Words into it:

     

    -- Converts a list of Words to a WordMap.

    wordsToMap :: [Word] -> WordMap

    wordsToMap words = foldr insertWord emptyWordMap (digitWords ++ words)

      where emptyWordMap :: WordMap

            emptyWordMap ds = emptySet

     

            digitWords :: [Word]

            digitWords = [show i | i <- [0..9]]

     

    Again there is no loop here: foldr does that for you and hides the details.  This is what I was talking about: (unnecessary) low-level details don’t belong in a program!

     

    The where clause was used to introduce local definitions.

     

    3.4             Output

     

    I wrote a mini pretty printer for Phrases.  Instead of inserting “-” between two words, I decide to just capitalize the first letter of each word.  This way, all entries are printed with the same width, and are much easier on the eyes (to me at least).

     

    -- Capitalizes the first letter in a word

    cap1 :: Word -> Word

    cap1 "" = ""

    cap1 (x:xs) = toUpper x : map toLower xs

     

    cap1 is defined by pattern matching: If the input is empty, then the output is also empty.  Otherwise suppose the first character in the input is x, and the rest of the input is a String xs, we just turn x into upper case and everything in xs into lower case.  x:y means a list whose head is x and the remainder is y.

     

    Given a Phrase (a list of Words) p, we show it by properly capitalizing each Word in p (that’s what map cap1 p does) and concatenating the results:

     

    -- Prints a Phrase

    showPhrase :: Phrase -> String

    showPhrase p = concat (map cap1 p)

     

    Given a list of Phrases, we show it by separating the elements with TAB:

     

    -- Prints a list of Phrases

    showPhrases :: [Phrase] -> String

    showPhrases ps = concat (intersperse "\t" (map showPhrase ps))

     

    3.5             Core algorithm

     

    Now comes the exciting stuff.  allPhrases is the function for calculating all possible ways to phrase a digit sequence (remember that Phraser is a synonym of (WordMap -> [Digit] -> [Phrase]).

     

    -- Finds all possible ways to phrase a digit sequence

    allPhrases :: Phraser

    allPhrases wordMap [] = [[]]

    allPhrases wordMap ds = [ w : ph | i  <- [1..length ds]

                                     , w  <- setToList (wordMap (take i ds))

                                     , ph <- allPhrases wordMap (drop i ds) ]

     

    Compare this piece of code with our informal definition of P( s ) shown earlier:

     

    P( empty sequence ) = { empty sequence }

    P( s ) = { w++p | s1 is non-empty, s1++s2 == s, w is in P1( s1 ) and p is in P( s2 ) }

     

    Do we see the correspondence?  It’s pretty clear to me.  This is what I love about Haskell most: being able to produce high-level code close to your algorithm description!

     

    The last thing is to hook up I/O with the core algorithm:

     

    -- Runs a Phraser with a phone number

    run :: Phraser -> String -> IO ()

    run phraser num = do wordMap <- readWordMap

                         let ps = phraser wordMap (strHash num)

                         putStrLn ("There are " ++ show (length ps) ++ " solutions:")

                         putStrLn (showPhrases ps)

     

    4.     Sample runs

     

    To phrase a phone # foo, just execute run allPhrases “foo.  For example,

     

    run allPhrases "6423946369"

     

    gives you:

     

    There are 160 solutions:

    6423946369      6423946Do9      6423946Fox      642394Of69      64239I6369

    64239I6Do9      64239I6Fox      64239IOf69      64239Go369      64239GoDo9

    64239GoFox      64239In369      64239InDo9      64239InFox      6423Who369

    6423WhoDo9      6423WhoFox      6423Wind69      6423Wine69      6423Window

    64A3946369      64A3946Do9      64A3946Fox      64A394Of69      64A39I6369

    64A39I6Do9      64A39I6Fox      64A39IOf69      64A39Go369      64A39GoDo9

    64A39GoFox      64A39In369      64A39InDo9      64A39InFox      64A3Who369

    64A3WhoDo9      64A3WhoFox      64A3Wind69      64A3Wine69      64A3Window

    64Be946369      64Be946Do9      64Be946Fox      64Be94Of69      64Be9I6369

    64Be9I6Do9      64Be9I6Fox      64Be9IOf69      64Be9Go369      64Be9GoDo9

    64Be9GoFox      64Be9In369      64Be9InDo9      64Be9InFox      64BeWho369

    64BeWhoDo9      64BeWhoFox      64BeWind69      64BeWine69      64BeWindow

    6I23946369      6I23946Do9      6I23946Fox      6I2394Of69      6I239I6369

    6I239I6Do9      6I239I6Fox      6I239IOf69      6I239Go369      6I239GoDo9

    6I239GoFox      6I239In369      6I239InDo9      6I239InFox      6I23Who369

    6I23WhoDo9      6I23WhoFox      6I23Wind69      6I23Wine69      6I23Window

    6IA3946369      6IA3946Do9      6IA3946Fox      6IA394Of69      6IA39I6369

    6IA39I6Do9      6IA39I6Fox      6IA39IOf69      6IA39Go369      6IA39GoDo9

    6IA39GoFox      6IA39In369      6IA39InDo9      6IA39InFox      6IA3Who369

    6IA3WhoDo9      6IA3WhoFox      6IA3Wind69      6IA3Wine69      6IA3Window

    6IBe946369      6IBe946Do9      6IBe946Fox      6IBe94Of69      6IBe9I6369

    6IBe9I6Do9      6IBe9I6Fox      6IBe9IOf69      6IBe9Go369      6IBe9GoDo9

    6IBe9GoFox      6IBe9In369      6IBe9InDo9      6IBe9InFox      6IBeWho369

    6IBeWhoDo9      6IBeWhoFox      6IBeWind69      6IBeWine69      6IBeWindow

    6Ice946369      6Ice946Do9      6Ice946Fox      6Ice94Of69      6Ice9I6369

    6Ice9I6Do9      6Ice9I6Fox      6Ice9IOf69      6Ice9Go369      6Ice9GoDo9

    6Ice9GoFox      6Ice9In369      6Ice9InDo9      6Ice9InFox      6IceWho369

    6IceWhoDo9      6IceWhoFox      6IceWind69      6IceWine69      6IceWindow

    Nice946369      Nice946Do9      Nice946Fox      Nice94Of69      Nice9I6369

    Nice9I6Do9      Nice9I6Fox      Nice9IOf69      Nice9Go369      Nice9GoDo9

    Nice9GoFox      Nice9In369      Nice9InDo9      Nice9InFox      NiceWho369

    NiceWhoDo9      NiceWhoFox      NiceWind69      NiceWine69      NiceWindow

     

    This is all cool, except that the list contains so many entries with many numbers in it.  No problem, given an arbitrary Phraser, we can easily restrict it by filtering the result it produces.  Behold the power of higher-order functions (in C#, these are functions that take delegates as parameters or return delegates):

     

    -- Restricts a Phraser by a predicate

    restrict :: (Phrase -> Bool) -> Phraser -> Phraser

    restrict pred phraser = \wordMap ds ->

        filter pred (phraser wordMap ds)

     

    Now suppose we are only interested in phrases that contain no more than 2 digits.  We just define a predicate that tells us if a phrase satisfies this test:

     

    atMost2Digits :: Phrase -> Bool

    atMost2Digits ph = length (filter isDigit (showPhrase ph)) <= 2

     

    Then we can use this predicate to restrict the original Phraser.  For example,

     

    run (restrict atMost2Digits allPhrases) "6423946369"

     

    gives us:

     

    There are 24 solutions:

    64BeWhoFox      64BeWindow      6IA3WhoFox      6IA3Window      6IBe9GoFox

    6IBe9InFox      6IBeWhoDo9      6IBeWhoFox      6IBeWindow      6Ice9GoFox

    6Ice9InFox      6IceWhoDo9      6IceWhoFox      6IceWindow      Nice9I6Fox

    Nice9GoDo9      Nice9GoFox      Nice9InDo9      Nice9InFox      NiceWhoDo9

    NiceWhoFox      NiceWind69      NiceWine69      NiceWindow

     

    Isn’t this modular?

     

    5.     Ways to optimize

     

    How would I go about optimizing my program?  First let’s see where the time is spent.  In this algorithm, most of the time the program is either updating the word map or looking up from it.  In this simple implementation, inserting to the word map is an O(1) operation, which is good.  However, looking up takes O(n) time, where n is the number of keys in the map.  This seriously impedes the performance.

     

    I have two ideas for reducing the complexity, both not difficult to implement.  The first is to change the implementation of WordMap from a function to a Hash table, that will make both inserting and looking-up nearly O(1) (assuming a good Hash function).  This alone should dramatically speed things up.

     

    My second idea goes further and uses a trie instead of a Hash.  This would yield guaranteed O(1) access time.  It will also cut down the space footprint because keys share storage in a trie.  (To find out the details, read more about the trie data structure here.)

     

    Both optimizations will make the code look more complex, but not very much.

     

    6.     What has Microsoft got to do with functional programming?

     

    Well, there are some eminent functional programmers at Microsoft, including Eric Meijer and Wolfram Schulte, who designed the much anticipated Xen programming language together.  More to the point, Xen (rumors are that it’s called X omega now) incorporates a lot of cool ideas from functional programming to C# in order to support seamless programming of XML, SQL, and objects.

     

    The next version of C# (the one in Whidbey) was also influenced by functional programming.  One example is the addition of anonymous methods.

     

    And, the most widely used Haskell compiler is called GHC, which is freely available and supported (among others) by researchers at Microsoft Research, Cambridge.

  • Another entry to the phraser programming contest

    I received another entry to my programming contest in email:

     

    From: "Michael Scholz"

    Sent: Thursday, April 01, 2004 5:34 PM

     

    I started working on your programming contest, but I quickly tired of optimizing for how-clear-the-code-is. But then, when you posted some clear code with the slow algorithm, I figured I'd finish what I'd started; to at least provide an algorithmic alternative. Fixing up the performance by pre-allocating buffers, etc, would arguably be less clear. So I'll just submit a first draft (i.e. sans comments). And final draft, actually, I should go back to my regularly scheduled coding... :)

     

    -Michael Scholz

    THQ, Inc.

     

    p.s. I'm kind of learn as I go on STL, so probably the blog-o-sphere would have much feedback about STL correctness. But at least I learned something new! (equal_range())

     

    I’m including the source code at the end of the message.  Basically, this one is a 120-line C++ program.  Unlike the previous entries, it uses recursion rather than pure iteration.  I actually like this better.  What do you think?

     

    This solution is close to mine, which I shall disclose fairly soon (I’m writing up the docs).  FYI, my program is in yet another language. :-)

     

    #include <assert.h>

    #include <fstream>

    #include <map>

    #include <set>

     

    // why did I have to do this? Couldn't a simple std::string have compiled correctly?

    class stlstring : public std::string

    {    

    public:

          stlstring() : std::string()

          {}

          stlstring(const std::string& arg) : std::string(arg)

          {}

          std::string& operator=(const std::string& _Right)

          {

                return (this->std::string::operator=(_Right));

          }

          bool operator<(const stlstring& _Right) const

          {

                return (this->compare(_Right) < 0);

          }

    };

    typedef std::multimap<stlstring,stlstring> tWordList;

    typedef std::pair<stlstring,stlstring> tWordValue;

    typedef std::set<stlstring> tStringSet;

    char GetKeypadMapping(char characterToMap)

    {

          if (characterToMap >= 'a' && characterToMap <= 'c')

                return '2';

          else if (characterToMap >= 'd' && characterToMap <= 'f')

                return '3';

          else if (characterToMap >= 'g' && characterToMap <= 'i')

                return '4';

          else if (characterToMap >= 'j' && characterToMap <= 'l')

                return '5';

          else if (characterToMap >= 'm' && characterToMap <= 'o')

                return '6';

          else if (characterToMap >= 'p' && characterToMap <= 's')

                return '7';

          else if (characterToMap >= 't' && characterToMap <= 'v')

                return '8';

          else if (characterToMap >= 'w' && characterToMap <= 'z')

                return '9';

         

          assert(0); // input wasn't valid, Although I'm putting a filter on the wordlist to guarantee it.

        return '0';

    }

     

    void InitWordList(tWordList& rWordList, const char* pszFileName)

    {

          const int MAX_STRING_LENGTH =48;

          char tempbuffer[MAX_STRING_LENGTH];

          std::ifstream iffile( pszFileName );

          tWordValue tempval;

          if( iffile )

          {

                while ( iffile.good() ) // EOF or failure stops the reading

                {

                      iffile.getline(tempbuffer,MAX_STRING_LENGTH);

                      if (*tempbuffer)

                      {

                            tempval.first = tempbuffer;

                            tempval.second = tempbuffer;

                    for (unsigned int mapIndex = 0;mapIndex<tempval.second.length();mapIndex++)

                            {

                                  tempval.first[mapIndex] = GetKeypadMapping(tolower(tempval.second[mapIndex]));

                            }

                            rWordList.insert(tempval);

                      }

                }

          }

          else {

                // cout << "ERROR: Cannot open file 'pszFileName'." << endl;

          }

    }

    void FindMatchesRecursive(tStringSet& destSet,const tWordList& rWordList, const stlstring& input,const stlstring& prefix)

    {

          stlstring saveString = prefix;

          saveString.append(input);

          destSet.insert(saveString);

     

          size_t inputStringLength = input.length();

          for (unsigned int i=0;inputStringLength && i<=(inputStringLength);i++)

          {

                for (unsigned int j=0;j<=i;j++)

                {

                      stlstring prefixString = prefix;

                      prefixString.append(input.substr(0,j));

                      stlstring testString = input.substr(j,i);

                      stlstring remainderString = "";

                      if (j+i < inputStringLength)

                            remainderString = input.substr(j+i,inputStringLength-j-i);

     

                      std::pair<tWordList::const_iterator,tWordList::const_iterator> rangeMatch=rWordList.equal_range(testString);

                      for (tWordList::const_iterator it = rangeMatch.first;it!=rangeMatch.second;++it)

                      {

                            saveString = prefixString;

                            saveString.append(it->second);

                            FindMatchesRecursive(destSet,rWordList,remainderString,saveString);

                      }

                }

          }

    }

     

    void main(void)

    {

          tWordList wordlistMap;

          tStringSet answer;

          InitWordList(wordlistMap,"wordlist.txt");

          stlstring inputString = "6423946369";

          answer.insert(inputString);

          stlstring nullString = "";

          FindMatchesRecursive(answer,wordlistMap,inputString,nullString);

     

          for(tStringSet::iterator it = answer.begin();it!=answer.end();it++)

          {

                char buff[20];

                sprintf(buff, "%s\n", it->c_str());

                OutputDebugString(buff);

          }

    }

     

  • Entries to my programming challenge

    Calvin submitted the first entry to my programming challenge.  And (surprise!), it’s in Visual Fox Pro.  I didn’t know Fox Pro can do that!  It’s always nice to see unconventional ways to solve a problem.

     

    Calvin’s post includes the source code, comments, and some anecdotes.  I suggest you check it out.

     

    After studying the code and comments, my conclusion is that his algorithm is a brute-force one.  In summary, it:

     

    1. generates all possible ways to represent the phone number using a sequence of English letters (not necessarily words) and decimal digits, (e.g. “23hj6p8x” is a way to represent “23456789”)
    2. for each sequence from step 1, generates all possible ways to divide it into sections (by inserting dashes into the sequence, e.g. “23h-j-6p8-x”), and
    3. filters out the entries from step 2 that contains illegitimate words (thus “23h-j-6p8-x” is out since “23h” is not a valid word).

     

    This idea is a sound one, i.e. it does give all the result we want.  However, I wish it could be more efficient, although efficiency is not the main concern for this competition.  (Calvin mentioned he also has an optimized version but didn’t post that one.)

     

    Now let’s come to the clarity.  The code is about 100 lines of Fox Pro.  I managed to understand it with the comments, even though I never programmed in Fox Pro before (You should be able to translate this code into any other procedural language like C++ or VB with little trouble).  The EnumNum()procedure is still a little too big to my taste, and could benefit from refactoring.

     

    You cannot talk about the clarity of the code without talking about the clarity of the algorithm and data structure.  I saw many programmers trying hard to improve their code, while what needs to be improved is the algorithm.  The leap from optimization-in-the-small to optimization-in-the-large can often make surprising difference.

     

    For this contest, I believe there are algorithms that are both simpler and more efficient.  Let’s work harder!

     

     

    After writing the above, I got another entry by Justin Rogers.  This one is about 140 lines of C#, and has basically no comments.  In Justin’s own words:

     

    “I made some changes to the algorithm by cutting out single character responses.  I don't bother placing the dashes in since I don't think they are uber important, but I could put that feature back in if necessary.

     

    I'm using the cheapo dictionary for now, and I've done 0 optimizations. The only true optimization is in the switch table.

     

    One thing to note. You can really optimize the algorithm in the case that a 1 or 2 somehow bisects the numbers. When you do this, you only have to process two substrings, reinserting the splitting character where necessary. This can result in a huge performance gain. However, a good deal of the numbers you would process don't contain a 1 or a 0, and so the perf gain is lost.

     

    And my algorithm isn't properly convergent either, so I'm not yet completed.”

     

    Justin, I don’t know what you mean by “isn’t properly convergent”.  Literally I think that means the program does not terminate (oh, BTW, in this case you don’t really have an algorithm.  An algorithm by definition must terminate.).  I’m sure you can fix that though.

     

    I also spotted some suspicious lines that could a bug.  Nice try.  But please make sure you code runs when submitting.  Some sample results can help convince us (and yourself) of that.

     

    The competition is getting more interesting.  Who’ll be the next?

     

     

    <4/1/04, 4pm: edited to add Justin’s entry>

  • An interesting read about interviewing at Microsoft

    JobsBlog posted this interesting article on the internals of a Microsoft interview.
  • Programming Challenge: Phraser

    Want to prove that you are the best programmer money can buy?  (OK, I know you are not for sale, but your boss may need a friendly reminder that it's time for your next big raise.)  Here's your chance:

     

    On a telephone keypad, the number keys 2 -- 9 also carry letters on them (see the picture below).  Your task is to write a program that finds all possible ways to phrase a phone number.  By "phrase" I mean using a mixture of numbers and English words to spell out the number.  For example, the number 642-394-6369 can be phrased as nice-window, nice-wind-ox, nice-win-fox, nice-9-in-fox, etc.

     

     

    Your input includes a list of English words, as well as the phone number as a sequence of digits.  For testing your program, you can use a basic English word list I found here.

     

    The merit of an entry will be judged mainly by how clear the code is, not necessarily the efficiency.  You can choose your favorite programming language, as long as it gets the job done (If it's not a widely known language, you probably want to make sure the code is self-documenting, so other people can understand why your code is so cool).

     

    I will post my own solution in several days.  (In case you are wondering, it is not in C++.)

     

    Enjoy!

  • Comments on an Answer to a Clearest Code Challenge

    Jaybaz_MS posted this Clearest Code Challenge, and later provided his own answer.  I wrote some comments (and more) on it, which you might want to check out.

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker