Welcome to MSDN Blogs Sign in | Join | Help

Performance Quiz #6 -- Chinese/English Dictionary reader

Raymond Chen is running a series of articles about how to build and optimize the startup time of a Chinese/English dictionary.

Actually truth be told I got a look at his article quite some time ago as he was kind enough to ask me for comments well in advance.  At the time I couldn't resist doing a managed version of the same program to see how it would do.  So I encourage you to watch as Raymond works through various steps optimizing his program and see how it comes along. 

This managed code is a line for line conversion in the dumbest possible way of his initial program with no attempt whatsoever to optimize anything.

And then, the question of the hour:  How does Raymond's program fare vs. the equivalent managed code below?

Feel free to comment on the code, the problem, or just the unfairness of it all but please don't accuse me of concluding too much from the result of just this one benchmark :) :)

using System;
using System.IO;
using System.Text;
using System.Collections;

namespace NS
{
   class Test
    {
        [System.Runtime.InteropServices.DllImport("Kernel32.dll")]
        private static extern bool QueryPerformanceCounter(out long lpPerformanceCount);

        [System.Runtime.InteropServices.DllImport("Kernel32.dll")]
        private static extern bool QueryPerformanceFrequency(out long lpFrequency);

        static void Main(string[] args)
        {
            long startTime, endTime, freq;

            QueryPerformanceFrequency(out freq);
            QueryPerformanceCounter(out startTime);

            Dictionary dict = new Dictionary();        

            QueryPerformanceCounter(out endTime);

            Console.WriteLine("Length: {0}", dict.Length());
            Console.WriteLine("frequency: {0:n0}", freq);
            Console.WriteLine("time: {0:n5}s", (endTime - startTime)/(double)freq);
        }

        class DictionaryEntry
        {
            private string trad;
            private string pinyin;
            private string english;

            static public DictionaryEntry Parse(string line)
            {
                DictionaryEntry de = new DictionaryEntry();
               
                int start = 0;
                int end = line.IndexOf(' ', start);
               
                if (end == -1) return null;
                de.trad = line.Substring(start, end - start);
               
                start = line.IndexOf('[', end);
                if (start == -1) return null;
               
                end = line.IndexOf(']', ++start);
               
                if (end == -1) return null;
               
                de.pinyin = line.Substring(start, end - start);

                start = line.IndexOf('/', end);
               
                if (start == -1) return null;
                start++;
               
                end = line.LastIndexOf('/');
                if (end == -1) return null;
                if (end <= start) return null;
               
                de.english = line.Substring(start, end-start);

                return de;
            }
        };

        class Dictionary
        {
            ArrayList dict;
           
            public Dictionary()
            {
                StreamReader src = new StreamReader(
                   
"cedict.b5", 
                    
System.Text.Encoding.GetEncoding(950));
                string s;
                DictionaryEntry de;
                dict = new ArrayList();

                while ((s = src.ReadLine()) != null)
                {
                    if (s.Length > 0 && s[0] != '#') {
                        if (null != (de = DictionaryEntry.Parse(s))) {
                            dict.Add(de);
                        }
                    }
                }
            }

            public int Length() { return dict.Count; }      
        };        
    }
}

Published Tuesday, May 10, 2005 4:22 PM by ricom
Filed under: ,

Comments

# A better question -- what is the performance, Everett vs. Whidbey?

Rico Mariani decided to try a managed version of the dictionary I talked about earlier today. According to Rico...
Tuesday, May 10, 2005 6:58 PM by Michael Kaplan

# No Chinese Dictionaries

I want to go on the record and note that I will not be deveoping a Chinese/English Dictionary, in unmanaged...
Tuesday, May 10, 2005 10:37 PM by mgrier's WebLog

# Loading the dictionary, part 2: Character conversion

Converting the file as we read it is taking a lot of time.
Wednesday, May 11, 2005 8:56 AM by The Old New Thing

# Performance Quiz #6 -- Looking at the second cut

Stefang jumped into the fray with his analysis in the comments from my last posting.&amp;nbsp; Thank you...
Thursday, May 12, 2005 4:43 PM by Rico Mariani's Performance Tidbits

# Optimizacao de codigo Win32 e Managed

Friday, May 13, 2005 7:05 AM by Programando .NET

# Managed versus unmanaged perf wars

Saturday, May 14, 2005 12:40 AM by `(joe (@ (version "2.0")) ,(mk-blog))

# Optimizing managed C# vs. native C++ code

Raymond Chen (aka &quot;fixed more Windows bugs than you've had hot dinners&quot;) and Rico Mariani (aka &quot;Mr .NET...
Friday, May 20, 2005 7:43 PM by Jonathan Hardwick

# .NET Efficiency

So I was reading through one of my favorite MSDN blogs (http://blogs.msdn.com/oldnewthing/)

And he...
Tuesday, May 31, 2005 9:26 AM by Alan's Corner

# Just because I don't write about .NET doesn't mean that I don't like it

I'm just not an expert.
Monday, July 31, 2006 10:00 AM by The Old New Thing

# H??gerkonspiration &raquo; Genier i arbete: C# vs C++

Wednesday, August 02, 2006 7:11 AM by H??gerkonspiration » Genier i arbete: C# vs C++

# links for 2006-11-30

&bull; Closures and Continuations / c# .net continuations Continuations in their full glory capture more

Thursday, November 30, 2006 5:35 AM by Impersonation Failure

# mikelehen: Rico and Raymond

Thursday, December 28, 2006 8:14 PM by mikelehen: Rico and Raymond

# Rico and Raymond by mikelehen () | LjSEEK.COM

Thursday, December 28, 2006 8:14 PM by Rico and Raymond by mikelehen () | LjSEEK.COM

# Console.ReadLine() :: C++ is ugly after you&#8217;ve been doing C# for a while

# Performance Quiz #6 -- Looking at the third cut

The fun continues as today we look at Raymond's third improvement . Raymond starts using some pretty

Tuesday, January 23, 2007 9:30 PM by Rico Mariani's Performance Tidbits

# Performance Quiz #6 -- Looking at the second cut

Stefang jumped into the fray with his analysis in the comments from my last posting . Thank you Stefang.

Tuesday, January 23, 2007 9:31 PM by Rico Mariani's Performance Tidbits
New Comments to this post are disabled
 
Page view tracker