The Random Facts of Coding (or so it seems)

Fernando Vicaria

  • Encapsulation 101

    After seeing this sort of issue come up in many projects in the shape of hard to spot bugs I decided to quickly write a small entry as a sort of OOP Commandment.

    Thou shalt not return references to internal fields!

    If you took the trouble of hiding an internal field making it private or read-only don't send it all to waste by handling it to the first caller of your class method.

    A simple example of what I am talking about is as follows:

     

     

    using System;

    using System.Collections.Generic;

    using System.Text;

     

    public class MyObject

    {

        string m_Name;

       

        public string Name

        {

            get { return m_Name; }

            set { m_Name = value; }

        }

    }

     

    public class MyObjectHolder

    {

        private MyObject m;

     

        public MyObjectHolder()

        {

            m = new MyObject();

            m.Name = "MyObject";

        }

     

        public MyObject GetMyObject()

        {

            return m;

        }

     

        public string GetName()

        {

            return m.Name;

        }

    }

     

    class Program

    {

        static void Main(string[] args)

        {

            MyObjectHolder h = new MyObjectHolder();

            Console.WriteLine(h.GetName());

           

            // The private member now is not private anymore...

            MyObject m = h.GetMyObject();

            // Any changes in m will affect the original internal field of h.

            m.Name = "SomeOtherName";

          

            Console.WriteLine(h.GetName());

        }

    }

     

     

    There are at least three options for this simple case. One is to turn the internal field into a ValueType, a struct in this case will fullfil the objective of holding the data.

    The second would be create a temporary local variable inside GetName and assign it the same values for every field. The easiest way to get that is to make your object implement ICloneable (with a deep copy implementation).

    The third option is to return an interface to the object with limited capabilities.

    Here is a possible implementation:

     

     

    public interface IMyObject

    {

        string GetName();

    }

     

    public class MyObject: IMyObject

    {

        string m_Name;

       

        public string Name

        {

            get { return m_Name; }

            set { m_Name = value; }

        }

     

        public string GetName()

        {

            return m_Name;

        }

    }

     

    public class MyObjectHolder

    {

        private MyObject m;

     

        public MyObjectHolder()

        {

            m = new MyObject();

            m.Name = "MyObject";

        }

     

        public IMyObject GetMyObject()

        {

            return m;

        }

     

        public string GetName()

        {

            return m.Name;

        }

    }

     

    class Program

    {

        static void Main(string[] args)

        {

            MyObjectHolder h = new MyObjectHolder();

            Console.WriteLine(h.GetName());

           

            // The private member now is not private anymore...

            IMyObject m = h.GetMyObject();

            // Any changes in m will affect the original internal field.

            //m.Name = "SomeOtherName"; <-- error : does not contain Name.

          

            Console.WriteLine(h.GetName());

        }

    }

     

     

    Although String is also a reference type it is also an immutable type so it’s ok to return a direct reference to the internal string field. Any attempt to modify it will create a new string.

    A caveat of the interface solution is that the client of your class can still guess the real type of the interface reference returned and force a cast. But then again if a developer goes through such an effort he deserves what he gets.

    That’s it. I hope that helps you to avoid a couple of bugs here and there.

  • Timing out a request to start a process...

    This is another post to quickly cover an question that I saw recently on one of our usergroups...

    How do we create a process and "wait" for it begin running for a finite amount of time and if the process is not up and running until then timeout the request? I used the word wait in quotes because what the user really meant was that he makes the request in a separate thread while the application's main thread was still going on with its business. Of course there are at least half a dozen ways to do that but I decide to use the System.Threading.Timer class for this example.

    The code below ilustrates this example:

    using System;
    using System.Diagnostics;
    using System.Threading;

    /// <summary>
    /// Time-out process if it's not started or responding
    /// </summary>

    public class TimeOutProc
    {
        static int TIME_OUT = 2000// in milliseconds

        static public int Main()
        {
            // Comment out one of the next two lines...
            //Process p = new Process();

            Process p = Process.Start("notepad");

            // Use Timeout.Infinite for the period to guarantee that
            // the timer will fire only once.

            Timer timer = new Timer(
                new TimerCallback(TimeOut), p, TIME_OUT, Timeout.Infinite);

            // This is just a trick so you can see the timer firing,
            // in a real-life app you should not need this.
            Console.WriteLine("Press any key to finish...");
            Console.WriteLine();
            Console.ReadLine();

            // Get rid of timer when you're done.
            timer.Dispose();
            return 100;
        }

        // This is the callback method passed to the timer.
        static void TimeOut(object state)
        {
            try
            {          
                // Check if process has started and has a pid.
                // If not this will throw a InvalidOperationException.

                if (((Process)state).Id >= 0)
                {
                    Console.WriteLine("Process has started...");

                    // Just for fun...
                    Console.WriteLine("Process is working OK. Kill it now!");
                    Thread.Sleep(2000);
                    ((Process)state).Kill();

                    // You could also check if process is not responding
                    // instead...

                    if (!((Process)state).Responding)
                    {
                        Console.WriteLine("Process not responding...");
                        ((Process)state).Kill();
                   
                        // Optional: Make sure you have no dead object here.
                        state = null;
                    }
                }
            }
            catch (System.InvalidOperationException)
            {
                Console.WriteLine("Process not ready...");
                // Optional: Make sure you have no dead object here.
                state = null;
            }
        }
    }

    As you can see I have added a call to Console.ReadLine() so that we could see the callback function firing. Another point to note is that we only want it to fire once and to accomplish that we pass Timeout.Infinite as the period parameter. We can test if the process has started by checking if it has a pid, if it does not what you get is an InvalidOperationException so we have to cover for this case.

    Even if the process has a pid it does not mean that it's in a healthy state, it could be not responding for example. In that case we can just kill the process if we think that it's not the expected behavior after the timeout.

  • How long will it wait?

    This posting will touch once again the Process class and the WaitForInputIdle method.

    This method contains 2 overloads, one that takes no parameters and another one that will take an int as its single parameter. The confusion will start exactly here with what these two overloads do differently. The first one will wait indefinitely for the process to enter an idle state locking your app until it returns when the process in question has become idle.

    Inside the process class these two overloads map to only one call to the Windows API WaitForInputIdle located in User32.lib. The signature of this API is as follows:

    DWORD WaitForInputIdle(
      HANDLE hProcess,
      DWORD dwMilliseconds
    );

    where hProcess is the handle to the process main window and dwMilliseconds the time-out interval, in milliseconds, we want to wait for.

    When calling the parameterless version of the managed method what we are really doing is passing Int32.MaxValue as the parameter for the other one. In this case the value of this constant is 2,147,483,647; that is, hexadecimal 0x7FFFFFFF, which for all effects mean wait indefinitely. Now back to the unmanaged world there was a more explicit way to tell the API to wait forever; We would simply use INFINITE, which in windows.h maps to 0xfffffffh (or -1 if you prefer).

    Another special value that we can use when calling the WaitForInputIdle(int) method in the process class is 0 (ZERO). This will cause the method to return true or false immediately.

    So as a summary when calling Process.WaitForInputIdle(int) we have:

    • -1 = wait indefinitely (any other negative number will be read as a DWORD)
    •  0 = return immediately
    • 1 to Int32.MaxValue = wait for the specified amount of time (in milliseconds)
  • Pseudo-processes...

    The .NET Framework Process class let's you access various aspects (or properties) of a system process. Among these properties are things like the process id (or pid), the process name and the modules (.dll or .exe) it loads. 

    This blog entry, my first as a member of the BCL Team, will briefly talk about some special processes and how they are viewed by the Process class.

    How many times have you tried to delete or overwrite a file only to find out that you couldn't because some other process had it loaded? What you usually get is a message such as "The process cannot access the file because it is being used by another process" or similar. A quick way to identify who is using the particular exe or dll is to enumerate the modules loaded by each running process on your machine. The code below shows just how to do that.

    /// <summary>
    /// Simple tool to find out which process have loaded a particular module.
    /// </summary>
    public class LMod
    {    
        // "System Idle Process" pid
        static int IdleProcessID = 0;

        // "System" pid
        static int SystemProcessID
        {
            get
            {
                //Is older than XP...
                if (Environment.OSVersion.Version.Major < 5 ||
                    (Environment.OSVersion.Version.Major == 5 &&
                         Environment.OSVersion.Version.Minor == 0))
                    return 8;
                else
                    return 4;
            }
        }

        public static int Main(string[] args)
        {
            int total = 0;
            string m_ModuleName = "";

            if (args.Length == 1)
                m_ModuleName = args[0];
            else
            {
                // wrong number of parameters...
                Console.WriteLine("Usage: LMod module_name");
                return 1;
            }

            // Get all running processes on the machine...
            Process[] m_arrSysProcesses = Process.GetProcesses();
            for (int i = 0; i < m_arrSysProcesses.Length; i++)
            {
                try
                {
                    ProcessModuleCollection modules = m_arrSysProcesses[i].Modules;
                    int nCount = modules.Count;
     
                    if (nCount > 0)
                    {
                        for (int j = 0; j < nCount; j++)
                        {
                            // Is it the module we are looking for?
                            if (modules[j].ModuleName == m_ModuleName)
                            {
                                Console.WriteLine("-------------------");
                                Console.WriteLine("Process Name: "
                                     + m_arrSysProcesses[i].ProcessName);
                                Console.WriteLine("Process ID : "
                                     + m_arrSysProcesses[i].Id);
                                Console.WriteLine("Priority : "
                                     + m_arrSysProcesses[i].BasePriority);
                                Console.WriteLine("Memory Usage: "
                                     + (m_arrSysProcesses[i].WorkingSet64 / 1024) + " Kb");
                                Console.WriteLine();

                                total++;
                                break;
                            }
                        }
                    }
                }
                catch (Exception e)
                {
                    // System Idle Process (Idle): represents pseudo-process  
                    // that represents all the processor time not used by
                    // other processes.
                    // System (System): represents the processor time

                    // used by the kernel itself.
                    if (m_arrSysProcesses[i].Id != SystemProcessID
                         && m_arrSysProcesses[i].Id != IdleProcessID)
                    {
                        Console.WriteLine("Error: Process "
                             + m_arrSysProcesses[i].ProcessName
                             + " (" + m_arrSysProcesses[i].Id + ") failed!");
                        Console.WriteLine(e);
                        return 2;
                    }
                }
            }

            Console.WriteLine();
            Console.WriteLine("There are " + total
                 + " processes using module " + m_ModuleName);

            return 100;
        }
    }

    It's up to you what to do when you get the list of processes back.

    A simple output for this program when passed, for example, the mscoree.dll module in my dev machine is:

    C:\Temp\Process>Lmod.exe mscoree.dll

    ================================
    Process Name: iexplore
    Process ID : 2496
    Priority : 8
    Memory Usage: 46036 Kb
    ================================
    Process Name: explorer
    Process ID : 2628
    Priority : 8
    Memory Usage: 26408 Kb
    ================================
    Process Name: LMod
    Process ID : 3428
    Priority : 8
    Memory Usage: 12140 Kb
    ================================
    Process Name: aspnet_wp
    Process ID : 1456
    Priority : 8
    Memory Usage: 58792 Kb

    There are 4 processes using module mscoree.dll

    The main point in this exercise is to note that we had to identify two special processes while iterating through the processes currently running on the machine. These processes are the System Idle Process and the System process. They are really not real processes in the true sense of the word but only pseudo processes.

    • System Idle Process -  This "process" is really a counter which is displayed in the Windows Task Manager (taskmgr) used for measuring how much idle time the CPU is having at any particular time. This counter will display how much CPU Resources, as a percentage are 'idle' and available for use.
    • System - This "process", as with the system idle process, accounts for the time used by the kernel itself.

    A simple and quick way to identify these process is via their pid. As you can see in the code above the Idle process has always a pid equal to 0. On the other hand the pid for the System process will vary according to the OS you are using. For a machine with an OS older than Windows XP that value is 8, for XP and newer the pid will be 4.

    Previous to Whidbey Beta 2, Process.Modules would return an empty ModuleInfo array. Now if you query any of these two special processes for their loaded modules you will get a System.Win32Exception. This behavior was concluded to be more meaningful to the user of the class.

    Another point that is worthy mentioning is that in the next version of Windows, or should I say Vista, the Idle System Process will not be present in the Task Manager anymore.

  • What is a Random Number?

    Just so we can set the tone of what I have planned for this blog I decided to start with a very quick definition and then let you think while I prepare the next one...

     

    A random number can not be predicted in advance. Thus, we can define only what a random number is not, not what it is. Random numbers can be produced by physical processes, such as throwing a dice, flipping a coin or counting intervals between radioactive decay events. By itself, software can't generate truly random numbers; instead, it creates what are called pseudo random numbers, starting from a single random seed.

     

    To generate a random sequence, we start with a certain seed, and then we iteratively apply some mathematical transformation to it (see Listing 1 below), progressively extracting a random sequence. A sequence is considered random when no prediction can be made about it and no simple description can be found. But from the simple fact that we generate this sequence from a definite transformation, then a description of it will always exist.

     

     

    long Seed = a;

     

    long  _Random()

    {  

        Seed = Seed * b + c;

        return Seed >> d;

    }

     

    Listing 1: A simple implementation for the LCRNG algorithm.

     

    The values of a, b, c and d are fixed constants.

    The formula is simple and deterministic and will always result in the same sequence of numbers as long as you don’t modify the values of the constants.

     

    The second line in the body of the function is optional and will only shift the result by a specified number of bits to narrow the result to a fixed size. For instance you could shift the result by 56 bits to force the values returned to always be between 0 and 255 (or use anyother type with different sizes for that matter).

     

    This type of algorithm is known as Linear Congruential Random Number Generator or LCRNG for short. It consists of three or more fixed constants, usually large prime numbers, which have certain mathematical properties.

     

    Bad Seeds

     

    In a general way, a bad seed is one that can be guessed after a relatively small number of tries or in a short interval of time. If a hacker (or an Adversary, to use a cryptography term) can predict the value of the constants used in your random algorithm, for instance the values of a, b, c and d from Listing 1, the seed can be computed and the whole scheme is compromised.

     

    A typical example of this is some earlier implementations of SSL key generation in web browsers. They would encrypt the message in a large sequence of random numbers.

    Some of those seeds generators would use the time of the day as a means of obtaining the values for the constants used in their algorithm.

    The basic algorithm can usually be reversed-engineered using a disassembler and the time of the day can be known to within a certain precision often better than a second, which means that there are only one million possible choices for the microsecond part (and often a lot less because the clocks on most computer systems do not have a true microsecond resolution). See Table 1 for the resolution of some Windows API functions.

     

    Function

    Unit

    Resolution

    Now, Time, Timer

    seconds

    1 second

    GetTickCount

    milliseconds

    10 ms

    TimeGetTime

    milliseconds

    10 ms

    QueryPerformanceCounter

    milliseconds

    < 1 ms

    Table 1: Different time accuracy offered by Windows.

     

    The values shown in Table 1 do not take into account the overheads of calling those functions. For example, it takes 5 to 10us (micro seconds) to call QueryPerformanceCounter, because it has to do several port I/O instructions to read the computer’s clock counter.

     

    Note: If you decide to use QueryPerformanceCounter remember that it doesn't run at the same rate on all machines,

     

    Another good example of a flawed seed generator algorithm is one that uses the process id (or PID) of an application. You should never consider PID’s a secret. These PID’s are usually present in message packets from the system.

     

    You can see that generating a good random number is very hard, and as Donald Knuth once said, “A random number generator should not be chosen at random”. A good random process is not just a complicated or an obscure procedure. Another famous quote says “compromise of the system should not inconvenience the correspondents”. That means the system should remain secure even if all the details about its inner workings, down to the actual source code, are known to all.

     

    Entropy = Randomness

     

    In general the importance of generating a good sequence of random numbers is a bit more relaxed when we are working with mathematical or scientific applications given that we only need to fool our simulations or modeling programs into thinking that true random numbers have been used. This approach is not possible (or at least not wise) when writing security applications (to fool a strong-willed hacker or the NSA you need to do a lot better than simply shuffling your numbers).

     

    Pseudo random numbers as the name itself implies are not really random. They just look random. As we saw earlier they are pseudo (or knowable) because they come from a mathematical function.

     

    To the increase the randomness of our selected numbers we should collect as much entropy as possible before using a system. Anything that is determined by external factors can be and should be used as input, such as the time between keystrokes, the timing of disk interrupts, number of network packages arrived, and the like. On a multi-user system, the number of page faults, the number of disk read/writes, the time to wait until eligible to get the next time slice, and other such information that depends on the overall activity in a manner that is hard to predict or precisely control.

     

    Typically a cryptographically sound “message digest”, such as MD5 or SHA (the Secure Hash Algorithm) is computed over a entropy pool and used as your next random number or seed.

     

    It’s a good idea to have a tenth to a fifth as much entropy as the length of the key. For example if you are generating a 1024-bit key, it’s a good idea to have between 100 and 200 bits of entropy. More of course is always better but as you would expect it’s also slower. Sources of entropy usually carry a significant overhead.

     

    Note: By one bit of entropy I mean the equivalent of a true random event with two, and only two, possible outcomes (such as a coin-flip for example).

     

    There are many ways to get true random numbers. Some methods include making hardware devices that generate noise, observing cosmic ray flux, and observing light emissions from trapped mercury atoms. They're great in theory and in some sorts of practice, and there are some very high-quality random generator chips out there but most of the time we have to work with existing equipment.

     

    Here is a list of ways to get entropic numbers:

     

    • User input: A few examples are time between keystrokes, mouse movements etc. Users are among the most entropic (meaning unpredictable) things there are. The obvious drawbacks are that it takes time to get entropic responses from the user, and if you have no user (in the case of a server), this doesn't help. Nonetheless, if you do have one, you should use user inputs, as these are the hardest for the adversary to acquire or spoof.
    • Hardware: If you have no user, you have to use your hardware for a stream of entropy. This is annoying, but not impossible. Some methods you might use are: clock, hard disk, video etc. Most of these devices will give you a source of nearly true random data if you know how/where to get them.
    • Network: There are all sorts of things on a network that are unpredictable. The downside of these methods is that they are also accessible to other people as they are to you.

    Microsoft Cryptography API

     

    Starting from PIII processors Intel has added support for a Security Driver that provides software applications the ability to access the Firmware Hub's hardware Random Number Generator. This RNG is based on sampling the thermal noise in resistors. It uses SHA-1 as mixing function for the outputs. It also runs some FIPS 140-1 autotests.

     

    A number of Microsoft Operating Systems are currently supported by Intel and you can take advantage of this feature using the Microsoft’s Cryptography API.

     

    To start using this RNG in your programs you will have to first download the driver from Intel and then read the directions presented in the CryptoAPI documentation in the Platform SDK. 

     

    Some people consider this the ultimate solution. Others are afraid because all the entropy comes from one source (albeit a good one).

     

    Note: In the SDK you can find some good example on how to use the CryptoAPI.

     

    I will stop now and leave the testing of randomness for the next posting when we will look into makes a sequence of numbers a good or a bad apporximation of a "random" sequence.

     

    Further Reading

     

    • Using and Creating Cryptographic - Quality Random Numbers:

    www.merrymeet.com/jon/usingrandom.html
    • random.org:

    http://www.random.org/essay.html.
    • Generating a Truly Random Number by Leif Svalgaard:

    http://cobolreport.com/columnists/leif/
    • Use Query Performance Counterto Time Code:

    http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/Q172/3/38.asp&NoWebContent=1
    • The Cryptography API, or How to  Keep a Secret  by Robert Coleridge:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncapi/html/msdn_cryptapi.asp
    • Intel Security Driver:

    http://www.intel.com/design/software/drivers/platform/security.htm


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker