• The Old New Thing

    Customers not getting the widgets they paid for if they click too fast -or- In C#, the += operator is not merely not guaranteed to be atomic, it is guaranteed not to be atomic

    • 40 Comments
    In the C# language, operation/assignment such as += are explicitly not atomic. But you already knew this, at least for properties.

    Recall that properties are syntactic sugar for method calls. A property declaration

    string Name { get { ... } set { ... } }
    

    is internally converted to the equivalent of

    string get_Name() { ... }
    void set_Name(string value) { ... }
    

    Accessing a property is similarly transformed.

    // o.Name = "fred";
    o.put_Name("fred");
    
    // x = o.Name;
    x = o.get_Name();
    

    Note that the only operations you can provide for properties are get and set. There is no way of customizing any other operations, like +=. Therefore, if you write

    o.Name += ", Jr.";
    

    the compiler has no choice but to convert it to

    o.put_Name(o.get_Name() + ", Jr.");
    

    If all you have is a hammer, everything needs to be converted to a nail.

    Since the read and write are explicitly decoupled, there is naturally a race condition here. The underlying property may change value in between the time you read the old value and the time you write the new value.

    But there are extra subtleties here. Let's dig in.

    The rule for operators like += are spelled out in Section 14.3.2: Compound Assignment:

    [T]he operation is evaluated as x = x op y, except that x is evaluated only once.

    (There is some discussion of what "evaluated only once" means, but that's not important here.)

    The subtleties lurking in that one sentence are revealed when you see how that sentence interacts with other rules in the language.

    Now, you might say, "Sure, it's not atomic, but my program is single-threaded, so this should never affect me."

    Actually, you can get bitten by this even in single-threaded programs. Let's try it:

    class Program
    {
     static int x = 0;
    
     static int f()
     {
      x = x + 10;
      return 1;
     }
    
     public static void Main()
     {
      x += f();
      System.Console.WriteLine(x);
     }
    }
    

    What does this program print?

    You might naïvely think that it prints 11 because x is incremented by 1 by Main and incremented by 10 in f.

    But it actually prints 1.

    What happened here?

    Recall that C# uses strict left-to-right evaluation order. Therefore, the order of operations in the evaluation of x += f() is

    1. Rewrite as x = x + f().
    2. Evaluate both sides of the = operator, left to right.
      1. Left hand side of assignment: Find the variable x.
      2. Right hand side of assignment:
        1. Evaluate both sides of the + operator, left to right.
          1. Evaluate x.
          2. Evaluate f().
        2. Add together the results of steps 2b(i)1 and 2b(i)2.
    3. Take the result of step 2b(ii) and assign it to the variable x found in step 2a.

    The thing to notice is that a lot of things can happen between step 2b(i)1 (evaluating the old value of x), and step 3 (assigning the final result to x). Specifically, we shoved a whole function call in there: f().

    In our case, the function f() also modifies x. That modification takes place after we already captured the value of x in step 2b(i)1. When we get around to adding the values in step 2b(ii), we don't realize that the values are out of date.

    Let's step through this evaluation in our example.

    1. Rewrite as x = x + f().
    2. Evaluate both sides of the = operator, left to right.
      1. Left hand side of assignment: Find the variable x.
      2. Right hand side of assignment:
        1. Evaluate both sides of the + operator, left to right.
          1. Evaluate x. The result is 0.
          2. Evaluate f(). The result is 1. It also happens that x is modified as a side-effect.
        2. Add together the results of steps 2b(i)1 and 2b(i)2. In this case, 0 + 1 = 1.
    3. Take the result of step 2b and assign it to the variable x found in step 2a. In this case, assign 1 to x.

    The modification to x that took place in f was clobbered by the assignment operation that completed the += sequence. And this behavior is not just in some weird "undefined behavior" corner of the language specification. The language specification explicitly requires this behavior.

    Now, you might say, "Okay, I see your point, but this is clearly an unrealistic example, because nobody would write code this weird."

    Maybe you don't intentionally write code this weird, but you can do it accidentally. And this is particularly true if you are using the new await keyword, because an await means, "Hey, like, put my function on hold and do other stuff for a while. When the thing I'm awaiting is ready, then resume execution of my function." And that "do other stuff for a while" might change x.

    Suppose that you have a button in your application called Buy More. When the user clicks it, they can buy more widgets. Let's assume that the Buy­More­Async function return the number of items bought. (If the user cancels the purchase it returns zero.)

    // Suppose the user starts with 100 widgets.
    
    async void BuyMoreButton_OnClick()
    {
     TotalWidgets += await BuyMoreAsync();
    
     Inventory.Text = string.Format("You have {0} widgets.",
                                    TotalWidgets);
    }
    
    async Task<int> BuyMoreAsync()
    {
     int quantity = QuickPurchase.IsChecked ? 1
                                            : await GetQuantityAsync();
     if (quantity != 0) {
      if (await DebitAccountAsync(quantity * PricePerWidget)) {
       return quantity;
      }
     }
     return 0; // user bought no items
    }
    

    You receive a bug report that you track back to the fact that Total­Widgets does not match the number of widgets purchased. It affects only people who checked the quick purchase box, and only people purchasing from overseas.

    Here's what is going on.

    The user clicks the Buy More button, and they have Quick Purchase enabled. The Buy­More­Async function tries to debit the account for the price of one widget.

    While waiting for the server to process the transaction, the user gets impatient and clicks Buy More a second time. This triggers a second task to debit the account for the price of one widget.

    Okay, so you now have two tasks running, each processing one of the clicks. In theory, the worst case is that the user accidentally buys two widgets, but in practice...

    The first Debit­Account­Async task completes, and Buy­More­Async returns 1, which is then added to the value of Total­Widgets at the time the button was clicked, as we discussed above. At the time the button was clicked the first time, the number of widgets was 100, so the total number of widgets is now 101.

    The second Debit­Account­Async task completes, and Buy­More­Async returns 1, which is then added to the value of Total­Widgets at the time the button was clicked, as we discussed above. When the button was clicked the second time, the number of widgets was still 100. We set the total widget count to 100 + 1 = 101.

    Result: The user paid for two widgets but got only one.

    The fix for this is to explicitly move waiting for the purchase to complete outside of the compound assignment.

     int quantity = await BuyMoreAsync();
     TotalWidgets += quantity;
    

    Now, the await is outside the compound assignment so that the value of Total­Widgets is not captured prematurely. When the purchase completes, we update Total­Widgets without interruption from any async operations.

    (You probably also should fix the program so it disables the Buy More button while a transaction is in progress, to avoid the impatient user ends up making an accidental double purchase problem. The above fix merely gets rid of the user pays for two items and gets only one problem.)

    Like closing around the loop control variable, this is the sort of subtle change that should be well-commented so that somebody doesn't "fix" it in a well-intentioned but misguided attempt to remove unnecessary variables. The purpose of the variable is not to break an expression into two but rather to force a particular order of evaluation: You want to to finish the purchase operation before starting to update the widget count.

  • The Old New Thing

    Keep your eye on the code page: C# edition (the mysterious third code page)

    • 12 Comments

    A customer was having trouble manipulating the console from a C# program:

    We found that C# can read only ASCII data from the console. If we try to read non-ASCII data, we get garbage.

    using System;
    using System.Text;
    using System.Runtime.InteropServices;
    
    class Program
    {
      [StructLayout(LayoutKind.Sequential)]
      struct COORD
      {
        public short X;
        public short Y;
      }
    
      [DllImport("kernel32.dll", SetLastError=true)]
      static extern IntPtr GetStdHandle(int nStdHandle);
    
      const int STD_OUTPUT_HANDLE = -11;
    
      [DllImport("kernel32.dll", SetLastError=true)]
      static extern bool ReadConsoleOutputCharacter(
        IntPtr hConsoleOutput,
        [Out] StringBuilder lpCharacter,
        uint nLength,
        COORD dwReadCoord,
        out uint lpNumberOfCharsRead);
    
      public static void Main()
      {
        // Write a string to a fixed position
        System.Console.Clear();
        System.Console.WriteLine("\u00C5ngstr\u00f6m");
    
        // Read it back
        COORD coord  = new COORD { X = 0, Y = 0 };
        StringBuilder sb = new StringBuilder(8);
        uint nRead = 0;
        ReadConsoleOutputCharacter(GetStdHandle(STD_OUTPUT_HANDLE),
                                   sb, (uint)sb.Capacity, coord, out nRead);
        // Trim off any unused excess.
        sb.Remove((int)nRead, sb.Length - (int)nRead);
    
        // Show what we read
        System.Console.WriteLine(sb);
      }
    }
    

    Observe that this program is unable to read the Å and ö characters. They come back as garbage.

    Although there are three code pages that have special treatment in Windows, the CLR gives access to only two of them via Dll­Import.

    • CharSet.Ansi = CP_ACP
    • CharSet.Unicode = Unicode (which in Windows means UTF16-LE unless otherwise indicated).

    Unfortunately, the console traditionally uses the OEM code page.

    Since the Dll­Import did not specify a character set, the CLR defaults (unfortunately) to Char­Set.Ansi. Result: The Read­Console­Output­Character function stores its results in CP_OEM, the CLR treats the buffer as if it were CP_ACP, and the result is confusion.

    The narrow-minded fix is to try to fix the mojibake by taking the misconverted Unicode string, converting it to bytes via the ANSI code page, then converting the bytes to Unicode via the OEM code page.

    The better fix is simply to avoid the 8-bit code page issues entirely and say you want to use Unicode.

      [DllImport("kernel32.dll", SetLastError=true, CharSet=CharSet.Unicode)]
      static extern bool ReadConsoleOutputCharacter(...);
    
  • The Old New Thing

    Keep your eye on the code page: C# edition (warning about DllImport)

    • 19 Comments

    Often, we receive problem reports from customers who failed to keep their eye on the code page.

    Does the SH­Get­File­Info function support files with non-ASCII characters in their names? We find that the function either fails outright or returns question marks when asked to provide information for files with non-ASCII characters in their name.

    using System;
    using System.Runtime.InteropServices;
    
    class Program
    {
     static void Main(string[] args)
     {
      string fileName = "BgṍRồ.txt";
      Console.WriteLine("File exists? {0}", System.IO.File.Exists(fileName));
      // assumes extensions are hidden
    
      string expected = "BgṍRồ";
      Test(fileName, SHGFI_DISPLAYNAME, expected);
      Test(fileName, SHGFI_DISPLAYNAME | SHGFI_USEFILEATTRIBUTES, expected);
     }
    
     static void Test(string fileName, uint flags, string expected)
     {
      var actual = GetNameViaSHGFI(fileName, flags);
      Console.WriteLine("{0} == {1} ? {2}", actual, expected, actual == expected);
     }
    
     static string GetNameViaSHGFI(string fileName, uint flags)
     {
      SHFILEINFO sfi = new SHFILEINFO();
      if (SHGetFileInfo(fileName, 0, ref sfi, Marshal.SizeOf(sfi),
                        flags) != IntPtr.Zero) {
       return sfi.szDisplayName;
      } else {
       return null;
      }
     }
    
     [StructLayout(LayoutKind.Sequential)]
     struct SHFILEINFO {
      public IntPtr hIcon;
      public int iIcon;
      public uint dwAttributes;
      [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
      public string szDisplayName;
      [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 80)]
      public string szTypeName;
     }
    
     const uint SHGFI_USEFILEATTRIBUTES = 0x10;
     const uint SHGFI_DISPLAYNAME = 0x0200;
    
     [DllImport("shell32.dll")]
     static extern IntPtr SHGetFileInfo(
        string path, uint fileAttributes, ref SHFILEINFO info, int cbSize,
        uint flags);
    }
    // Output:
    // File exists? True
    //  == Bg?R? ? False
    // Bg?R? == Bg?R? ? False
    

    If we ask for the display name, the function fails even though the file does exist. If we also pass the SHGFI_USE­FILE­ATTRIBUTES flag to force the system to act as if the file existed, then it returns the file name but with question marks where the non-ASCII characters should be.

    The SH­Get­File­Info function supports non-ASCII characters just fine, provided you call the version that supports non-ASCII characters!

    The customer here fell into the trap of not keeping their eye on the code page. It goes back to an unfortunate choice of defaults in the System.Runtime.Interop­Services namespace: At the time the CLR was originally being developed, Windows operating systems derived from Windows 95 were still in common use, so the CLR folks decided to default to Char­Set.Ansi. This made sense back in the day, since it meant that your program ran the same on Windows 98 as it did in Windows NT. In the passage of time, the Windows 95 series of operating systems became obsolete, so the need to be compatible with it gradually disappeared. But too late. The rules were already set, and the default of Char­Set.Ansi could not be changed.

    The solution is to specify Char­Set.Unicode explicitly in the Struct­Layout and Dll­Import attributes.

    FxCop catches this error, flagging it as Specify­Marshaling­For­PInvoke­String­Arguments. The error explanation talks about the security risks of unmapped characters, which is all well and good, but it is looking too much at the specific issue and not so much at the big picture. As a result, people may ignore the issue because it is flagged as a complicated security issue, and they will think, "Eh, this is just my unit test, I'm not concerned about security here." However, the big picture is

    This is almost certainly an oversight on your part. You didn't really mean to disable Unicode support here.

    Change the lines

     [StructLayout(LayoutKind.Sequential)]
     [DllImport("shell32.dll")]
    

    to

     [StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
     [DllImport("shell32.dll", CharSet=CharSet.Unicode)]
    

    and re-run the program. This time, it prints

    File exists? True
    Bg?R? == Bg?R? ? True
    Bg?R? == Bg?R? ? True
    

    Note that you have to do the string comparison in the program because the console itself has a troubled history with Unicode. At this point, I will simply cue a Michael Kaplan rant and link to an article explaining how to ask nicely.

  • The Old New Thing

    Revival of the Daleks: Act One, Scene One

    • 2 Comments

    In 2009, a group of volunteers on a routine cleanup of a pond in Hampshire, England discovered a Dalek.

    (Later in the episode, the story may introduce a scientist who is thawing out a 30,000-year-old-virus.)

  • The Old New Thing

    If you want to set a thread's apartment model via Thread.CurrentThread.ApartmentState, you need to act quickly

    • 18 Comments

    Welcome to CLR Week 2014. Don't worry, it'll be over in a few days.

    A customer wanted to know why their Folder­Browser­Dialog was displaying the infamous Current thread must be set to single thread apartment (STA) mode before OLE calls can be made error.

    private void btnBrowseFolder_Click(object sender, System.EventArgs e)
    {
      Thread.CurrentThread.ApartmentState = ApartmentState.STA;
      FolderBrowserDialog fbd = new FolderBrowserDialog {
        RootFolder = System.Environment.SpecialFolder.MyComputer,
        ShowNewFolderButton = true,
        Description = "Select the awesome folder..."
      };
      DialogResult dr = fbd.ShowDialog();
      ...
    }
    

    "Even though we set the Apartment­State to STA, the apartment state is still MTA. Curiously, if we put the above code in a standalone test program, it works fine."

    The problem is that the customer is changing the apartment state too late.

    On the first call to unmanaged code, the runtime calls Co­Initialize­Ex to initialize the COM apartment as either an MTA or an STA apartment. You can control the type of apartment created by setting the System.Threading.ApartmentState property on the thread to MTA, STA, or Unknown.

    Notice that the value you specify in Current­Thread.Apartment­State is consulted at the point the runtime initializes the COM apartment (which occurs on the first call to unmanaged code). If you change it after the COM apartment has been initialized, you're revising the blueprints of a house after it has been built.

    The standard way to avoid this problem is to attach the [STAThread] attribute to your Main function, or if you need to set the apartment model of a thread you created yourself, call the Thread.Set­Apartment­State method before the thread starts.

  • The Old New Thing

    The case of the orphaned LpdrLoaderLock critical section

    • 11 Comments

    A customer found that under heavy load, their application would occasionally hang. Debugging the hang shows that everybody was waiting for the Lpdr­Loader­Lock critical section. The catch was that the critical section was reported as locked, but the owner thread did not exist.

    0:000> !critsec ntdll!LdrpLoaderLock
    
    CritSec ntdll!LdrpLoaderLock+0 at 7758c0a0
    WaiterWoken        No
    LockCount          4
    RecursionCount     2
    OwningThread       150c
    EntryCount         0
    ContentionCount    4
    *** Locked
    
    0:000> ~~[150c]k
                  ^ Illegal thread error in '~~[150c]k'
    

    How can a critical section be owned by thread that no longer exists?

    There are two ways this can happen. One is that there is a bug in the code that manages the critical section such that there is some code path that takes the critical section but forgets to release it. This is unlikely to be the case for the loader lock (since a lot of really smart people are in charge of the loader lock), but it's a theoretical possibility. We'll keep that one in our pocket.

    Another possibility is that the code to exit the critical section never got a chance to run. For example, somebody may have thrown an exception across the stack frames which manage the critical section, or somebody may have tried to exit the thread from inside the critical section, or (gasp) somebody may have called Terminate­Thread to nuke the thread from orbit.

    I suggested that the Terminate­Thread theory was a good one to start with, because even if it wasn't the case, the investigation should go quickly because the light is better. You're not so much interested in finding it as you are in ruling it out quickly.¹

    The customer replied, "We had one call to Terminate­Thread in our application, and we removed it, but the problem still occurs. Is it worth also checking the source code of the DLLs our application links to?"

    Okay, the fact that they found one at all means that there's a good chance others are lurking.

    Before we could say, "Yes, please continue your search," the customer followed up. "We found a call to Terminate­Thread in a component provided by another company. The code created a worker thread, and decided to clean up the worker thread by terminating it. We commented out the call just as a test, and it seems to fix the problem. So at least now we know where the problem is and we can try to fix it properly."

    ¹ In the medical profession, there's the term ROMI, which stands for rule out myocardial infarction. It says that if a patient comes to you with anything that could possibly remotely be the symptom of a heart attack, like numbness in the arm or chest pain, you should perform a test to make sure. Even though the test is probably going to come back negative, you have to check just to be safe. Here, we're ruling out Terminate­Thread, which I guess could go by the dorky acronym ROTT.

  • The Old New Thing

    The latest technologies in plaintext editing: NotepadConf

    • 13 Comments

    On November 13, 2014 November 14, 2014, Saint Paul, Minnesota will be the home to NotepadConf, which bills itself as "the premier technology conference for Notepad.exe users and text enthusiasts."

    I'm still not sure whom Microsoft will send to the conference, but maybe that person could give a talk on how you can use Notepad to take down the entire Internet.

    Update: The conference has been rescheduled to Friday the 14th.

  • The Old New Thing

    Since clean-up functions can't fail, you have to soldier on

    • 20 Comments

    Clean-up functions can't fail, so what do you do if you encounter a failure in your clean-up function?

    You have to keep cleaning up.

    Some people like to follow this pattern for error checking:

    HRESULT Function()
    {
      hr = SomeFunction();
      if (FAILED(hr))) goto Exit;
    
      hr = AnotherFunction();
      if (FAILED(hr))) goto Exit;
    
      ... and so on until ...
    
      hr = S_OK;
    
    Exit:
        return hr;
    }
    

    And some like to put it inside a cute flow control macro like

    #define CHECK_HRESULT(hr) if (FAILED(hr)) goto Exit;
    

    or even

    #define CHECK_HRESULT(f) if (FAILED(hr = (f))) goto Exit;
    

    Whatever floats your boat.

    But you have to be careful if using this pattern in a clean-up function, because you might end up not actually cleaning up. For example:

    HRESULT Widget::Close()
    {
        HRESULT hr;
        CHECK_HRESULT(DisconnectDoodad(m_hDoodad));
        m_hDoodad = nullptr;
        for (int i = 0; i < ARRRAYSIZE(GadgetArray); i++) {
            CHECK_HRESULT(DestroyGadget(m_rghGadget[i]));
            m_rghGadget[i] = nullptr;
        }
    
        hr = S_OK;
    
    Exit:
        return hr;
    }
    

    What if there is an error disconnecting the doodad? (Maybe you got RPC_E_SERVER_DIED because the doodad lives on a remote server which crashed.) The cleanup code treats this as an error and skips destroying the gadget. But what can the caller do about this? Nothing, that's what. Eventually you get a bug that says, "On an unreliable network, we leak gadgets like crazy."

    Or worse, what if you're doing this in your destructor. You have nowhere to report the error. The caller simply expects that when the object is destroyed, all its resources are released.

    So release as much as you can. If something goes wrong with one of them, keep going, because there's still other stuff to clean up.

    Related: Never throw an exception from a destructor.

    Bonus chatter: Yes, I know that you can avoid this problem by wrapping the Doodad and Gadget handles inside a class which disconnects/destroys on destruction. That's not my point.

  • The Old New Thing

    Why does Explorer say "File too large" for my custom file system, when the problem has nothing to do with the file being too large (heck it's not even a file)

    • 34 Comments

    When Explorer copies files around, it doesn't really know what the maximum file size supported by any file system happens to be. (That information is not reported by Get­Volume­Information.) So it guesses.

    If the file system name is "FAT" or "FAT32", then Explorer assumes that the maximum file size is 4GB − 1.

    Also, if a file operation fails with the error ERROR_INVALID_PARAMETER, and Explorer can't figure out why the parameter is invalid, it assumes that the reason is that the file has exceeded the maximum allowed file size.

    Why does Explorer map "invalid parameter" to "file size too large"? Because some file systems use ERROR_INVALID_PARAMETER to report that a file is too large instead of the somewhat more obvious ERROR_FILE_TOO_LARGE.

    Therefore, if you're implementing a file system, and you're getting these spurious "File too large" errors, one thing to check is whether you are reporting "invalid parameter" for a case where all the parameters are actually valid, but something else prevents you from doing what you want. (Maybe "access denied" would be a better error code.)

  • The Old New Thing

    Microspeak: 1 – 1 is not zero

    • 28 Comments

    In his reddit AMA, Joe Belfiore wrote

    i have regular 1-1 meetings with my counterparts in Office, Skype, Xbox.

    The little bit of jargon there is 1-1 meeting. This is an abbreviation for one-on-one meeting, a common business practice wherein two people, typically a manager and a direct report, have a face-to-face meeting with no one else present. In the case Joe used, the meeting is not between a manager and a direct report but between two peers.

    The term is also abbreviated 1:1, which like 1 − 1 also looks like a bit of mathematical nonsense. But it's not zero or one. It's just an abbreviation for business jargon.

    Of course, Microspeak isn't happy with the abbreviation as-is. Microspeak takes it further and nounifies the abbreviation. "I'll bring that up in my 1–1 with Bob tomorrow" means "I'll bring that up in my one-on-one meeting with Bob tomorrow."

    Note that the abbreviation OOO is not used to mean one-on-one. That abbreviation is used to mean Out of the office, although it is more commonly abbreviated OOF at Microsoft.

Page 4 of 431 (4,309 items) «23456»