March, 2014

  • The Old New Thing

    A puzzle involving dynamic programming, or maybe it doesn't


    Here's a programming challenge:

    Evaluate the following recurrence relation efficiently for a given array [x0, …, xn−1] and integer k.

    f ([x0, x1],  k) =  (x0 + x1) / 2  for all k, n = 2. 
    f ([x0, …, xn−1],  k) =  (x0 + xn−1) / 2 +  f ([x0, …, xn−2], k + 1) / 2 +  f ([x1, …, xn−1], k + 1) / 2  for even k, n ≥ 3. 
    f ([x0, …, xn−1],  k) =    f ([x0, …, xn−2], k + 1) / 2 +  f ([x1, …, xn−1], k + 1) / 2  for odd k, n ≥ 3. 

    Hint: Use dynamic programming.

    In words:

    • If the array has only two elements, then the result is the average of the two elements.
    • If the array has more than two elements, then then the result is the sum of the following:
      • Half the value of the function evaluated on the array with the first element deleted and the second parameter incremented by one.
      • Half the value of the function evaluated on the array with the last element deleted and the second parameter incremented by one.
      • If the second parameter is even, then also add the average of the first and last elements of the original array.

    The traditional approach with dynamic programming is to cache intermediate results in anticipation that the values will be needed again later. The naïve solution, therefore, would have a cache keyed by the vector and k.

    My habit when trying to solve these sorts of recurrence relations is to solve the first few low-valued cases by hand. That gives me a better insight into the problem and may reveal some patterns or tricks.

    f ([x0, x1, x2], keven = (x0 + x2) / 2 + f ([x0, x1], keven + 1) / 2 + f ([x1, x2], keven + 1) / 2
      = (x0 + x2) / 2 + f ([x0, x1], kodd) / 2 + f ([x1, x2], kodd) / 2
      = (x0 + x2) / 2 + (x0 + x1) / 4 + (x1 + x2) / 4
      = ¾x0 + ½x1 + ¾x2

    Even just doing this one computation, we learned a lot. (Each of which can be proved by induction if you are new to this sort of thing, or which is evident by inspection if you're handy with math.)

    First observation: The function is linear in the array elements. In other words,

    f (x + y k f (x, k) + f (y, k),
    f (ax k = a f (x, k).

    To save space and improve readability, I'm using vector notation, where adding two vectors adds the elements componentwise, and multiplying a vector by a constant multiples each component. The long-form version of the first of the above equations would be

    f ([x0 + y0, …, xn−1 + yn−1], k) = f ([x0, …, xn−1], k) + f ([y0, …, yn−1], k)

    Since the result is a linear combination of the vector elements, we can work just with the coefficients and save ourselves some typing. ("Move to the dual space.") For notational convenience, we will write

    a0, … an−1⟩ = a0 x0 + ⋯ + an−1 xn−1

    Second observation: The specific value of k is not important. All that matters is whether it is even or odd, and each time we recurse to the next level, the parity flips. So the second parameter is really just a boolean. That greatly reduces the amount of stuff we need to cache, as well as increasing the likelihood of a cache hit. (The naïve version would not have realized that f (x, k) can steal the cached result from f (x, k + 2).)

    Our previous hand computation can now be written as

    f ([x0, x1, x2], even)  = ⟨½, 0, ½⟩ + f ([x0, x1], odd) / 2 + f ([x1, x2], odd) / 2
      = ⟨½, 0, ½⟩ + ⟨½, ½, 0⟩ / 2 + ⟨0, ½, ½⟩ / 2
      = ⟨½, 0, ½⟩ + ⟨¼, ¼, 0⟩ + ⟨0, ¼, ¼⟩
      = ⟨¾, ½, ¾⟩

    Now that we are working with coefficients, we don't need to deal with x any more! All that matters is the length of the vector. This makes our recurrences much simpler:

    f (2,  k ⟨½, ½⟩  for all k.
    f (n even)  ⟨½, 0, …, 0, ½⟩ +  f (n−1, odd) / 2, 0⟩ +  ⟨0, f (n−1, odd) / 2⟩  for n ≥ 3.
    f (n odd)    f (n−1, even) / 2, 0⟩ +  ⟨0, f (n−1, even) / 2⟩  for n ≥ 3.

    Now we can carry out a few more hand computations.

    f (3, odd)  = ⟨f (2, even) / 2, 0⟩ + ⟨0, f (2, even) / 2⟩
      = ⟨¼, ¼, 0⟩ + ⟨0, ¼, ¼⟩
      = ⟨¼, ½, ¼⟩
    f (4, even)  = ⟨½, 0, 0, ½⟩ + ⟨f (3, odd) / 2, 0⟩ + ⟨0, f (3, odd) / 2⟩
      = ⟨½, 0, 0, ½⟩ + ⟨⅛, ¼, ⅛, 0⟩ + ⟨0, ⅛, ¼, ⅛⟩
      = ⟨⅝, ⅜, ⅜, ⅝⟩
    f (4, odd)  = ⟨f (3, even) / 2, 0⟩ + ⟨0, f (3, even) / 2⟩
      = ⟨⅜, ¼, ⅜, 0⟩ + ⟨0, ⅜, ¼, ⅜⟩
      = ⟨⅜, ⅝, ⅝, ⅜⟩

    The interesting thing to observe here is that the recursion does not branch. When we reduce the size of the vector by one element, the recursive calls are basically identical. We have to shift the coefficients differently in order to build up the results, but the recursive call itself is unchanged. This means that we need to perform only n−2 recursive steps to compute f (n, k).

    Okay, now that we've studied the problem a bit, we can write the code. I'll write three versions of the function. The first computes according to the recurrence relation as originally written. We use this to verify our calculations.

    function f1(x, k) {
     if (x.length == 2) {
      return (x[0] + x[1]) / 2;
     var term = 0;
     if (k % 2 == 0) {
      term = (x[0] + x[x.length - 1]) / 2;
     return term +
            f1(x.slice(0,-1), k + 1) / 2 +
            f1(x.slice(1), k + 1) / 2;
    Immediate window:
    f1([1,2,3], 0)
    = 4.0

    Okay, that matches our hand calculations, ¾·1 + ½·2 + ¾·3 = 4.

    Now let's do the straight recursive version.

    function dotproduct(a, x)
     var total = 0;
     for (var i = 0; i < a.length; i++) total += a[i] * x[i];
     return total;
    function f2a(n, k)
     if (n == 2) return [1/2, 1/2];
     var c = new Array(n);
     for (var i = 0; i < n; i++) c[i] = 0;
     if (k % 2 == 0) {
      c[0] = 1/2;
      c[n-1] = 1/2;
     var inner = f2a(n-1, k+1);
     for (var i = 0; i < n - 1; i++) {
      c[i] += inner[i] / 2;
      c[i+1] += inner[i] / 2;
     return c;
    function f2(x, k)
     return dotproduct(f2a(x.length, k), x);
    Immediate window:
    f2([1,2,3], 0)
    = 4.0

    Notice that there is no dynamic programming involved. The hint in the problem statement was a red herring!

    Finally, we can eliminate the recursion by iterating n up from 2.

    function f3(x, k)
     var c = new Array(x.length);
     for (var i = 0; i < x.length; i++) c[i] = 0;
     c[0] = 1/2;
     c[1] = 1/2;
     for (var n = 3; n <= x.length; n++) {
      var carry = 0;
      for (var i = 0; i < n; i++) {
       var nextcarry = c[i];
       c[i] = (carry + c[i]) / 2;
       carry = nextcarry;
      if ((k + n + x.length) % 2 == 0) {
       c[0] += 1/2;
       c[n-1] += 1/2;
     return dotproduct(c, x);

    I pulled a sneaky trick here in the place we test whether we are in the even or odd case. Officially, the test should be

      if ((k + (x.length - n)) % 2 == 0) {

    but since we are interested only in whether the result is even or odd, we can just add the components together, because subtraction and addition have the same effect on even-ness and odd-ness. (If I really felt like micro-optimizing, I could fold x.length into k.)

    Okay, now that we have our code, let's interpret the original problem.

    The expression f (n, k) / 2, 0⟩ + ⟨0, f (n, k) / 2⟩ takes the vector f (n, k) and averages it against a shifted copy of itself. (The word convolution could be invoked here.) If you think of the coefficients as describing the distribution of a chemical, the expression calculates the effect of diffusion after one time step according to the simple model "At each time step, each molecule has a 50% chance of moving to the left and a 50% chance of moving to the right." (Since the length of the vector varies with n, I'll visualize the vector drawn with center-alignment.)

    The base case ⟨½, ½⟩ describes the initial conditions of the diffusion, where half of the chemicals are on the left and half are on the right. This is one time step after one unit of the chemical was placed in the center. Let's get rid of the extra term in the recurrence and focus just on the diffusion aspect:

    d(2) = ⟨½, ½⟩ 
    d(n) = ⟨d(n−1) / 2, 0⟩ + ⟨0, d(n−1) / 2⟩  for n ≥ 3.

    If you compute these values, you'll see that the results are awfully familiar. I've pulled out the common denominator to avoid the ugly fractions.

    1 1 entire row divided by 2
    1 2 1 entire row divided by 4
    1 3 3 1 entire row divided by 8
    1 4 6 4 1 entire row divided by 16
    1 5 10 10 5 1   entire row divided by 32

    Yup, it's Pascal's Triangle.

    Notice that the sum across the row equals the amount we are dividing by, so that the sum of the row is always 1. (This is easy to see from the recurrence relation, since the base case establishes the property that the sum of the coordinates is 1, and the recurrence preserves it.)

    This means that we can calculate the coefficients of d(n) for any value of n directly, without having to calculate any of coefficients for smaller values of n. The coefficients are just row n of Pascal's triangle, which we know how to compute in O(n) time.

    Now we can also interpret the extra term we removed at the even steps. That term of the recurrence simulates adding a unit of chemical to the solution at every other time step, half a unit at the left edge and half at the right edge. And we can calculate these directly in the same way that we calculated the diffusion coefficients, since they basically are the diffusion coefficients, just with a time and location adjustment.

    I pulled a fast one here. Maybe you didn't pick up on it: I'm assuming that the two parts of the recurrence unrelated to the diffusion behavior (the base condition and the extra term at the even steps) are independent and can simply be added together. You can show this by noting that given the generalized recurrence

    fG(2, k) = G(2, k)
    fG(n, k) = G(n, k) + ⟨fG(n−1, k + 1) / 2, 0⟩ + ⟨0, fG(n−1, k + 1) / 2⟩ for n ≥ 3.

    then fG+H = fG + fH. (As before, induct on the recurrence relations.) Therefore, we can solve each of the pieces separately and just add the results together.

    If I had the time and inclination, I could work out the solution for

    C(n, i) + Σk even, 2 < k ≤ n C(k, i) / 2k

    or something similar to that. Like I said, I ran out of time and inclination. But if I could come up with an efficient way to compute that value for all values of i for fixed n (and I believe there is, I'm just too lazy to work it out), then I would have an O(n) solution to the original recurrence.

    (Okay, so the "If I had the time" bit is a cop-out, but I sort of lost interest in the problem.)

  • The Old New Thing

    The dangers of buffering up posted messages and then reposting them later


    A customer wanted to check that their idea for solving a re-entrancy problem doesn't have any hidden gotchas.

    We have a window which processes incoming work. One of our work items enters a modal loop, and if new work gets posted to the window while the modal loop is running, our work manager gets re-entered, and Bad Things happen.

    Our proposed solution is to alter the modal loop so that it buffers up all messages destined for the worker window. (Messages for any other window are dispatched normally.) When the modal loop completes, we re-post all the messages from the buffer, thereby allowing the worker window to resume processing.

    The danger here is that reposting messages can result in messages being processed out of order. Depending on how your worker window is designed, this might or might not be a problem. For example, suppose that during the modal operation, somebody posts the WWM_FOO­STARTED message to the worker window. You buffer it up. When your modal operation is complete, you are about to post the message back into the queue, but another thread races against you and posts the WWM_FOO­COMPLETED message before you can post your buffered messages back into the queue. Result: The worker window receives the WWM_FOO­COMPLETED message before it receives the WWM_FOO­STARTED message. This will probably lead to confusion.

    The place to solve this problem is in the window itself. That gets rid of the race condition.

    LRESULT CALLBACK WorkerWindow::WndProc(
        HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
     if (uMsg is a work message) {
      if (m_cBusy) {
       // Now is a bad time to process the work message.
       // Queue it up for later.
       m_queue.Append(uMsg, wParam, lParam);
      } else {
       m_cBusy++; // prevent re-entrancy
       do {
        ProcessWorkMessage(uMsg, wParam, lParam);
       } while (m_queue.RemoveFirst(&uMsg, &wParam, &lParam));
       m_cBusy--; // re-entrancy no longer a problem
      return 0;
     ...  // handle the other messages

    By queueing up the work inside the window itself, you ensure that the messages are processed in the same order they were received.

    This technique can be extended to, say, have the worker window do some degree of work throttling. For example, you might keep track of how long you've been processing work, and if it's been a long time, then stop to pump messages for a while in case any system messages came in, and somebody is waiting for your answer.

  • The Old New Thing

    Functions that return GDI regions rarely actually return regions


    For reasons I don't quite understand, the only functions in GDI and USER which create GDI regions are the functions with Create in their name, like Create­Rect­Rgn or Create­Polygon­Rgn. All the other functions which return a region require you to pass an existing region to use as the output.

    I can see this being useful for Combine­Rgn, because you can set your output region to be equal to one of the input regions in order to update a region in place.

    // hrgnTotal |= hrgnSegment
    CombineRgn(hrgnTotal, hrgnTotal, hrgnSegment, RGN_OR);
    // hrgnTotal &= hrgnClip
    CombineRgn(hrgnTotal, hrgnTotal, hrgnClip, RGN_AND);

    But for all of the Get functions, having to create an output region is usually just an annoyance.

    // Create a dummy region - contents not important
    HRGN hrgnClip = CreateRectRgn(0, 0, 0, 0);
    // Ask for the clipping region to be copied to the dummy region
    int status = GetClipRgn(hdc, hrgnClip);

    I guess it lets you reuse a single dummy region over and over again, but in practice, you're just going to destroy the region when you're done to free up the GDI region memory.

    Whatever the historical reason for this, we're stuck with it. It may be an ugly pattern, but at least it's a pattern.

    The things I do for you: I asked a colleague who worked on Windows 3.0 if he knew the reason for this design pattern. He didn't know but suggested that I ask another person who retired from Microsoft many, many years ago. Fortunately, I happened to have his email address even though we aren't really that close. And the second person also didn't know. "This behavior was already in place when I joined the Windows 1.03 project. Maybe you can ask Rao." Unfortunately, I don't have Rao's email address, so the trail ran cold. But I burned a favor for you guys.

  • The Old New Thing

    Can CoCreateGuid ever return GUID_NULL?


    A customer asked whether the Co­Create­Guid function can ever return GUID_NULL. Their code uses GUID_NULL for special purposes, and it would be bad if that was ever returned as the GUID for an object. "Can we assume that Co­Create­Guid never returns GUID_NULL? Or should we test the return value against GUID_NULL, and if it is equal, then call Co­Create­Guid and try again?"

    Some people started running Co­Create­Guid a bunch of times and observing that it was spitting out type 4 GUIDs, which will always have a 4 in the version field. Then other people started wondering whether the use of Algorithm 4 was contractual (it isn't). Then still other people went back to read the RFCs which cover UUIDs to see whether those documents provided any guidance.

    And then I had to step in and stop the madness.

    It is very easy to show that any UUID generator which generates GUID_NULL has failed to meet the requirement that the generated UUID be unique in space and time: If it's equal to GUID_NULL, then it isn't unique!

    The uniqueness requirement is that the generated GUID be different from any other valid GUID. And if it generated GUID_NULL, then it wouldn't be different from GUID_NULL! (And GUID_NULL is a valid GUID, specifically identified in RFC4122 section 4.1.7.)

    If you're so worried about Co­Create­Guid generating a duplicate GUID_NULL, why aren't you worried about Co­Create­Guid generating a duplicate IID_IUnknown or GUID_DEV­CLASS_1394 or any of the other GUIDs that have already been generated in the past?

    In other words, no valid implementation of Co­Create­Guid can generate GUID_NULL because the specification for the function says that it is not allowed to generate any GUID that has been seen before.

    One of my colleagues cheekily remarked, "And even if it did generate GUID_NULL for some reason, uniqueness would require that it do so only once! (So you should try to force this bug to occur in test, and then you can be confident that it will never occur in production.)"

  • The Old New Thing

    Geeky t-shirt alert: Windows 7's so-called God Mode


    Back in 2010, the so-called God Mode was the hit discovery of the Internet, at least until the next cute cat video made the rounds. If you had stopped by the Microsoft Visitor Center during that meme's brief moment in the sun, and wandered into the gift shop, you could have picked up a t-shirt that said {ED7BA470-8E54-465E-825C-99712043E01C} on the front.

    Of course, if you actually wore that shirt, you would also get stuffed into a locker and have your lunch money stolen from you.

  • The Old New Thing

    Enumerating set partitions with Bell numbers and Stirling numbers of the second kind


    Just for fun, today's Little Program is going to generate set partitions. (Why? Because back in 2005, somebody asked about it on an informal mailing list, suggesting it would be an interesting puzzle, and now I finally got around to solving it.)

    The person who asked the question said,

    Below we show the partitions of [4]. The periods separate the individual sets so that, for example, 1.23.4 is the partition {{1},{2,3},{4}}.

    1 blocks: 1234
    2 blocks: 123.4,  124.3,  134.2,  1.234,  12.34,  13.24,  14.23
    3 blocks: 1.2.34,  1.24.3,  1.23.4,  14.2.3,  13.2.4,  12.3.4
    4 blocks:

    I replied with a hint, saying, "This page explains what you need to do, once you reinterpret the Stirling recurrence as an enumeration." Only now, writing up this post, did I realize that I linked to the same page they were quoting from.

    The key is in the sentence, "They satisfy the following recurrence relation, which forms the basis of recursive algorithms for generating them." Let's reinterpret the Stirling recurrence combinatorially. No wait, we don't need to. Wikipedia did it for us. Go read that first. If you didn't follow Wikipedia's explanation, here's another angle:

    Suppose you have a set of n elements, and you want to generate all the partitions into k blocks. Well, let's look at element n. What happens when you delete it from the partition?

    One possibility is that n was in a block all by itself. After you remove it, you are left with a partition of n − 1 elements into k − 1 blocks. Therefore, to generate the partitions where n is in a block all by itself, you generate all the partitions of n − 1 elements into k − 1 blocks, and then tack on a single block consisting of just element n.

    On the other hand, if element n was not in a block by itself, then removing it leaves a partition of n − 1 elements into k blocks. (Removing n did not decrease the number of blocks because there are still other numbers keeping that block alive.) Now, element n could have belonged to any of the k blocks, so you have k possible places where you could reinsert element n.

    Therefore, our algorithm goes like this:

    • Handle base cases.
    • Otherwise,
      • Recursively call S(n − 1, k) and add element n separately into each of the k blocks.
      • Recursively call S(n − 1, k − 1) and add a single block consisting of just n.

    Now that we spelled out what we're going to do, actually doing it is a bit anticlimactic.

    function Stirling(n, k, f) {
     if (n == 0 && k == 0) { f([]); return; }
     if (n == 0 || k == 0) { return; }
     Stirling(n-1, k-1, function(s) {
      f(s.concat([[n]])); // append [n] to the array
     Stirling(n-1, k, function(s) {
      for (var i = 0; i < k; i++) {
       f(, j) { // append n to the i'th subarray
        return t.concat(i == j ? n : []);

    You can test out this function by logging the results to the console.

    function logToConsole(s) {
    Stirling(4, 2, logToConsole);

    Let's walk through the function. The first test catches the vacuous base case where you say, "Please show me all the ways of breaking up nothing into zero blocks." The answer is, "There is exactly one way of doing this, which is to break it into zero blocks." Math is cute that way.

    The second test catches the other base cases where you say, "Please show me all the ways of breaking up n elements into zero blocks" (can't be done if n > 0, because you will always have elements left over), or you say, "Please show me all the ways of breaking up 0 elements into k blocks" (which also can't be done if k > 0 because there are no elements to put in the first block).

    After disposing of the base cases, we get to the meat of the recurrence. First, we ask for all the ways of breaking n - 1 elements into k - 1 blocks, and for each of them, we add a single block consisting of just n.

    Next, we ask for all the ways of breaking n - 1 elements into k blocks, and for each of them, we go into a loop adding n to each block. The goofy map creates a deep copy of the array and adds n to the ith block.

    Here's a walkthrough of the goofy map:

    • The concat method creates a new array consisting of the starting array with the concat parameters added at the end. If a parameter is an array, then its elements are added; otherwise the parameter itself is added. For example, [1].concat(2, [3, 4]) returns [1, 2, 3, 4]. The concat method creates a new array, and a common idiom is s.concat() to make a shallow copy of an array.
    • The map method calls the provided callback once for each element of the array s. The return values from all the callbacks are collected to form a new array, which is returned. For example, [1,2,3].map(function (v) { return v * 2; }) returns the new array [2, 4, 6].
    • The map callback is called with the subarray as the first parameter and the index as the second parameter. (There is also a third parameter, which we don't use.)
    • Therefore, if all we want to do was create a deep copy of s, we could write (t, j) { return t.concat(); }).
    • But we don't want a perfect deep copy. We want to change the ith element. Therefore, we check the index, and if it is equal to the i, then we append n. Otherwise, we append [] which appends nothing.
    • After building the array (with modifications), we pass it to the callback function f.

    This pattern is common enough that maybe we should pull it into a helper function.

    function copyArrayOfArrays(array, indexToEdit, editor) {
     return, i) {
      return i === indexToEdit ? editor(e) : e.concat();
    function Stirling(n, k, f) {
     if (n == 0 && k == 0) { f([]); return; }
     if (n == 0 || k == 0) { return; }
     Stirling(n-1, k-1, function(s) {
      f(s.concat([[n]])); // append [n] to the array
     Stirling(n-1, k, function(s) {
      for (var i = 0; i < k; i++) {
       f(copyArrayOfArrays(s, i, function(e) { return e.concat(n); }));

    The copy­Array­Of­Arrays function abstracts the goofy map: You give it an array of arrays, and optionally an index to edit and the function that does the editing. It copies the array of arrays and calls your editor on the element you want to edit.

    To reduce the number of memory allocations, the recursion could also be done destructively. You're then counting on the callback not modifying the result, since you're going to use it again.

    function Stirling(n, k, f) {
     if (n == 0 && k == 0) { f([]); return; }
     if (n == 0 || k == 0) { return; }
     Stirling(n-1, k, function(s) {
      for (var i = 0; i < k; i++) {
     Stirling(n-1, k-1, function(s) {

    The original question was about enumerating all partitions (Bell numbers), and that's easy to put together from the Stirling numbers.

    function Bell(n, f) {
     for (var i = 1; i <= n; i++) {
      Stirling(n, i, f);
  • The Old New Thing

    Different senses of scale create different travel expectations


    A friend of mine had a business meeting near London, and he decided to extend it to a tour of Scotland and England once the meetings were over. (This is the same friend who took me on the emergency vacation many years ago.) His plan was to rent a car early one morning and drive from the meeting location all the way up to Aberdeen at one go, then slowly work his way back south, enjoying the sights along the way.

    He sanity-checked his plan against his colleagues from Great Britain. "I looked it up on multiple online mapping sites, and they all say that the trip from London to Aberdeen is doable in a day. I take motorway X, then Y, then Z. Does this make sense to you?"

    Every single one of his colleagues said, "Oh, no. You can't do it in a day. You should budget two days travel time."

    My colleague was curious. Is the motorway really congested?

    "Not particularly."

    Is the road unusually difficult to navigate, or is the road in poor condition? Something that would prevent me from traveling at the posted speed limit?

    "No, the roads are just fine, and driving is straightforward."

    He asked several other questions trying to find out what it was about the trip that required it to take two days. Is there something funny at the England/Scotland border that takes a long time? Do I have to cross a mountain or something?

    "It just can't be done in one day."

    My colleague concluded that it was simply in the mindset of the locals that driving that far in one day is Just Not Done. There is nothing physically preventing it, but it is considered to be highly unusual.

    As I recall, he ultimately executed his plan without incident. I wonder if the other drivers on the road looked at him funny.

    Bonus story: Another friend of mine was staying in Reading, and he decided to take a weekend excursion to Wales. He pulled out the map, calculated how long it would take him, and noted that the map indicated that there were mountains that he needed to cross to reach his destination.

    He set out with what he thought was plenty of time to spare, but it started getting late, and he still needed to cross the mountains, and he was concerned that the people in Wales would start worrying when he didn't show up.

    And then he reached his destination.

    He drove over the mountains without even realizing it.

  • The Old New Thing

    Why can't I __declspec(dllexport) a function from a static library?


    A customer was having trouble exporting a function with the __decl­spec(dll­export) declaration specified, but found that if the function was in a static library, no function was exported. Why is that?

    Let's go back to the classical model for linking. Code is pulled from a LIB file only if the linker encounters a reference to it. If the linker encounters no reference to any symbols offered by an OBJ file in that LIB, then that OBJ file is not included in the final image. (Remember, we're talking about OBJ files inside LIBs; explicitly-provided OBJ files are always included in the image, under the classical model.)

    Now consider a LIB which contains an OBJ file that has a function marked __decl­spec(dll­export). Suppose that no symbol offered by that OBJ file is ever required by the final image. That means that the OBJ file is never added to the image. And that means that the linker does not see the __decl­spec(dll­export) qualifier on the function inside the OBJ file (since the OBJ file was never used), so the function doesn't get exported.

    Let's look at it another way: __decl­spec(dll­export) does not influence the linking process. All it does is add a little sticker to the function that says, "For export." When the linker adds functions to an image, it makes note of the sticker and adds it to the list of functions that it needs to export. But if the linker never sees the function, then it never sees the sticker.

    In order to export a function from a static library, you need to force a reference to the function from the image. One way is to add an OBJ to the image that contains a dummy function that calls the function you want to export. That dummy function will trigger the resolution of the symbol from the static library, at which point the linker will see the sticker.

    Another way is to use the /INCLUDE directive to create an artificial reference to the function from the command line, but this gets you into the fragile world of having to know the various name decoration schemes for different architectures.

    The best solution is to use an explicit DEF file, since that also gives you a chance to do other things like remove the decorations from the function (so that it can be Get­Proc­Addressed).

    Exercise: "But sometimes the __decl­spec(dll­export) works from a static library, even though I did none of those special things." Explain.

  • The Old New Thing

    When visitors to the United States underestimate the size of the country


    A friend of mine who is from Lebanon (but now lives in Seattle) invited his grandmother to come visit for the summer. When she arrived, he asked her, "Grandma, is there anywhere in particular you would like to visit?"

    His grandmother replied, "I'd like go to to Washington, DC."

    "Okay, Grandma. Let me buy some plane tickets."

    "No, let's drive."

    "You want to drive all the way to Washington, DC? Here, let me show you on a map how far away it is."

    Grandma replied, "Let's do it."

    My friend said, "Okay, Grandma, we're going on a road trip!" He got the rest of the family on board with the plan, packed up the car, and set out early one morning for their cross-country trip.

    By the end of the day, they had made it as far as Idaho, where they stopped for the night. I assume that they made plenty of stops along the way because (1) part of the point of a road trip is to enjoy the things along the way, and (2) Grandma.

    Grandma asked, "Is this Washington, DC?"

    "No, Grandma. Washington, DC is still very far away. Here, let me show you on the map where we are."

    Grandma was unconvinced. "If you'd only stop and ask for directions, we would have been there by now." Grandma was certain that the only reason they were driving all day was that her grandson was lost and stupidly driving in circles, and if he only had driven in the right direction, they'd be there by now.

    Grandma's reference for distance was Lebanon, which is a relatively small country. You can drive from the northern tip of the country to the southern tip in a day. The United States is a bit bigger than that.

    A related story was when my parents in New Jersey hosted some friends from Japan. The first excursion they took was to New York City, a convenient train ride away. For their second trip, they said, "How about today we drive to Chicago?"

    Third story.

    Please share any funny stories about geographic blindness in your home country.

    (This was supposed to be posted on Geography Awareness Week but I messed up.)

  • The Old New Thing

    Going for the facts: Who is displaying this message box?


    A customer wanted to know whether Create­Process had a problem with Unicode.

    A problem with it? Quite the contrary. Create­Process loves Unicode! In fact, if you call the ANSI version, it converts the strings to Unicode and then finishes the work in Unicode.

    Okay, here's the customer's problem.

    We have a custom application written in managed code. When we launch the process from unmanaged code via Create­Process, we sometimes get a bogus error message:

    WARNING! The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.

    The filename is well under the 260-character limit and the directory name is well under the 248-character limit. We have isolated the problem to be related to whether we put Unicode characters in the command line arguments. If the command line arguments are all ASCII, then no message appears.

    In case it matters, here's our code to launch the custom application.

    STARTUP_INFO si = { sizeof(si) };
    if (CreateProcess(NULL, commandLine, 0, 0, FALSE,
                       NORMAL_PRIORITY_CLASS, 0, 0,
                       &si, &pi)  ...
     // (in our case, the call succeeds)

    What do we have to do to get Create­Process to accept non-ASCII characters on the command line without display an error message?

    (Note that Unicode is a superset of ASCII. All ASCII characters are also Unicode characters. The customer is making the common mistake of saying Unicode when they mean Unicode-not-ASCII.)

    Actually, that error message is not coming from Create­Process. It's coming from the custom application.

    We have the source code for our custom application and it does not display this message. The custom application actually receives the command line just fine (be it Unicode or not), but if there is Unicode in the command line, we get the message above.

    The message box may not be coming from your code, but it's still coming from your application. Why not hook up a debugger when the message box is up, then take a stack trace to see whose idea it was to display the message box.

    The customer connected a debugger and determined that the message was coming from a third-party library that their custom application uses. Now they know whom to talk to in order to solve the problem.

Page 1 of 3 (27 items) 123