Out parameters and LINQ do not mix

Out parameters and LINQ do not mix

Rate This
  • Comments 28

I am back from my annual vacation in beautiful southwestern Ontario; before I get into the subject of today's post, check out this shot I took with my Windows Phone camera from the plane on the trip home. We are at 37000 feet, just outside of Billings, Montana, a few minutes before sunset:

WP_000062

The whole thing was chock full of immense lightning arcs which unfortunately I did not capture in the image. This is certainly the largest isolated thunderstorm I've ever seen from the outside. Notice the characteristic anvil shape; as I've described before, we've got a huge heat engine here that is extracting the latent heat from the gaseous and liquid water, and then using that heat to power the updraft that sucks more warm water vapor upwards. Quite beautiful.

Well, enough chit-chat about the weather. Today: what's wrong with this code?

var seq = new List<string> { "1", "blah", "3" };
int tmp;
var nums =
  from item in seq
  let success = int.TryParse(item, out tmp)
  select success ? tmp : 0;

The intention is pretty clear: take a sequence of strings and produce a sequence of numbers by turning the elements that can be parsed as numbers into those numbers and the rest into zero.

The C# compiler will give you a definite assignment error if you try this, which seems strange. Why does it do that? Well, think about what code the compiler will translate the last statement into:

var nums =
  seq.Select(item=>new {item, success = int.TryParse(item, out tmp)})
     .Select(transparent => transparent.success ? tmp : 0);

We have two method calls and two lambdas. Clearly the first lambda assigns tmp and the second reads it, but we have no guarantee whatsoever that the first call to Select invokes the lambda! It could, for some bizarre reason of its own, never invoke the lambda. Since the compiler cannot prove that tmp is definitely assigned before it is read, this program is an error.

So is the solution then to assign tmp in the variable declaration? Certainly not! That makes the program compile, but it is a "worst practice" to mutate a variable like this. Remember, that one variable is going to be shared by every delegate invocation! In this simple LINQ-to-Objects scenario it is the case that the delegates are invoked in the sensible order, but even a small change makes this nice property go out the window:

int tmp = 0;
var nums =
  from item in seq
  let success = int.TryParse(item, out tmp)
  orderby item
  select success ? tmp : 0;
foreach(var num in nums) { Console.WriteLine(num); }

Now what happens?

We make an object that represents the query. The query object consists of three steps: do the "let" projection, do the sort, and do the final projection. Remember, the query is not executed until the first result from it is requested; the assignment to "nums" just produces an object that represents the query, not the results.

Then we execute the query by entering the body of the loop. Doing so initiates a whole chain of events, but clearly it must be the case that the entire "let" projection is executed from start to finish over the whole sequence in order to get the resulting pairs to be sorted by the "orderby" clause. Executing the "let" projection lambda three times causes "tmp" to be mutated three times. Only after the sort is completed is the final projection executed, and it uses the current value of tmp, not the value that tmp was back in the distant past!

So what is the right thing to do here? The solution is to write your own extension method version of TryParse the way it would have been written had there been nullable value types available in the first place:

static int? MyTryParse(this string item)
{
    int tmp;
    bool success = int.TryParse(item, out tmp);
    return success ? (int?)tmp : (int?)null;
}

And now we can say:

var nums =
  from item in seq
  select item.MyTryParse() ?? 0;

The mutation of the variable is now isolated to the activation of the method, rather than a side effect that is observed by the query. Try to always avoid side effects in queries.


Thanks to Bill Wagner for the question that inspired this blog entry.

  • Re read the comments, my bad, it would have a different signature, one parameter less :)

  • While it's unfortunate that the TryParse() methods in the BCL don't come in a flavor that allows nullable types to be returned, it would be nice if the language allowed developers to correct that omission via Static Extension Methods, which don't currently exist.

  • Just for the record. TryParse returning a nullable is inconsistent since null may very well be an expected result. The naming suggested by douglas would be consistent here, namely: ParseOrDefault.

  • Glad to see you back, Eric!

    I'm not sure if I'd like TryParse methods to be that way. It's a nice feature that they return bool, because you can use them in conditions, like this:

    string s1 = whatever1;

    string s2 = whatever2;

    string s3 = whatever3;

    //We need to parse n strings and only proceed if they are all valid

    int i1;

    int i2;

    int i3;

    if (int.TryParse(s1, out n1) && int.TryParse(s2, out n2) && int.TryParse(s3, out n3))

    {

        //proceed and use i1, i2, and i3...

    }

    else

    {

        //return code indicating error...

    }

    With your MyTryParse, you would have to write:

    int? i1;

    int? i2;

    int? i3;

    int.TryParse(s1, out n1);

    int.TryParse(s2, out n2);

    int.TryParse(s3, out n3))

    if (i1.HasValue && i2.HasValue && i3.HasValue)

    {

        //proceed and use i1.Value, i2.Value, and i3.Value...

    }

    else

    {

        //return code indicating error...

    }

    Which, aside from being less elegant, will potentially parse s2 and s3 unnecessarily; to avoid this, you would have to write:

    int? i1;

    int.TryParse(s1, out n1);

    if (i1.HasValue)

    {

        int? i2;

        int.TryParse(s2, out n2);

        if (i2.HasValue)

        {

             int? i3;

             int.TryParse(s3, out n3);

             if (i3.HasValue)

             {

                  //proceed and use i1.Value, i2.Value, and i3.Value...

             }

             else

             {

                  //return code indicating error...

             }

        }

        else

        {

             //return code indicating error...

        }

    }

    else

    {

        //return code indicating error...

    }

    Which is far more contrived.

  • I think nullable value types are not the best replacement for the `bool Try...(..., out value)` pattern.

    The most obvious reason is that it doesn't generalize to reference types. For example you couldn't use it for `Dictionary<K,V>.TryGetValue`. IMO the clean solution would be introducing a `MayBe<T>` struct in the BCL, and use it in these situations.

    The second reason is the lack of additional error information. A `MayBe<T>` type could store an exception object as alternative.

    Some languages, such as F#, offer such a type. But I think it would be beneficial to have it in mscorlib. Then add overloads to existing `Try...` methods in the framework.

  • Maybe<T> - love it. Reminds me of INTERCAL's "PLEASE" modifier.

  • Using a Nullable<int> as the return value of int.TryParse would be vay bad. null is valid value for int? and it would mean invalid instead of not existing.

    I would rather prefer this:

    (bool, int) TryPare(string)

    where (bool, int) would stand for Tuple<bool, int>.

    The usage would be something like this:

    (bool isValid, int value) = int.TryParse("some text");

    var nums =

     from item in seq

     let (isValid, value) int.TryParse(item)

     select isValid ? value : 0;

    And it's await/async friendly.

  • Excellent, Paulo!

    I hope C# 6.0 adds some language resources so that we don't have to deal anymore with tuples like this:

           static Tuple<bool, int> TryParse(string text)

           {

               int i;

               bool b = int.TryParse(text, out i);

               return Tuple.Create<bool, int>(b, i);

           }

    or this:

           var nums = from s in strings

                              let t = TryParse(s)

                              select t.Item1 ? t.Item2 : 0;

  • What about this:

    var nums =

     from item in seq

     let value = int.TryParse(item, out tmp) ? tmp : 0

     select value;

  • I am pretty sure I just got dumber reading this. Awww... shiny extensions and var types and linq... oooo....

  • Yes its also an excellent visualization of the tropopause.

  • Nice blog article. Why not doing it like this:

    var seq = new List<string> { "1", "blah", "3" };

    int tmp=0;

    var nums =

     from item in seq

     orderby item

     select int.TryParse(item, out tmp) ? tmp : 0;

    If you exexute this in LinqPad and append nums.Dump(); it outputs:

                  1 3 0

    as you would expect.

  • Wow that was great, I do really enjoyed reading and understanding this article. I've even read some of paragraphs few times to understand the exact meaning.

    Love logical examples like this, to be hones I never use non lambda queries and never used "let" statement before.

    From your example I've understood one interesting thing, that each projection, in query, is executed sequentially like a method, so everything that happens in previous projection, is not shared with the next one, it means that you cant share one variable between those statements, because you'll always have a last result of var in your next projection, where this var is used.

    Thanks for great article.

Page 2 of 2 (28 items) 12