Asynchrony in C# 5 Part Six: Whither async?

Asynchrony in C# 5 Part Six: Whither async?

Rate This
  • Comments 37

A number of people have asked me what motivates the design decision to require any method that contains an "await" expression to be prefixed with the contextual keyword "async".

Like any design decision there are pros and cons here that have to be evaluated in the context of many different competing and incompossible principles. There's not going to be a slam-dunk solution here that meets every criterion or delights everyone. We're always looking for an attainable compromise, not for unattainable perfection. This design decision is a good example of that.

One of our key principles is "avoid breaking changes whenever reasonably possible". Ideally it would be nice if every program that used to work in C# 1, 2, 3 and 4 worked in C# 5 as well. (*) As I mentioned a few episodes back, (**) when adding a prefix operator there are many possible points of ambiguity and we want to eliminate all of them. We considered many heuristics that could make good guesses about whether a given "await" was intended as an identifier rather than a keyword, and did not like any of them.

The heuristics for "var" and "dynamic" were much easier because "var" is only special in a local variable declaration and "dynamic" is only special in a context in which a type is legal. "await" as a keyword is legal almost everywhere inside a method body that an expression or type is legal, which greatly increases the number of points at which a reasonable heuristic has to be designed, implemented and tested. The heuristics discussed were subtle and complicated. For example, var x = y + await; clearly should treat await as an identifer but should var x = await + y do the same, or is that an await of the unary plus operator applied to y? var x = await t; should treat await as a keyword; should var x = await(t); do the same, or is that a call to a method called await?

Requiring "async" means that we can eliminate all backwards compatibility problems at once; any method that contains an await expression must be "new construction" code, not "old work" code, because "old work" code never had an async modifier.

An alternative approach that still avoids breaking changes is to use a two-word keyword for the await expression. That's what we did with "yield return". We considered many two-word patterns; my favourite was "wait for". We rejected options of the form "yield with", "yield wait" and so on because we felt that it would be too easily confused with the subtly different continuation behaviour of iterator blocks. We have effectively trained people that "yield" logically means "proffer up a value", rather than "cede flow of control back to the caller", though of course it means both! We rejected options containing "return" and "continue" because they are too easily confused with those forms of control flow. Options containing "while" are also problematic; beginner programmers occasionally ask whether a "while" loop is exited the moment that the condition becomes false, or if it keeps going until the bottom of the loop. You can see how similar confusions could arise from use of "while" in asynchrony.

Of course "await" is problematic as well. Essentially the problem here is that there are two kinds of waiting. If you're in a waiting room at the hospital then you might wait by falling asleep until the doctor is available. Or, you might wait by reading a magazine, balancing a chequebook, calling your mother, doing a crossword puzzle, or whatever. The point of task-based asynchrony is to embrace the latter model of waiting: you want to keep getting stuff done on this thread while you're waiting for your task to complete, rather than sleeping, so you wait by remembering what you were doing, and then go do something else while you're waiting. I am hoping that the user education problem of clarifying which kind of waiting we're talking about is not insurmountable.

Ultimately, whether it is "await" or not, the designers really wanted it to be a single-word feature. We anticipate that this feature will potentially be used numerous times in a single method. Many iterator blocks contain only one or two yield returns, but there could be dozens of awaits in code which orchestrates a complex asynchronous operation. Having a succinct operator is important.

Of course, you don't want it to be too succinct. F# uses "do!" and "let!" and so on for their asynchronous workflow operations. That! makes! the! code! look! exciting! but it is also a "secret code" that you have to know about to understand; it's not very discoverable. If you see "async" and "await" then at least you have some clue about what the keywords mean.

Another principle is "be consistent with other language features". We're being pulled in two directions here. On the one hand, you don't have to say "iterator" before a method which contains an iterator block. (If we had, then "yield return x;" could have been just "yield x;".) This seems inconsistent with iterator blocks. On the other hand... let's return to this point in a moment.

Another principle we consider is the "principle of least surprise". More specifically, that small changes should not have surprising nonlocal results. Consider the following:

void Frob<X>(Func<X> f) { ... }
...
Frob(()=> {
    if (whatever)
    {
        await something;
        return 123;
    }
    return 345;
  } );

It seems bizarre and confusing that commenting out the "await something;" changes the type inferred for X from Task<int> to int. We do not want to add return type annotations to lambdas. Therefore, we'll probably go with requiring "async" on lambdas that contain "await":

Frob(async ()=> {
    if (whatever)
    {
        await something;
        return 123;
    }
    return 345;
  } );

Now the type inferred for X is Task<int> even if the await is commented out.

That is strong pressure towards requiring "async" on lambdas. Since we want language features to be consistent, and it seems inconsistent to require "async" on anonymous functions but not on nominal methods, that is indirect pressure on requiring it on methods as well.

Another example of a small change causing a big difference:

Task<object> Foo()
{
    await blah;
    return null;
}

if "async" is not required then this method with the "await" produces a non-null task whose result is set to null. If we comment out the "await" for testing purposes, say, then it produces a null task -- completely different. If we require "async" then the method returns the same thing both ways.

Another design principle is that the stuff that comes before the body of a declared entity such as a method is all stuff that is represented in the metadata of the entity. The name, return type, type parameters, formal parameters, attributes, accessibility, static/instance/virtual/override/abstract/sealed-ness, and so on, are all part of the metadata of the method. "async" and "partial" are not, which seems inconsistent. Put another way: "async" is solely about describing the implementation details of the method; it has no impact on how the method is used. The caller cares not a bit whether a given method is marked as "async" or not, so why put it right there in the code where the person writing the caller is likely to read it? This is points against "async".

On the other hand, another important design principle is that interesting code should call attention to itself. Code is read a lot more than it is written. Async methods have a very different control flow than regular methods; it makes sense to call that out at the top where the code maintainer reads it immediately. Iterator blocks tend to be short; I don't think I've ever written an iterator block that does not fit on a page. It's pretty easy to glance at an iterator block and see the yield. One imagines that async methods could be long and the 'await' could be buried somewhere not immediately obvious. It's nice that you can see at a glance from the header that this method acts like a coroutine.

Another design principle that is important is "the language should be amenable to rich tools". Suppose we require "async". What errors might a user make? A user might have an have a method with the async modifier which contains no awaits, believing that it will run on another thread. Or the user might write a method that does have awaits but forget to give the "async" modifier. In both cases we can write code analyzers that identify the problem and produce rich diagnostics that can teach the developer how to use the feature. A diagnostic could, for instance, remind you that an async method with no awaits does not run on another thread and give suggestions for how to achieve parallelism if that's really what you want. Or a diagnostic could tell you that an int-returning method containing an await should be refactored (automatically, perhaps!) into an async method that returns Task<int>. The diagnostic engine could also search for all the callers of this method and give advice on whether they in turn should be made async. If "async" is not required then we cannot easily detect or diagnose these sorts of problems.

That's a whole lot of pros and cons; after evaluating all of them, and lots of playing around with the prototype compiler to see how it felt, the C# designers settled on requiring "async" on a method that contains an "await". I think that's a reasonable choice.

Credits: Many thanks to my colleague Lucian for his insights and his excellent summary of the detailed design notes which were the basis of this episode.

Next time: I want to talk a bit about exceptions and then take a break from async/await for a while. A dozen posts on the same topic in just a few weeks is a lot.


(*) We have violated this principle on numerous occasions, both (1) by accident, and (2) deliberately, when the benefit was truly compelling and the rate of breakage was likely to be low. The famous example of the latter is F(G<A,B>(7)). In C# 1 that means that F has two arguments, both comparison operators. In C# 2 that means F has one argument and G is a generic method of arity two.

(**) When I wrote that article I knew that we would be adding "await" as a prefix operator. It was an easy article to write because we had recently gone through the process of noodling on the specification to find the possible points of ambiguity. Of course I could not use "await" as the example back in September because we did not want to telegraph the new C# 5 feature, so I picked "frob" as nicely meaningless.

  • @Timothy Fries

    I agree completely with all that you are saying.

    My gripe is with "await" which is a new term. What I'm trying to say is that "await" is not obvious in its meaning and Mort and Elvis might interpret quite the opposite to what it tries to convey. Is there any word better suited? Probably not.

  • I'd love to be able to await multiple tasks.  Just allow the await operator to await on Task[] or IEnumerable<Task>.  I think that the pattern will become fairly common and that the call to Task.AwaitAll will seem less than elegant in the long term.

    Task[] tasks = ... ;

    await tasks;

    vs

    Task[] tasks = ... ;

    Task.AwaitAll( tasks );

    First way just seems much nicer.

    Another idea: Maybe the IDE could color the await keyword differently so that async calls become really obvious.

  • goggling for  'C# async keyword' points you to the right place yet (and it hasn't even been released yet). I would also assume that VS help will know about it, so that pressing F1 would work as well.

    Personally I don't see the issue, the first time I saw it I read as 'wait for xyz to finish'. Admittedly I didn't immediately see that it would return then and there, however I must confess the concept of 'yeild return' didn't hit me right away either.

    As with any keyword your going to need to know what it does to use it. Try telling my GF what 'struct' means and how its different to 'class'.  I wouldn't expect a novice to be writing async methods (although its does seem alot easier now), and if they are trying to maintain an existing method, then they at least have an example to use right in-front of them.

    PS. Ever tried using google for '??' ;)

  • what about

    async<int> Foo()

    {

     var a = async DoSomething();

     return 0;

    }

    or

    task int Foo()

    {

     var a = async DoSomething();

     return 0;

    }

  • To me it does not really matter which key words will be chosen finally. But I would prefer short single word terms. The issue with finding alternatives for await is, that in fact 2 things are happening there. An async operation is started, while the effective flow of control is returned to the caller. Probably it would require a complete sentence instead of one or two words to express that all (not mentioning the case when the async method can deliver a result immediately). In fact I'm saying that it requires a decent understanding of this feature, which cannot simply be deduced by looking at the language terms. Anyhow, await looks reasonable to me, because it expresses the local, logical flow of control.

    The only bad feeling I have is about return. This issue has already been mentioned by others too. The method body states that a Task<T> will be returned, but the return statement expects an expression of T. This looks somehow not logical. I would support the proposal of having "async return t;"

  • Eric, do you guys maintain a list of decisions that would have been made if it it weren't for the cost of breaking backwards compatibility? I am sure at least you have some strong opinions about language features if you had the option of redoing them.

    It seems like a lot of knowledge/learnings could be reused for a hypothetical new programming language. The most immediate notion would be avoiding null in the type system. Thoughts?

    Cheers,

    Navid

  • This is great -- an inside look at the design decisions and reasoning really clarify a lot. It would be nice to have clarification on why it's still incumbent to write "async Task<int> Foo()" versus "async int Foo()". Why can't the Task<> be implicit?

  • I'm confused by the penultimate paragraph before the "credits".  You say "suppose we require 'async'" and then go on to say "you can have a method that awaits with no 'async' modifier."  Is this right?  What would such a method do if it didn't have the async modifier?

    The way I worded the paragraph was confusing. I've rewritten it. Is that more clear? - Eric

  • @Aaron: "you can have a method that awaits with no 'async' modifier" was an example of what could go wrong when automatically generating code.

  • As far as the "await" keyword, I would suggest two existing keywords -- return and while -- give a better idea of what's going on:

    var document = return while FetchUrlAsync(url);

    var docIsValid = return while Task.Run(() => ParseDocumentAsync(document));

  • First, the good:

    I'm cool with the "await" keyword.  I like it. :)

    Now the bad:

    I definitely don't like making a method async by modifing the method signature (e.g. adding the "async" prefix).  Based on this post, it sounds like the two reasons for this are backwards compatibility and return type inference (e.g. long vs. Task<long>).

    The good again:

    I think I have possible solutions two both of these two issues.

    First, in regard to the type inference problem.

    The root of the problem is that you're requiring async methods (with return values) to have a return type of Task<T>, and then inside the method, you're performing "implicit casting" of the return value from type T to type Task<T>.  So, if a method doesn't know it's async then it won't know it infer a return type of Task<T> and it won't know to "implicitly cast" a return value of T to Task<T>;

    What if, instead of making the method responsible for creating the Task<T>, the caller was responsible for creating it?  You could use the "async" keyword in front of a method invokation instead of on the method signature.  Then the compiler-generated Task would be created in the caller's scope rather than in the callee's scope. Here's an example of what I mean:

    Instead of:

    // When the method starts, a Task<long> is implicitly created.

    async Task<long> ArchiveDocumentsAsync() {

    Thread.Sleep(10000);

    return 0;

    }

    var documentCount = await ArchiveDocumentsAsync();

    You would have:

    // When "async" is reached, a Task<long> is implicitly created.

    long ArchiveDocumentsAsync() {

    return 0;

    }

    var task = async ArchiveDocuments();  

    var documentCount = await task;

    // OR

    var documentCount = await async ArchiveDocuments();  

    // OR (Note the "select async")

    Task<long[]> allResults = Task.WhenAll(from urls in groupsOfUrls select async ArchiveDocuments(urls));

    long[] results = await allResults;

    Provided this is possible, there are numerous advantages to making a method invokation async rather than making a method async:

    1) The method is more reusable in that it can be called both synchronously or asynchronously.

    2) Can get a Property value asynchronously.

    3) Can get a LINQ result asynchronously.

    4) ... etc. Can make anything that you can put into a Task asyncronous.

    5) No requirement of the method definition to use Task<T>.

    6) ... which means the following problematic code examples you gave in your post is no longer a problem:

    void Frob<X>(Func<X> f) { ... }

    ...

    Frob(()=> {

    if (whatever)

    {

    await something;

    return 123;

    }

    return 345;

     } );

    7) No implicit casting to/from Task<T>, so no type inference ambiguities or oddness.

    8) ... which means the other code example you gave is no longer a problem:

    Task<object> Foo()

    {

       await blah;

       return null;

    }

    So that covers the first problem.  The second problem is with the backwards compatibility.  I propose the following:

    Do not support the "await" keyword unless it is enabled somehow (e.g. either a compile flag or a project property that allows you to enable "continuation support" or "coroutine support" or "C# 5.0 language features").  Backward compatibility would no longer be an issue because you can opt-in.  And converting existing projects would be as simple as doing a simple solution-wide replace to remove any existing usage of the keyword.  As proof that this concept works, looking at my C# projects, I see two existing examples where language feature selection is already allowed:

    1) If I select a project in solution explorer then hit F4 to get the properties window, I see a "Language Level" property which allows me to select the different C# versions (2.0-4.0).  I'm not sure if this is built in to Visual Studio or if Resharper adds this, but either way, it's been dogfooded as a great way to select the available language features.

    2) In the C# Project Properties / Build / Advanced, you can set "Language Version", which I think gets passed as a flag called "/langversion" to the compiler.  I'm not sure how relavent this one is, though, since it's probably referring to the IL version, not the C# version.

  • It seems that something like 'retask' would be an interesting option in place of 'await'. It mirrors the Task class name, and doesn't drag too much baggage with it, since it's not actually a word (or at least I can't find an official definition for it). But it sounds sort of like it's associated with scheduling tasks. It could also serves as a shorthand for 'return task'.

  • Eric, I'm curious about the decision to wrap things into a big state machine method instead of splitting it into multiple methods and using Task.ContinueWith.  Was it just easier to transform this way, or are there other benefits?

    I also noticed that the transform isn't very smart yet -- it seems to capture locals even if they're never used in the async method.  I first noticed this in an EventHandler where the sender/eventargs are never used but became members in the generated class anyway.

    It'd be cool to have a standard IAsyncEnumerable<T> where MoveNext() returns a Task.  Can make one ourselves but a standard one would encourage more people to make use of it.  Are there any plans for something like this?

    I've been writing async stuff in C, C++, and C# for years and this is the dream-come-true idea that we've all had but have never implemented because it required too much language support.  Well done, and thank you!

  • @Cory: Around the state machine business - splitting it into multiple methods might work for methods with simple control flow, but how would cope with a loop? If we have

       foreach (var x in y)

       {

           // Some logic here

           await something;

           // Some more logic

       }

    ... it's hard to see how that could cleanly be split into separate methods. You basically need to be able to re-enter the code at any point - and a state machine is quite possibly the simplest way of modelling that.

  • @Joshua: Where would the actual asynchrony be introduced? If the method has to just return a value if it's not called with "async", what would it do at the await statement? I think if you tried to work out how your proposal would actually translate into code which *does* do things asynchronously, you'd have problems getting it all to fit.

Page 2 of 3 (37 items) 123