Query transformations are syntactic

Query transformations are syntactic

Rate This
  • Comments 16

As you probably know, there are two ways to write a LINQ query in C#. The way I personally prefer is to use the “query comprehension” syntax:

from customer in customerList
where customer.City == "London"
select customer.Name

Or you can, equivalently, use the “fluent method call” syntax:

customerList
.Where(customer=>customer.City == "London")
.Select(customer=>customer.Name)

These are guaranteed to be equivalent because the compiler simply transforms the former syntax into the latter syntax before it compiles it. An interesting aspect of this transformation is that it is (almost) entirely syntactic. (The "transparent identifiers" generated for certain queries require a small amount of semantic analysis to determine the corresponding anonymous type, but for the most part, all we do is just pull the raw hunks of code out of each clause and reorganize the program into the method call form.) Once it is in a form that the rest of the compiler can understand, then semantic analysis proceeds as usual.

This means that it is perfectly legal to do dumb things. For example, suppose we decide that by “where” we mean don’t mean “filter”, we actually mean “multiply”. And by “select” we actually mean “add”, not “project”. No problem!

static class ExtInt
{
    public static int Where(this int c, Func<int, int> f)
    {
        return f(0) * c;
    }
    public static int Select(this int c, Func<int, int> f)
    {
        return f(0) + c;
    }
}

int ten = 10;
int twenty = 20;
int thirty = 30; 
int result = from c in ten where twenty select thirty;
Console.WriteLine(result);

And sure enough, ten where/times twenty select/plus thirty is 230. This is a deeply strange way to write a mathematical expression, but legal.

The semantics of the Where method are supposed to be “takes a collection of T and a predicate mapping T to bool, and returns a filtered collection of those T items which match the predicate”. This method does not take a collection or a predicate, it does not return a collection, and it certainly does not have the semantics of filtering, but the compiler neither knows nor cares about any of those facts. All the compiler does is syntactically turn that line into

int result = ten.Where(c=>twenty).Select(c=>thirty);

and compile it; that line compiles just fine, so, no problem as far as the compiler is concerned.

The C# specification has a section which describes the pattern we expect a query provider to implement: that Where takes a predicate, and so on. But we perform no checks whatsoever that you successfully implemented a query provider that implements either the form or the semantics of our recommended pattern. If you do something crazy, like redefine “where” to take something other than a predicate and you get crazy results, then, well, what can I tell you? If it hurts when you do that then don’t do that!

 

  • Although it's certainly not advisable to redefine query operators to perform mathematical operations, there could be some useful applications... for instance, check out this post by Marc Gravell :

    http://marcgravell.blogspot.com/2009/11/selectmany-combining-idisposable-and.html

  • "This means that it is perfectly legal to do dumb things."

    Since when has it been *illegal* to do dumb things as a developer?

  • Bart de Smet also explores such things in "Who ever said LINQ predicates need to be Boolean-valued?" (http://bartdesmet.net/blogs/bart/archive/2008/09/14/who-ever-said-linq-predicates-need-to-be-boolean-valued.aspx)

  • Interesting timing--I just had to implement this functionality in a compiler for a side project.  Getting the transparent identifiers and scoping right was tricky, and one of the spots where the language spec could have been a bit more thorough.  Thankfully, Wes Dyer had a great blog post on this that helped get me on the right track.

    On the whole, though, you guys have done a great job with the language spec.  It's been immensely helpful in helping me get type inference and implicit lambdas implemented correctly.  I really appreciate the effort.

  • I wish this will be not a dark magic (something that is hard coded in the compiler), but just one usage of public extension point. I wish there will be a public way to define rules to map custom contextual keywords to some set of methods. And LINQ will be just one set of such rules. And I wish I were able to define my own mapping for my methods.

  • Andrey, try Googling (or Binging) for a .NET language called "Nemerle".  It implements macros in the way you're describing, and in fact that is how they implemented LINQ support.  The language supports a syntax similar to C# as well as a "shorthand" syntax similar to functional languages.  It's pretty slick, and while you may not be able to use it for your day job, you could use it for personal projects.

  • <blockquote>I wish this will be not a dark magic (something that is hard coded in the compiler), but just one usage of public extension point. I wish there will be a public way to define rules to map custom contextual keywords to some set of methods. And LINQ will be just one set of such rules. And I wish I were able to define my own mapping for my methods.</blockquote>

    There is such a thing: templates in C++. (ducks!)

  • > The way I personally prefer is to use the “query comprehension” syntax

    Eric, have you written anything (here or somewhere like stackoverflow) talking about why this is your preference? Personally I'm still struggling to internalise an intuition about which way is better when...

  • This ( http://blogs.msdn.com/mattwar/archive/2008/07/14/linq-building-an-iqueryable-provider-part-xi.aspx ) is the 11th part of the "LINQ: Building an IQueryable Provider" series by Matt Warren ( http://blogs.msdn.com/user/Profile.aspx?UserID=3134 ).

    I have only used the early stages of it, a year ago, to implement a very cut-down LINQ provider for the SOAP web services-based API of the CA's Unicenter Service Desk, but even that limited experience of mine is enough to show me that the expression trees LINQ is based upon can ultimately only be converted into the the "fluent method call" syntax, with the compiled lambda expression becoming the entry point of the whole thing. And on top of that, the "query comprehension" syntax is implemented via extension methods, to make it all look and feel more like the "traditional" SQL, to standardize the syntax... not so?

  • The very fact that the compiler doesn't try to enforce any particular is *so* versatile - in MiscUtil (with Jon Skeet) we hacked together "Push LINQ", which reverses the IEnumerable<T> direction, but LINQ query syntax still works (and indeed, it still follows the correct semantics). At one point I did mess around with (ab)using SelectMany to write an async wrapper (similar to how F# does async). It worked, but it **really** didn't follow the semantics, so I ditched it in the end (favoring 4.0 / Parallel)

  • The queries in the post are relatively conventional - after all, they're still calling extension methods. The compiler isn't limited to those... we can use properties or fields, and even call static query operators on types...

    http://msmvps.com/blogs/jon_skeet/archive/2008/02/29/odd-query-expressions.aspx

    Evil, evil stuff... but fun.

  • I had been led to believe that e.g. the Where method for some classes will translate an 'expression tree' to some query language specific to the object (e.g. sql for linq to sql, DataTable.Select(String) for linq to datasets). How does it do that if it has already been syntactically transformed by the compiler to a delegate?

  • @Random832: It's not syntactically transformed into a delegate. It's transformed into a lambda expression.

    That lambda expression *may* then be converted into a delegate - or it *may* be converted into an expression tree - it depends on what the method parameter type is. That's another way in which the transformation is incredibly neat :)

  • Denis,

    are you saying that, unless you have a really big time budget, converting LINQ expressions to anything else than compiled code is not feasable?

    If you use LINQ to connect so some SOAP API, the problem is usually that LINQ is so much more powerful than that API, so you would usually end up just taking a few things out of the primary "where" query that you put into the API, and perform the rest of the LINQ statement in memory. There's really not much more you can do, and that's not really LINQ's fault, is it?

    But on the other hend, if you're targeting some system with a more powerful query language, of course you'll notice that LINQ is not really helping a lot here, because it represents the transformed query. And that query has suffered really bad transformations in order to be executable in a static language like C#. E.g., transparent identifiers work around the fact that a delegate cannot take an arbitrary number of input arguments etc.

    If that's what you're concerned with, take a look at re-linq @ http://relinq.codeplex.com. It will give you a nice AST that resembles the LINQ query expressions (from...where...select etc) with transparent identifiers removed. Much easier to go from there!

    @Eric: "The "transparent identifiers" generated for certain queries require a small amount of semantic analysis to determine the corresponding anonymous type..."

    Oh well, that's one way to have it. Guess how much fun it is to trace those transparent identifiers back to the original identifiers they're hiding! Because that's really what you want if your LINQ provider targets anything other than C#...

  • @stefan.wenig: Check out the relinq project (http://www.re-motion.org/).  It includes a LINQ provider framework that essentially takes the transformed expression tree and applies a reverse transform to give you a tree that better reflects the "sql-style" query dialect.  I believe this relieves you of the nastiness of dealing with transformed transparent identifiers.  I haven't spent a lot of time with it, but my so far it looks pretty solid.

Page 1 of 2 (16 items) 12