Delightful, delicious, DLINQ

Delightful, delicious, DLINQ

  • Comments 12

As you've probably already heard, at long last we've announced the new features that we're planning on delivering in C# 3.0. Our primary goal for 3.0 is to enable programmers to concisely represent complex query logic on arbitrary collections in C#. That's great, and as a C# user who spent the last three years playing around with data access in C#, I'm excited about that. But as Cyrus points out, what's more exciting to me is how we're going to get there. In order to build the kind of language integrated query system we want, we're going to have to add features to C# that I've been wanting for years: lambdas, type inference, anonymous types, and so on. These have applications far broader than integrated query.

The reason that we tell you guys about the secret goodies long before we're ready to ship is because we want your feedback as early as possible. It costs a considerable number of dollars to implement this stuff and we want to be very sure that we're building the features that really do make your lives as programmers easier. So please, please, write blogs, send me email, leave comments, attend chats, be vocal. If you're excited by this stuff, let us know. If you think that the goal is good but the proposed implementation is lousy, let us know. We want to hear it!

*****************

In unrelated news, I have always wondered how the magic sunglasses in the classic John Carpenter movie They Live worked.  Perhaps this partially explains it?

  • You want to know if the proposed implementation is lousy, right? Well, where are the outer join operators?
  • I find implicitly typed variables a little scary, especially in a language like C#. Sure, the type is inferred but in the source a developer is not forced to think about the type of expression or the enumerated type in 'foreach' statement. Anonymous types seem just as bad because you're now creating types on the fly instead of defining types up front.
  • I have talked about this general direction in language design in the past (although perhaps not with you). I remain dubious that the continual addition of language features (as opposed to language libraries) is beneficial to users in the long run. It has worried me since Anders first began talking about the idea of integrating many kinds of language into one big super language.

    The features, especially the ones proposed for 3.0+, are individually very complex. It is increasingly difficult for ISVs to write the parsers, compilers, analysis, and programming tools that they want. Microsoft can help by writing some of these components for developers, but the size of the language is cutting off the air supply of tool builders. In the long run this is bad for the language as well.

    A refactoring program might take a few days to write for C, slightly longer for C# 1.0. Generics make the job harder so quite a bit longer for C# 2.0. How hard will it be for C# 3.0, 4.0? Visual Studio can throw hundreds of people at the problem, but nobody else can afford to do so. Where are the creative, new tools going to come from if the startup costs for building a tool are high? Already, C# seems to lag behind other languages in terms of 3rd-party tool availability (see Java and refactoring in comparison, or unit testing).

    A large number of complex language features also causes language balkanization. See: Ada, Perl. How many people are going to master all of these new constructs? Can I pick up somebody else's well written C# program and figure out what it is doing? There are enormous communication problems when everyone is working with language subsets. It's a negative factor for success during long term maintenance.

    Finally, although the individual features are complex, their interaction is going to be much more complex. What really sparked me to write some of this is the experience with E4X adding XML to Javascript. It adds support for XML everywhere. Write a simple program to get something done, and it's great. It adds support for XML everywhere. I'm pretty sure that some of these tricky interactions were not perceived during specification writing. The implementation becomes horrendous. A new feature threads all through the old and well-tested features. It's a great way to expose all the bugs and assumptions that didn't matter before.

    But even once the implementation of the new version is taken care of, it makes reasoning about the language very difficult. Can you do a good job designing a language if you can't predict well how people will use it? Language tradeoff decisions become less obvious and more bad choices will be made. I don't know a good term for this problem, perhaps I'll coin "unintended orthogonality" if someone hasn't done so already.
  • That was a bit long and intense. I have some pent-up language regret!
  • Re: outer join operators -- the DLinq guys are at PDC right now, so it'll be hard for me to ask them whether this is an oversight. I suspect that there will be all the operators you expect. Even if there are omissions, the operators are not a fixed set, but rather are extension methods behind the scenes. There is no reason whatsoever that you couldn't define your own outer join method on your collection object.


  • Re: implicit typing is scary.

    Let me play devil's advocate here.

    This scares me:

    x = a() + b * c;

    All kinds of anonymous stuff is going on in there! temporary variables with no names are being created, filled in with values and destroyed. And who knows what registers they're using. It would be much more clear if this were written in a language without anonymous local variables. Something like

    int temp1 = a();
    int temp2 = b * c;
    x = temp1 + temp2;

    That's much clearer!

    OK, seriously now, I see your point. But as computer languages have evolved, something we've seen over and over again is that fewer and fewer things need to be explicitly named. Often what matters is the structure or function of a thing, rather than what it is called.

    C came along and freed developers from having to name every temporary. Lisp came along and freed people from having to name their functions. ML frees people from having to name their structs, Java has anonymous classes, and so on. If you feel more comfortable naming stuff, then continue to do so -- but once you get used to not having to name everything, it is very freeing.
  • Hey Nicholas,

    That's an excellent point, and one that deserves a more thought-out answer than I can give today. The short version is that I think that our efforts to make the standardization process very public and to get the spec out there early help tools vendors. It's not like VBScript, where there are loads of secret gotchas in the parser.

    Now, let me address your "hundreds of people" claim. There certainly aren't even a hundred devs, testers, pms and managers in the entire C# product unit, and the C# team owns the debugging portions of VS as well. The language compiler/IDE dev team is about a dozen people.

    Also, I think you mean unintended nonorthogonality. I'm sure that I'll be discovering plenty of that as I actually implement the spec, and we'll figure it out. It'll be a hard problem, to be sure.



  • Ok, the estimate of hundreds is a bit high. I had, from idle curiosity, checked the list of open positions in the past, saw 20 or so slots, and assumed that there was around 200 people. However, that is still far, far larger than other companies in this space. I use a lot of software made by one person doing something cool in their spare time. I use a lot of software from companies with fewer than five programmers (and they're also acting as the testers, designers, and managers).

    I think there may be both unintended orthogonality and nonorthogonality. If two things don't work together but obviously should, that's unintended nonorthogonality and can possibly be fixed.

    If two things work together with such disasterous consequences that you wish they couldn't work together, that's unintended orthogonality and much harder to repair. It manifests as a specification that is seemingly simple but almost impossible to implement. I think the decade of C++ compiler correctness mishaps is a powerful lesson here. Users are to this day still afraid to use all the beneficial but rampantly misimplemented features.

    Both terms, when quoted, produce uninteresting googlewhacks so there isn't much of a consensus to base the names on.
  • I have always wanted a language with a built-in relational algebra. I have my own half-baked pages covering this type of thing, but they are over 5 years old now and it seems I will never have the time to complete the thought.

    But, just adding relational algebra semantics to a language are not enough. The self-join is too often used in programming to ignore; so some sort of relational algebra extension is needed to cover this common case.

    On the other hand, maybe you should be adding a real (read lispy) macro system instead. It almost looks like it would be easier.

    At the very least there should be a default process (in the form of a ?macro?) for converting from relational algebra down to C# 2.0.

    Some other requests:

    1) Please treat null as just another value.
    2) SQL blows mostly because self-joins are not handled gracefully, but also because of its use of nulls
    3) Do not force the programmer to explicitly state whether conditions are to be applied during JOIN or after-the-fact in the WHERE clause (T-SQL does this). The ON clause add needless complexity to a query when the WHERE clause should be sufficient. PL-SQL does it right.
    4) Look at WHY outer joins are used. I suspect they are not as important as others insist. I only use outer joins for a few patterns that could have their own semantics instead. Check out an old (Mac based) database program called 4th dimension; I heard they had an alternate solution to outer joins.
  • Well, I'm going to be in a minority of one and say I'm all up for it. I share the concern that C# might get too bloated with new features, but I've always been secretly hoping that the designers would add some of the more useful dynamic constructs and then leave well alone.

    Hey hey! Lambda functions, anonymous methods, anonymous types, closures, class helper methods... I'm a happy developer. I still think in ECMAscript as my mother tongue and sometimes I really miss the flexibility that dynamic features can give you.

    I like the LINQ concept, especially as it relates to object collections, but what are the performance implications for [DX]Linq?

    Will there be caching facilities? How much support will there be for sprocs? Will dynamic SQL queries be pre-compiled? What about ad-hoc queries against XML?

    I think I might well end up building a bog-standard data-access layer and use LINQ for querying the objects it returns and caches.

    That's cool enough in itself though, because I've been beavering away at some data access problems for a while, to which integrated query is a natural and obvious solution.
  • <P>Avast ye, scallywag!</P>
    <P>You're hardly a minority -- the vast majority of comments we've heard so far has been very positive. I am concerned that we not get into some kind of echo chamber where we miss important criticism because it is drowned out by plaudits. As for your perf question, that is the #2 question we've been asked. (When? is #1) You should read the papers we've already released describing how DLINQ will work to be performant. Basically the idea is that the query gets transformed into an expression tree which special-purpose code can then render into the appropriate SQL query. Obviously we want to leverage the power of the back-end database to do rich queries. There will be no big magic in this special expression tree transforming code -- if you want to write your own transformer that talks to your particular optimized stored procedures or your third party database provider or whatever, we will provide some kind of guidance on how to do so. The obvious next question of which providers and what specific data technologies will be supported "out of the box" I have no clear answer on, because we're still gathering feedback on what our customers' top scenarios are, how much work partnering companies are willing to do, etc. As for your other questions -- remember, I'm the language guy, not the data guy. My job here is to get the language features in place so that the data team can do their job. What specific data features they've got in mind, I don't know.</P>
    <P>Yaaargh.</P>
  • I spent a year or so doing javascript development, and I have to say that a lot of the new features in C# 3.0 are the exact things that made javascript development so quick, and the code so pleasantly concise yet readable.

    Internally to my classes, I can't see any point in defining lots of private classes for the sole purpose of allowing me to temporarily store and manipulate some semi-complicated data structure.
    And the huge amount of...
    System.X.Y myVar = new System.X.Y();
    ...code always struck me as redundant too so I'm dead keen on var as well.

    Isn't it better to learn a couple more language concepts and suddenly find that you can see your algorithm without having to mentally filter out a ton of redundant syntax and boilerplate code?

    LINQ seems to be an effort to reduce buggy boilerplate.
    I see lots of code every day which is manually doing the LINQ stuff (at the end of the day that's what most business logic code is about), and it seems to be full of dubious "optimizations" and bugs.
    It will make me sooo happy to be able to replace 200-300 line chunks of buggy code with a small LINQ query.

    Kyle, I think you're completely off base with your TSQL complaints too. Maybe you don't use databases very much?
    Null is a very important concept, and I think SQL gets the difference between null and a normal value just right (so does C# 2.0 RTM, finally).
    Once you accept that null is an important concept, it follows that inner and outer joins are necessary.

    I agree that if you don't do SQL very much then the JOIN syntax may be just an extra detail to remember.
    But to my mind, relationships between tables are VERY different from rules to filter out results. Putting both of these things into the WHERE clause just means I have to mentally separate the two out as I'm reading the WHERE clause. This is hard work!

    Every time I have to use an Oracle 8 database I say a little thankyou to Microsoft for putting the outer join escapes into ODBC and OLEDB.
Page 1 of 1 (12 items)