Five-Dollar Words For Programmers, Part Three: Homoiconic

Five-Dollar Words For Programmers, Part Three: Homoiconic

Rate This
  • Comments 24

Jeff Atwood was kind enough to once more give me the shout-out in his blog the other day. Thanks Jeff!

This inspires me to continue my series on five-dollar words for programmers. Here’s one that I only learned relatively recently, when I helped write the code that translates a lambda expression into an expression tree which represents the content of the lambda: homoiconic.

Expression Trees A language is said to be homoiconic if the representation of the program can be seen as a data structure expressible in that language. With expression lambdas being convertible to expression trees (which can then be compiled into code at runtime), C# 3.0 is somewhat homoiconic. But it pales in comparison to, say, LISP, where pretty much everything you can do in the language you can also represent as structurally isomophic data.

Something I personally would like to see more of in C# in the future is greater homoiconicity. We could extend expression trees to statement trees, declaration trees, program trees, and so on. This series of steps would enable increasingly powerful and interesting metaprogramming scenarios.

C# suffers from the lack of a metalanguage. We absolutely do not want to go towards the horrible and primitive metaprogramming language exemplified by the C preprocessor language. We already have a great language, C#, so why not use C# as a metalanguage? Wouldn’t it be nice to make C# its own metalanguage? And once we do that, then we get into some truly strange loops, where we could use those data structures in the compiler implementation itself!

This is a long way off and might never happen, but a guy can dream.

  • Exactly.

    What made Scheme so powerful (and probably lisp too, but I've never delved in lisp) was the fact that you could simply add a ' before any command, statement, or complete program, and it would become an expression tree, which you could later execute using (exec expression).

    So instead of doing

    (define (double x)

      (* x 2))

    you could

    (define (double x)

       (exec `(* ,x 2)))

    and it would insert the value of x in the code - and if the value of x was the symbol y, this would become (* y 2), doubling the global y and not x.

    When you need context, you'd have

    (define-macro (add-vars x y)

       #`(+ ,x ,y))

    or something like that - or the more complex yet still easy to control define-syntax.

    I don't remember it all, since it was a *long* time ago, but it enabled beautiful things. My favourite example for the power of scheme was the amb macro, which was defined as a 20-line or so macro. It would accept any number of possible values, and try them all until he finds one that doesn't fail, enabling logical programming - for example, what's the first prime between 20 and 30?

    (define (first-prime-between-20-and-30)

      (let ([x (amb 20 21 22 23 24 25 26 27 28 29 30)])

         (if (is-prime? x)

            x

            (amb-fail))))

    This is beautiful usage of the language. If you enable macros and metaprogramming in C#5, it would be an *extremely* powerful language.

    But only if you manage to keep it simple yet powerful.

  • The lack of a metalanguage in C# is a major failing, please read my blog rant from almost a year ago.

    http://johnmelville.spaces.live.com/blog/cns!79D76793F7B6D5AD!163.entry

    Interestingly, Visual Studio provides T4, which implements meta-programming as an IDE feature.  C# 3.5 is one of the many "dialects" of meta-languages supported by T4.  Unfortunately the IDE experience leaves something to desire with regard to intellisense and etc.  It also requires a partial class for the metaprogrammed and non-metaprogrammed parts of a given class.  On the whole, though it gives a very sweet taste of what a good meta-programming system could add to the (already excellent) C# language.

  • .NET already has it CodeDOM, which does almost the same: I say 'almost' because it does not create a direct representation of a program, but a representation of the program's source code, in the language of your choosing. Later, this source code can be compiled in memory and loaded on the fly; I believe WF does a lot of the same (although I do not pretend to know for sure: it's just that back in 2006 I have written tons of very similar code for a company whose engineers had created XML representations of their workflows used to automate their hardware testing and brought me aboard to create a sort of 'visual studio' to make those workflows run, the WF being in a very early beta back then).

    My intuition tells me that the expression trees are more 'immediate', although, once again, I do not pretend to understand in what way, exactly: you still need to call the Compile method of your LambdaExpression object, right?

    But, if C# does eventually become its own 'meta-language', the Business Process Mangement dream may come true: just define the English (or any other 'human' language) keywords, like, 'Incident', 'Change Request', 'Asset', etc., as data types, then make the verbs like 'deploy' or 'break-fix' invoke the related workflows (for example) and -- voila! -- you have RUNNABLE DOCUMENTS, in English, or in your business' native language, which the Big Bosses write to define the 'standard operating procedures' for their business and you then run for them! That should dwarf both WF and Azure, both by 'being cool' and by selling around the world... :-)

  • +1 Douglas Hofstadter reference.

  • > The lack of a metalanguage in C# is a major failing

    It's not, it's merely an indicator that it will take some more time for MP to become mainstream. Be patient. We've already got lambdas and type inference, and look at the poor Java guys who have neither :)

    > T4, which implements meta-programming

    T4 effectively is a macro system, which is a very weak form of metaprogramming. In truth, for anything serious, you want a proper language to work with (and it's best if it's the same as the target language, and not like template metaprogramming in C++).

    > .NET already has it CodeDOM, which does almost the same: I say 'almost' because it does not create a direct representation of a program, but a representation of the program's source code, in the language of your choosing. Later, this source code can be compiled in memory and loaded on the fly

    Ah, but that is not the point of the exercise. Generating C# code on the fly is quite doable even without CodeDOM. The trick is rather to parse the code, and manipulate it.

    > But, if C# does eventually become its own 'meta-language', the Business Process Mangement dream may come true: just define the English (or any other 'human' language) keywords, like, 'Incident', 'Change Request', 'Asset', etc., as data types, then make the verbs like 'deploy' or 'break-fix' invoke the related workflows (for example) and -- voila! -- you have RUNNABLE DOCUMENTS, in English, or in your business' native language, which the Big Bosses write to define the 'standard operating procedures' for their business and you then run for them! That should dwarf both WF and Azure, both by 'being cool' and by selling around the world

    I believe that's how the Oslo guys are trying to sell their product now. Me, I'm not convinced. So far such experiments haven't been successful. By the way, do you know that SQL was supposed to be a language that non-programmers could use "naturally" to get immediate results without deep exposure and learning?

  • Have you seen Nemerle?  The ability to extend the language's syntax with "macros" written in the Nemerle language itself is seriuosly powerful.  I'm tired of external code-generation tools, including T4 and CodeSmith.  Ideally, C# should follow in Nemerle's footsteps and allow one to define coding patterns within the language itself.  Simply adding an attribute to a method (among other ways) to compile-time modify the behavior of that method or to generate code is indeed powerful.

  • I seem to remember Anders saying something about statement trees and meta-programming in the E2E he did on the future of C# on Channel 9 recently. Meta-programming is definitely something I'd like to see, given the amount of time I spend writing code generators of one form or another, and writing generic and/or reflection code that borders on insanity.

  • Great! This is exactly the word I was grapsing for when I was trying to articulate some things about application design. Automation and scripting are more feasible to support if the built-in application features are using the same APIs. In other words, there is no "internal" and "external" API, there is just "the" API. (And maybe built-in features get access to a superset API, but the API is not disjoint.)

    The problem you run into is with an application that was not designed for automation/scripting for v1, and then suddenly this come in as a high priority feature for v2/v3. You can either provide a facade API and snake your calls through multiple thunking layers, or you can ... rewrite half the app to provide the homoiconic design you wish you'd had in the first place (both hindsight and psychic powers are 20/20 after all).

  • Count me in for another enthusiastic vote in favor of real metaprogramming.   This is easier to implement in the Lisp languages because there's no real syntax, you are pretty much just writing bare ASTs anyways.

    Finding the right balance in a language with a rich syntax like C# is, as far as I know, not a cleanly solved problem.   How best to capture the statement/declaration/program tree in the first place?  Can quoting be avoided by using an assignment context like expression trees currently do?  For example:

    class Foo {}

    class Bar {

     void Main() {

       Class foo = typeof(Foo);

     }

    }

    But if you go down that path, you lose the obvious ways to splice in 'unquoted' bits.  Lots of open questions.   I hope it becomes a priority to find good answers.

  • You can't have a homoiconicity post without somebody referring to Greenspun's 10th Law.

    "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp."

    ;)

  • Eric,

    I wonder whether you or any of the C# designers have ever taken a look at Boo. It's a statically typed programming language for .NET that enables exactly what you describe in this blog post, and it also enables what Anders Hejlsberg has shown about his vision of C# 5 in a few talks in the past.

    Boo can do this because it is already implemented in C# (and is currently rewritten in Boo), so it can simply provide the AST representation of the program in its internal expression tree syntax. This enables things like Meta methods (method calls executed at compilation time instead of runtime), AST attributes (attributes with a Process method that gets invoked when the attributed code element is compiled), macros, etc. It also has an extensible compiler queue, where you can insert custom compilation steps, transforming the program tree to your heart's desire. It is well-suited for any kind of metaprogramming, and is heavily used for DSLs.

    Whenever somebody on the C# team talks about these features, it's like "this is so far into the future" mixed with "in ye olden times, mysterious languages like LISP could do that".  Yet, Boo can do all of this, now. On the .NET platform. Look at it: <http://boo.codehaus.org/>. And maybe learn from it? Or at least acknowledge that approaches like this already exist on the .NET platform.

    Regards,

    Fabian

  • Hi Eric,

    I have two comments:

    1) having a full AST in C# would be great. And I don't say this because it just sounds cool, but because it would enable better solutions for a number of problems we can currently only solve in very clumsy ways.

    However, I want to bring your attention to the problem of meta-programming in the static .NET type system in general. Even if we get AST-level access to an entire class, and be able to modify that AST (which is currently not possible, and copying ASTs is not always the better solution), we're still in a much worse position than any Lisp programmer would be. For two reasons: syntax and types. I don't need to tell you about syntax making meta-programming harder, and it's a problem that C# would share with many dynamic languages. The other problem might be worse:

    "In ye olden times", the .NET type system was simple. We had static types and a fallback of "object" for everything we could not express using that simple type system. This was in fact not much harder to deal with than purely dynamic languages. However, with the power of generics, we also invited a whole new level of complexity into .NET-based meta-programming. It suddenly became hard. For generic types we get so many specific combinations of generic arguments in types, members, base types and interfaces etc., and in so many differenct states (closed, open and everything in between) that in any interesting challenge in meta-programming, it becomes a nightmare event to figure out which tests to create.

    It is not an accident that we found quite a few deep bugs in the .NET framework when we created a library that brings real mixins to C#. And considering the overwhelming complexity of everything, I almost find it surprising that we didn't find more. That's really difficult stuff, and kudos to the C# and CLR teams for getting this right. However, now it suddenly is just as difficult to do meta-programming in Reflection. And I'm pretty sure that these problems are going to tickle down into C# 5 (?) meta-programming, if they are not adressed carefully.

    I'm with Fabian, finding a way to enable meta-programming in a static .NET language is not rocket science, humankind has been there before. You might find an even better way, fine. But if you really want to make a difference here, you should focus on providing simple yet powerful ways to handle the complexities of the type system with special attention to generics.

    (How could that look? Hell, if I knew I could probably apply to Eric Meijer's job ;-))

    2) This is a pet topic of mine. I believe that IQueryable-based LINQ (i.e., anything but LINQ to Objects) is somewhat broken. Here's why:

    The processing time required to transform the Queryable-AST to SQL by LINQ to SQL sometimess exceeds the actual time to do the query, including the roundtrip to the server. Other meaningfully complex LINQ providers are going to experience similar problems.

    This is not surprising for people who know the way C# translates LINQ query expressions into chained method calls. Making any sense of the results is not just hard to code right, but also a lot of work for the machine. Even simple query expressions are translated into something ugly, verbose, and structurally hard to handle just so that they result in compilable C# code. This is funny, because most of the time they would never have to be compiled, but rather translated into something that in turn closely resembles the original LINQ query. What a waste of time!

    The result is a static, strongly typed compiled language that takes so many detours that the resulting performance is probably worse than in a dynamic language like Ruby that could provide a more straightforward approach.

    Now not everything is rain clouds, dragons and Bush here, all of what you're doing makes sense. But you have to deal with the arising problems somehow.

    Currently, the LINQ to SQL team's answer is explicitly compiled queries. This sucks for a number of reasons, the most obvious one being that it's hard, ugly and error-prone, and not at all Language Integrated in the way that LINQ queries are supposed to be.

    Is that it? Look at those beautiful LINQ queries you can create (but for production code, we recommend this ugly workaround)?

    I beleive that there has to be _some_ way to implement implicit and automatic caching of generated queries. The only problem with this is that every time a from. ... select statement gets executed, a new AST is created. This defies any attempt at efficient caching.

    Here's my proposed solution: Instead of digging captured variables deep inside the ASTs, automatically generate lambdas and insert the actual variables at instantiation time. The inner lambda expression (tree) can remain unchanged, and can therefor be stored in an invisible static field somewhere.

    pseudo code: instead of

    var x;

    execute AST (source.Where (s => s.Property > x));

    you could generate

    static ast = AST (x => source.Where (s => s.Property > x));

    execute AST (ast (x));

    Now we can cache the ast object using its reference as a dictionary key.

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=315491

    Together with your vision of ASTing entire statements, we could even capture whole LINQ statements that way, cache their transformation results, and just insert the parameters at run time. Pretty much what I need to do today using compiled queries today, but transparently so.

    What do you think?

  • If you want to see an example of a object-oriented (albeit prototype based) homoiconic language, then you might find Ioke interesting. It's not fit for production use (it's still a baby) but it currently serves as an excellent didactic vehicle for exploring and understanding these concepts in a fairly recognisable syntax for people with Java, C#, Ruby backgrounds.

    Oh, and quite a bit of the core of the language is written in itself and has a whole load of tests which really makes things a lot more accessible and easy to understand.

    http://ioke.org

  • If you want to see an example of an object-oriented (albeit prototype based) homoiconic language, then you might find Ioke interesting. It's not fit for production use (it's still a baby) but it currently serves as an excellent didactic vehicle for exploring and understanding these concepts in a fairly recognisable syntax for people with Java, C#, Ruby backgrounds.

    Oh, and quite a bit of the core of the language is written in itself and has a whole load of tests which really makes things a lot more accessible and easy to understand.

    http://ioke.org

  • Why not implement a Common Lisp.Net like clojure? Not a toy like ironscheme, but a real supported implementation. You will never get away from ASTs, so why not just use lisp.

Page 1 of 2 (24 items) 12