I am crazy about language processing!

It is an excellent way to think deeply about programming, programming languages, metaprogramming, and it is the foundation of automated software engineering. Over the last 15 years or so I have written countless language processing components (parsers, pretty printers, transformations, etc.). Many of them originated in a teaching context. Others originated in the context of developing or exercising language processing infrastructures – as part of my research work. Yet others were implied by my continued interest in formal (and executable) semantics as well as type systems and program analysis. Further, I am a fan of declarative languages and programming methods (think of Haskell, Prolog, Attribute Grammars, DCGs, etc.) – language processing is a home match for declarative programming, and I am playing it every now and then. Finally, I have also worked on software re- and reverse engineering projects (both in industry and academia), and guess what, language processing is a recurring theme there.


Language processing and teaching

These days, the software engineering and programming language communities discuss language processing-related issue a lot. Just think of all the intensive efforts focused at MDE, DSLs and software language engineering. The observable level of importance of language processing makes me wonder about some educational aspects:

How can one actively contribute to a matured body of conceptually founded and practically applicable knowledge about languages and language processing? How can one efficiently disperse such knowledge in university courses without though necessarily assuming entirely new courses?

The new SLPS project (read as Software Language Processing Samples) may hold one answer to these questions. SLPS is set up to collect samples that explore different methods of language processing, different languages under study, different implementation languages and platforms, and different auxiliary implementation technologies (e.g., parser generators). SLPS is a SourceForge project; see the project site and the project web site. The project will hopefully grow into something useful that inspires programmers as much as university teachers. SLPS could become a community effort that collects programs of “a certain kind” – just as much as the sites “99 Bottles of Beer”, “Hello World”, or “OO Shape Examples”. This is a call to arms for users and contributors. I am absolutely motivated and prepared to push this project for some time, but I hope that others see the value and join in. I readily leverage the value of the project in teaching, and consider the teaching value indeed to be the sweet spot of the project. (There may be an idea for a textbook in the air, subject to more thinking.)


The Factorial Language (FL)

To begin with, I have uploaded several implementations of the Factorial Language (FL) – a tiny functional programming language with first-order functions over integers. (Lukas Renggli has readily added an FL implementation based on Squeak.) The uploaded language processors parse, pretty-print, evaluate or optimize programs like the following:

mult n m = if (n==0) then 0 else (m + (mult (n - 1) m))

fac n = if (n==0) then 1 else (mult n (fac (n - 1)))

In case you didn’t guess it yet, the language’s name refers to its groundbreaking ability to express the factorial function. I have been told that the Fibonacci and the Ackermann functions can be expressed, too. This may be entirely possible. In the above snippet, we define both a mult and a fac function because there is no built-in multiplication; mult is derived from built-in addition.


ANTLR, AspectJ, C#, Haskell, Java, Prolog, Smalltalk, Tom, XML, XSD, …

Here are the existing FL implementations, as of writing: (i) a Haskell-based one using parser combinators for parsing, and generic functional programming (Data.Generics) for implementing an optimization; (ii) a Prolog-based one using DCG style for parsing; (iii) a Java-based one using ANTLR to parse concrete syntax, and plain old classes with virtual methods otherwise; (iv) another Java-based one using additionally TOM for implementing several operations by rewriting; (v) yet another Java-based one that bypasses concrete syntax, and uses XML and XSD instead, to which end it uses the JAXB approach for XML/object de-/serialization; the language processing operations are added as virtual methods by AspectJ’s intertype declaration; (vi) a Squeak-based one; (vii) two C#-based ones. I really look forward seeing additional FL implementations or entirely new groups of language processing samples.

We may be able to grow this sample collection up to a point … (i) that it captures and demonstrates most if not all principles and methods of language processing, language-oriented programming, compiler construction, X/O/R mapping, XML- and grammar-based programming, program and software transformation, and, last but not least, formal semantics; (ii) that it showcases, compares, and evaluates popular or interesting language processing technology – such as combinator libraries (for parsing, pretty printing, transformation), DLS infrastructures and frameworks, Metamodeling technologies (e.g., based on EMF), and IDEs; and (iii) that it collects executable knowledge about languages – both the languages under study and the languages used for implementation.