Welcome to MSDN Blogs Sign in | Join | Help
Software Language Processing Samples (SLPS)

I am crazy about language processing!

It is an excellent way to think deeply about programming, programming languages, metaprogramming, and it is the foundation of automated software engineering. Over the last 15 years or so I have written countless language processing components (parsers, pretty printers, transformations, etc.). Many of them originated in a teaching context. Others originated in the context of developing or exercising language processing infrastructures – as part of my research work. Yet others were implied by my continued interest in formal (and executable) semantics as well as type systems and program analysis. Further, I am a fan of declarative languages and programming methods (think of Haskell, Prolog, Attribute Grammars, DCGs, etc.) – language processing is a home match for declarative programming, and I am playing it every now and then. Finally, I have also worked on software re- and reverse engineering projects (both in industry and academia), and guess what, language processing is a recurring theme there.

 

Language processing and teaching

These days, the software engineering and programming language communities discuss language processing-related issue a lot. Just think of all the intensive efforts focused at MDE, DSLs and software language engineering. The observable level of importance of language processing makes me wonder about some educational aspects:

How can one actively contribute to a matured body of conceptually founded and practically applicable knowledge about languages and language processing? How can one efficiently disperse such knowledge in university courses without though necessarily assuming entirely new courses?

The new SLPS project (read as Software Language Processing Samples) may hold one answer to these questions. SLPS is set up to collect samples that explore different methods of language processing, different languages under study, different implementation languages and platforms, and different auxiliary implementation technologies (e.g., parser generators). SLPS is a SourceForge project; see the project site and the project web site. The project will hopefully grow into something useful that inspires programmers as much as university teachers. SLPS could become a community effort that collects programs of “a certain kind” – just as much as the sites “99 Bottles of Beer”, “Hello World”, or “OO Shape Examples”. This is a call to arms for users and contributors. I am absolutely motivated and prepared to push this project for some time, but I hope that others see the value and join in. I readily leverage the value of the project in teaching, and consider the teaching value indeed to be the sweet spot of the project. (There may be an idea for a textbook in the air, subject to more thinking.)

 

The Factorial Language (FL)

To begin with, I have uploaded several implementations of the Factorial Language (FL) – a tiny functional programming language with first-order functions over integers. (Lukas Renggli has readily added an FL implementation based on Squeak.) The uploaded language processors parse, pretty-print, evaluate or optimize programs like the following:

mult n m = if (n==0) then 0 else (m + (mult (n - 1) m))

fac n = if (n==0) then 1 else (mult n (fac (n - 1)))

In case you didn’t guess it yet, the language’s name refers to its groundbreaking ability to express the factorial function. I have been told that the Fibonacci and the Ackermann functions can be expressed, too. This may be entirely possible. In the above snippet, we define both a mult and a fac function because there is no built-in multiplication; mult is derived from built-in addition.

 

ANTLR, AspectJ, C#, Haskell, Java, Prolog, Smalltalk, Tom, XML, XSD, …

Here are the existing FL implementations, as of writing: (i) a Haskell-based one using parser combinators for parsing, and generic functional programming (Data.Generics) for implementing an optimization; (ii) a Prolog-based one using DCG style for parsing; (iii) a Java-based one using ANTLR to parse concrete syntax, and plain old classes with virtual methods otherwise; (iv) another Java-based one using additionally TOM for implementing several operations by rewriting; (v) yet another Java-based one that bypasses concrete syntax, and uses XML and XSD instead, to which end it uses the JAXB approach for XML/object de-/serialization; the language processing operations are added as virtual methods by AspectJ’s intertype declaration; (vi) a Squeak-based one; (vii) two C#-based ones. I really look forward seeing additional FL implementations or entirely new groups of language processing samples.

We may be able to grow this sample collection up to a point … (i) that it captures and demonstrates most if not all principles and methods of language processing, language-oriented programming, compiler construction, X/O/R mapping, XML- and grammar-based programming, program and software transformation, and, last but not least, formal semantics; (ii) that it showcases, compares, and evaluates popular or interesting language processing technology – such as combinator libraries (for parsing, pretty printing, transformation), DLS infrastructures and frameworks, Metamodeling technologies (e.g., based on EMF), and IDEs; and (iii) that it collects executable knowledge about languages – both the languages under study and the languages used for implementation.

 

Regards,

Ralf

About the fundamental notion of software languages

The call for papers for the 1st International Conference on Software Language Engineering (SLE) is public now. I am happy that we did all the work to start this new conference. I am 100% confident in the SLE conference; it’s backed up by a strong and self-confident community that has something to say about generic language technology, programming languages (incl. parsing, semantics, type checking), domain-specific languages, program transformation, software re- and reverse engineering, and model-driven development; it’s also backed up by the ample relevance of this area for the progress of IT, and the insight that the fundamental notion of languages cannot sufficiently be discussed within the scope of “technical spaces” alone (read as grammars, XML, UML, graph transformation, etc.). Have a look at the call. SLE 2008 will be co-located with MODELS 2008, September 29-30 October, in Toulouse, France.

Disclaimer: A subjective perspective on SLE follows. Opinions expressed below are neither necessarily shared by my wife, nor by other SLE organizers and combatants, nor by my current/previous/future employers. Also, don’t take me too serious. Remember this is a blog, I am a Cobol+Haskell nerd, and I don’t use smilies, as a matter of style. As one of my SLE friends (namely, JMF) tends to say: “May the fun be with you.” Indeed, SLE 2008 isn’t just an ambitious conference. We also want to have fun studying languages, being language nerds, comparing artificial and natural languages, engineering denotational semantics, hooking into popular IDEs, hacking parsers and type systems, and bashing XSD. Did I mention, BTW, that we meet up in Toulouse? Toulouse is not Bordeaux (“city of wine”), it’s not Lyon (“city of gastronomy”), but it means quality French wine and food; if you are stationed elsewhere in this world, then get some rest for a few days, and submit an unrejectable paper, or attend anyhow. In fact, I am just reading in a message from JMF: “Toulouse, a `haut lieu de la gastronomie francaise’". Watch your weight.

 

The IT Tower of Babel


Just as much as mankind has suffered from the construction of the (Biblical) Tower of Babel and the subsequent confusion of languages (in fact, tongues), mankind (say, IT) is now suffering from the IT Tower of Babel – say the continuing emergence of always new (artificial) languages without much reliable ability to mediate between or to integrate or to evolve or to replace these languages; without even much shared understanding for the need for all these languages; overall: without much ability to deal with the increasing complexity of the pool of relevant languages in any given IT context.

Is the Tower of Babel a good metaphor? Admittedly, the (Biblical) Tower of Babel was simply an overambitious construction project that justly annoyed the chief architect (not of the building but of the universe) up to the point to punish the constructors with the confusion of languages; other punishments could have been chosen – such as blocking access to further construction material. In contrast, I use the term “IT Tower of Babel” to refer to the plethora of artificial languages as such. This plethora was not caused by Microsoft’s CSA or any other major chief architect, but it’s pretty much a self-punishment of the IT community as a whole. We somehow ended up with all these artificial languages. Technology providers (say for compilers and platforms) certainly had a part in this: variation of language (read as programming language, any sort of tool-dependent file format) is often used for differentiation. Generally, applied computer science (let’s say “IT computer science”) is not quite an engineering science; it is so much affected by individuals, industry, fashion, and fiction. Finally, the emergence of artificial languages is subject to entropy; it seems we are just getting more – no matter what.

Life was easy when there was only the Adamic language; IT life was easy when languages were in the hands of computer scientists (as opposed to language amateurs, not caring about semantics and program algebra); when there were just a few important programming languages: Cobol, C, Fortran, and not much more; when systems didn’t have to interoperate by means of C/S, SOA, WebServices; when there were just a few APIs per platform; when system owners were mostly happy with staying with their (evolving) platform for 1-3 decades. In fact, there were quite a few platforms (think of the mainframe age and the various Cobols; ask your grandpa), but that wasn’t too much of a fundamental problem; it merely provided job security for a small crowd of “interpreters” – specialists who could connect systems on different platforms in those relatively few cases when it was inevitable. (Read “few” in comparison to today’s standards.)

The confusion of natural languages happened instantly, and it was fatal to the further construction of the (Biblical) Tower of Babel; suddenly we had 70-72 dialects – a number that is amazingly small for IT standards. In contrast, the confusion of artificial languages went through some stages, and it’s ongoing.

 

Stage I: “Many programming languages”

The problem with the confusion of natural languages basically was that people lacked translational power for the 70-72 dialects. Lacking this power, the offending building had to be abandoned. There are probably a thousand programming languages in use, certainly if we count dialects and versions, even if we narrow down the number to business-critical languages – say languages used for business-critical software systems – as opposed to say languages for making rhymes about beer. Such a multitude of languages is a somewhat unsolved and serious problem in IT by itself (and not made easier by the continuing increase of languages); think of mass maintenance or language conversion. The problem can be economically fatal for a system owner; lack of transformational and conversional power makes it impossible to keep up with inevitable evolution pressure.

Now, you might say: “Lack of transformational and conversional power as in mass maintenance? This sounds like a Cobol problem to me. Cobol is for wimps; I am using Java or .NET, look at my beautiful refactoring support, lock at how I can patch this/that compiler to give me (low-level) access to the parse-tree of a program.”

Then try this: develop a converter from Java 1.1 to C# 3.0 applications; please send me an email, every now and then, how it goes. If you are among the top 10 of computer scientists worldwide, or an extraterrestrial hacker, you might get this done before Isabelle enters school, and you will sell it for more money than is needed to stop global warming locally in your county. That’s the point: shouldn’t we all work on making these things easy and repeatable and doable by folks with just one PhD? Let’s have 10 years of the new SLE conference, and see whether we can get closer to this goal!

Want a different challenge? Then try this one: go and patch Squeak or any other major Smalltalk dialect so that it gets a decent amount of static checking, in fact, up to the degree that you would get in an OO functional language with type inference, modulo the necessary compromises to account for Smalltalk’s breathtaking expressiveness, most likely on the grounds of soft typing. Providing static typing for Smalltalk has been attempted a number of times (see Strongtalk, for example), but it hasn’t been successful so far. Smalltalkers would absolutely appreciate additional checks, as long as they don’t get into the way, as long as they do not remove expressiveness. I think that this is an extremely laudable attitude, and we just must get better software-language engineers to solve this problem, ultimately. In more general terms, Erik Meijer uses the slogan “Static Typing Where Possible, Dynamic Typing When Needed”, and, of course, the assumption is that we extend our horizon as to what we call a type, as to the perhaps old-fashioned dichotomy of static types, dynamically checked assertions, and offline-proven program properties.

 

Stage II: “Complex platforms”

 

Even if you are a wizard of the C#, Java or VB syntax and semantics, you cannot write much else than “hello world” programs and the factorial function. In fact, “hello world” already requires some “library” knowledge.  More generally, nobody can get real work done without profound knowledge of a platform. I want to make the point here that a platform is actually an overweight cocktail of languages: API signatures, API usage protocols, make/build/configuration descriptions, ontologies for semantic annotation (“custom attributes”), …, you name it. Further, the use of a typical platform is tied to the use of different data models in the average application: POJO/POCO, not so plain objects, a bit of relational model, and a bit of XML. (Defining schemes according to different data models, mapping between specific schemes, and mapping generically between different data models – these activities all count as software language engineering, in case, you didn’t notice.) Moreover, as a “platform user”, you use different type systems: Java 1.1, Java 1.5, C# ?.?, VB.NET ?.?, SQL, XSD, … – type systems are languages, too. Finally, you also use systems of designs patterns – again, these are languages.

Compare this to good old Cobol times, when you had an all-inclusive language (“hello world” works with the DISPLAY statement; data access works with the CRUD statements), and some pretty lean (and optional) tools for forms, reports, and structured programming. The data model of hierarchical indexed and sequential files was deeply embedded into Cobol. Can’t we just fix Cobol? It already has OO. We know how to do aspects in Cobol; even though the extension got accidentally obsoleted some decades ago. Let’s re-enable aspects, add a few more things, and then convert back to Cobol, worldwide. I’ll try to get the Haskell community aboard.

If we do not convert back to Cobol, we need to deeply understand platforms as cocktails of languages. You might say, “I don’t care much that these are languages; I just have to handle this complexity, call it language, call it software development reality. Again, I don’t care.” That’s the point: nobody cares, and the various languages are disintegrated; some of them are obfuscated; many of them are compromised. Also, they deadlock each other’s evolution and clean-up. Indeed, backwards compatibility deadlocks everything. I have a dream: I imagine an (IT) world where the notion of backwards compatibility has stopped to frighten us; evolution is just so effective that we can always factor in ways that we see a system and a platform that represents the optimum in design conceivable by the time. (Perhaps, the Smalltalk folks can do it? Come and talk at SLE 2008…!)

Again, we should be able to effectively refactor platforms. First example: concepts that were previously in the build rules get suddenly deeply embedded into the language, or disappear from the programming surface because new kinds of compilers automate the concept. Second example: How many XML APIs are there for Java? My bet is 42. Do we need all of them? Shouldn’t we be able to retire half of them or more? Should a Java program in 5 years from now even care about the dichotomy “in-memory” vs. “push/pull”? Third example: if suddenly XSD is found to be too complicated, can we just replace it all over our apps by the use of say RELAX NG? (Some XML bashers may even want to stretch this scenario, and transform away all XML in their apps. Now, let’s not overdo it, Ok?) Fourth example: whenever a major language extension makes to our favorite programming language, we would like to retire design patterns that mimicked the same concepts previously, and we would like to do mining so that lower-level program structures are semi-automatically lifted to the full, extended language. We have seen research on instances of this idea such as in aspect mining or refactoring for generics. How can we make this idea more repeatable? How can we define the higher-level language concepts in such a way in the first place that their relationships to design patterns or lower-level program structures follow from the language definition, and do not require heartbreaking proofs, after the fact? Overall, shouldn’t we limit the entropy in IT? Otherwise, suddenly 50% percent of all human beings may end up working as programmers. Mark my words!

 

Stage III: “Complex interfaces”

 

Even if you are a wizard of the .NET or Java platform, and have a Master’s in say financial services or “foundations” of eCommerce, you will still not be able to do a “MyAmazon” or “MyEBay” because this sort of application must interface with a good deal of other systems in quite complicated ways. First, the sheer size and the sheer number of interfaces that must be contented by the advanced web application are challenging. (For instance, have a look at FpML – a sized (markup) language for financial products.) Second, the interfaces (or services) come with fundamentally different protocols and kinds of contracts (e.g., synchronous, asynchronous, reliable, occasionally connected, self-persisting, encrypted, session-based, etc., sigh!). Third, evolution is the straw to break the camel’s back. Software evolution, in the presence of complex platforms and complex interfaces is currently infeasible in theory, even though it occasionally works in practice – often by not admitting evolution. No matter whether it works or not, software evolution triggers interesting, additional artificial languages, e.g., transformation languages, and model/metamodels for reverse engineering.

I guess WebServices with WSDL and friends was a first attempt to solve this problem. Does it work? Not for me! Whenever I see a .wsdl file, I am overwhelmed; I look somewhere else. I guess SOA is the current hype way of addressing the problem. In both cases, the foremost emphasis is on service description (based on some sort of signatures and protocols, again languages). In both cases, some efforts address the discovery of services. In neither case, I am aware of a comprehensive approach to service evolution with all the encompassing problems of co-evolution and versioning. I really hope that the WebService/SOA community feels somewhat represented by and interested in SLE. I am sure that SLE will attract experts in programming languages, program transformation, language design, and programming environments; this sort of expertise should go well with the challenges in WebService/SOA. Let’s benefit from each other’s interests and strengths! Let’s meet at SLE 2008 in Toulouse.

 

Stage IV: “Driving models

 

Model-driven engineering/development is partially meant as a solution to some of the problems mentioned in the descriptions of earlier stages. We should ignore the fact that MDE/MDD was initially touted as “vacuous fluff” on the OhMyGosh site, aka MDA. In fact, the idea of generating implementational artifacts from higher-level specifications or “models” is laudable and pretty established in CS, so MDA didn’t quite start the salvation of mankind; here are some distant references, [1], [2], [3], [4], [5], but I am not always sure what “model-driven” means – so these related work pointers are given without any underlying principle of what’s in and out. Nevertheless, I mention them here for the benefit of some of the disoriented PhD students that I occasionally encounter at conferences.

Perhaps my stupid problem with model-driven foo/bar is the following: Yes, sure, if … [fun_level++] … I start from a language setup with only brain-dead abstraction mechanisms, don’t care about the fundamental notions of refinement, translation, composition, partial evaluation, drive my efforts by the ideas of boxes, arrows, and stereotypes, add an amazingly complex constraint mechanism (like I would add the wings of an air shuttle to say a Trabant), invent a dozen notations for state machines of different kinds, throw hundreds of people at standardizing this Godzilla, … [keep on rambling for some time] …, then, yes, I can fly deliver very expressionistic pieces of art (read as UML diagrams) meant as solutions for problems of even the smallest size, differing only in the degree of required masochistic orientation. [fun_level--] (I apologize for this imperative idiom, but it’s just too convenient.)

Anyway, if we just think of it as program refinement, translation, composition, partial evaluation – then it all makes sense. So we just call it model-driven engineering, and try not to get stuck in bad memories of early MDA. No matter what, model-driven engineering is a software-language engineering challenge per excellence. We get additional languages (e.g., the language that is refined to be able to express distribution behavior after refactoring or refinement; think of POPL-powered explanation of the GWT); we get additional operations on languages (other than the obvious modes of compilation and interpretation). This creates (interesting) consistency challenges of a magnitude. Languages that arise due to well-defined refinement or translation are hard to argue away; so here I take the position that we better cope with them, make it easy to endorse them, and fully support their life cycle.

 

Stage V: “Languages as underwear”


Toddlers develop their own natural language, and parents are motivated to catch up and understand them, sometimes imitate them. When friends come for dinner, parents even volunteer as interpreters, and everyone is happy and enjoys the individual language. It also makes sense because a full-blown dictionary and grammar is way too complex and inappropriate for the toddler’s ability and mind and focus.  Also, parents do parenting quite a bit for the entertainment value of toddlers, and those individual languages are part of the entertainment value, no? Also, the time frame for using these languages isn’t completely insane; the toddler is going to change diapers way more often than inventing a new baby language.

In the emerging age of domain-specific languages, or should I say, in the current, more intensive wave of the long-standing DSL movement, we are suddenly able to change our artificial languages like underwear; introduce a new one, every day; use one per component of the application! To me, it is not strikingly obvious that this works, unless we impose appropriate constraints on this ability. (Come to SLE 2008, and tell us, how it works!) Taken to an extreme, “languages as underwear” would correspond to an adult who invents a new (natural) language every day or so, while assuming his colleagues, friends, and partner to catch up and learn this language pretty quickly. Perhaps, the adult is actually teaming up with a few buddies to justify the creation of the new language. Still, imagine everyone being so creative continuously. Let me emphasize that the creation of a special-purpose dictionary, or, let’s say, an ontology (to impose more structure on a dictionary) is Ok. I wonder … what’s fundamentally wrong with say English or German given these languages’ expressiveness? Getting back to artificial languages, what’s wrong with say Haskell 98+, provided we are getting depending typing, clever macros, and a few more gadgets. Then we will be “DSL complete” (as opposed to Turing complete).

Here is a pretty non-controversial definition of the term “domain-specific language”: it’s a language with domain-specific support for notation, abstraction mechanisms, checks, and behaviors (e.g., efficient behaviors based on domain-specific transformations). This sounds good. Sure, we do want the capability to define domain-specific languages. However, the dominating discussion of DSLs lacks some related, very important aspects. Suppose we are close to the point that it is easy to define a DSL, don’t we also need the ability to evolve them, integrate them, retire them, objectively measure the impact of introducing them, or analyze their usage and provide feedback for their evolution? Here I note that not even the mere definition of DSLs is a done deal, unless you are fan of attribute grammars, dependent typing, and a few more weapons of math-based distraction. It is relatively established to provide domain-specific notations, e.g., visual frontends on top of an object model; it is still relatively difficult to provide domain-specific checks, behaviors, and to support DSL evolution, again, unless you are a cutting-edge language nerd – to be clear and serious, I don’t consider myself nearly to be such an expert.

 

Conclusion

 

When the confusion of natural languages happened, it was at least clear that everyone was still talking in some language, even though people would no longer understand each other – too bad. Again, there wasn’t much dispute as to what a language is. In CS, we have taken the confusion of languages to a new level. We are so overwhelmed by language-like entities so that we do not simply lack the ability to understand the various languages at hand; worse than that – we do not even realize when “someone is talking”; we do not even “see” the language when it’s there. Also, we are caught up by details of rendering – comparable to someone who has no ability to deal with even the lowest degree of a dialect. It may also be that we are too openly endorsing slang. By this I mean that the SLE-related areas of CS and IT could perhaps make better use of (and constructively challenge) the fundamental insights accumulated in parsing, formal semantics, programming languages, program transformation, testing, verification, literate programming, and general automated software engineering.

We will inevitably see more languages (programming languages, APIs, DSLs, metalevel/transformation languages, ontologies, …). Understanding the software life cycle of all these languages is a central challenge for the CS discipline. We must improve our abilities to retire languages when they are no longer needed, objectively determine when a domain-specific language is worth designing, provide higher abstraction levels in programming, validate and verify language definitions or implementations, modularize them, think beyond the single-language frame, and endorse language integration and composition, … [ please check out the call for papers for SLE 2008 ] ….

 

Ralf Lämmel

Wannabe Software Language Engineer

42 basic OO assignments

Recently, I started to re-wire some mildly used parts of my brains to accommodate capabilities for a 1st year CS course on OO programming and modeling. Long before I joined the faculty in Koblenz, the course had been designed to get students going in (OO) “programming in the small” and “basics of (OO) modeling” as well as “related foundations of CS” (such as bits of PL theory and complexity theory) and the implied treatment of the “imperative heritage” including the whole range of Hoare semantics and structured programming. So this course may end up being a mix of Java, UML and the classical first CS course. To tell you a secret, I have a hidden agenda of the kind that students are potentially prepared for the post-Java 1.5 / UML 2.1 era. To this end, I subliminally detour into FP ground and touch upon some of the issues with current OO programming.

I should also mention that the course at hand in the curriculum at hand neither should produce “little OO programmers”, nor “UML junkies”, nor “recursion theoreticians”. That is, by far not every student in this class will end up as a software developer, and by the time these folks graduate, most programming will be reduced to model-driven development anyhow, right? Seriously, most of my students focus on “information management” as opposed to vanilla CS. This raises the question of how to make the course valuable for such students. My current take is to implant a good number of higher-level and timeless concepts into these young brains so that these students can think of programming and modeling and talk to technical folks and understand designs and honor technical problems. As a complement, the lecture also offers demo parts and the labs are pretty much hands-on. Again, programming-related courses can be extremely boring (ditto for examinations), and hence I have decided that a more intellectual (dis)course may be more fun; I can only speak for the lecturer.

There are these ways to consume this blog post: (i) my students may check out their understanding of all previous lectures to date, and they should not panic if a few questions are over their heads; (ii) other lecturers (e.g., those whose slide decks I have adopted for my course) may use some of the questions, if they ever run out of questions themselves; (iii) everyone else may find some riddles, admittedly nothing of Brian’s caliber, but then again, also without any risk of headache or brain damage. As the course continues over the next few months, I may publish another bag of questions, assuming that the course will not be abandoned or get stuck. If anyone wants to reply with clever solutions, this is more than appreciated.

 

Yours,

Ralf Lämmel

 

1.      Encapsulation

 

a.       Why is it a good idea to hide the state representation for the implementation of an object type for complex numbers? Yes, hiding representations is always a good idea, but what’s so special about complex numbers to provide a particular incentive?

 

b.      Given a pair of setter and getter methods, what is a reasonable and general correctness criterion for these methods? To put it differently, how would you test setters and getters; how would incorrectness be revealed in a simple test?

 

c.       Consider a class for a counter with an integer for the count as state; the count is initialized to 0 upon construction, it can be read by a getter, and it can be incremented by a corresponding method. Why may it be desirable to exclude the setter for the count?

 

d.      Can you provide an example that fits the following scenario? (i) The initial state of an object is described by a parameterized constructor that lists all required components of the object. (ii) One required component is purposely read-only, i.e., there is a getter but no setter. (iii) All other components (required or optional) come with both a getter and a setter.

 

e.      Can you come up with (and defend) some scenario of object modeling where information hiding may be irrelevant, where no proper encapsulation is needed? Again, the extreme OO follower may refuse to consider this option, but let’s try, just for the sake of it.

 

f.        Consider the following legal Java class:

 

class C {

    private C c;

    public void setC(C c) { c.c = this.c; }

}

 

That is, the body of the setC method exercises access to “private” state of its argument. Here we note that the “private” modifier is the most restrictive access modifier in Java. Can you think of an even more restrictive mode that would illegalize the above program, but keep intact other permissions associated with “private”? Describe the language rules for this new “very-private” access modifier.

 

g.       Consider the following legal Java class:


public class C {

    private C() { }

}

 

This class shows that the default constructor can be effectively removed from the interface of a class. How can this capability be helpful in enforcing a useful, initial state of an object and in minimizing the number of setters to be exposed? Provide an example.

 

2.      Class inheritance

 

a.       Consider the following illegal attempt to privatize a method in Java:

public class C {

    private boolean equals(Object o) {

        return super.equals(o);

    }

}

 

Can you defend this language rule?

 

b.      A class with abstract methods is necessarily abstract. Can you provide a compelling scenario for a class that has only non-abstract methods, but is labeled abstract voluntarily by the programmer?

 

c.       Does it make sense for a concrete class to have an abstract subclass?

 

d.      Consider the following illegal Java classes:

 

public class C extends D { }

public class D extends C { }

 

Can you argue why cyclic class inheritance may be undesirable? Here is some elaboration of this question. It is a known fact that inheritance can be “designed away” by means of object composition. Suddenly, the following program is legal:

 

public class C { public D d; }

public class D { public C c; }

 

“How come” object composition works, but inheritance fails?

 

e.      Consider the following two Java classes:

 

public class C {

    protected int x;

}

 

public class D {

    private C c;

    public D() { c = new C(); }

    public int getX() {  return c.x; }

    public void setX(int x) {  c.x = x; }

}

 

Now assume that these classes are in different Java packages. In this case, the methods of class D cannot access the protected member x of class C. (Hence, the program turns out to be illegal.) Can you advice a solution to this access insufficiency? It is required that C and D remain in different packages, and class C is kept unchanged; ditto for the package of C. Hint: we may subclass C; a good name for the new class may be BreakEncapsulation.

 

f.        Consider the following illegal Java class:

public abstract final class C { };

Why is it reasonable to disallow this sort of class?

 

g.       One may think that the “super” construct for invoking methods of the superclass may not be strictly necessary, as far as expressiveness is concerned. That is, one may hope to “copy and paste” the superclass implementation to the derived class. Does this trick always work?

 

3.      Interfaces

 

a.       Java provides neither protected nor private interface members. Why is this reasonable?

 

b.      Suppose you want to replace certain uses of Java arrays by different data structures, possibly even by database access. As a first step, provide an interface that captures the essential operations on Java arrays including indexed access. Another operation is construction parameterized by the number of elements in the array. Can you add a constructor to the Java interface? How could a constructor possibly make sense in an interface?

 

c.       Describe a design scenario (in terms of concepts, attributes, behavior, generalizations, specializations) that clearly benefits from the use of an interface, when compared to class inheritance? What are the precise drawbacks that are encountered when trying to survive with class inheritance alone?

 

d.      Describe a design scenario that clearly benefits from the use of class inheritance as opposed to solely using interfaces for generalization? What are the precise drawbacks that are encountered when trying to avoid class inheritance?

e.      Argue why multiple interface inheritance is less bothersome than multiple class inheritance. Refer to the so-called diamond shape of inheritance where a given class inherits indirectly twice from the same base class.

 

f.        Consider the following legal Java program:

 

public interface I { }

public class C implements I { }

public class D { }

 

In particular, one may have empty interfaces, and one may have classes (empty or nonempty) that implement such interfaces. In your wildest dreams, can you think of any usage scenario for empty interfaces in actual programming or modeling?

 

g.       When you place a type bound on a type parameter (for generics in Java 1.5) then you use the “extends” keyword (such as “T extends Number”), no matter whether you mean to say that the actual type must be a class that “implements” the interface of the bound, or a class that indeed “extends” the class of the bound. Suppose you are part of the Java language team and someone complains to you that more clarity is desirable, i.e., programmers should be able to use both “implements” and “extends” for the sake of precision. Now suppose you don’t want to change Java. Can you come up with an argument such that the sole use of “extends” may be accepted by the petitioner?

 

4.      Modeling versus programming

 

a.       From an UML modeling perspective, what are Java enumeration types good for? That is, what sort of UML diagrams can benefit from enumeration types when implementing them in Java?

 

b.      Consider the concepts of residences and residents. We assume a bi-directional association such that a resident may have multiple residences and a residence may be occupied by multiple residents. The classes for residents and residences will need to provide methods for registration and deregistration. What can possibly go wrong in a Java implementation of this association? That is, what programming errors are you prepared to avoid?

 

c.       Suppose the abovementioned relationship is annotated with cardinality such that each resident has at least one residence. If we assume that this constraint is to be enforced stringently, then it will also affect the construction protocol for residents. How?

 

d.      What sorts of generalizations/specializations are directly supported by UML but require encoding effort by contemporary Java? Identify a sample scenario and sketch a possible encoding.

 

e.      Consider aggregation (or composition) in the sense of “part-of” relationships. Which form of cloning (shallow vs. deep) is appropriate for the compositor class (i.e., the class that represents the whole)? Defend your choice.

 

f.        What rigor is needed in a UML sequence diagram so that it can be turned rather mechanically into a “test case” in the sense of a method body that exercises the scenario?

 

g.       Can you sketch an object model (i.e., class diagram) of basic Nassi-Shneiderman diagrams? That is, devise classes and associations and relationships for the building blocks and the forms of composition in Nassi-Shneiderman diagrams. How is it apparent from the obtained object model that these diagrams are restricted, by definition, to structured programming without “spaghetti capability”? Argue in what sense an object model of flowcharts would fail to possess this desirable property.

 

5.      Predefined types

 

a.       Why is it reasonable that complex numbers are not defined as a primitive type in Java? Please go beyond the universal argument that complex numbers could be derived from existing primitive types, and the implementation could be wrapped up by a user-defined class.

 

b.      Numeric functionality must carefully observe the precision of numeric types as well the out-of-range behavior and the associativity for numeric operations. Provide one test case that shows that associativity of say “*” for doubles matters, i.e., the result of a given computation differs when the default associativity of multiplication is overridden by parentheses.

 

c.       Consider the following loop, which determines an approximation of the machine epsilon for Java’s floating-point arithmetic, i.e., (an approximation of) the difference between 1 and the smallest exactly representable number greater than one:

 

float machEps = 1.0f;

do {

machEps /= 2.0f;

}

while ((float)(1.0 + (machEps/2.0)) != 1.0);

System.out.println(machEps);

 

Can you identify a law that must hold for the involved division operation so that this loop terminates? (The fact that precision is finite does not imply termination just by itself.) How would you use the found (approximation of) the machine epsilon in a test that determines whether given floating-point numbers a, b, c, which are assumed to represent the length of the sides in a triangle, are related by the Pythagorean theorem, i.e., whether c2 = a2 + b2 (modulo permutation of a,b,c)?  (Be pragmatic! A rigorous approach to limiting numerical errors is beyond the scope of this question.)

 

d.      Built-in Java arrays provide a bounded collection type, i.e., the number of elements in the collection must be fixed at the time of constructing the collection object. Can you think of a way to provide an unbounded collection type (presumably a new class) whose implementation nevertheless leverages Java arrays for storing the elements of the collection? (In fact, collection classes like Vector and ArrayList exploit the idea that you need to recover.)

 

e.      Why is the length of a Java string defined by a method as opposed to a constant? Why is the length of a Java array defined by a constant as opposed to a method?

 

f.        The Java class Math defines a number of common operations for numeric types. For instance, there is an operation abs (for absolute value). Why are these operations defined as static methods (as opposed to regular methods)?

 

g.       Consider the following Java code; the test case completes fine:

 

Integer i1 = new Integer(42);

Integer i2 = new Integer(42);

String s1 = new String("42");

String s2 = new String("42");

assertTrue(i1!=i2);

assertTrue(s1!=s2);

s1 = s1.intern();

s2 = s2.intern();

assertTrue(s1==s2);


This code basically illustrates that each construction of an Integer object (by wrapping an int) leads unsurprisingly to a new object that is different from any object that possibly wraps the same int. In fact, the situation seems to be very similar for the case of String objects, except that the intern method allows us to map strings to a canonical representation. Here are two questions: (i) What would it take to provide an intern method for Integer objects? (ii) Why is it arguably less beneficial to try working with canonical Integer objects (when compared to the string case)?

 

6.       Implementation of algorithms

a.       One corollary of the Böhm/Jacopini Theorem is that for any possible flowchart, it is sufficient to use sequential composition of statements, if-then-else selection and pre-test iteration in order to compose a functionally equivalent flowchart. In fact, one while loop is sufficient (as opposed to any number of loops). Transform the following program accordingly:

 

int k = 1;

while (k < 10) {

int p = 1;

while (p < 10) {

System.out.println("k:" + k + " p:" + p);

p++;

}

k++;

}

 

b.      What property of “&&” is essential for the following program to work?

 

String[] x = {"0","1","2","3"};

int i;

for (i = 0; i < 4 && x[i] != null; i++) {}

if (i < 4) x[i] = "42";

 

c.       Suppose “&&” would not be defined by the Java language. Would you be able to define this operator as a (static) method? Please make an attempt and report on your findings.

 

d.      Transform the above program so that you do not rely on the property in question.

 

e.      How would you implement an infinite precision type for natural numbers including simple arithmetic operations as well as conversions to and from int? Provide two solutions: (i) a solution that is strikingly simple in terms of conceptual clarity and thereby easy to prove correct; (ii) a solution that makes an effort to achieve a small memory footprint for huge natural numbers. Both solutions should share an interface.

 

f.        Consider the following program for binary trees; nodes are labeled by strings; there is a left and a right subtree; the method member tests whether a given string occurs as a label in this. Thus:

 

public class BinTree

{

    public String label;

    public BinTree left;

    public BinTree right;

    public boolean member(String s) {

        return

            (label != null && label.equals(s))

         || (left != null && left.member(s))

         || (right != null && right.member(s));

    }

}

 

Why are the non-null tests essential for the design at hand? Propose a revision of the BinTree class so that these non-null tests can be avoided. Hints: (i) you may want to introduce an extra class; (ii) you may need to make some assumptions (and possibly enforce them) regarding the non-null properties of method and constructor arguments.

 

g.       The member method in the above BinTree class is recursively defined. How can you resort to iteration instead? It is too easy to see that you may want to use a stack object to keep track of nodes, as you are walking down the tree, so that you can return to these nodes later on. So let’s find a better question. Can you propose a modest revision of BinTree such that no object whatsoever needs to be created when the member method is executed? Several solutions come to mind; make sure to pick one that delivers a reentrant member method, i.e., a method that allows for concurrent executions of the method.

IEEE TSE Special Issue on Software Language Engineering

CALL FOR PAPERS
IEEE Transactions on Software Engineering
Special Issue on Software Language Engineering
May-June Issue of 2009


IEEE Transactions on Software Engineering seeks original manuscripts for a Special Issue on Software Language Engineering scheduled to appear in the May-June issue of 2009.

The term ‘software language’ comprises all sorts of artificial languages that are used in software-intensive systems and software engineering including programming languages, domain-specific languages, data models, ontologies, and modeling languages. Not always are these languages explicitly described (by grammars, schemas, models, or otherwise); sometimes they are ‘hidden’ in code, coding conventions, or design patterns. If Software Engineering is “the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software” [IEEE Std. Glossary, 1990], then Software Language Engineering (SLE) is the application of a systematic, disciplined, quantifiable approach to the development, use, and maintenance of languages in software engineering. SLE is particularly concerned with (i) the life cycle of software languages including design, implementation, documentation, testing, deployment, evolution, recovery, and retirement; (ii) the treatment of language descriptions as software artifacts, akin to programs, subject to tailored engineering principles and methods such as modularization, refactoring, refinement, composition, versioning, and analysis; (iii) programming support and engineering principles for transformations (or mappings, translations, conversions) between different software languages or different manifestations of the same software language such as those in X/O/R mapping; (iv) the management of coupling and cohesion in software systems caused by the invasive use of languages; (v) consistency management for the uses of languages in software systems; (vi) language integration for interrelated uses of software languages.

 

We solicit high-quality contributions with consolidated and thoroughly evaluated research results in the area of SLE that are worthy of archival publication in the Transactions on Software Engineering. These are the topics of interest:

-          The fundamental SLE concerns (i) – (vi) listed above.

-          Language engineering-related aspects of the following paradigms and application domains:

o    Domain-specific languages – infrastructure, design methods, execution platforms.

o    Programming platforms – API evolution, extensible type systems, IDE support.

o    Web Engineering – schema-based generators, ontology-based annotation.

o    Service Oriented Architectures – platforms, frameworks, service and policy languages.

o    Model-driven development – meta-models, model transformations, and round-tripping.

 

Submitted articles must not have been previously published or currently submitted for journal publication elsewhere. As an author, you are responsible for understanding and adhering to our submission guidelines. You can access them at http://www.computer.org/mc/tse/author.htm. Please thoroughly read these before submitting your manuscript.