Stuart Kent - Building developer tools at Microsoft - @sjhkent
I typed this entry a few days ago, but then managed to lose it through a set of circumstances I'm too embarrassed to tell you about. It's always better second time around in any case.
Anyway, reading this recent post from Simon Johnston prompted a few thoughts that I'd like to share. In summary, Simon likens UML to a craftsman's toolbox, that in the hand of a skilled craftsman can produce fine results. He then contrasts this with the domain specific language approach and software factories, suggesting that developers are all going to be turned into production-line workers - no more craftsman. The argument goes something like this: developers become specialized in a software factory to work on only one aspect of the product through a single, narrowly focussed domain specific language (DSL); they do their work in silos, without any awareness of what the others are doing; this may increase productivity of the individual developers, but lead to a less coherent solution.
Well, I presume this is a veiled reference to the recent book on Software Factories, written by Jack Greenfield and Keith Short, architects in my product group at Microsoft, and to which I contributed a couple of chapters with Steve Cook. The characterization of sofware factories suggested by Simon is at best an over-simplification of the vision presented in this book.
I trained as a mathematician. When constructing a proof in mathematics there are two approaches. Go back to the original definitions, the first principles, and work out your proof from there; or build on top of theorems already proven by others. The advantage of the first approach is that all you have to learn is the first principles and then you can set your hand to anything. The problem, is that it will take you a very long to time to prove all but the simplest theorems, and you'll continually be treading over ground you've trod many times before. The problem with the second approach is that you have to learn a lot more, including new notations (dare I say DSLs) and inevitably end up becoming a specialist in a particular branch of the subject; but in that area you'll be a lot more productive. And it is not unknown for different areas of mathematics to combine to prove some of the more sophisticated theorems.
With software factories we're saying that to become more productive we need to get more domain specific so that we can provide more focused tooling that cuts out the grunt work and let's us get on with the more challenging and exciting parts of the job. As with mathematics, the ability to invent domain specific notations, and, in our case, the automated tools to support them, is critical to this enterprise. And sophisticated factories (that is, most of them) will combine expertise from different domains, both horizontal and vertical, to get the job done, just as different branches of mathematics can combine to tackle tricky problems.
So our vision of software factories is closer to the desirable situation described by Simon towards the end of his article, where he talks about the need for a "coherent set of views into the problem". Each DSL looks at the problem, the software system being built or maintained, from a particular perspective. These perspectives need to be combined with the other views to give a complete picture. If developers specialize in one perspective or another, then so be it, but that doesn't mean that they can sit in silos and not communicate with the others in the team. There are always overlaps between views and work done by one will impact the work of another. But, having more specialized tooling should avoid a lot of error-prone grunt work, and will make the team as a whole far more productive as a result.
So what about UML in all this? To return to Simon's toolbox analogy (and slightly toungue-in-cheek) UML is like having a single hand drill in the toolbox, which we've got to try and use to drill all sizes of hole (for large holes you drill a number of small holes close together), and in all kinds of material; some materials you won't be able to drill into at all. DSLs, on the other hand, is like having a toolbox full of drill bits of all different sizes, each designed to drill into a particular material. And in a software factory, you support your DSLs with integrated tooling, which is like providing the electric hammer-drill: you'll be a lot more productive with these specialist tools, and even do things you couldn't manage before, like drill holes in concrete.
So I don't see UML as a central part of the software factory/DSL story. I see it first and foremost as a language for (sketching) the design of object-oriented programs - at least this is its history and its primary use to date. Later versions of UML, in particular the upcoming UML 2, have tried to extend its reach by adding to the bag of notations that it includes. At best, this bag is useful inspiration in the development of some DSLs, but I doubt very much that they'll get used exactly as specified in the standard - as far as conformance against the standard can be checked that is...
I read on Jim Steel's blog that back in August there was lots of discussion on OMG lists about what makes a compliant MDA tool. I followed a link from his blog to this entry, and there were three aspects that intrigued me.
I used the phrase 'premature standardization' in an earlier post today. I'm rather pleased with it, as it is a crisp expression of something that has vexed me for some time, namely the tendency of standards efforts in the software space to transform themselves into a research effort of the worst kind - one run by a committee. I have certainly observed this first hand, where what seemed to be happening was not standardization of technologies that existed and were proven, but instead paper designs for technology that might be useful in the future. Of course, then I was an academic researcher so was quite happy to contribute, with the hope that my ideas would have a better chance of seeing the light of day as part of an industry standard than being buried deep in a research paper. I also valued the exposure to the concentration of clever and experienced people from the industry sector. But now, as someone from that sector developing products and worrying everyday about whether those products are going to solve those difficult and real problems for our customers, I do wonder about the value of trying to standardize something which hasn't been tried and tested in the field, and, in some cases not even prototyped. To my mind, efforts should be made to standardize a technology when:
Even if all these tests come up positive, it is rarely necessary to standardize all aspects of the technology, just that part which is preventing the competing technologies to interoperate: a square plug really will not fit in a round hole, so my French electrical appliance can not be used in the UK, unless of course I use an adaptor...
If we apply the above tests to technologies for the development of DSLs, I'd say that we currently fail at least two of them. Which means that IMHO standardization of metamodelling and model transformation technologies is premature. We need a lot more innovation, a lot more tools, and, above all, many more customer testimonials that this stuff 'does what it says on the tin'.
Two posts in the same day. I guess I'm making up for the two month gap.
Anyway, Alan Wills and I are giving a tutorial at the UML conference in Lisbon in October. The tutorial is called "How to design and use Domain Specific Modeling Languages" and is on Tuesday 12th October in the afternoon. We promise you not much presentation, interesting exercises and lots of discussion and reflection.
Let's just recap where I've got to so far on the theme of modelling languages and tools. I started out with a reaction to an article by Grady Booch on DSL's, in particular why UML is not really the right tool for the job if this is the direction you want to go. I then talked about code generation, my thoughts prompted by an interesting article by Dan Haywood. Then, in the third entry, I talked about designing a visual language (strictly we should say pictorial or graphical language, as a textual language is also visual), focusing on the difference between designing one on paper and one to be used in a tool.
So what next? Well I'd like to return to the topic of DSLs, in particular try to pin down what is meant by the term 'domain specific language', why we need them, and how we can make it easier to build them. As I seem to be incapable of writing short entries, I've hived off the main content to a separate article.
A domain specific language is a language that's tuned to describing aspects of the chosen domain. Any language can be domain specific, provided you are able to identify the domain it is specific to and demonstrate that it is tuned to describe aspects of that domain. C# is a language specific to the (rather broad) domain of OO software. Its not a DSL for writing insurance systems, though. You could use it to write the software for an insurance system, but it's not exactly tuned to that domain.
So what is meant by the term 'domain'?. A common way to think about domains is to categorize them according to whether they are horizontal or vertical. Vertical domains include, for example: insurance systems, telephone billing systems, aircraft control systems, and so on. Horizontal domains include, for example, the bands in the classic waterfall method: requirements analysis, specification, design, implementation, deployment. New domains emerge by intersecting verticals and horizontals. So, for example, there is the domain of telephone billing systems implementation, which could have a matching DSL for programming telephone billing systems.
Domains can be broad or narrow, where broad ones can be further subdivided into narrow ones. So one can talk about the domain of real-time systems, with one sub-domain being aircraft control systems. Or the domain of web-based systems for conducting business over the internet, with a sub-domain being those particular to insurance versus another sub-domain of those dealing in electrical goods, say. And domains may overlap. For example, the domain of airport baggage control systems includes elements of real-time systems (the conveyer belts etc. that help deliver the luggage from the check-in desks to the aircraft) and database systems (to make a record of all the luggage checked in, its weight and who it belongs to, etc.).
So there are lots of domains. But is it necessary to have a language specific to each of them? Couldn't we just identify a small number of general purpose languages that cover the broad domains, and just use those for the sub-domains as well?
What we notice in this approach is that users demand general purpose languages that have extensibility mechanisms which allow the base language to be customized to narrower domains. There's always a desire to identify domain specific abstractions, because the right abstractions can help separate out the things that vary between systems in a domain and things that are common between them: you then only have to worry about the things that vary when defining systems in that domain.
Two extensibility mechanisms in common use today are:
These mechanisms take you so far, but do not exactly deliver customized languages that intuitively capture those domain specific abstractions - the problem is that the base language gets in the way. Using OO code frameworks is not exactly easy: it requires you to understand all or most of the mechanisms of the base language; then, although you get clues from the names of classes, methods and properties on where the extension points are, there is no substitute for good documentation, a raft of samples and understanding the framework architecture (patterns used and so on). Stereotypes and tagged values in UML are powerful in that you can decorate a model with virtually any data you like, but that data is generally unstructured and untyped, and often the intended meaning takes you a long way from the meaning of the language as described in the standard. Neither OO framework mechanisms or UML extensibility mechanisms, allow you to customize the concrete notation of the language, though some UML tools allow stereotypes to be identified with bitmaps that can be used to decorate the graphical notation.
Instead of defining extensibility mechanisms in the language, why not just open up the tools used to define languages in the first place, either to customize an existing language or create a new one?
Well, it could be argued that designing languages is hard, and tooling them (especially programming languages) even harder. And the tools used to support the language design process can only be used by experts. That probably is the case for programming languages, but I'm not sure it needs to be the case for (modelling) languages that might target other horizontal domains (e.g. design, requirements analysis, business modelling), where we are less interested in efficient, robust and secure execution of expressions in the language, and more interested in using them for communication, analysis and as input to transformations. Analysis may involve some execution, animation or simulation, but, as these models are not the deployed software, it doesn't have to be as efficient, robust or secure. Other forms of analysis include consistency checking with other models, possibly expressed in other DSLs, taking metrics from a model, and so on. Code generation is an obvious transformation that is performed on models, but equally one might translate models into other (non-code) models.
It could also be argued that having too many languages is a barrier to communication - too much to learn. I might be persuaded to agree with that statement, but only where the languages involved are targeted at the same domain and express the same concepts differently for no apparent reason (e.g. UML reduced the number of languages for sketching OO designs to one). Though it is worth pointing out that just having one language in a domain can lead to stagnation, and for domains where the languages and technologies are immature, inevitably there will be a plethora of different approaches until natural selection promotes the most viable ones - unless of course this process is interrupted by premature standardization :-). On the other hand, where a language is targeted on a broad domain, and then customized using its own extensibility mechanisms, the result carries a whole new layer of meaning (OO frameworks, stereotypes in UML), or even an entirely different meaning (some advanced uses of stereotypes). In the former case, there is a chance that someone who just understands the base language might be able to understand the extension without help; in the latter case, I'd argue that the use of the base language can actually hinder understanding, as it replaces the meaning of existing notation with something different.
Finally, whether we like it or not, people and organizations will continue to invent and use their own DSLs. Some of these may never ever get completed and will continue to evolve. Just look at the increasing use of XML to define DSLs to help automate the software development process - input to code generators, deployment scripts and so on. Yes, XML is a toolkit for defining DSLs; it's just that there are certain things missing: you can't define your own notation, certainly not a graphical one; the one you get is verbose; validation of well-formedness is weak.
Am I going to tell you what a toolkit for building domain specific modelling languages should look like? Soon I hope, but I've run out of time now. And I'm sure that some folks reading this will give feedback with pointers to their own kits.
One parting thought. In this entry, I have given the impression that you identify a domain and then define one or more languages to describe it. But perhaps it's the other way round: the language defines the domain…