Compiled, interpreted, whatever

Compiled, interpreted, whatever

  • Comments 3

Every now and then, people talk about "compiled" versus "interpreted" languages, and how they are different. Whilst there are obvious technical differences between the two, the degree to which the end-user / developer can tell the difference is mostly a function of the language design and implementation. This topic came up yet again on the JScript .NET newsgroup (note: the indentation / colour coding on the thread is not entirely correct...) which I now unfortunately frequent quite rarely, and rather than send a long reply there, I thought I'd send it here ;-)

Now, I'm not strictly speaking a compiler guy, and I never took compiler courses in college nor had any deep interest in all the theoretical stuff, so sue me if I get something wrong (on second thoughts, please don't!) Nevertheless, I am interested in language (human or otherwise) and was the PM for JScript / JScript .NET at Microsoft for several years, where I had the privilege to work with some very talented people working on the compiler and runtime which blurred the distinction between the two concepts. Tyson would know much much much more about this stuff, since he actually worked on a team at Melbourne University that implemented a really weird language <g> called Mercury, which even runs on the CLR!

 

So what's the difference between compiled and interpreted languages? JScript .NET is 100% compiled and yet has an "error recovery" mode that can deal with most compile time errors (including horrible syntax errors) and still produce correct output, for some reasonably low values of "correct". Check the VsaDemo sample on GotDotNet and play with it for a bit. Paste in some VB code, check the "Always run" box, and see what you get out of it ;-). (Note: you may need to move the "txtWhatever.MultiLine = true" statements around in the source files and recompile if you are running on the "Everett" version of the CLR...)

 

Anyway, what most people tend to be talking about, I think, is essentially how strict the language is. Interpreted languages tend to be less strict, often out of necessity -- you can't detect that a program is well formed if you haven't even 'seen' the whole program yet -- and often out of design -- trading performance and robustness for developer flexibility. Compiled languages tend to be more strict, mostly because people want them that way, but also because it is much easier to write a strict language compiler than it is to write a loose language compiler / runtime. Sometimes other things come into play, for example speed of compilation. When you are building large programs, the time it takes to compile them actually matters. You can write a much faster compiler if you fail early and fail often. For example, the C# compiler did (and may still do; not sure) very little error recovery. If you had something outside of a class -- anything -- it gave you a generic "only using statements and class declarations are allowed". If you missed a semi-colon or closing brace, it got very upset. It takes time and effort to take care of these things, and that costs CPU cycles.

Another thing people confuse is syntax errors ("compiler errors") and type errors. Generally speaking, a compiler error is something that is wrong with the grammar of the language --  caused by things such as missing out a brace, forgetting to close a comment block, or copying-and-pasting code from Office after it has "Smart"-erised your quotes -- that stops the compiler from even figuring out the rest of the program. If it doesn't know where one statement ends and the next begins, or if it sees a weird character that it has not been told how to handle, it gets very confused in deed. Muchasifitypethiswholesentenceinonewordandyoucanttellwhereonewordendsandthenextstartsexceptforthefactthatyouareaverycleverhumanbeingandcanactuallyreadthissentencewithlittleornotroubleatall. (Word says "Fragment (consider revising)" hehehe).

Most errors though are actually type errors -- the developer is trying to use a value in a way that is not appropriate for values of a given type, for example trying to multiply a string by a boolean (egad!). Whilst some classes of compiler errors will break even interpreters -- if you feed complete gobbledegook (wow, that's in Word's dictionary!) into them, they'll get confused -- other classes of compiler errors may or may not break interpreters depending on how they work. For example, if I'm interpreting a program line-by-line, it doesn't really matter what the next line looks like until I get there, and depending on the instructions in the current line, I may never get to the next line. Conversely, if I hit an if statement and it evaluates to false, even an interpreter must start looking forward to find the else or end if statement, although it may be able to look only at (eg) new lines and therefore treat the contents of the lines themselves as opaque blobs, thereby bypassing any syntactic errors in the lines themselves.

Type errors, on the other hand, don't stop the compiler from looking at the rest of the program, but they do generally cause some kind of warning (or error) because, being a helpful bit of software, the compiler likes to tell you when you are about to shoot yourself in the foot. The compiler generally knows what operations are valid on any given variable, and if you try and perform an invalid operation it will let you know with some kind of "type mismatch" or "no such member" or "operation not allowed" error. In general, programmers want type errors to be caught as soon as possible (ie, at compile time, or even at "authoring time" if you have a cool enough editor) because this makes them more productive and cuts down entire classes of bugs that are tricky to find at runtime (just ask any script programmer!). But they also make it tedious for programmers writing simple programs, because they must think upfront and tell the computer exactly what they are going to use a value for ("name is a string", "count is a number") even they really should just be obvious, right? It also means you can't use the same variable to hold a string, a number, and a date at different times in the program; you must have three separate variables to satisfy the compiler.

Since JScript classic has no way for the developer to express the desired type of a variable, it is impossible for the compiler to determine at compile-time whether or not the operations you want to perform on variables are valid or not. Furthermore, because JScript allows the dynamic modification of types via prototype chains, it wouldn't be possible to do this (in general) even if the user could give type information. For example:

  "Hello, world".getFullYear()

looks like it should cause a compiler error, since "Hello, world" is obviously a string, and getFullYear is a method on the Date object, not the String object, but in fact it will succeed if somewhere else in the program I have something like:

  function String.prototype.getFullYear()

  {

    return "But I'm not a Date object!"

  }

Couple this with the fact that JScript almost certainly (90% sure, but I didn't write the code) generalises operations on literals to be the same as operations on arbitrary expressions, and the fact that it will perform conversions between numbers, strings, etc. quite liberally at runtime, and you get the fact that you can't give decent errors at compile time with Jscript.

JScript .NET does much better at this, even in the absence of type information, due to its type inferencing capabilities. For example, it will give a compiler error for this code:

  function foo()

  {

    var s = "Hello"

    print(s.bar())

  }

because it knows that s must contain a string (it is a local variable that is only ever assigned to once), and it knows that strings do not have a bar method. (Unlike the prior getFullYear example, JScript .NET supports a "fast" mode that will disallow such modifications of the object model, precisely so the compiler can reason more usefully about the program and give you such errors).   

As a broad rule of thumb, "interpreted" languages tend to be better for writing small programs, whilst "compiled" languages tend to be better for writing large programs. Developing with "interpreted" languages tends to be faster for small projects, or projects where you might be building and testing in small increments. For example, if I have a 100 line program but I'm only interested in running 20 lines of it, it shouldn't matter whether the remaining 80 lines all have "compiler errors" or not -- I should just be able to run those 20 lines. "Interpreted" languages let you do this, whilst "compiled" languages do not. But you'd never write Windows or Office in an interpreted language, not only for performance reasons, but also because it would be impossible to debug. (I use the words in "quotes" because, as I mentioned above, these are generalisations, and languages such as JScript .NET are fully compiled and yet still have properties such as the one I've just described).

Anyway, I've talked enough rubbish for one day. I need to go to sleep so I can get up for another day of filming tomorrow [looks at watch] errrr today. Yup, I like to crew on independent films in my spare time (and was even roped into some bit-part acting today - wow, that was unexpected!)

  • > Couple this with the fact that JScript almost certainly (90% sure, but I didn't write the code) generalises operations on literals to be the same as operations on arbitrary expressions Correct. In theory, the interpreter should take the literal, create a new in-memory object wrapper around the value, and then call the appropriate prototype method on the wrapper, and then destroy the wrapper. In practice, the engine is a little smarter than that -- we detect this situation at runtime and dispatch to the appropriate method without actually doing the work of creating an in-memory item. Eric
  • > Every now and then, people talk about "compiled" versus "interpreted" languages, and how they are different. This distinction is both arbitrary and vague, and indeed, JScript Classic blurs the line considerably. JScript Classic acts like a compiled language in the sense that before any JScript Classic program runs, we fully syntax check the code, generate a full parse tree, and generate a bytecode. We then run the bytecode through a bytecode interpreter. In that sense, JScript is every bit as "compiled" as Java. The difference is that JScript does not allow you to persist or examine our proprietary bytecode. Also, the bytecode is much higher-level than the JVM bytecode -- the JScript Classic bytecode language is little more than a linearization of the parse tree, whereas the JVM bytecode is clearly intended to operate on a low-level stack machine. JScript .NET is even harder to characterize as "compiled" or "interpreted". Like JScript Classic, the JScript .NET compiler produces bytecode, but this time it is Common Language Runtime Intermediate Language (IL) bytecode. Instead of interpreting the bytecode, the CLR JIT-compiles it into native machine language. But that's not all -- to implement features like "eval", JScript .NET also provides the ability to generate a parse tree and then run an interpreter _directly_ on the parse tree. In JScript .NET, calling "eval" does not generate a byte code which is then interpreted, it generates a parse tree which can interpret itself. "Compiled" and "interpreted" have ceased to be useful buckets for categorizing programming languages. There are some "interpreted" languages which are faster than some "compiled" languages, there are some "compiled" languages which are more flexible than some "interpreted" languages -- at some point you have to look at the actual relevant characteristics of each tool rather than trying to summarize them as "just another interpreted language" or "just another compiled language". Eric
  • Just to clear my innocenseI would like to ask eveyone to provide example of compiled language and interpreted language, and why you say so.

Page 1 of 1 (3 items)