Quibbling Over Semicolons

Quibbling Over Semicolons

Rate This
  • Comments 10

In a comment to my last entry Dan Shappir said:

In C/C++/Java/C# statements must end with a semicolon. JavaScript OTOH allows the use of a newline as a statement separator. Presumably this was done so that it would be easier to use for scripters. Personally, I hate the ambiguities it generates.

I'm with you Dan -- I find it irritating as well.  But we're stuck with it now.

Let me clarify a bit what is actually going on here. Thinking that a newline is a legal statement separator in JScript is actually not the best way to think about this language feature. A better way to think about it is that JScript requires some statements to end in a semicolon but the parser will automatically insert missing semicolons. The net result of both statements is the same, but I find it easier to think about semicolons as required and the compiler as automatically fixing some mistakes.

First off, what statements require semicolons? All the following require a semicolon:

  • an empty statement - a single semicolon is a legal statement
  • a var declaration
  • an expression evaluation statement
  • a do-while loop
  • a continue statement
  • a break statement
  • a return statement
  • a throw statement

The automatic semicolon inserter scans the program looking for places that semicolons are required but missing. It automatically inserts the semicolon provided that these three conditions are met:

  • The missing semicolon goes before a newline, a right-curly-brace, or the end of the program.
  • Adding a missing semicolon does not create an empty statement.
  • Adding a missing semicolon does not screw up the arguments to the "for" loop. (See the spec for the exact details.)

This leads to a few bizarre situations, because programs like these are now hard to parse:

a
++
b

Is that a++; b; or a; ++b; ?

To disambiguate this, JScript restricts where newlines can go. You can't put a newline in the middle of a ++, --, return, throw, break or continue. (Recall that JScript supports labelled break and continue.)

For example, the automatic semicolon inserter turns this:

return
a++

into

return;
a++;

Which means that

if (b) return
M()

means

if (b) return;
M();

and not

if (b) return M();

Which is great, unless of course the latter is what you wanted.

Auto semi insertion can bite you in many ways. Consider for example:

Number.prototype.blah = function(){ /* whatever */ }
var d = 1, e = 2
var a = d * e
(d + e).blah()

Auto semicolon insertion turns that into

Number.prototype.blah = function(){ /* whatever */ };
var d = 1, e = 2;
var a = d * e // no semi!
(d + e).blah();

because of course var a = d * e(d + e).blah(); is perfectly legal.  Nonsensical at runtime but syntactically legal.

Dan brings up another interesting case in the comments to this post; what about {foo:bar()} ? Is this a statement that consists of just an anonymous object expression with a single member foo equal to the value returned by bar(), so {foo:bar()};? Or is this a code block with a labeled call to bar(), so {foo:bar();}?

It is the latter; the specification already makes it illegal to have a statement which consists of a single expression that begins with a left curly, because obviously allowind that would make it difficult to disambiguate between a block of statements and an anonymous object. The automatic semicolon inserter never even considers the possibility that this might be an illegal program that is also missing semicolons.

My advice is to use semicolons rather than relying upon the crazy rules for the auto semi inserter.

  • Completely agree
    don't be lazy :)
  • What was the reasoning behind allowing devs to not use semis in the first place?
  • Do JScript and JavaScript (ECMAScript, perhaps) differ in this at all?
  • The ECMA spec mandates auto semi insertion, and to my knowledge, JScript and JavaScript both implement it consistently and according to the spec.
  • Here is another example of an ambiguity stemming from this "feature" (and also from the overloaded meaning of the curly brackets):

    { foo: bar() }

    Is this an anonymous object, with a single member foo equal to the value returned by bar()? Or maybe this is a code block with a labeled call to bar()? How should JScript (or ECMAScript) behave if it encounters a file with just this text?

    I would have much preferred mandatory semi-colons to such ambiguities. Indeed, I would have been in favor of a mandatory use of var to declare variables (even members). These are two examples where trying to make something simpler ends up making it more complex.

    Dan (a reader)
  • > How should ECMAScript behave if it encounters a file with just this text?

    I see what you're getting at, but actually, this has little to do with auto semi insertion. Basically you're asking should the auto semi inserter do this

    { foo : bar() }; // object literal

    or this

    { foo : bar(); } // labeled statement in a block

    ?

    But in ECMAScript the former isn't even grammatical, so the auto semi inserter will never produce it! The parser never even considers that it might be an object literal.

    This is not ambiguous because it is simply not grammatical to have a statement consisting of a single expression that begins with a left curly. See section 12.4 of ECMA 262 Revision 3 for the details.
  • Javascript is a dynamic language . And that's a problem. On the one hand it allows you virtually

  • Python seems to get along fine without them.

  • read the emca.  only then can you make a comment that is sensible.

    however, i expect most to make stupid comments....so whale away!

Page 1 of 1 (10 items)