C++/CLI specifies several keywords as extensions to ISO C++. The way they are handled falls into five major categories, where only the first impacts the meaning of existing ISO C++ programs.

1. Outright reserved words

As of this writing (November 22, 2003, the day after we released the candidate base document), C++/CLI is down to only three reserved words:

  gcnew   generic   nullptr

An existing program that uses these words as identifiers and wants to use C++/CLI would have to rename the identifiers. I'll return to these three again at the end.

All the other keywords, below, are contextual keywords that do not conflict with identifiers. Any legal ISO C++ program that already uses the names below as identifiers will continue to work as before; these keywords are not reserved words.

2. Spaced keywords

One implementation technique we are using is to specify some keywords that include embedded whitespace. These are safe: They can't possibly conflict with any user identifiers because no C++ program can create an identifier that contains whitespace characters. [I'll omit the obligatory reference to Bjarne's classic April Fool's joke article on the whitespace operator. :-) But what I'm saying here is true, not a joke.]

Currently these are:

  for each
  enum class/struct
  interface class/struct
  ref class/struct
  value class/struct

For example, "ref class" is a single token in the lexer, and programs that have a type or variable or namespace named ref are entirely unaffected. (Somewhat amazingly, even most macros named ref are unaffected and don't affect C++/CLI, unless coincidentally the next token in the macro's definition line happens to be class or struct; more on this near the end.)

3. Contextual keywords that can never appear where an identifier could appear

Another technique we used was to define some keywords that can only appear in positions in the language grammar where today nothing may appear. These too are safe: They can't conflict with any user identifiers because no identifiers could appear where the keyword appears, and vice versa. Currently these are:

  abstract    finally    in    override    sealed    where

For example, abstract as a C++/CLI keyword can only appear in a class definition after the class name and before the base class list, where nothing can appear today:

  ref class X abstract : B1, B2 { // ok, can only be the keyword
    int abstract;                 // ok, just another identifier
  };

  class abstract { };             // ok, just another identifier
  namespace abstract { /*...*/ }  // ok, just another identifier

4. Contextual keywords that can appear where an identifier could appear

Some keywords can appear in a grammar position where an identifier could also appear, and this is the case that needs some extra attention. There are currently five keywords in this category:

  delegate    event    initonly    literal    property

In such grammar positions, when the compiler encounters a token that is spelled the same as one of these keywords, the compiler can't know whether the token means the keyword or whether it means an identifier until it first does some further lookahead to consider later tokens. For example, consider the following inside a class scope:

  property int x;  // ok, here property is the contextual keyword
  property x;      // ok, if property is the name of a type

Now imagine you're a compiler: What do you do when you hit the token property as the first token of the next class member declaration? There's not enough information to decide for sure whether it's an identifier or a keyword without looking further ahead, and C++/CLI has to specify the decision procedure -- the rules for deciding whether it's a keyword or an identifier. As long as the user doesn't make a mistake (i.e., as long as it's a legal program with or without C++/CLI) the answer is clear, because there's no ambiguity.

But now the "quality of diagnostics" issue rears its head, in this category of contextual keywords and this category only: What if the user makes a mistake? For example:

  property x;      // error, if no type "property" exists

Let's say that we set up a disambiguation rule with the following general structure (I'll get specific in just a moment):

  1. Assume one case and try to parse what comes next that way.
  2. If that fails, then assume the other case and try again.
  3. If that fails, then issue a diagnostic.

In the case of property x; when there's no type in scope named property, both #1 and #2 will fail and the question is: When we get to the diagnostic in case #3, what error message is the user likely to see? The answer almost certainly is, a message that applies to the second "other" case. Why? Because the compiler already tried the first case, failed, backed up and tried the second "other" case -- and it's still in that latter mode with all that context when it finally realizes that didn't work either and now it has to issue the diagnostic. So by default, absent some (often prodigious) amount of extra work inside the compiler, the diagnostic that you'll get is the one that's easiest to give, namely the one for the case the compiler was most recently pursuing, namely the "other" case mentioned in #2 -- because the compiler already gave up on the first case, and went down the other path instead.

So let's get specific. Let's say that the rule we picked was:

  1. Assume that it's an identifier and try to parse it that way
     (i.e., by default assume no use of the keyword extension).
  2. If that fails, then assume that it's the keyword and try again.
  3. If that fails, then issue a diagnostic.

Under that rule, what's the diagnostic the user gets on an illegal declaration of property x;? One that's in the context of #2 (keyword), something like "illegal property declaration," perhaps with a "the type 'x' was not defined" or a "you forgot to specify the type for property 'x'" in there somewhere.

On the other hand, let's say that the rule we picked was:

  1. Assume that it's the keyword and try to parse it that way.
  2. If that fails, then assume that it's an identifier and try again.
  3. If that fails, then issue a diagnostic.

Under this rule, the diagnostic that's easy to give is something like "the type 'property' was not defined."

Which is better?

This illustrates why it's very important to consider common mistakes and whether the diagnostic the user will get really applies to what he was probably trying to do. In this case, it's probably better to emit something like "no type named 'property' exists" than "you forgot to specify a type for your property named 'x'" -- the former is more likely to address what the user was trying to do, and it also happens to preserve the diagnostics for ISO C++ programs.

More broadly, of course, there are other rules you can use than the two "try one way then try the other" variants shown above. But I hope this helps to give the flavor for the 'quality of diagnostics' problem.

  • Aside: There's usually no ambiguity in the case of property (or the other keywords in this category); the only case I know of where you could write legal C++/CLI code where one of these five keywords could be legally interpreted both ways, both as the keyword and as an identifier, is when the type has a global qualification. Here's an example courtesy of Mark Hall:

       initonly  ::  T  t;

    Is this a declaration of an initonly member t of type ::T (i.e, initonly  ::T  t;), or a declaration of a member t of type initonly::T (i.e, initonly::T  t; where if initonly is the name of a namespace or class then this is legal ISO C++). Our current thinking is to adopt the rule "if it can be an identifier, it is," and so this case would mean the latter, either always (even if there's no such type) or perhaps only if there is such a type.

I feel compelled to add that the collaboration and input over the past year-plus from Bjarne Stroustrup and the folks at EDG (Steve Adamczyk, John Spicer, and Daveed Vandevoorde) has been wonderful and invaluable in this regard specifically. It has really helped to have input from other experienced compiler writers, including in Bjarne's case the creator of the first C++ compiler and in EDG's case the folks who have one of the world's strongest current C++ compilers. On several occasions all of their input has helped get rid of inadvertent assumptions about "what's implementable" and "what's diagnosable" based on just VC++'s own compiler implementation and its source base. What's easy for one compiler implementation is not necessarily so for another, and it's been extremely useful to draw on the experience of comparing notes from two current popular ones to make sure that features can be implemented readily on various compiler architectures and source bases (not just VC++'s) and with quality user diagnostics.

5. Not keywords, but in a namespace scope

Finally, there are a few "namespaced" keywords. These make the most sense for pseudo-library features (ones that look and feel like library types/functions but really are special names known to the compiler because the compiler does special things when handling them). They appear in the stdcli namespace and are:

  array    interior_ptr    pin_ptr    safe_cast

That's it.


Now, for a moment let's go back to case #1, reserved words. Right now we're down to three reserved words. What would it take to get down to zero? Consider the cases:

  • nullptr: This has been proposed in WG21/J16 for C++0x, and at the last meeting three weeks ago the evolution working group (EWG) was favorable to it but wanted a few changes. The proposal paper was written by me and Bjarne, and we will revise the paper for the next meeting to reflect the EWG direction. If C++0x does adopt the proposal and chooses to take the keyword nullptr then the list of C++/CLI reserved words goes down to two and C++/CLI would just directly follow the C++0x design for nullptr, including any changes C++0x makes to it.
  • gcnew: One obvious way to avoid taking this as a reserved word would be to put it into bucket #1 as a spaced keyword, "gc new".
  • generic: Similarly, a spaced keyword (possibly "generic template") would avoid taking this reserved word. Unfortunately, spelling it "<anything> template" is not only ugly, but seriously misleading because a generic really is not at all a template.

Is it worth it to push all the way down to zero reserved words in C++/CLI? There are pros and cons to doing so, but I've certainly always been sympathetic to the goal of zero reserved words; Brandon and others will surely tell you of my stubborn campaigning to kill off reserved words (I think I've killed off over a half dozen already since I took the reins of this effort in January, but I haven't kept an exact body count).

I think the right time to decide whether to push for zero reserved words is probably near the end of the C++/CLI standards process (summer-ish 2004). At that point, when all other changes and refinements have been made and everything else is in its final form, we will have a complete (and I hope still very short) list of places where C++/CLI could change the meaning of an existing C++ program, and that will be the best time to consider them as a package and to make a decision whether to eliminate some or all of them in a drive-it-to-zero cleanup push. I am looking forward to seeing what the other participants in all C++ standards arenas, and the broader community, think is the right thing to do as we get there.

Putting it all together, what's the impact on a legal ISO C++ program? Only:

  • The (zero to three) reserved words, which we may get down to zero.
  • Macros with the same name as a contextual keyword, which ought to be rare because macros with all-lowercase names, never mind names that are common words, are already considered bad form and liable to break way more code than just C++/CLI. (For example, if a macro named event existed it would already be breaking most attempts to use Standard C++ iostreams, because the iostreams library has an enum named event.)

Let me illustrate the macro cases with two main examples that affect the spaced keywords:

  // Example 1: this has a different meaning in ISO C++ and C++/CLI
  #define interface struct

In ISO C++, this means change every instance of interface to struct. In C++/CLI, because "interface struct" is a single token, the macro means instead to change every instance of "interface struct" to nothing.

Here's the simplest workaround:

  // Workaround 1: this has the same meaning in both
  #define interface interface__
  #define interface__ struct

Here's another example of a macro that can change the meaning of a program in ISO C++ and C++/CLI:

  // Example 2: this has a different meaning in ISO C++ and C++/CLI
  #define ref const
  ref class C { } c;

In ISO C++, ref goes to const and the last line defines a class C and simultaneously declares a const object of that type named c. This is legal code, albeit uncommon. In C++/CLI, the macro has no effect on the class declaration because "ref class" is a single token (whereas the macro is looking for the token ref alone, not "ref class") and so the last line defines a ref class C and simultaneously declares a (non-const) object of that type named c.

Here's the simplest workaround:

  // Workaround 2: this has the same meaning in both
  #define REF const
  REF class C { } c;

But hey, macro names are supposed to be uppercase anyway. :-)

I hope these cases are somewhere between obscure and pathological. At any rate, macros with short and common names are generally unusual in the wild because they just break so much stuff. I would rate example 1 above as fairly obscure (although windows.h has exactly that line in it, alas) and example 2 as probably outright pathological (as I would rate all macros with short and common names).

Whew. That's all for tonight.