Practice thinking like a compiler tester

Practice thinking like a compiler tester

Rate This
  • Comments 13

I don’t know why but for some reason I find this little recursive algorithm I ran across in the compiler the other day to be completely hilarious.

For every class in your program we have to emit metadata. For various historical reasons, there are strict rules we must follow about the order in which metadata is emitted:

  • Every class must be emitted exactly once.  (We cannot omit emitting any class and we cannot emit the same class twice.)
  • A derived class must be emitted after its base class.
  • A nested inner class must be emitted after its outer class.

(To simplify this discussion I am omitting interfaces.  A class must also be emitted after all of its interfaces, but for the purpose of this discussion they are just like having multiple base classes.)

We have a recursive algorithm that goes something (but not quite!) like this:

foreach(Class c in TopLevelClasses)
    Emit(c);

void Emit(Class c)
{
    if (c == null || c.Emitted) return; // don’t emit twice!
    Emit(c.BaseClass);
    Emit(c.OuterClass);
    // OK, now we are ready to really emit c,
    // because its base and outer classes are really emitted.
    ReallyEmit(c); // This must only happen once per class, so mark us as emitted.
    c.Emitted = true;
    // Recurse on c’s inner classes, so that we don’t miss them.
    foreach(Class i in c.InnerClasses)
        Emit(i);
}

Does that look correct to you?

Note that at this point we have already guaranteed that base classes form a tree.  You cannot start going up the base class chain and end up back where you started.  So there is no infinite recursive descent along the base class.  Similarly, we already know that outer/inner classes make a well-formed tree.  We’re not going to go up the chain of outer classes and end up back where we started, so there is no infinite recursive descent there either.

We are also not concerned about generic classes, whether the classes in the generic type constraints have been emitted, etc.  Assume C# 1.0 classes.

Nevertheless, there is still a bug in this algorithm.  Can you find a set of legal C# classes which causes this algorithm to violate one of the preconditions I mentioned?

This one is pretty easy. I’ll post an example tomorrow, propose a fix to the algorithm, and pose a slightly harder version of the puzzle then.

  • There is no already emitted check like

    if (!c.Emitted)

    {

      ReallyEmit(c);

    }

  • Tim: First line in the method, unless traversing to the parents will somehow reveal an inheritance loop.

  • Assume the following:  

    1.  Outer is a top level class

    2.  Inner is a public class nested within Outer

    3.  DerivedFromInner is a top level class derived from Outer.Inner

    If DerivedFromInner is processed by the foreach loop BEFORE Outer, then Inner will be emitted twice due to the fact that c.Emitted is only set to true AFTER the call to Emit(c.OuterClass)

  • I do not know enough C# to write a set of classes to expose a bug, but I am thinking you might want to write the top of Emit like this:

       if (c == null ) return; // got no class

       Emit(c.BaseClass);

       Emit(c.OuterClass);

       if ( c.Emitted) return; // don’t emit twice!

  • Rich posted while I was typing...looks like he posited the bug case.

  • RichM:  How do you propose Derived be processed before Outer?

  • kfarmer: Outer and DerivedFromInner are both top level classes and would therefore be contained in the TopLevelClasses collection.  The problem statement does not indicate how items are ordered within TopLevelClasses.  Since this is undefined, I am assuming it would be possible (but not certain) for DerivedFromInner to be processed before Outer.

  • Nice job RichM, that is exactly the example I have on my whiteboard right now.

    kfarmer: the compiler can choose to enumerate the top level classes however it wishes to.  The algorithm it actually uses is something like:

    EmitFiles(files){ foreach(file f in sourcefiles) emitNamespace(f.rootnamespace); }

    EmitNamespace(n) { foreach(class c in n.toplevelclasses) Emit(c); foreach(Namespace nc in n.childnamespaces) EmitNamespace(n); }

    Top level classes in a namespace are enumerated in the order they appear in the source code.

  • So essentially what you have is not a tree, but a DAG (directed acyclic graph), where an edge from A to B indicates A is B's base or outer class.

    Then all you need to do is perform a topological sort using a depth-first search:

    Stack<Class> stack = new Stack<Class>();

    foreach(Class c in TopLevelClasses) Emit(c);

    foreach(Class c in stack) ReallyEmit(c);

    void Emit(Class c) {

       if (!c.Emitted) {

           foreach(Class i in c.InnerClasses) Emit(i);

           foreach(Class d in c.DerivedClasses) Emit(d);

           stack.Push(c);

           c.Emitted = true;

       }

    }

  • So if I take as given that your strict rules are consistent, then

    class Outer : Outer.Inner

    {

       public class Inner

       {

       }

    }

    is not permissible? (Nor if there were some chain of classes separating Outer and Outer.Inner in the inheritence tree)?

  • c:

    From VS2k5:

    Error 1 Circular base class dependency involving 'Scratch.Outer' and 'Scratch.Outer.Inner' C:\...\Projects\Scratch\Scratch\Program.cs 16 11 Scratch

    I always kind of wondered what error 1 was.  :-)

  • Reader RichM found the same solution to the puzzle I posed yesterday that I did: public class V : S.T

  • Yesterday I posed a slightly harder version of the puzzle I posted the day before . Reader Steve found

Page 1 of 1 (13 items)