No Backtracking, Part Two

No Backtracking, Part Two

Rate This
  • Comments 16

As i was saying last time, the nice thing about "no backtracking" is that it makes the language much easier to understand. Simple rules benefit both the compiler and the code reader; both are attempting to read the code to make sense of it. It is not always a good idea to take advantage of the compiler's ability to search a large space if that makes it harder for a human to understand the code.

Suppose you have something like this mess: (*)

namespace XYZ.DEF

    public class GHI {} 

namespace QRS.DEF.GHI 

    public class JKL { } 

... in another file ...

using QRS; 
namespace TUV  
{
    using XYZ;
    namespace ABC
    {
        namespace DEF
        {
            class GHI { }
            class MNO : DEF.GHI.JKL { }
        }
    }

And now we must work out the base type of MNO.

With no backtracking we say "The nearest container of MNO that has a member DEF is ABC, therefore DEF means ABC.DEF". Therefore GHI is ABC.DEF.GHI. Therefore JKL is ABC.DEF.GHI.JKL, which does not exist, therefore, give an error. The developer must fix the error by giving a type name that lets the compiler identify which DEF you meant.

If we had backtracking, what would we have to do? We’d get that error, and then we’d backtrack. Does XYZ contain a DEF? Yes. Does it contain a GHI? Yes. Does it contain a JKL? No. Backtrack again. Does QRS contain an DEF.GHI.JKL? Yes.

That works, but can we logically conclude from the fact that it works that it is the one the user meant?

Who the heck knows in this crazy situation? We got all kinds of good bindings in there that then went bad very late in the game. The idea that we stumbled upon the desired answer after going down so many blind alleys seems highly suspect. Maybe there is yet another choice in there that is the one the user meant. We cannot know that unless we try all of them, and again, that could involve a lot of searching.

The correct thing to do here is not to backtrack multiple times and try out all kinds of worse bindings for every stage of the lookup. The correct thing to do is to say "buddy, the best possible match for this lookup gives nonsensical results; give me something less ambiguous to work with here please."

An unfortunate fact about writing a language where the compiler by design complains loudly if the best match is something that doesn't work, is that developers frequently say "well, sure, in general I want the compiler to point out all my mistakes -- or, rather, all my coworker's mistakes. But for this specific case, I know what I am doing, so please, compiler, do what I mean, not what I say."

Trouble is, you can't have it both ways. You can't have both a compiler that both enforces rigid rules that make it highly likely that suspicious code will be aggressively identified as erroneous and allow crazy code via compiler heuristics that figure out "what I really meant" when you write something that the compiler quite rightly sees as ambiguous or wrong.

I could discuss many more places where we could do backtracking but do not. Method type inference, for example, always either makes progress or fails; it never backtracks in C# (**). But I think I will leave it at that. Except to say that there is one place where we do use backtracking, and that’s analysis of overload resolution in nested lambdas. I wrote about that here.

(*) When reading this remember that in C#, “namespace XYZ.DEF { }” is a short form for “namespace XYZ{ namespace DEF { } }”

(**) Unlike in, say, F#.

  • Lambdas don't need to be nested - you always backtrack when resolving overloads there. But if they're nested, it introduces those millions of cases you talked about.

  • One thing, where backtracking could be used is informing developer of possible solutions of error (something like IntelliSence do, but even more intellegent).

    Small example is from my past: several years ago I had to compile some C-program with some strange C-compiler (I even do not remember what platform it was). And one of very noticeable feature was guessing of possible solutions - something like 'unknown identifier "abc", may be you mean "abd"?'

    For example if we have unknown identifier, we can backtrack for several limited levels and try to guess: namespaces, lookup nearest defined identifiers and check if there is something similar.

    As build servers and development boxes have quite good CPU power, that will not take much time (and can be turned off by default), but can become as compromise between 'backtrack or not'.

  • Olostan, I'm with you. The next step for intellisense is something like Word's spell check which doesn't just highlight the errors, it suggests what you actually meant so you don't need to type it. Just right click and add that semi-colon, 'new' keyword, fix that namespace, add that cast, rename that identifier, stub the missing parameter in the calling code, etc. :)

  • OT, but Eric, is there any way you could enhance NullReferenceException to include more info in a release build? I'd love to just know at least the type that it was expecting. Would help when diagnosing production errors without debug/trace code.

  • My favorite compiler ever, the Metaware Pascal compiler, had a wonderful use of extensive back-tracking and other heureistics:  producing better error messages.    In this example, I'd definitely want the compiler to issue an error message, but the back-tracking could let the error message include "Did you mean QRS.DEF.GHI.JLK?  If so, refer to it as ::global.QRS.DEF.GHI.JLK to avoid this error in the future", on the assumption that in a significant subset of cases it actually is the right answer.  Simiarly, the Metaware compilers would look for similar names in "nearby" scopes when an "identifier not found" error occurred and in the vast majority of cases, would find the correct identifier in a nearby scope.  Of course, the compiler was also incredibly slow compared to contemporaries, like Turbo Pascal, but the error messages were soooo good it was almost pleasurable to have errors in the code!

  • @Olostan - sounds like the MetaWare "High-C" compiler!  I'm not sure who owns MetaWare now - it's one of the embedded developer tools companies, so I'm sure that their compilers are still out there in the wild somewhere.

  • Eric, I'm slightly disappointed you didn't even add

    namespace TUV.DEF.GHI {

    sealed class JKL { }

    }

    as a special case for the backtracker.

  • It could be a fun sport to do "compiler smashing": Find a small program that takes hours to compile or crashes ;-) Although I must say that crashes in csc.exe are incredibly rare.

  • It's all a matter of perspective:  C# performs extensive backtracking all over the place.  For example; overload resolution is equivalent to a form of (non-recursive) backtracking: when you see a method name, you could optimistically resolve the first match, attempt to compile further and upon failure rollback and try another match, typical backtracking.  In practice, the decision to resolve the method name is postponed until the arguments to the call have been analyzed.  Why do you choose to think about *that* form of backtracking as simply a set of possible matches from which the most specific wins (if unique) yet choose to think of other problems as backtracking even though they too could be implemented without backtracking?

    Backtracking is an implementation detail.  What you're talking about is essentially ambiguity that cannot be resolved without extra information.  Backtracking is a poor way of doing so; using lookahead or sets is clearly superior in all but the simplest of situations.

    For instance, the compiler could resolve your example DEF.GHI.JKL by maintaining and processing sets of matching symbols rather than single symbols, and only give an error if the final symbol is ambiguous; if memoized, that approach will scale reasonably even in complex cases, even without memoization it's not as bad as backtracking.  That feature may not be *desirable*, but the *implementation* no longer involves backtracking.  Also, as you say, the language's parser uses lookahead - which is also equivalent to backtracking, yet implemented without it.

    I don't understand how the principle of "No backtracking" can be extended from implementation to design.  Merely because you *can* use backtracking to resolve local ambiguity clearly cannot mean that you must avoid the ambiguity in the first place, since backtracking is such a general solution that it might apply to almost anything.

    So, from a design perspective, what does "No Backtracking" actually proscribe?

  • I guess what I'm trying to say is the following:

    You say that (in programs) ambiguity that cannot be trivially resolved is bad, so its better for the compiler to abort if faced with such ambiguity even if that ambiguity might be resolved by taking context into account.  I'm wondering: where do you draw the line - what kinds of ambiguity are then acceptable, and which are not?

  • @Eamon: I think that when ambiguity can have a solution that would be obvious to the developer, the solution is acceptable. Otherwise, it isn't.

  • @Eamon: I don't think overload resolution can be solved using backtracing (ignoring lambdas). Each expression has a type - that's how the language is defined. Once you've got the list of types, you need to choose the best overload. Not, and this is important, the first overload that matches. If you've got a method overloaded over (object) and (string), and you pass it "a string", the second overload would be the best match. How could a backtracking matcher know to try that second overload first? You'd need to order the methods by how good a match they might be. What if you've got a lot of parameters and plenty of overloads? How would you sort the methods for the backtracker to always try the best match first?

  • @configurator:  You're looking through the glasses of the current design.  If you look at methods the way you look at classes (from a namespace perspective), the C# design chooses to have odd things such as "method groups" and to not resolve a reference to a method to a single method, but to wait until the types and number of arguments are known - why permit this ambiguity.  To be clear: the overload example is merely an example, and it's *not* the core issue, which is that backtracking is an implementation detail, not an inherent feature of (most) problems.  On the matter of your specific issue concerning a backtracking implementation of overload resolution; indeed, the backtracker would need to try more specific solutions before less specific solutions; and doing that would not in all cases result in identical behaviour (at least, not without some fairly tricky fixes since specificity of match is only a partial order).  Anyhow, overload resolution as-is is a fine solution; but it involves ambiguity; apparently *this* ambiguity is acceptable - why?

    I'm not arguing that overload resolution is somehow problematic, I'm asking what makes backtracking somehow special.  Obviously, if you use backtracking to effectively "resolve" ambiguity at random, you're asking for trouble.  But but this blog post and the previous post used examples that could be processed without backtracking and still identify real ambiguity without resorting to backtracking.

    In other words, what does it mean to say you're designing the compiler with "no backtracking" if it's commonly possible to avoid backtracking simply by picking another implementation?

  • I really like this behavior (no backtracking). I only wish that Visual Studio / MSBuild behaved the same way. If a referenced assembly is not found, Visual Studio should FAIL THE BUILD. Instead, it will attempt to locate that DLL (usually finding an old copy in some bin folder). I hate that.

  • @Eamon: What Eric means by "No backtracking" is not the implementation detail (And he can correct me if I'm wrong here). What he means is that when you as a developer read the code DEF.GHI in the example, you know it means the most internal one; so the fact that it's followed by .JKL will not cause you to mentally backtrack and think "Wait, no, now DEF.GHI needs to be something else"; instead it will cause you to think "Hmm, DEF.GHI doesn't contain a JKL! What's going on here?" And that's the thought pattern that the compiler is trying to immitate.

Page 1 of 2 (16 items) 12