Not as easy as it looks

Not as easy as it looks

Rate This
  • Comments 41

My colleague Kevin works on (among many other things) the refactoring engine in the C# IDE. He and I were at the end of last year discussing the possible cases for a hypothetical "eliminate variable" refactoring. I thought that it might be of interest to you guys to get a glimpse of the sorts of issues that the IDE team faces on a daily basis.

First off, what is the "eliminate variable" refactoring? It is seemingly straightforward, but as we'll see, appearances can be deceiving. Suppose you have a local variable that is initialized in its declaration, never written to otherwise, and you have only one place where that variable is used. (A generalization of this refactoring is to replace every usage; for simplicity, let's consider only the single usage case.) For example:

void M()
{
    int x = 2 + R();
    Q();
    N(x);
}

You can eliminate the variable by refactoring the code to:

void M()
{
    Q();
    N(2 + R());
}

Now, obviously this changes the meaning of the code. If Q() throws an exception then now R() is never called, whereas before it was always called. If Q() observed a side effect of R() in the first version, it does not after the refactoring. And so on; there are numerous ways in which this code is different. We assume that the user is all right with these changes; that's why they're using the refactoring.

How, at a high level, would you implement this? It seems straightforward:

  • Find the text of the initializer
  • Find the single usage (which the IDE already knows how to do somehow)
  • Textually replace the single usage with the initializer text
  • Erase the declaration and initialization of the variable

That last step might be slightly tricky if you have something like

    int y, x = 2 + R();

because we have to make sure to erase the comma and the declaration and initialization of x, but not erase the "int y;". But that is not hard to deal with.

You also have to think a bit about comments. What should happen here?

    // x is blah blah blah
    int x = 2 /* blahblah */ + R(); // Blah!
    Q();
    N(x);

Should that become

    // x is blah blah blah
    /* blahblah */  // Blah!
    Q();
    N(2 + R());

Or should any of the three comments before, inside or after the declaration be inserted where x used to be used? Notice that if the one that comes after is inserted, it's going to have to be changed to an inline comment, or a newline is going to have to be inserted in the inside of the call. Also, a comment refers to a local variable that has been eliminated, which seems bogus.

Assume that we can solve these problems. Are we done? Let's apply our refactoring to a few test cases. In every case, we eliminate "x".

    const short s = 123;
    int x = s;
    N(x);

becomes

    const short s = 123;
    N(s);

Is that right? Not necessarily. What if N has two overloads, one for int and one for short? Before we called N(int); now we are calling N(short)! This seems qualitatively different than merely reordering a possible side effect; now our refactoring is changing overload resolution, which seems wrong. Really we should be generating

    const short s = 123;
    N((int)s);

OK, is that the right refactoring? Let's keep on trying test cases.

    int x = 123;
    N(ref x);

becomes... what? We can't say N(ref 123) or even N(ref (int)123).

In this case, the refactoring engine has to fail. There is no way to eliminate the variable when what you're passing is a reference to the variable itself.

How about this?

    int x = R() + S();
    N(Q() * x);

becomes

    N(Q() * R() + S());

Argh, we have now introduced a semantic change because of operator precedence. That should be

    N(Q() * (R() + S()));

How about this?

    Func<int, int> x = z=>z+1;
    object n = x;

becomes

    object n = z=>z+1;

Nope; you can't implicitly convert a lambda to anything other than a delegate or expression type. That's got to be

    object n = (Func<int, int>)(z=>z+1);

Whew. What have we learned so far? To eliminate T x = expr, there are scenarios where you can use:

  • nothing at all; give an error
  • expr
  • (expr)
  • (T)expr
  • (T)(expr)

An exercise for the reader: find scenarios in which none of the above are good enough either. Can you find a scenario in which you have to say ((T)expr)?  How about ((T)(expr))? The latter was the worst I could find easily; are there any more that I missed?

Understandably, we would not want to implement this refactoring in a hypothetical future version of Visual Studio if every time you did an elimination of a variable, the compiler conservatively inserted four parens plus a cast; what we really need are heuristics which can tell us when each of the above forms is necessary. That is a surprisingly tricky problem in language analysis!

And these are just the issues that Kevin and I came up with in a few minutes of whiteboard doodling; it is entirely possible that on deeper analysis, we'll find more interesting problems.

Next time you use the refactoring tools in Visual Studio, think about everything that is happening behind the scenes to make those tools work seamlessly. Like playing the piano, the unseen effort required to make it look easy is enormous.

  • curses, skeet shot.

    anything involving collection initialisers is going to add to the complexity:

    int[] foo = {0,1};

    Console.WriteLine(foo);

    Console.WriteLine({0,1}); // illegal

    Console.WriteLine(new int[] {0,1});

  • curses, skeet shot.

    anything involving collection initialisers is going to add to the complexity:

    int[] foo = {0,1};

    Console.WriteLine(foo);

    Console.WriteLine({0,1}); // illegal

    Console.WriteLine(new int[] {0,1});

  • sorry for the dupe:

    here's another example where you can't do it (ignoring that the operator itself is likely to be error prone wrt side effect)

    int x = 1;

    int a = ++x;

    Console.WriteLine(++a);

    int y = 1;

    Console.WriteLine(++(++y)); // illegal

    I'm sure someone can do worse with implicit and explicit casts but here's a really unpleasant case if you were only dealing with text rather than the abstract syntax tree:

    void Main()

    {

    short y = 0;

    Frober x = y;

    Console.WriteLine(x);

    Console.WriteLine((Frober)y); // wrong

    Console.WriteLine((Frober)((int)y)); // right - but needs the compiler telling it that this happened

    }

    // Define other methods and classes here

    class Frober

    {

    int X = 1;

    public Frober(int i) { this.X = i; }

    public static implicit operator Frober(int i) { return new Frober(i); }

    public static explicit operator Frober(short i) { return new Frober(i + 1); }

    public override string ToString() { return this.X.ToString(); }

    }

  • A somewhat coerced example, but replacing x here would fail since C as a dynamic can only be implicitly cast to int.  The variable cannot be eliminated.

       class C : System.Dynamic.DynamicObject

       {

           public override bool TryConvert(ConvertBinder binder, out object result)

           {

               if (!binder.Explicit && binder.ReturnType == typeof(int))

                   {

                       result = 3;

                       return true;

                   }

               return base.TryConvert(binder, out result);

           }

       }

       class Program

       {

           static void Main(string[] args)

           {

               Action<int> A = i => Console.WriteLine("A(int) called.");

               int x = (dynamic)(new C());

               A(x); // works

               // A ((dynamic)(new C())) and A((int)((dynamic)(new C()))) throw.

           }

       }

  • Eric,

    I question if this type of refactoring is something that you want to encourage.  Let's assume that your first two examples don't have any gotchas like side effects or exceptions thrown in Q.  I would still say that the functions are different from a readability/maintainability standpoint.  For example, I would write the first first function if 2 + R() has some meaning in and of itself.  I would write the 2nd function if 2+R() only means something to N() but doesn't have any meaning in another context.

    I'd be concerned that having Visual Studio do this automatically will encourage collapsing the first function into the second function all the time at the expense of readability.

  • Shuggy, I think the ++ case is pretty much the same as a ref case - you can think of ++ as an operator accepting a ref parameter.

    Eric, on my current project I could give you hundreds of examples, because the client insisted that we use the "tool" StyleCop, with most errors enabled. (the word tool is in quotes here because a tool is something that's supposed to help development and StyleCop does the exact opposite...)

    var route = new

    {

     controller = "A",

     method = "B",

     parameter = "C"

    };

    routes.MapRoute(

     "Name",

     "Path",

     route

    );

    In this example,.you'd get an error for the seemingly fine

    routes.MapRoute(

     "Name",

     "Path",

     new

     {

       controller = "A",

       method = "B",

       parameter = "C"

     }

    );

    SA1118: The parameter spans multiple lines. If the parameter is short, place the entire parameter on a single line. Otherwise, save the contents of the parameter in a temporary variable and pass the temporary variable as a parameter.

    We've learned to avoid having more than one or two parameters in an MVC route because of that... Which I find very sad.

  • Why do you need the extra parentheses around ((Func<int>)(() => 1)) in SSLaks's example? They seem redundant to me.

    Another example where you would need extra parentheses with functions is when they're applied:

    Func<int> x = () => 1;

    int y = x();

    would turn into

    int y = ((Func<int>)(() => 1))();

  • Another example for the extra parentheses: implicit cast and call member

    class Bd {

       public int Value;

       public static implicit operator Bd(int value) { return new Bd { Value = value }; }

    }

    Bd x = 1 + 2;

    Console.WriteLine("The value is " + x.Value);

    would be turned into

    Console.WriteLine("The value is " + ((Bd)(1 + 2)).Value);

  • Here's another interesting case.

    int x = 42;

    var properties = new { x };

    would need to be converted to

    var properties = new { x = 42 };

  • And last one for today: Expressions go more wrong than anything else.

    int x = 1;

    Expression<Func<int>> e1 = () => x;

    Expression<Func<int>> e2 = () => 1;

    These two are completely different.

  • I hope you compensate Kevin well. I'd have been fired by now for making too many tests fail.

  • I'd like to point out that Khoth's example is perfectly valid. The code is syntactically correct C# and I would expect refactorings to work on code that I haven't finished writing yet. Of course that could be a situation in which it couldn't perform this refactoring, but it's certainly a valid input to the refactoring engine.

    Would it give warnings when moving expressions across scopes?

       var x = MightThrow();

       try {

           Foo(x); // can we eliminate x here?

       } catch {

           // swallow all exceptions

       }

    Obviously that could change the semantics of the code more than one might expect. What about this:

       int LockX()

       {

           lock (X) { return Bar(); }

       }

       int LockY()

       {

           int bar = LockX();

           lock (Y) { return Foo(bar); } // can we eliminate bar here?

       }

    Now you've changed the logic from locking then unlocking X and Y in sequence to acquiring X while holding lock Y. If the proper protocol is to acquire X and then Y, you've created a potential deadlock.

       int limit = ExpensiveOperation();

       for (int i = 0; i < limit; i++) // can we eliminate limit here?

       { }

    Eliminating limit there would change the code from doing a single expensive operation to untold numbers of them. A variant of this is capturing variables:

       var obj = new Obj();

       D del = () => Foo(obj); // can we eliminate obj here?

    If we eliminate obj then instead of sharing it among all invocations of del, a new Obj instance will be created for each invocation. One might even consider disallowing a substitution like this:

       foreach (var x in L) {

           var V = x; // create new variable to hold loop iteration variable

           Foo(() => Bar(V)); // can we eliminate V here?

       }

    Whenever I write code like that I put in a comment saying why V is there so that nobody comes along and tries to refactor V away and change what variable is closed over.

    One interesting situation where you would need the parens is here:

       bool J = x<y, K = w>(z);

       F(J, K); // can we eliminate J and K here?

    You can safely replace either J or K without parens. However if you try to replace both J *and* K then you will get either an error or change the overload resolution for F without adding parens around one or both of them..

  • Designing a language explicitly for perfect refactorability (instead of using a syntax compatible with existing mindsets) may be quite interesting.

    How far could the code editing process be shifted from traditional text editing towards snippet selection + refactoring?

  • continuing configurator's puash to find yet more cases requiring ((TYPE)(X)):

    DayOfWeek d = default(int);

    Console.WriteLine(d.ToString());

    Console.WriteLine(((DayOfWeek)(default(int))).ToString());

    msdn.microsoft.com/.../2bxt6kc4.aspx

    and I believe anything with higher precedence than typecast which is Unary and postfix (if prefix as with ~ you can get away without the outer brackets)  which you can make applicable.

  • I also found an interesting case you probably didn't expect: the refactoring can leave an out-parameter unassigned where it previously wasn't. All it takes is that the out-parameter is set in the declaration of the variable to eliminate and a conditional returning branch before the usage of the variable to eliminate. For example:

    void Foo(out int b)

    {

       int x = b = R();//x is the variable to eliminate

       if (Q()) return;//b is assigned

       doSomething(x);

    }

    would after refactoring become

    void Foo(out int b)

    {

       if (Q()) return;//error: b is unassigned

       doSomething(b = R());

    }

Page 2 of 3 (41 items) 123