Not as easy as it looks, Part Two

Not as easy as it looks, Part Two

Rate This
  • Comments 21

Holy goodness, did you guys ever find a lot of additional ways in which an "eliminate variable" refactoring can go wrong. Just a few of your observations: (again, in every case, "x" is eliminated.)

Any situation in which x is being treated as a variable rather than a value will pose a problem. Some obvious ones, like if x is on the left side of an assignment, or the target of a ++, or used as a "ref" or "out" parameter are straightforward. But there are some situations in which it is a bit surprising where x is being used as a variable. Suppose S is a mutable struct with field F:

S x = new S();
x.F = 123;

Here we cannot simply say (new S()).F = 123 because new S() is not a variable. The mutable field requires a storage location to mutate but we don't have a storage location here.

A number of people pointed out that the C# rules about how simple names may be used potentially pose problems:

int y = 2;
void M()
{
  Action x = () => { Console.WriteLine(y); };  // refers to this.y
  {
    string y = "hello";
    x();
  }

The two inconsistent uses of the simple name y are legal here because the declaration spaces in which both are first used do not overlap. But the refactoring causes them to overlap; this needs to be refactored to

void M()
{
  {
    string y = "hello";
    ((Action)(()=>{Console.WriteLine(this.y);})))();
  }

It gets even worse if the inconsistent simple names cannot be fully qualified:

Action x = () => {int z = 123;};
{
  string z = "hello";
  x();
}

Now one of the local zeds must be renamed if the refactoring is to succeed.

You can also get up to mischief with anonymous types:

int x = 42;
var a = new { x };

This must be refactored to

var a = new { x = 42 };

Similarly,

int x = k.y( );
var a = new { x };

cannot be refactored to

var a = new { k.y() };

because that changes the name of the anonymous type property.

Moving an expression around can also muck with the definite assignment rules; this would have to be illegal to refactor:

void Foo(out int b)
{
   int x = b = R();
   if (Q()) return;
   doSomething(x);
}

because it becomes

void Foo(out int b)
{
   if (Q()) return;
   doSomething(b = R());
}

And now we have a method that returns without assigning the out parameter.

I already mentioned that moving an expression around can change the semantics of a program because the order of observed side effects can change. A number of horrid ways in which methods become dependent on side effects were noted, the nastiest of which was:

int x = LocksFoo();
lock (Bar) { return M(x); }

The refactoring changes the order in which Foo and Bar are locked, and therefore might cause a deadlock if other code depends on Foo always being locked before Bar.

Array initializers are a kind of expression that is only legal in a small number of situations; the refactoring has to take that into account:

int[] x = {0,1};
Console.WriteLine(x);

needs to be refactored into

Console.WriteLine(new int[] {0, 1});

And finally, moving an expression around can move it from a checked to an unchecked context, or vice versa:

int zero = 0;
int one = 1;
int x = int.MaxValue + one;
Console.WriteLine(checked(x + zero));

The naive refactoring turns a program which succeeds into one which fails at runtime:

Console.WriteLine(checked(int.MaxValue + one + zero));

To be correct, the refactoring would have to introduce an unchecked block around the first addition!

Thanks all, that was quite amusing.

  • So much pain for so little benefit... Probably not worth including this refactoring in the next version of VS !

  • @Thomas - I assume you're not a Resharper user?

    The funny thing about refactoring tools is how hard is it to see why they're useful until you have them.

  • I think you mean new { x = 42 }

  • Having so much edge cases increases the value as well.  If refactoring manually I have to notice all these issues and account for them myself every time.

  • I think there's one too many closing parenthesis in

    ((Action)(()=>{Console.WriteLine(this.y);})))();

    I thought at first this was a new case where we need to add more than ((Type)(Value)), but it isn't.

  • Not just amusing but it makes you wonder why you'd ever try to erase a variable.  Even with one use point you'd be asking for the above headache, but in the case where there's multiple use-sites you'd end up with a clear answer: the variable must remain because: the solution to introduce a variable free alternative would ... introduce a variable to ensure the side-effects maintain consistency where applicable.

    Example:

    int x = StateChanger();

    int y = x + 1, z = x - 2;

    becomes

    int y = (StateChanger() + 1), z = (StateChanger() - 2);

    The issue with that is obvious.

    In the case of:

    var x = new S();

    x.F = 123;

    You'd end up using:

    new S() { F = 123 };

    Which is the same, but ... object initializers introduce a background variable to do their work.  Ironic.

  • Instead of finding out what are "fail" cases for such refactoring there should be "whitelist" of allowed refactorings. That should be easy enough to do.

  • @Toni: if your tool simply says "scenario is not in whitelist, cannot refactor", then what do you do? Refactor manually, taking very good care of all of the above gotchas. You're bound to miss the more subtle ones. Taking care of complicated gotchas in a systematic fashion is just what a computer program is good for. Even a tool that says "scenario is of the form X, cannot refactor" because it only detects and doesn't know how to correct is more useful.

  • I find this feature really nice, but what I would really like to have is a class reordering feature, like the "Organize using --> Sort using"  but for a class, to reorder it's fields, properties, events, methods and constructors and at the end have the class organized with the same structure of the class diagram.

  • I guess I'm a bit late to the party. Did anybody mention closing over a loop variable?

    Func<int>[] fs = new Func<int>[10];

    for (int i=0; i!= 10; ++i)

    {

    int x = i;

    fs[i] = ()=>x;

    }

    Console.WriteLine(fs[7]());

    This prints "7", but if you apply the naive refactoring it prints "10".

  • One other variant of the "simple names" scenario dealing with static variables:

    class C

    {

     static int y = 2;

     void M()

     {

       Action x = () => { Console.WriteLine(y); };  // refers to C.y

       {

         string y = "hello";

         x();

       }

    Refactor x:

    class C

    {

     static int y = 2;

     void M()

     {

       {

         string y = "hello";

         ((Action)(()=>{Console.WriteLine(C.y);}))();

       }

  • DRBlaise: Static variables can still be disambiguated although not as easily - B.y here would disambiguate just fine.

  • Weeble: That's a good point! Nobody mentioned that...

  • Reordering: But that would destroy the natural order of a class... Take this example, to make things clear.

    Class RectangleCalculator has a public interface which accept a Rectangle, two Points or four integers

    public int Area(Rectangle r);

    public int Area(Point p1, Point p2);

    public int Area(int x1, int y1, int x2, int y2);

    public int Perimeter(Rectangle r);

    public int Perimeter(Point p1, Point p2);

    public int Perimeter(int x1, int y1, int x2, int y2);

    Each of these does the argument validation, and then they delegate to the private CalculateArea and CalculatePermiter.

    Wouldn't it make sense to order the class like this?

    public int Area(Rectangle r);

    public int Area(Point p1, Point p2);

    public int Area(int x1, int y1, int x2, int y2);

    private int CalculateArea(Whatever parameters);

    public int Perimeter(Rectangle r);

    public int Perimeter(Point p1, Point p2);

    public int Perimeter(int x1, int y1, int x2, int y2);

    private int CalculatePerimeter(Whatever parameters);

  • At the beginning the class is good organized and the class entropy is very low. Then a few changes are needed, which are added somewhere in the class structure, but still the class is good organized, but those few changes begin to raise the class entropy. After a couple of years of "few changes" the class entropy is so high that it would take too much time to reorganize it and the risk of a copy paste error is concrete. This means the class has already overcome the point of no return, the entropy now will only grow higher, it's too late, bye bye good organized class, welcome tasty spaghetti class, the can of worms is open and nobody will close it.

Page 1 of 2 (21 items) 12