I wanted to write a post today about a very interesting bug we just came across related to Refactoring.  As i just blogged about verification, i thought that this would be a good followup.  First, a bit of background.  Referencing the previous post, we can see the following Q/A about refactoring:
Why is it important? Well, if we look at Martin Fowler's Refactoring Site, we'll find the following information:
Q: What is Refactoring?
A: Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.
Now, let's dissamble that "answer".  What does "without chaning its external behavior" mean?  Well, to me, it means that the semantics of the code (as defined by the language) have not been changed.  Why do i limit myself to just "as defined by the language"?  Because, frankly, any other definition would be too hard to implement :)  For example, if you perform an "extract method" of some code that sits in a tight loop, and that the jit doesn't inline, then you'll have changed the external behavior with respect to its performance.  However, there's no good way to detect this, and most people don't care.  Similarly, if your code makes decisions based on runtime type information (i.e. it uses reflection), then it's possible to change it's external behavior because suddenly this alteration of internal structure becomes visible.  However, customers have spoken and have stated that they're fine with these caveats and so we accept them. 

Now, here's where it gets a bit interesting.  Let's look at my original definition: "it means that the semantics of the code (as defined by the language) have not been changed."  Why is this interesting?  Well, consider the case of code with errors in it.  The language defines such code to be semantically meaningless.  So, at that point, almost any change can be made and still be called a refactoring since the semantics won't change (i.e. meaningless code stays meaningless).  In fact, the only change a refactoring could not do would be to take this broken code and make it non-broken (since that would change it's semantics from meaningless to meaningfull).  Early on in VS2005 we considered making it so that you could only run refactorings if your code was compilable.  We felt that this went with the spirit of refactoring, and also, it made the refactoring job orders of magnitude easier since trying to make sense out of broken code is extraordinarily difficult (my full time job in fact!).  However, based on a huge amount of customer feedback, we felt that that was a bad idea.  Customers still wanted to be able to do things like extract a method even if their code was broken.  And they were willing to deal with the fact that the result might not be perfect if the code wasn't in a good shape.  However, they just wanted to be made aware that this might be the case, and so we decided to pop up a warning every time you did a refactoring of unbuildable code.  Specifically:


Seems pretty sane right?  We give you the power of a lot of code manipulation tools at your fingertips, but we warn you that we might not always be able to do everything correctly.  And because we have a preview mechanism you can go in and see what we're going to do to decide if it meets your needs.

So what's the problem?  Well, based on quite a bit of customer feedback (as well as a nasty bug that i opened myself), we decided to invest a lot of time on improving the performance of refactorings.  We felt that they were far too slow, and that that would limit their usefulness.  At this point i feel that they're still slower than we'd like, however fast enough to be used as part of the regular development cycle.  For example, of my fairly meaty project that i work on, they take on the order of 2-3 seconds.  I'd like them to be instantaneous, but it's far better than the 2-3 *minutes* that it used to take!  (Note: it's a known problem that the more projects you have, the more degradation you will get, we'll work on that more in the future).  The performance optimizations implemented were several fold.  First off, we realized that we were just duplicating a whole lot of work, and that it was trivial to just make sure we were only doing the work once instead.  However, we also added what we thought were some pretty snazzy heuristics to improve performance.  For example, if you're renaming a variable, then we will not compile any methods in your source code that don't contain the identifier you're renaming in them.  (The C# compiler does two passes, the type system outside of methods, and the compilations of individual methods one by one).  This latter optimization was a huge savings in performance since it meant that, for the most part, 99% of the methods in your code did not need to be compiled in order to do a rename. 

But, in our haste to make fast, we missed a critical issue.  If we don't compile those methods, then we won't know if there are errors in them.  And if we don't know if there are errors in them, then we won't give you that dialog.  And now the user can get enormously confused.  Why do i get this dialog in some cases and not in others?  Technically the dialog now means "The source in your project related the refactoring you're performing does not currently build".  However, i think that that's enormously subtle, and might lead users to be nervous about the validity of our analysis and verification steps.  So much so that i'm wondering if we should undo this performance optimization.

How do you feel about this?