The PDC has happened, which means two things. I can post some of my (slightly self-censored) reactions to the show, and I can talk about what we ve disclosed about Whidbey and Longhorn more freely. In this particular case, I had promised to talk about the deep changes we re making in Whidbey to allow you to host the CLR in your process. As you ll see, I got side tracked and ended up discussing Application Compatibility instead.
But first, my impressions of the PDC:
The first keynote, with Bill, Jim & Longhorn, was guaranteed to be good. It had all the coolness of Avalon, WinFS and Indigo, so of course it was impressive. In fact, throughout all the sessions I attended, I was surprised by the apparent polish and maturity of Longhorn. In my opinion, Avalon looked like it is the most mature and settled. Indigo also looked surprisingly real. WinFS looked good in the keynote, where it was all about the justification for the technology. But in the drill-down sessions, I had the sense that it s not as far along as the others.
Hopefully all the attendees realize that Longhorn is still a long way off. It s hard to see from the demos, but a lot of fundamental design issues and huge missing pieces remain.
Incidentally, I still can t believe that we picked WinFX to describe the extended managed frameworks and WinFS to describe the new storage system. One of those names has got to go.
I was worried that the Whidbey keynote on Tuesday would appear mundane and old-fashioned by comparison. But to an audience of developers, Eric's keynote looked very good indeed. Visual Studio looked better than I've ever seen it. The device app was so easy to write that I feel I could build a FedEx-style package tracking application in a weekend. The high point of this keynote was ASP.NET. I hadn't been paying attention to what they've done recently, so I was blown away by the personalization system and by the user-customizable web pages. If I had seen a site like that, I would have assumed the author spent weeks getting it to work properly. It s hard to believe this can all be done with drag-and-drop.
In V1, ASP.NET hit a home run by focusing like a laser beam on the developer experience. Everyone put so much effort into building apps, questioning why each step was necessary, and refining the process. It's great to see that they continue to follow that same discipline. In the drill-down sessions, over and over again I saw that focus resulting in a near perfect experience for developers. There are some other teams, like Avalon, that seem to have a similar religion and are obtaining similar results. (Though Avalon desperately needs some tools support. Notepad is fine for authoring XAML in demos, but I wouldn t want to build a real application this way).
Compared to ASP.NET, some other teams at Microsoft are still living in the Stone Age. Those teams are still on a traditional cycle of building features, waiting for customers to build applications with those features, and then incorporating any feedback. Beta is way too late to find out that the programming model is clumsy. We shouldn t be shirking our design responsibilities like this.
Anyway, the 3rd keynote (from Rick Rashid & Microsoft Research) should have pulled it all together. I think the clear message should have been something like:
Whidbey is coming next and has great developer features. After that, Longhorn will arrive and will change everything. Fortunately, Microsoft Research is looking 10+ years out, so you can be sure we will increasingly drive the whole industry.
This should have been an easy story to tell. The fact is that MSR is a world class research institution. Browse the Projects, Topics or People categories at http://research.microsoft.com and you ll see many name brand researchers like Butler Lampson and Jim Gray. You will see tremendous breadth on the areas under research, from pure math and algorithms to speech, graphics and natural language. There are even some esoterica like nanotech and quantum computing. We should have used the number of published papers and other measurements to compare MSR with other research groups in the software industry, and with major research universities. And then we should have shown some whiz-bang demos of about 2 minutes each.
Unfortunately, I think instead we sent a message that Interesting technology comes from Microsoft product groups, while MSR is largely irrelevant. Yet nothing could be further from the truth. Even if I restrict consideration to the CLR, MSR has had a big impact. Generics is one of the biggest feature added to the CLR, C# or the base Frameworks in Whidbey. This feature was added to the CLR by MSR team members, who now know at least as much about our code base as we do. All the CLR s plans for significantly improved code quality and portable compilers depend on a joint venture between MSR and the compiler teams. To my knowledge, MSR has used the CLR to experiment with fun things like transparent distribution, reorganizing objects based on locality, techniques for avoiding security stack crawls, interesting approaches to concurrency, and more. SPOT (Smart Object Personal Technology) is a wonderful example of what MSR has done with the CLR s basic IL and metadata design, eventually leading to a very cool product.
In my opinion, Microsoft Research strikes a great balance between long term speculative experimentation and medium term product-oriented improvements. I wish this had come across better at the PDC.
Trends
In the 6+ years I ve been at Microsoft, we ve had 4 PDCs. This is the first one I ve actually attended, because I usually have overdue work items or too many bugs. (I ve missed all 6 of our mandatory company meetings for the same reason). So I really don t have a basis for comparison.
I guess I had expected to be beaten up about all the security issues of the last year, like Slammer and Blaster. And I had expected developers to be interested in all aspects of security. Instead, the only times the topic came up in my discussions is when I raised it.
However, some of my co-workers did see a distinct change in the level of interest in security. For example, Sebastian Lange and Ivan Medvedev gave a talk on managed security to an audience of 700-800. They reported a real upswing in awareness and knowledge on the part of all PDC attendees.
But consider a talk I attended on Application Compatibility. At a time when most talks were overflowing into the hallways, this talk filled less than 50 seats of a 500 to 1000 seat meeting room. I know that AppCompat is critically important to IT. And it s a source of friction for the entire industry, since everyone is reluctant to upgrade for fear of breaking something. But for most developers this is all so boring compared to the cool visual effects we can achieve with a few lines of XAML.
Despite a trend to increased interest in security on the part of developers, I suspect that security remains more of an IT operations concern than it does a developer concern. And although the events of the last year or two have got more developers excited about security (including me!), I doubt that we will ever get developers excited about more mundane topics like versioning, admin or compatibility. This latter stuff is dead boring.
That doesn t mean that the industry is doomed. Instead, it means that modern applications must obtain strong versioning, compatibility and security guarantees by default rather than through deep developer involvement. Fortunately, this is entirely in keeping with our long term goals for managed code.
With the first release of the CLR, the guarantees for managed applications were quite limited. We guaranteed memory safety through an accurate garbage collector, type safety through verification, binding safety through strong names, and security through CAS. (However, I think we would all agree that our current support for CAS still involves far too much developer effort and not enough automated guarantees. Our security team has some great long-term ideas for addressing this.)
More importantly, we expressed programs through metadata and IL, so that we could expand the benefits of reasoning about these programs over time. And we provided metadata extensibility in the form of Custom Attributes and Custom Signature Modifiers, so that others could add to the capabilities of the managed environment without depending on the CLR team s schedule.
FxCop (http://www.gotdotnet.com/team/fxcop/) is an obvious example of how we can benefit from this ability to reason about programs. All teams developing managed code at Microsoft are religious about incorporating this tool into their build process. And since FxCop supports adding custom rules, we have added a large number of Microsoft-specific or product-specific checks.
Churn and Application Breakage
We also have some internal tools that allow us to compare different versions of assemblies so we can discover inadvertent breaking changes. Frankly, these tools are still maturing. Even in the Everett timeframe, they did a good job of blatant violations like the removal of a public method from a class or addition of a method to an interface. But they didn t catch changes in serialization format, or changes to representation after marshaling through PInvoke or COM Interop. As a result, we shipped some unintentional breaking changes in Everett , and until recently we were on a path to do so again in Whidbey.
As far as I know, these tools still don t track changes to CAS constructs, internal dependency graphs, thread-safety expectations, exception flow (including a static replacement for the checked exceptions feature), reliability contracts, or other aspects of execution. Some of these checks will probably be added over time, perhaps by adding additional metadata to assemblies to reveal the developer s intentions and to make automated validation more tractable. Other checks seem like research projects or are more appropriate for dynamic tools rather than static tools. It s very encouraging to see teams inside and outside of Microsoft working on this.
I expect that all developers will eventually have access to these or similar tools from Microsoft or 3rd parties, which can be incorporated into our build processes the way FxCop has been.
Sometimes applications break when their dependencies are upgraded to new versions. The classic example of this is Win95 applications which broke when the operating system was upgraded to WinXP. Sometimes this is because the new versions have made breaking changes to APIs. But sometimes it s because things are just different . The classic case here is where a test case runs perfectly on a developer s machine, but fails intermittently in the test lab or out in the field. The difference in environment might be obvious, like a single processor box vs. an 8-way. Yet all too often it s something truly subtle, like a DLL relocating when it misses its preferred address, or the order of DllMain notifications on a DLL_THREAD_ATTACH. In those cases, the change in environment is not the culprit. Instead, the environmental change has finally revealed an underlying bug or fragility in the application that may have been lying dormant for years.
The managed environment eliminates a number of common fragilities, like the double-free of memory blocks or the use of a file handle or Event that has already been closed. But it certainly doesn t guarantee that a multi-threaded program which appears to run correctly on a single processor will also execute without race conditions on a 32-way NUMA box. The author of the program must use techniques like code reviews, proof tools and stress testing to ensure that his code is thread-safe.
The situation that worries me the most is when an application relies on accidents of current FX and CLR implementations. These dependencies can be exceedingly subtle.
Here are some examples of breakage that we have encountered, listed in the random order they occur to me:
I could fill a lot more pages with this sort of list. And our platform is still in its infancy. Anyway, one clear message from all this is that things will change and then applications will break.
But can we categorize these failures and make some sense of it all? For each failure, we need to decide whether the platform or the application is at fault for each case. And then we need to identify some rules or mechanisms that can avoid these failures or mitigate them. I see four categories.
Category 1: The application explicitly screws itself
The easiest category to dispense with is the one where a developer intentionally and explicitly takes advantage of a behavior that s/he knows is guaranteed to change. A perfect example of this is #8 above. Anyone who navigates through private members to unmanaged internal data structures is setting himself up for problems in future versions. The responsibility (or irresponsibility in this case) lies with the application. In my opinion, the platform should have no obligations.
But consider #5 above. It s clearly in this same category, and yet opinions on our larger team were quite divided on whether we needed to fix the problem. I spoke to a number of people who definitely understood the incredible difficulty of keeping this application running on new versions of the CLR and EnterpriseServices. But they consistently argued that the operating system has traditionally held itself to this sort of compatibility bar, that this is one of the reasons for Windows ubiquity, and that the managed platform must similarly step up.
Also, we have to be realistic here. If a customer issue like this involves one of our largest accounts, or has been escalated through a very senior executive (a surprising number seem to reach Steve Ballmer), then we re going to pull out all the stops on a fix or a temporary workaround.
In many cases, our side-by-side support is an adequate and simple solution. Customers can continue to run problematic applications on their old bits, even though a new version of these bits has also been installed. For instance, the config file for an application can specify an old version of the CLR. Or binding redirects could roll back a specific assembly. But this technique falls apart if the application is actually an add-in that is dynamically loaded into a process like Internet Explorer or SQL Server. It s unrealistic to lock back the entire managed stack inside Internet Explorer (possibly preventing newer applications that use generics or other Whidbey features from running there), just so older questionable applications can keep running.
It s possible that we could provide lock back at finer-grained scopes than the process scope in future versions of the CLR. Indeed, this is one of the areas being explored by our versioning team.
Anyway, if we were under sufficient pressure I could imagine us building a one-time QFE (patch) for an important customer in this category, to help them transition to a newer version and more maintainable programming techniques. But if you aren t a Fortune 100 company or Steve Ballmer s brother-in-law, I personally hope we would be allowed to ignore any of your applications that are in this category.
Category 2: The platform explicitly screws the application
I would put #6, #7 and #11 above in a separate category. Here, the platform team wants to make an intentional breaking change for some valid reason like performance or reliability. In fact, #10 above is a very special case of this category. In #10, we would like to break compatibility in Whidbey so that we can provide a stronger model that can avoid subsequent compatibility breakage. It s a paradoxical notion that we should break compatibility now so we can increase future compatibility, but the approach really is sensible.
Anyway, if the platform makes a conscious decision to break compatibility to achieve some greater goal, then the platform is responsible for mitigation. At a minimum, we should provide a way for broken applications to obtain the old behavior, at least for some transition period. We have a few choices in how to do this, and we re likely to pick one based on engineering feasibility, the impact of a breakage, the likelihood of a breakage, and schedule pressure:
Windows Shimming
Before we look at the next two categories of AppCompat failure, it s worth taking a very quick look at one of the techniques that the operating system has traditionally used to deal with these issues. Windows has an AppCompat team which has built something called a shimming engine.
Consider what happened when the company tried to move consumers from Win95/Win98/WinMe over to WinXP. They discovered a large number of programs which used the GetVersion or the preferred GetVersionEx APIs in such a way that the programs refused to run on NT-based systems.
In fact, WinXP did such a good job of achieving compatibility with Win9X systems that in many cases the only reason the application wouldn t run was the version check that the program made at start up. The fix was to change GetVersion or GetVersionEx to lie about the version number of the current operating system. Of course, this lie should only be told to programs that need the lie in order to work properly.
I ve heard that this shim which lies about the operating system version is the most commonly applied shim we have. As I understand it, at process launch the shimming engine tries to match the current process against any entries in its database. This match could be based on the name, timestamp or size of the EXE, or of other files found relative to that EXE like a BMP for the splash screen in a subdirectory. The entry in the database lists any shims that should be applied to the process, like the one that lies about the version. The shimming engine typically bashes the IAT (import address table) of a DLL or EXE in the process, so that its imports are bound to the shim rather than to the normal export (e.g. Kernel32!GetVersionEx). In addition, the shimming engine has other tricks it perform less frequently, like wrapping COM objects up with intercepting proxies.
It s easy to see how this infrastructure can allow applications for Win95 to execute on WinXP. However, this approach has some drawbacks. First, it s rather labor-intensive. Someone has to debug the application, determine which shims will fix it, and then craft some suitable matching criteria that will identify this application in the shimming database. If an appropriate shim doesn t already exist, it must be built.
In the best case, the application has some commercial significance and Microsoft has done all the testing and shimming. But if the application is a line of business application that was created in a particular company s IT department, Microsoft will never get its hands on it. I ve heard we re now allowing sophisticated IT departments to set up their own shimming databases for their own applications but this only allows them to apply existing shims to their applications.
And from my skewed point of view the worst part of all this is that it really won t work for managed applications. For managed apps, binding is achieved through strong names, Fusion and the CLR loader. Binding is practically never achieved through DLL imports.
So it s instructive to look at some of the techniques the operating system has traditionally used. But those techniques don t necessarily apply directly to our new problems.
Anyway, back to our categories&
Category 3: The application accidentally screws itself
Category 4: The platform accidentally screws the application
Frankly, I m having trouble distinguishing these two cases. They are clearly distinct categories, but it s a judgment call where to draw the line. The common theme here is that the platform has accidentally exposed some consistent behavior which is not actually a guaranteed contract. The application implicitly acquires a dependency on this consistent behavior, and is broken when the consistency is later lost.
In the nirvana of some future fully managed execution environment, the platform and tools would never expose consistent behavior unless it was part of a guarantee. Let s look at some examples and see how practical this is.
In example #1 above, reflection used to deliver members in a stable order. In Whidbey, that order changes. In hindsight, there s a simple solution here. V1 of the product could have contained a testing mode that randomized the returned order. This would have exposed the developer to our actual guarantees, rather than to a stronger accidental consistency. Within the CLR, we ve used this sort of technique to force us down code paths that otherwise wouldn t be exercised. For example, developers on the CLR team all use NT-based (Unicode) systems and avoid Win9X (Ansi) systems. So our Win9X Ansi/Unicode wrappers wouldn t typically get tested by developers. To address this, our checked/debug CLR build originally considered the day of the week and used Ansi code paths every other day. But imagine chasing a bug at 11:55 PM . When the bug magically disappears on your next run at 1:03 AM the next morning, you are far too frazzled to think clearly about the reason. Today, we tend to use low order bits in the size of an image like mscorwks.dll or the assembly being tested, so our randomization is now more friendly to testing.
In example #2 above, you could imagine a similar perturbation on our AutoLayout algorithms when executing a debug version of an application, or when launched from inside a tool like Visual Studio.
For example #4, the CLR already has internal stress modes that force different and aggressive GC schedules. These can guarantee compaction to increase the likelihood of detecting stale references. They can perform extensive checks of the integrity of the heap, to ensure that the write barrier and other mechanisms are effective. And they can ensure that every instruction of JITted managed code that can synchronize with the GC will synchronize with the GC. I suspect that these modes would do a partial job of eradicating assumptions about lifetimes reported by the JIT. However, we will remain exposed to significantly different code generators (like Rotor s FJIT) or execution on significantly different architectures (like CPUs with dramatically more registers).
In contrast with the above difficulty, it s easy to imagine adding a new GC stress mode that perturbs the finalization queues, to uncover any hidden assumptions about finalization order. This would address example #3.
Customer Debug Probes, AppVerifier and other tools
It turns out that the CLR already has a partial mechanism for enabling perturbation during testing and removing it on deployed applications. This mechanism is the Customer Debug Probes feature that we shipped in V1.1. Adam Nathan s excellent blog site has a series of articles on CDPs, which are collected together at http://blogs.gotdotnet.com/anathan/CategoryView.aspx/Debugging. The original goal of CDPs was to counteract the black box nature of debugging certain failures of managed applications, like corruptions of the GC heap or crashes due to incorrect marshaling directives. These probes can automatically diagnose common application errors, like failing to keep a marshaled delegate rooted so it won t be collected. This approach is so much easier than wading through dynamically generated code without symbols, because we tell you exactly where your bugs are. But we re now realizing that we can also use CDPs to increase the future compatibility of managed applications if we can perturb current behavior that is likely to change in the future.
Unfortunately, example #6 from above reveals a major drawback with the technique of perturbation. When we built the original implementation of Object.GetHashCode, we simply never considered the difference between what we wanted to guarantee (hashing) and what we actually delivered (OIDs). In hindsight, it is obvious. But I m not convinced that we aren t falling into similar traps in our new features. We might be a little smarter than we were five years ago, but only a little.
Example #10 worries me for similar reasons. I just don t think we were smart enough to predict that changing the binding configuration of an AppDomain after starting to execute code in that AppDomain would be so fragile. When a developer delivers a feature, s/he needs to consider security, thread-safety, programming model, key invariants of the code base like GC reporting, correctness, and so many other aspects. It would be amazing if a developer consistently nailed each of these aspects for every new feature. We re kidding ourselves if we think that evolution and unintentional implicit contracts will get adequate developer attention on every new feature.
Even if we had perfect foresight and sufficient resources to add perturbation for all operations, we would still have a major problem. We can t necessarily rely on 3rd party developers to test their applications with perturbation enabled. Consider the unmanaged AppVerifier experience.
The operating system has traditionally offered a dynamic testing tool called AppVerifier which can diagnose many common unmanaged application bugs. For example, thanks to uploads of Watson process dumps from the field, most unmanaged application crashes can now be attributed to incorrect usage of dynamically allocated memory. Yet AppVerifier can use techniques like placing each allocation in its own page or leaving pages unmapped after release, to deterministically catch overruns, double frees, and reads or writes of freed memory.
In other words, there is hard evidence that if every unmanaged application had just used the memory checking support of AppVerifier, then two out of every three application crashes would be eliminated. Clearly this didn t happen.
Of course, AppVerifier can diagnose far more than just memory problems. And it s very easy and convenient to use.
Since testing with AppVerifier is part of the Windows Logo compliance program, you would expect that it s used fairly rigorously by ISVs. And, given its utility, you would expect that most IT organizations would use this tool for their internal applications. Unfortunately, this isn t the case. Many applications submitted for the Windows Logo actually fail to launch under AppVerifier. In other words, they violate at least one of the rules before they finish initializing.
The Windows AppCompat team recognizes that proactive tools like AppVerifier are so much better than reactive mitigation like shimming broken applications out in the field. That s why they made the AppVerifier tool a major focus of their poorly attended Application Compatibility talk that I sat in on at the PDC. (Aha! I really was going somewhere with all this.)
There s got to be a reason why developers don t use such a valuable tool. In my opinion, the reason is that AppVerifier is not integrated into Visual Studio. If the Debug Properties in VS allowed you to enable AppVerifier and CDP checks, we would have much better uptake. And if an integrated project system and test system could monitor code coverage numbers, and suggest particular test runs with particular probes enabled, we would be approaching nirvana.
Winding Down
Looking at development within Microsoft, one trend is very clear: Automated tools and processes are a wonderful supplement for human developers. Whether we re talking about security, reliability, performance, application compatibility or any other measure of software quality, we re now seeing that static and dynamic analysis tools can give us guarantees that we will never obtain from human beings. Bill Gates touched on this during his PDC keynote, when he described our new tools for statically verifying device driver correctness, for some definition of correctness.
This trend was very clear to me during the weeks I spent on the DCOM / RPCSS security fire drill. I spent days looking at some clever marshaling code, eventually satisfying myself that it worked perfectly. Then someone else wrote an automated attacker and discovered real flaws in just a few hours. Other architects and senior developers scrutinized different sections of the code. Then some researchers from MSR who are focused on automatic program validation ran their latest tools over the same code and gave us step-by-step execution models that led up to crashes. Towards the end of the fire drill, a virtuous cycle was established. The code reviewers noticed new categories of vulnerabilities. Then the researchers tried to evolve their tools to detect those vulnerabilities. Aspects of this process were very raw, so the tools sometimes produced a great deal of noise in the form of false positives. But it s clear that we were getting real value from Day One and the future potential here is enormous.
One question that always comes up, when we talk about adding significant value to Visual Studio through additional tools, is whether Microsoft should give away these tools. It s a contentious issue, and I find myself going backwards and forwards on it. One school of thought says that we should give away tools to promote the platform and improve all the programs in the Windows ecology. In the case of tools that make our customers applications more secure or more resilient to future changes in the platform, this is a compelling argument. Another school of thought says that Visual Studio is a profit center like any other part of the company, and it needs the freedom to charge what the market will bear.
Given that my job is building a platform, you might expect me to favor giving away Visual Studio. But I actually think the profit motive is a powerful mechanism for making our tools competitive. If Visual Studio doesn t have P&L responsibility, their offering will deteriorate over time. The best way to know whether they ve done all they can to make the best tools possible, is to measure how much their customers are willing to pay. I want Borland to compete with Microsoft on building the best tools at the best price, and I want to be able to measure the results of that competition through revenue and market penetration.
In all this, I have avoided really talking about the issues of versioning. Of course, versioning and application compatibility are enormously intertwined. Applications break for many reasons, but the typical reason is that one component is now binding to a new version of another component. We have a whole team of architects, gathered from around the company, who have been meeting regularly for about a year to grapple with the problems of a complete managed versioning story. Unlike managed AppCompat, the intellectual investment in managed versioning has been enormous.
Anyway, Application Compatibility remains a relatively contentious subject over here. There s no question that it s a hugely important topic which will have a big impact on the longevity of our platform. But we are still trying to develop techniques for achieving compatibility that will be more successful than what Windows has done in the past, without limiting our ability to innovate on what is still a very young execution engine and set of frameworks. I have deliberately avoided talking about what some of those techniques might be, in part because our story remains incomplete.
Also, we won t realize how badly AppCompat will bite us until we can see a lot of deployed applications that are breaking as we upgrade the platform. At that point, it s easier to justify throwing more resources at the problem. But by then the genie is out of the bottle& the deployed applications will already depend on brittle accidents of implementation, so recovery will be painfully breaking. In a world where we are always under intense resource and schedule pressure, the needs of AppCompat must be balanced against performance, security, developer productivity, reliability, innovation and all the other must haves .
You know, I really do want to talk about Hosting. It is a truly fascinating subject. I m much more comfortable talking about non-preemptive fiber scheduling than I am talking about uninteresting topics like implicit contracts and compatibility trends.
But Hosting is going to have to wait at least a few more weeks.