The PDC has happened, which means two things. I
can post some of my (slightly self-censored) reactions to the show, and I can talk
about what we ve disclosed about Whidbey and Longhorn more freely. In
this particular case, I had promised to talk about the deep changes we re making
in Whidbey to allow you to host the CLR in your process. As
you ll see, I got side tracked and ended up discussing Application Compatibility
But first, my impressions of the PDC:
The first keynote, with Bill, Jim
& Longhorn, was guaranteed to be good. It had all the coolness of Avalon,
WinFS and Indigo, so of course it was impressive. In fact, throughout all the
sessions I attended, I was surprised by the apparent polish
and maturity of Longhorn. In my opinion, Avalon looked like it is the most mature
and settled. Indigo also looked surprisingly real. WinFS looked good in
the keynote, where it was all about the justification for the technology. But
in the drill-down sessions, I had the sense that it s not as far along as the others.
Hopefully all the attendees realize
that Longhorn is still a long way off. It
s hard to see from the demos, but a lot of fundamental design issues and huge missing
Incidentally, I still can t believe
that we picked WinFX to describe the extended managed frameworks and WinFS to describe
the new storage system. One of those
names has got to go.
I was worried that the Whidbey keynote
on Tuesday would appear mundane and old-fashioned by comparison. But to an audience
of developers, Eric's keynote looked very good indeed. Visual Studio looked
better than I've ever seen it. The device app was so easy to write that I feel
I could build a FedEx-style package tracking application in a weekend.
of this keynote was ASP.NET. I hadn't been paying attention to what they've
done recently, so I was blown away by the personalization system and by the user-customizable
web pages. If I had seen a site like that, I would have assumed the author spent
weeks getting it to work properly. It
s hard to believe this can all be done with drag-and-drop.
In V1, ASP.NET hit a home run by focusing
like a laser beam on the developer experience. Everyone put so much effort into
building apps, questioning why each step was necessary, and refining the process.
It's great to see that they continue to follow that same discipline. In the
drill-down sessions, over and over again I saw that focus resulting in a near perfect
experience for developers. There are
some other teams, like Avalon, that seem to have a similar religion and are obtaining
similar results. (Though Avalon desperately
needs some tools support. Notepad is
fine for authoring XAML in demos, but I wouldn t want to build a real application
Compared to ASP.NET, some other teams
at Microsoft are still living in the Stone Age. Those
teams are still on a traditional cycle of building features, waiting for customers
to build applications with those features, and then incorporating any feedback. Beta
is way too late to find out that the programming model is clumsy. We
shouldn t be shirking our design responsibilities like this.
Anyway, the 3rd keynote (from Rick
Rashid & Microsoft Research) should have pulled it all together. I think
the clear message should have been something like:
is coming next and has great developer features. After that, Longhorn will arrive
and will change everything. Fortunately, Microsoft Research is looking 10+ years
out, so you can be sure we will increasingly drive the whole industry.
This should have been an easy story
to tell. The fact is that MSR is a world class research institution. Browse
the Projects, Topics or People categories at http://research.microsoft.com and
you ll see many name brand researchers like Butler Lampson and Jim Gray. You
will see tremendous breadth on the areas under research, from pure math and algorithms
to speech, graphics and natural language. There
are even some esoterica like nanotech and quantum computing. We
should have used the number of published papers and other measurements to compare
MSR with other research groups in the software industry, and with major research universities. And
then we should have shown some whiz-bang demos of about 2 minutes each.
Unfortunately, I think instead we
sent a message that Interesting technology comes from Microsoft product groups,
while MSR is largely irrelevant. Yet
nothing could be further from the truth. Even
if I restrict consideration to the CLR, MSR has had a big impact. Generics
is one of the biggest feature added to the CLR, C# or the base Frameworks in Whidbey. This
feature was added to the CLR by MSR team members, who now know at least as much about
our code base as we do. All the CLR
s plans for significantly improved code quality and portable compilers depend on a
joint venture between MSR and the compiler teams. To
my knowledge, MSR has used the CLR to experiment with fun things like transparent
distribution, reorganizing objects based on locality, techniques for avoiding security
stack crawls, interesting approaches to concurrency, and more. SPOT
(Smart Object Personal Technology) is a wonderful example of what MSR has done with
the CLR s basic IL and metadata design, eventually leading to a very cool product.
In my opinion, Microsoft Research
strikes a great balance between long term speculative experimentation and medium term
product-oriented improvements. I wish
this had come across better at the PDC.
In the 6+ years I ve been at Microsoft,
we ve had 4 PDCs. This is the first
one I ve actually attended, because I usually have overdue work items or too many
bugs. (I ve missed all 6 of our mandatory
company meetings for the same reason). So
I really don t have a basis for comparison.
I guess I had expected to be beaten
up about all the security issues of the last year, like Slammer and Blaster.
And I had expected developers to be interested in all aspects of security. Instead,
the only times the topic came up in my discussions is when I raised it.
However, some of my co-workers did
see a distinct change in the level of interest in security. For
example, Sebastian Lange and Ivan Medvedev gave a talk on managed security to an audience
of 700-800. They reported a real upswing
in awareness and knowledge on the part of all PDC attendees.
But consider a talk I attended on
Application Compatibility. At a time
when most talks were overflowing into the hallways, this talk filled less than 50
seats of a 500 to 1000 seat meeting room. I
know that AppCompat is critically important to IT. And
it s a source of friction for the entire industry, since everyone is reluctant to
upgrade for fear of breaking something. But
for most developers this is all so boring compared to the cool visual effects we can
achieve with a few lines of XAML.
Despite a trend to increased interest
in security on the part of developers, I suspect that security remains more of an
IT operations concern than it does a developer concern. And
although the events of the last year or two have got more developers excited about
security (including me!), I doubt that we will ever get developers excited about more
mundane topics like versioning, admin or compatibility. This
latter stuff is dead boring.
That doesn t mean that the industry
is doomed. Instead, it means that modern
applications must obtain strong versioning, compatibility and security guarantees
by default rather than through deep developer involvement. Fortunately,
this is entirely in keeping with our long term goals for managed code.
With the first release of the CLR,
the guarantees for managed applications were quite limited. We
guaranteed memory safety through an accurate garbage collector, type safety through
verification, binding safety through strong names, and security through CAS. (However,
I think we would all agree that our current support for CAS still involves far too
much developer effort and not enough automated guarantees. Our
security team has some great long-term ideas for addressing this.)
More importantly, we expressed programs
through metadata and IL, so that we could expand the benefits of reasoning about these
programs over time. And we provided metadata
extensibility in the form of Custom Attributes and Custom Signature Modifiers, so
that others could add to the capabilities of the managed environment without depending
on the CLR team s schedule.
is an obvious example of how we can benefit from this ability to reason about programs. All
teams developing managed code at Microsoft are religious about incorporating this
tool into their build process. And since
FxCop supports adding custom rules, we have added a large number of Microsoft-specific
or product-specific checks.
Churn and Application Breakage
We also have some internal tools that
allow us to compare different versions of assemblies so we can discover inadvertent
breaking changes. Frankly, these tools
are still maturing. Even in the
timeframe, they did a good job of blatant violations like the removal of a public
method from a class or addition of a method to an interface. But
they didn t catch changes in serialization format, or changes to representation after
marshaling through PInvoke or COM Interop. As
a result, we shipped some unintentional breaking changes in
, and until recently we were on a path to do so again in Whidbey.
As far as I know, these tools still
don t track changes to CAS constructs, internal dependency graphs, thread-safety
expectations, exception flow (including a static replacement for the checked exceptions
feature), reliability contracts, or other aspects of execution. Some
of these checks will probably be added over time, perhaps by adding additional metadata
to assemblies to reveal the developer s intentions and to make automated validation
more tractable. Other checks seem like
research projects or are more appropriate for dynamic tools rather than static tools. It
s very encouraging to see teams inside and outside of Microsoft working on this.
I expect that all developers will
eventually have access to these or similar tools from Microsoft or 3rd parties,
which can be incorporated into our build processes the way FxCop has been.
Sometimes applications break when
their dependencies are upgraded to new versions. The
classic example of this is Win95 applications which broke when the operating system
was upgraded to WinXP. Sometimes this
is because the new versions have made breaking changes to APIs. But
sometimes it s because things are just different . The
classic case here is where a test case runs perfectly on a developer s machine, but
fails intermittently in the test lab or out in the field. The
difference in environment might be obvious, like a single processor box vs. an 8-way. Yet
all too often it s something truly subtle, like a DLL relocating when it misses its
preferred address, or the order of DllMain notifications on a DLL_THREAD_ATTACH. In
those cases, the change in environment is not the culprit. Instead,
the environmental change has finally revealed an underlying bug or fragility in the
application that may have been lying dormant for years.
The managed environment eliminates
a number of common fragilities, like the double-free of memory blocks or the use of
a file handle or Event that has already been closed. But
it certainly doesn t guarantee that a multi-threaded program which appears to run
correctly on a single processor will also execute without race conditions on a 32-way
NUMA box. The author of the program must
use techniques like code reviews, proof tools and stress testing to ensure that his
code is thread-safe.
The situation that worries me the most is when an application
relies on accidents of current FX and CLR implementations. These
dependencies can be exceedingly subtle.
Here are some examples of breakage that we have encountered,
listed in the random order they occur to me:
I could fill a lot more pages with this sort of list. And
our platform is still in its infancy. Anyway,
one clear message from all this is that things will change and then applications will
But can we categorize these failures and make some sense
of it all? For each failure, we need
to decide whether the platform or the application is at fault for each case. And
then we need to identify some rules or mechanisms that can avoid these failures or
mitigate them. I see four categories.
1: The application explicitly screws
The easiest category to dispense with is the one where
a developer intentionally and explicitly takes advantage of a behavior that s/he knows
is guaranteed to change. A perfect example
of this is #8 above. Anyone who navigates
through private members to unmanaged internal data structures is setting himself up
for problems in future versions. The
responsibility (or irresponsibility in this case) lies with the application. In
my opinion, the platform should have no obligations.
But consider #5 above. It
s clearly in this same category, and yet opinions on our larger team were quite divided
on whether we needed to fix the problem. I
spoke to a number of people who definitely understood the incredible difficulty of
keeping this application running on new versions of the CLR and EnterpriseServices. But
they consistently argued that the operating system has traditionally held itself to
this sort of compatibility bar, that this is one of the reasons for Windows ubiquity,
and that the managed platform must similarly step up.
Also, we have to be realistic here. If
a customer issue like this involves one of our largest accounts, or has been escalated
through a very senior executive (a surprising number seem to reach Steve Ballmer),
then we re going to pull out all the stops on a fix or a temporary workaround.
In many cases, our side-by-side support is an adequate
and simple solution. Customers can continue
to run problematic applications on their old bits, even though a new version of these
bits has also been installed. For instance,
the config file for an application can specify an old version of the CLR. Or
binding redirects could roll back a specific assembly. But
this technique falls apart if the application is actually an add-in that is dynamically
loaded into a process like Internet Explorer or SQL Server. It
s unrealistic to lock back the entire managed stack inside Internet Explorer (possibly
preventing newer applications that use generics or other Whidbey features from running
there), just so older questionable applications can keep running.
It s possible that we could provide lock back at finer-grained
scopes than the process scope in future versions of the CLR. Indeed,
this is one of the areas being explored by our versioning team.
Anyway, if we were under sufficient pressure I could
imagine us building a one-time QFE (patch) for an important customer in this category,
to help them transition to a newer version and more maintainable programming techniques. But
if you aren t a Fortune 100 company or Steve Ballmer s brother-in-law, I personally
hope we would be allowed to ignore any of your applications that are in this category.
2: The platform explicitly screws the
I would put #6, #7 and #11 above in a separate category. Here,
the platform team wants to make an intentional breaking change for some valid reason
like performance or reliability. In fact,
#10 above is a very special case of this category. In
#10, we would like to break compatibility in Whidbey so that we can provide a stronger
model that can avoid subsequent compatibility breakage. It
s a paradoxical notion that we should break compatibility now so we can increase future
compatibility, but the approach really is sensible.
Anyway, if the platform makes a conscious decision to
break compatibility to achieve some greater goal, then the platform is responsible
for mitigation. At a minimum, we should
provide a way for broken applications to obtain the old behavior, at least for some
transition period. We have a few choices
in how to do this, and we re likely to pick one based on engineering feasibility,
the impact of a breakage, the likelihood of a breakage, and schedule pressure:
Before we look at the next two categories of AppCompat
failure, it s worth taking a very quick look at one of the techniques that the operating
system has traditionally used to deal with these issues. Windows
has an AppCompat team which has built something called a shimming engine.
Consider what happened when the company tried to move
consumers from Win95/Win98/WinMe over to WinXP. They
discovered a large number of programs which used the GetVersion or the preferred GetVersionEx
APIs in such a way that the programs refused to run on NT-based systems.
In fact, WinXP did such a good job of achieving compatibility
with Win9X systems that in many cases the only reason
the application wouldn t run was the version check that the program made at start
up. The fix was to change GetVersion
or GetVersionEx to lie about the version number of the current operating system. Of
course, this lie should only be told to programs that need the lie in order to work
I ve heard that this shim which lies about the operating
system version is the most commonly applied shim we have. As
I understand it, at process launch the shimming engine tries to match the current
process against any entries in its database. This
match could be based on the name, timestamp or size of the EXE, or of other files
found relative to that EXE like a BMP for the splash screen in a subdirectory. The
entry in the database lists any shims that should be applied to the process, like
the one that lies about the version. The
shimming engine typically bashes the IAT (import address table) of a DLL or EXE in
the process, so that its imports are bound to the shim rather than to the normal export
(e.g. Kernel32!GetVersionEx). In addition,
the shimming engine has other tricks it perform less frequently, like wrapping COM
objects up with intercepting proxies.
It s easy to see how this infrastructure can allow applications
for Win95 to execute on WinXP. However,
this approach has some drawbacks. First,
it s rather labor-intensive. Someone
has to debug the application, determine which shims will fix it, and then craft some
suitable matching criteria that will identify this application in the shimming database. If
an appropriate shim doesn t already exist, it must be built.
In the best case, the application has some commercial
significance and Microsoft has done all the testing and shimming. But
if the application is a line of business application that was created in a particular
company s IT department, Microsoft will never get its hands on it. I
ve heard we re now allowing sophisticated IT departments to set up their own shimming
databases for their own applications but this only allows them to apply existing
shims to their applications.
And from my skewed point of view the worst part of
all this is that it really won t work for managed applications. For
managed apps, binding is achieved through strong names, Fusion and the CLR loader. Binding
is practically never achieved through DLL imports.
So it s instructive to look at some of the techniques
the operating system has traditionally used. But
those techniques don t necessarily apply directly to our new problems.
Anyway, back to our categories&
3: The application accidentally screws
4: The platform accidentally screws the
Frankly, I m having trouble distinguishing these two
cases. They are clearly distinct categories,
but it s a judgment call where to draw the line. The
common theme here is that the platform has accidentally exposed some consistent behavior
which is not actually a guaranteed contract. The
application implicitly acquires a dependency on this consistent behavior, and is broken
when the consistency is later lost.
In the nirvana of some future fully managed execution
environment, the platform and tools would never expose consistent behavior unless
it was part of a guarantee. Let s look
at some examples and see how practical this is.
In example #1 above, reflection used to deliver members
in a stable order. In Whidbey, that order
changes. In hindsight, there s a simple
solution here. V1 of the product could
have contained a testing mode that randomized the returned order. This
would have exposed the developer to our actual guarantees, rather than to a stronger
accidental consistency. Within the CLR,
we ve used this sort of technique to force us down code paths that otherwise wouldn
t be exercised. For example, developers
on the CLR team all use NT-based (Unicode) systems and avoid Win9X (Ansi) systems. So
our Win9X Ansi/Unicode wrappers wouldn t typically get tested by developers. To
address this, our checked/debug CLR build originally considered the day of the week
and used Ansi code paths every other day. But
imagine chasing a bug at
. When the bug magically disappears on
your next run at
the next morning, you are far too frazzled to think clearly about the reason. Today,
we tend to use low order bits in the size of an image like mscorwks.dll or the assembly
being tested, so our randomization is now more friendly to testing.
In example #2 above, you could imagine a similar perturbation
on our AutoLayout algorithms when executing a debug version of an application, or
when launched from inside a tool like Visual Studio.
For example #4, the CLR already has internal stress modes
that force different and aggressive GC schedules. These
can guarantee compaction to increase the likelihood of detecting stale references. They
can perform extensive checks of the integrity of the heap, to ensure that the write
barrier and other mechanisms are effective. And
they can ensure that every instruction of JITted managed code that can synchronize
with the GC will synchronize with the GC. I
suspect that these modes would do a partial job of eradicating assumptions about lifetimes
reported by the JIT. However, we will
remain exposed to significantly different code generators (like Rotor s FJIT) or
execution on significantly different architectures (like CPUs with dramatically more
In contrast with the above difficulty, it s easy to
imagine adding a new GC stress mode that perturbs the finalization queues, to uncover
any hidden assumptions about finalization order. This
would address example #3.
Customer Debug Probes, AppVerifier and other
It turns out that the CLR already has a partial mechanism
for enabling perturbation during testing and removing it on deployed applications. This
mechanism is the Customer Debug Probes feature that we shipped in V1.1. Adam
Nathan s excellent blog site has a series of articles on CDPs, which are collected
together at http://blogs.gotdotnet.com/anathan/CategoryView.aspx/Debugging. The
original goal of CDPs was to counteract the black box nature of debugging certain
failures of managed applications, like corruptions of the GC heap or crashes due to
incorrect marshaling directives. These
probes can automatically diagnose common application errors, like failing to keep
a marshaled delegate rooted so it won t be collected. This
approach is so much easier than wading through dynamically generated code without
symbols, because we tell you exactly where your bugs are. But
we re now realizing that we can also use CDPs to increase the future compatibility
of managed applications if we can perturb current behavior that is likely to change
in the future.
Unfortunately, example #6 from above reveals a major
drawback with the technique of perturbation. When
we built the original implementation of Object.GetHashCode, we simply never considered
the difference between what we wanted to guarantee (hashing) and what we actually
delivered (OIDs). In hindsight, it is
obvious. But I m not convinced that
we aren t falling into similar traps in our new features. We
might be a little smarter than we were five years ago, but only a little.
Example #10 worries me for similar reasons. I
just don t think we were smart enough to predict that changing the binding configuration
of an AppDomain after starting to execute code in that AppDomain would be so fragile. When
a developer delivers a feature, s/he needs to consider security, thread-safety, programming
model, key invariants of the code base like GC reporting, correctness, and so many
other aspects. It would be amazing if
a developer consistently nailed each of these aspects for every new feature. We
re kidding ourselves if we think that evolution and unintentional implicit contracts
will get adequate developer attention on every new feature.
Even if we had perfect foresight and sufficient resources
to add perturbation for all operations, we would still have a major problem. We
can t necessarily rely on 3rd party developers to test their applications
with perturbation enabled. Consider the
unmanaged AppVerifier experience.
The operating system has traditionally offered a dynamic
testing tool called AppVerifier which can diagnose many common unmanaged application
bugs. For example, thanks to uploads
of Watson process dumps from the field, most unmanaged application crashes can now
be attributed to incorrect usage of dynamically allocated memory. Yet
AppVerifier can use techniques like placing each allocation in its own page or leaving
pages unmapped after release, to deterministically catch overruns, double frees, and
reads or writes of freed memory.
In other words, there is hard evidence that if every
unmanaged application had just used the memory checking support of AppVerifier, then
two out of every three application crashes would be eliminated. Clearly
this didn t happen.
Of course, AppVerifier can diagnose far more than just
memory problems. And it s very easy
and convenient to use.
Since testing with AppVerifier is part of the Windows
Logo compliance program, you would expect that it s used fairly rigorously by ISVs. And,
given its utility, you would expect that most IT organizations would use this tool
for their internal applications. Unfortunately,
this isn t the case. Many applications
submitted for the Windows Logo actually fail to launch under AppVerifier. In
other words, they violate at least one of the rules before they finish initializing.
The Windows AppCompat team recognizes that proactive
tools like AppVerifier are so much better than reactive mitigation like shimming broken
applications out in the field. That
s why they made the AppVerifier tool a major focus of their poorly attended Application
Compatibility talk that I sat in on at the PDC. (Aha! I
really was going somewhere with all this.)
There s got to be a reason why developers don t use
such a valuable tool. In my opinion,
the reason is that AppVerifier is not integrated into Visual Studio. If
the Debug Properties in VS allowed you to enable AppVerifier and CDP checks, we would
have much better uptake. And if an integrated
project system and test system could monitor code coverage numbers, and suggest particular
test runs with particular probes enabled, we would be approaching nirvana.
Looking at development within Microsoft, one trend is
very clear: Automated tools and processes
are a wonderful supplement for human developers. Whether
we re talking about security, reliability, performance, application compatibility
or any other measure of software quality, we re now seeing that static and dynamic
analysis tools can give us guarantees that we will never obtain from human beings. Bill
Gates touched on this during his PDC keynote, when he described our new tools for
statically verifying device driver correctness, for some definition of correctness.
This trend was very clear to me during the weeks I spent
on the DCOM / RPCSS security fire drill. I
spent days looking at some clever marshaling code, eventually satisfying myself that
it worked perfectly. Then someone else
wrote an automated attacker and discovered real flaws in just a few hours. Other
architects and senior developers scrutinized different sections of the code. Then
some researchers from MSR who are focused on automatic program validation ran their
latest tools over the same code and gave us step-by-step execution models that led
up to crashes. Towards the end of the
fire drill, a virtuous cycle was established. The
code reviewers noticed new categories of vulnerabilities. Then
the researchers tried to evolve their tools to detect those vulnerabilities. Aspects
of this process were very raw, so the tools sometimes produced a great deal of noise
in the form of false positives. But it
s clear that we were getting real value from Day One and the future potential here
One question that always comes up, when we talk about
adding significant value to Visual Studio through additional tools, is whether Microsoft
should give away these tools. It s a
contentious issue, and I find myself going backwards and forwards on it. One
school of thought says that we should give away tools to promote the platform and
improve all the programs in the Windows ecology. In
the case of tools that make our customers applications more secure or more resilient
to future changes in the platform, this is a compelling argument. Another
school of thought says that Visual Studio is a profit center like any other part of
the company, and it needs the freedom to charge what the market will bear.
Given that my job is building a platform, you might expect
me to favor giving away Visual Studio. But
I actually think the profit motive is a powerful mechanism for making our tools competitive. If
Visual Studio doesn t have P&L responsibility, their offering will deteriorate
over time. The best way to know whether
they ve done all they can to make the best tools possible, is to measure how much
their customers are willing to pay. I
want Borland to compete with Microsoft on building the best tools at the best price,
and I want to be able to measure the results of that competition through revenue and
In all this, I have avoided really talking about the
issues of versioning. Of course, versioning
and application compatibility are enormously intertwined. Applications
break for many reasons, but the typical reason is that one component is now binding
to a new version of another component. We
have a whole team of architects, gathered from around the company, who have been meeting
regularly for about a year to grapple with the problems of a complete managed versioning
story. Unlike managed AppCompat, the
intellectual investment in managed versioning has been enormous.
Anyway, Application Compatibility remains a relatively
contentious subject over here. There
s no question that it s a hugely important topic which will have a big impact on
the longevity of our platform. But we
are still trying to develop techniques for achieving compatibility that will be more
successful than what Windows has done in the past, without limiting our ability to
innovate on what is still a very young execution engine and set of frameworks. I
have deliberately avoided talking about what some of those techniques might be, in
part because our story remains incomplete.
Also, we won t realize how badly AppCompat will bite
us until we can see a lot of deployed applications that are breaking as we upgrade
the platform. At that point, it s easier
to justify throwing more resources at the problem. But
by then the genie is out of the bottle& the deployed applications will already
depend on brittle accidents of implementation, so recovery will be painfully breaking. In
a world where we are always under intense resource and schedule pressure, the needs
of AppCompat must be balanced against performance, security, developer productivity,
reliability, innovation and all the other must haves .
You know, I really do want to talk about Hosting. It
is a truly fascinating subject. I m
much more comfortable talking about non-preemptive fiber scheduling than I am talking
about uninteresting topics like implicit contracts and compatibility trends.
But Hosting is going to have to wait at least a few more