Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
Hi Jon DeVaan here.
Steven wrote about how we organize the engineering team on Windows which is a very important element of how work is done. Another important part is how we organize the engineering project itself.
I’d like to start with a couple of quick notes. First is that Steven reads and writes about ten times faster than I do, so don’t be too surprised if you see about that distribution of words between the two of us here. (Be assured that between us I am the deep thinker :-). Or maybe I am just jealous.) Second is that we want do want to keep sharing the “how we build Windows 7” topics since that gives us a shared context for when we dive into feature discussion as we get closer to the PDC and WinHEC. We want to discuss how we are engineering Windows 7 including the lessons learned from Longhorn/Vista. All of these realities go into our decision making on Windows 7.
OK, on to the tawdry bits.
Steven linked last time to the book Microsoft Secrets, which is an excellent analysis of what I like to call version two of the Microsoft Engineering System. (Version one involved index cards and “floppy net” and you really don’t want to hear about it.) Version two served Microsoft very well for far longer than anyone anticipated, but learning from Windows XP, the truly different security environment that emerged at that time and from Longhorn/Vista, it became clear that it was time for another generational transformation in how we approach engineering our products.
The lessons from XP revolve around the changed security landscape in our industry. You can learn about how we put our learning into action by looking at the Security Development Lifecycle, which is the set of engineering practices recommended by Microsoft to develop more secure software. We use these practices internally to engineer Windows.
The comments on this blog show that the quality of a complete system contains many different attributes, each of varying importance to different people, and that people have a wide range of opinions about Vista’s overall quality. I spend a lot of time on core reliability of the OS and in studying the telemetry we collect from real users (only if they opt-in to the Customer Experience Improvement Program) I know that Vista SP1 is just as reliable as XP overall and more reliable in some important ways. The telemetry guided us on what to address in SP1. I was glad to see one way pointed out by people commenting about sleep and resume working better in Vista. I am also excited by the prospect of continuing our efforts (we are) using the telemetry to drive Vista to be the most reliable version of Windows ever. I add to the list of Vista’s qualities successfully cutting security vulnerabilities by just under half compared to XP. This blog is about Windows 7, but you should know that we are working on Windows 7 with a deep understanding of the performance of Windows Vista in the real world.
In the most important ways, people who have emailed and commented have highlighted opportunities for us to improve the Windows engineering system. Performance, reliability, compatibility, and failing to deliver on new technology promises are popular themes in the comments. One of the best ways we can address these is by better day-to-day management of the engineering of the Windows 7 code base—or the daily build quality. We have taken many concrete steps to improve how we manage the project so that we do much better on this dimension.
I hope you are reading this and going, “Well, duh!” but my experience with software projects of all sizes and in many organizations tells me this is not as obvious or easily attainable as we wish.
Daily Build Quality
Daily quality matters a great deal in a software project because every day you make decisions based on your best understanding of how much work is left. When the average daily build has low quality, it is impossible to know how much work is left, and you make a lot of bad engineering decisions. As the number of contributing engineers increases (because we want to do more), the importance of daily quality rises rapidly because the integration burden increases according to the probability of any single programmer’s error. This problem is more than just not knowing what the number of bugs in the product is. If that were all the trouble caused then at least each developer would have their fate in their own hands. The much more insidious side-effect is when developers lack the confidence to integrate all of the daily changes into their personal work. When this happens there are many bugs, incompatibilities, and other issues that we can’t know because the code changes have never been brought together on any machine.
I’ve prepared a graph to illustrate the phenomenon using a simple formula predicting the build breaks caused by a 1 in 100 error rate on the part of individual programmers over a spectrum of group sizes (blue line). A one percent error rate is good. If one used a typical rate it would be a little worse than that. I’ve included two other lines showing the build break probability if we cut the average individual error rate by half (red line) and by a tenth (green line). You can see that mechanisms that improve the daily quality of each engineer impacts the overall daily build quality by quite a large amount.
For a team the size of Windows, it is quite a feat for the daily builds to be reliable.
Our improvement in Windows 7 leveraged a big improvement in the Vista engineering system, an investment in a common test automation infrastructure across all the feature teams of Windows. (You will see here that there is an inevitable link between the engineering processes themselves and the organization of the team, a link many people don’t recognize.) Using this infrastructure, we can verify the code changes supplied by every feature team before they are merged into the daily build. Inside of the feature team this infrastructure can be used to verify the code changes of all of the programmers every day. You can see in the chart how the average of 40 programmers per feature team balances the build break probability so that inside of a feature team the build breaks relatively infrequently.
For Windows 7 we have largely succeeded at keeping the build at a high level of quality every day. While we have occasional breaks as we integrate the work of all the developers, the automation allows us to find and repair any issues and issue a high quality build virtually every day. I have been using Windows 7 for my daily life since the start of the project with relatively few difficulties. (I know many folks are anxious to join me in using Windows 7 builds every day—hang in there!)
For fun I’ve included a couple pictures from our build lab where builds and verification tests for servers and clients are running 24x7:
Whew! That seems like a wind sprint through a deep topic that I spend a lot of time on, but I hope you found it interesting. I hope you start to get the idea that we have been very holistic in thinking through new ways of working and improvements to how we engineer Windows through this example. The ultimate test of our thinking will be the quality of product itself. What is your point of view on this important software engineering issue?
As someone who, for almost 30 years, has been using automatic processes to prevent the introduction of errors into team-developed software, I'd be interested in hearing a little more detail on how your current system works.
Thanks to everyone for the posts. Steven, the team, and I are enjoying the feedback!
Several people commented about the need to also focus on the third party software that is necessary to run Windows. You will hear more in the coming weeks about this topic. For now let me say we hear you.
A few people have commented asking the question, “where’s all the info on features?” These comments are addressed in Steven’s post of 9/6. We intended this blog to talk about how we engineer Windows so we would have common context in any feature discussion. We are listening to the suggestions even if we don’t have an explicit feature request/response process here.
To PAStheLoD: You ask, “How do you handle code merging? Is there some all-knowing source control team, or every feature team is trusted to only do sane things?”
We have an internal source code system that scales to really large projects like Windows, but we are increasingly using Visual Studio Team Foundation Server across the company for source code control. There will be a day in the future where Windows does too, but that day is not yet. Visual Studio itself uses TFS though as well as hundreds of other projects. We do use TFS in Windows for feature tracking and scheduling. Your other questions are good ones and I’ll consider for a future post.
To Mikefarinha1: You ask, “Are you saying that using your common test automation infrastructure you were able to get bugs/developer down to .1%? Or are you simply saying that you're able to get fewer bugs using the common test automation infrastructure and thus have an almost exponetially lower build break?”
Developers have a “natural” rate of error, and the engineering system attempts to put the processes around the engineer so that the errors actually checked into the source tree happen at a much lower rate. So while developers themselves have the same basic rate (which hopefully goes down with experience, which is a good topic all on its own) the engineering system makes it seem like their rate is lower (in the 0.001 range).
Lastly, sorry for the picture size. I’ll see what I can do about that. They shrunk in the posting process for some reason. I am sure that is, “by design” :-). Or not.
One thing that you don't mention here is whether there is an effort to reduce bloat. My impression is that each Windows release has an almost exponential growth in memory and disk space.
Does any team or part of the process look at optimising the code with a view to reducing its size and real world running speed on modest hardware?
Reading the comments posted here I think I am not alone in wishing that Windows 7 is a lot leaner and more efficient with hardware resources than Vista.
Overall I think this is the right direction. Windows gets better with each release (except maybe for XP <SP2 which was like ME in the 95-series).
In the beta-days of Vista I read several times that Vista will get fixes and improvements over Windows Update. I don't think this has happened: Yes, we got security updates and bugfixes. But no new features. You could release Windows early, then feature by feature. Microsoft Windows subscribers would get new Windows versions incrementally.
There are only three things I think should be improved:
- I believe development is not agile enough
- It's hard to give feedback as there are so many "idiot analysts, posters, mailers, commenters". Or big companies. There should be a process to elect MVU, most valuable users.
- Telemetry can give you wrong ideas. Not all crashes are recorded and most problems (*) aren't.
*) Some crashes and unresponsive software might be reported. Problems like graphic glitches, flickering, unstable WiFi conncetions, stuttering sound, non-working-bluetooth headphones, jumping pictures in Word, ... are not caught. If you want to know about those problems let users report them in an easy way. But let only tech-savvy persons (MCPs, ...) report stuff because typically only those are able to write useful reports.
I'll reiterate what mikefarinha1 said: It would be good to see some commentary on the lessons learnt from the Vista development saga. No doubt a lot of analysis and navel-gazing was done as a result. Commentary on this process would help reassure us that we're not going to see Vista #2 again.
With your graphic apps code you need to provide more optimisation - or a choice. Take a look at the way Irfanview saves images. You have no optimisation (or size reduction) choice at all in Vista Snip. It saves high, so I'm guessing locked in to 95%? I have to copy to Irfanview to save.
And I see you are maybe putting a version of Movie Maker on Live with an option to upload to YouTube among others. I have to go through 3 saves and two programs to get a reasonable size video. Starts uncompressed at say 435 MB and eventually down to 5.23 MB as an example. So, what I am getting at, is, to me, Microsoft programmers complete lack of any size handling in your own graphics through-out Windows.
Most externally programs have options and sometimes it is necessary to reduce size to 40-75% as a trade off on quality - given that most of the world has a slow broadband speed, if they are still not on dial-up.
Ofcourse, Microsoft internally probably has a 60 Mbps pipe so you don't see any degradation in speed with large graphics. :-)
And lastly - can I have a full install copy of beta Windows 7 (date timed out) when available so I can build a small box to try it? I am an ordinary user, so valuable as a parallel to an expert. (I am your average retail shopper!)
Bueno soy participe de que las cosas salgan bien y de lo contrato no las hago me gusta que cada trabajo que hago sea lo mas perfecto posible, en cuanto a los software de microsoft espero que tengan lo menos errores posible para que nosotros los usuarios podamos decir es algo bueno, en cuanto a Windows 7 puedo decir que es el mejor software que he instalado en mi Compaq
How do you handle code merging? Is there some all-knowing source control team, or every feature team is trusted to only do sane things?
What software do you use for source control?
What about debugging? Profiling? I know this isn't the "diving into the dev. life of W7" blog, but what about using really new technologies (C#, .NET 4, virtualization)?
When I think engineering, I think about inventing new things. New structures, new algorithm, new methods. What are the things you had to invent due to the lack of them?
Also, as W7 is not just a software, but it's an OS, the kernel team must polish it's APIs first, then the Windows Platform team has to embrace the changes, haven't they? How do you handle this seemingly never-ending process/flow? (Linux distributions do this by totally separating them, and publishing updates rather frequently.)
Backward compatibility? Any chance, that you'll have the courage (as in management approval) to cut all the nasty tentacles reaching from the past?
One of the best ways we can address these is by better day-to-day management of the engineering of the Windows 7 code base—or the daily build quality. We have taken many concrete steps to improve how we manage the project so that we do much better on this dimension.
My fellow commentors... this is "Engineering Windows 7" not "Nifty New Features of Windows 7"
A huge portion of Engineering something is planning. A huge portion of planning is understanding what pices to add and not add, and understanding how all the pieces will fit together to make something coherent.
BTW they really need to add speel check to IE8 :P