About 2.5 years ago, I wrote a series of articles about how we threat model at Microsoft, about 18 months ago, I made a couple of updates to it, including a post about why we threat model at Micrososoft, and a review of how the process has changed over the years.
It's threat modeling time again in my group (it seems to happen about once every 18 months or so, as you can see from my post history :)), and as the designated security nutcase in my group, I've been spending a LOT of time thinking about the threat modeling process as we're practicing it nowadays. It's been interesting looking at my old posts to see how my own opinions on threat modeling have changed, and how Microsoft's processes have changed (we've gotten a LOT better at the process).
One thing that was realized very early on is that our early efforts at threat modeling were quite ad-hoc. We sat in a room and said "Hmm, what might the bad guys do to attack our product?" It turns out that this isn't actually a BAD way of going about threat modeling, and if that's all you do, you're way better off than you were if you'd done nothing.
Why doesn't it work? There are a couple of reasons:
So how do we go about threat modeling?
Well, as the fictional Maria Von Trapp said in her famous introductory lesson to solfege, "Let's start at the very beginning, A very good place to start"...
One of the key things we've learned during the process is that having a good diagram is key to a good threat model. If you don't have a good diagram, you probably don't have a good threat model.
So how do you go about writing a good diagram?
The first step is to draw a whiteboard diagram of the flow of data in your component. Please note: it's the DATA flow you care about, NOT the code flow. Your threats come via data, NOT code. This is the single most common mistake that people make when they start threat modeling (it's not surprising, because as developers, we tend to think about code flow).
When you're drawing the whiteboard diagram, I use the following elements (you can choose different elements, the actual image doesn't matter, what matters is that you define a common set of elements for each type):
You build a data flow diagram by connecting the various elements by data flows, inserting boundaries where it makes sense between the elements.
Now that we have a common language, we can using it to build up a threat model.
Tomorrow: Drawing the DFD.
AKA: How I spent last week :).
On Tuesday Morning last week, I got an email from "firstname.lastname@example.org":
You've probably already seen this article, but just in case I'd love to hear your response. http://it.slashdot.org/article.pl?sid=07/08/21/1441240 Playing Music Slows Vista Network Performance?
You've probably already seen this article, but just in case I'd love to hear your response.
Playing Music Slows Vista Network Performance?
In fact, I'd not seen this until it was pointed out to me. It seemed surprising, so I went to talk to our perf people, and I ran some experiments on my own.
They didn't know what was up, and I was unable to reproduce the failure on any of my systems, so I figured it was a false alarm (we get them regularly). It turns out that at the same time, the networking team had heard about the same problem and they WERE able to reproduce the problem. I also kept on digging and by lunchtime, I'd also generated a clean reproduction of the problem in my office.
At the same time, Adrian Kingsley-Hughes over at ZDNet Blogs picked up the issue and started writing about the issue.
By Friday, we'd pretty much figured out what was going on and why different groups were seeing different results - it turns out that the issue was highly dependent on your network topology and the amount of data you were pumping through your network adapter - the reason I hadn't been able to reproduce it is that I only have a 100mbit Ethernet adapter in my office - you can get the problem to reproduce on 100mbit networks, but you've really got to work at it to make it visible. Some of the people working on the problem sent a private email to Adrian Kingsley-Hughes on Friday evening reporting the results of our investigation, and Mark Russinovich (a Technical Fellow, and all around insanely smart guy) wrote up a detailed post explaining what's going on in insane detail which he posted this morning.
Essentially, the root of the problem is that for Vista, when you're playing multimedia content, the system throttles incoming network packets to prevent them from overwhelming the multimedia rendering path - the system will only process 10,000 network frames per second (this is a hideously simplistic explanation, see Mark's post for the details)
For 100mbit networks, this isn't a problem - it's pretty hard to get a 100mbit network to generate 10,000 frames in a second (you need to have a hefty CPU and send LOTS of tiny packets), but on a gigabit network, it's really easy to hit the limit.
One of the comments that came up on Adrian's blog was a comment from George Ou (another zdnet blogger):
""The connection between media playback and networking is not immediately obvious. But as you know, the drivers involved in both activities run at extremely high priority. As a result, the network driver can cause media playback to degrade." I can't believe we have to put up with this in the era of dual core and quad core computers. Slap the network driver on one CPU core and put the audio playback on another core and problem solved. But even single core CPUs are so fast that this shouldn't ever be a problem even if audio playback gets priority over network-related CPU usage. It's not like network-related CPU consumption uses more than 50% CPU on a modern dual-core processor even when throughput hits 500 mbps. There’s just no excuse for this."
""The connection between media playback and networking is not immediately obvious. But as you know, the drivers involved in both activities run at extremely high priority. As a result, the network driver can cause media playback to degrade."
I can't believe we have to put up with this in the era of dual core and quad core computers. Slap the network driver on one CPU core and put the audio playback on another core and problem solved. But even single core CPUs are so fast that this shouldn't ever be a problem even if audio playback gets priority over network-related CPU usage. It's not like network-related CPU consumption uses more than 50% CPU on a modern dual-core processor even when throughput hits 500 mbps. There’s just no excuse for this."
At some level, George is right - machines these days are really fast and they can do a lot. But George is missing one of the critical differences between multimedia processing and other processing.
Multimedia playback is fundamentally different from most of the day-to-day operations that occur on your computer. The core of the problem is that multimedia playback is inherently isochronous. For instance, in Vista, the audio engine runs with a periodicity of 10 milliseconds. That means that every 10 milliseconds, it MUST wake up and process the next set of audio samples, or the user will hear a "pop" or “stutter” in their audio playback. It doesn’t matter how fast your processor is, or how many CPU cores it has, the engine MUST wake up every 10 milliseconds, or you get a “glitch”.
For almost everything else in the system, if the system locked up for even as long as 50 milliseconds, you’d never notice it. But for multimedia content (especially for audio content), you absolutely will notice the problem. The core reason behind it has to do with the physics of sound, but whenever there’s a discontinuity in the audio stream, a high frequency transient is generated. The human ear is quite sensitive to these high frequency transients (they sound like "clicks" or "pops").
Anything that stops the audio engine from getting to run every 10 milliseconds (like a flurry of high priority network interrupts) will be clearly perceptible. So it doesn’t matter how much horsepower your machine has, it’s about how many interrupts have to be processed.
We had a meeting the other day with the networking people where we demonstrated the magnitude of the problem - it was pretty dramatic, even on the top-of-the-line laptop. On a lower-end machine it's even more dramatic. On some machines, heavy networking can turn video rendering to a slideshow.
Any car buffs will immediately want to shoot me for this analogy, because I’m sure it’s highly inaccurate (I am NOT a car person), but I think it works: You could almost think of this as an engine with a slip in the timing belt – you’re fine when you’re running the engine at low revs, because the slip doesn’t affect things enough to notice. But when you run the engine at high RPM, the slip becomes catastrophic – the engine requires that the timing be totally accurate, but because it isn’t, valves don’t open when they have to and the engine melts down.
Anyway, that's a long winded discussion. The good news is that the right people are actively engaged on working to ensure that a fix is made available for the problem.
In my last post, I listed off some of the elements that make up a threat model. Now that we have a common vocabulary that can be used to describe the names and types of the elements, let's see what we can do with it.
For this series, I'm going to use an API that's near and dear to my heart: PlaySound.
The nice thing about PlaySound is that it's a relatively simple API, but it's complex enough that it can demonstrate many of the characteristics of threat modeling.
So how do we go about drawing the whiteboard diagram for PlaySound?
First off, you need to characterize the data flows associated with the API. I often find it helps to describe the design of your component to someone - for some reason, it helps me understand the data flow if I explain it to someone (I can't explain why, but I know it does).
For PlaySound, my description was something like:
"The PlaySound API takes as input a string which represents either a WAV filename or an alias. If the input is an alias, the PlaySound API retrieves data from the registry under HKCU to convert the alias into a filename. Once the filename is determined, the PlaySound API opens the WAV file specified and reads the two relevant pieces from the file: the WAVEFORMATEX that defines the type of data in the file and the actual audio data. It then hands that data to the audio rendering APIs."
Given that description, what elements are going to appear in the diagram? Well, obviously the PlaySound API itself. In addition, you're going to have the .WAV file, HKCU, the audio playback APIs. You'll also need one other element that's not immediately obvious - the application that invokes PlaySound.
Here's what it looks like (I drew this in Visio, obviously you could use any tool to draw it (I know one group that literally draws their diagrams on the whiteboard then takes a picture of it with a cell phone camera and then pastes the picture into a Word document)):
All the elements I called out in my description above are present. As I said above, the application calling the API is modeled as an external element (because it is outside your control). Similarly, the audio playback APIs (and there may be more than one of them - it doesn't matter to this threat model) are modeled as external elements because you don't control them either.
The WAV file and the registry are separate data stores, and PlaySound is a process sitting in the middle of it. Remember that when you're drawing your diagram, a "process" isn't the same thing as a WIn32 process, instead it's a piece of code that processes data. Once you've got your elements, you just need to add the data flows and trust boundaries and you're done with the diagram.
It's not obvious from the picture, but the dataflows between "WAV file" and "PlaySound" all flow from "WAV file" to "PlaySound" - that's because we don't ever write data into the file, we only read it. Similarly, since we don't write data into HKCU, there's no data flow into HKCU.
One key thing to notice is that this diagram is significantly simpler than the actual implementation of PlaySound. It doesn't include lots of the options that PlaySound supports (like SND_ASYNC, SND_MEMORY or SND_RESOURCE). This is quite intentional because those options don't change the results of the threat modeling (I'll spend a bit of time talking about this later).
You'll note that I've chosen to inserted trust boundaries between the WAV file, the registry and the application, but I don't have a trust boundary between PlaySound and the audio playback APIs.
The reasons for this are:
That last point is important. When I'm doing a threat for a component, I usually don't bother to put a trust boundary between my component and data flowing out of my component. That's because I trust that my code produces correct data. On the other hand, the downstream component often can't trust the upstream component (so the Audio Playback APIs can't trust the WAV file Data that is provided by PlaySound).
Chris Pirillo had an interesting blog post the other day with the rather uninformative title of "Windows Vista Sound Problems". He has a reader who built a shutdown sound that is almost 2 minutes long, and that reader is upset that the system isn't playing his entire shutdown sound when he shuts is system down.
Chris speculates that it might be tied to the sound event process or to audio driver limitations, but the actual answer is actually much simpler, and is related to the way that the shell handles the shutdown sound.
One of the most significant pieces of feedback that we received about Windows XP was that people (especially people with laptops) were quite upset at the amount of time that it took for XP to shutdown. You could see dramatic proof of this by simply walking around the halls here at Microsoft - you'd see people going from their office to a meeting with their laptop lids cracked partly open. The big reason for this was that XP didn't reliably shut down the system - you'd close the lid of your laptop, stick it in your laptop case and head off to your meeting, when you got there you'd burn your hands because the laptop never shut down, even after 5 minutes with the lid closed.
For Vista, the power management folks decided that they were going to fix this problem - when you closed your laptop (or shut off your computer), they WERE going to shut down the machine. This makes a ton of sense - the act of closing the lid on the laptop is a clear indication that the customers intent is to stop using their machine, so the system should turn itself off when this happens.
This decision had some consequences though. On Windows XP, an application was allowed to delay system shutdown indefinitely - this was a major cause of the overheated laptop problem; on Vista, the system IS going to shut down, even if your application isn't ready for it. So if your application takes a long time to exit (and Microsoft applications are absolutely NOT excluded from this list), than it's going to have the rug yanked out from under its feet.
Since the shutdown process is effectively synchronous, the shell (explorer.exe) attempts to limit the size of the WAV file that's played during system shutdown (it uses the file size as a first order approximation of the length of the sound). If the .WAV file that's registered for the shutdown sound is larger than 4M in size, it won't be played.
So if Chris's reader reworked his file to keep it under 4M in size (which probably can be done with a reduction in sample rate and channel count) than Explorer will happily play the sound.
However Chris's reader may still not be happy with the results. To understand why, you need to dig a bit deeper into the shutdown process.
The Windows shutdown process is (very roughly - this is a 100,000 foot approximation, the actual process is much more complicated):
Remember my comment above about shutting down the user's applications? Well, explorer is still one of the user's applications, and it's subject to the same termination rules as every other application. Some number of seconds into playing the shutdown sound, NTUSER will decide that the explorer is hung and will bring up the "This application is hung, do you want to kill it?" screen (the reason will be something like "Explorer / Playing Logoff sound").
What happens next depends on what the user answers (or has previously answered). If the user answers "yes" to the "Do you wish to terminate this application" prompt, then the system enters "forced shutdown" mode. If they answered "no", than the system will wait until all the applications have terminated.
If the system is in "forced shutdown" mode, than 30 seconds after the prompt, the system WILL kill the remaining applications, regardless of whether or not they're shut down. If Explorer is still playing the logoff sound at that time, it'll be yanked as well, and the logoff sound will be cut short.
There's a simple answer to that question. As I mentioned in the first post in this series, "It's my machine dagnabbit". The simple answer is that applets consume resources that can be better used by by the customer.
At an absolute minimum, each applet process consumes a process (no duh - that was a stupid statement, Larry). But you need to realize that each process on Windows consumes a significant amount of system resources - you can see this in Vista's taskmgr.
There are three columns that are interesting: Working Set, Commit Size and Memory. Commit Size is the amount of memory reserved for the process (so can be insanely large , Working Set is the amount of physical memory that the process is currently consuming, and Memory is the amount of working set that's not being used by DLLs.
On my machine, to pick on two applets that I have running, you find:
That 700K is real, physical RAM that's being actively used by the process (otherwise it would have been swapped out). With multiple applets running, it adds up FAST. On todays big machines, this isn't a big deal, but on a machine with less memory, it can be crippling.
In my last post, I categorized applets into 4 categories (updaters, tray notification handlers, helper applications and services). In addition to the common issues mentioned above, each of these has its own special set of issues associated with it.
Updaters often to run all the time, even though they're only actually doing work once a day (or once a month). That means that they consume resources all the time that they're active. Adding insult to injury, on my machine at home, I have an updater that is manifested to require elevation (which means I get the "your app requires elevation" popup whenever it tries to run).
Tray notification handlers also run all the time, and adding insult to injury, they clutter up the notification area. The more items in the notification area, the less useful it is. This is actually the primary justification for the "big 4" notification area items in Vista - people kept on finding that the 3rd party notification area icons crowded out functionality they wanted to access. In addition, notification handlers seem to love popping up toast on the desktop, which often interrupts the user. In addition, since tray handlers often run synchronously at startup, they delay system boot time.
Helper applications don't have any specific issues, from what I've seen. They just consume resources when they're running.
Services are both good and bad. Each Windows service has a start type which lets the system know what to do with the service on startup. There are 3 relevant start types for most services: AutoStart, DemandStart and Disabled. When a service is marked as AutoStart, it starts on every boot of the system, which degrades the system startup time. In addition, because services often run in highly privileged accounts, the author of the service needs to take a great deal of care to ensure that they don't introduce security holes into the system. Before Vista, high privileged services were notorious for popping up UI on the user's desktop, a practice so dangerous, it justified its own category of security threat ("shatter attacks"). In Vista, changes were made to eliminate classic shatter attacks for all subsequent versions of the OS, so fortunately this issue isn't as grave as it was in the past.
Tomorrow: So how do you mitigate the damage that applets can cause?
As I've mentioned, applets can be a plague on your system. The annoying thing is that it's possible to write applets that aren't so horrible. And most of the mitigations are really just common sense ideas - there's nothing spectacularly complicated in any of them.
As with the earlier posts in this series, some of the mitigations are common to all types of applets (and all applications in general), and others are specific to various types of applet.
Let's start with the basics...
For applets that run all the time:
Do you REALLY need to have an applet running all the time? The best applet is one that doesn't run at all. Is it possible to bundle your applet's functionality into an application that the user invokes? The start menu highlights newly installed applications, so your new application will be visible there, worst case there are other mechanisms for installing your application in locations that are visible to the user (the desktop is one of them (although that has its own set of issues)).
Now that you've decided you need an applet that runs all the time, please reconsider. Seriously. I know I just asked you to think about it, but really. Steve Ballmer says that sometime in 2008, a billion people will be running some version of Windows, that's a LOT of people. If your product is successful, you're likely to be selling to a couple of million of them - do you believe that your applet provides enough value to every one of those customers that you need to have it running in their face? Really?
Once you've decided that you REALLY need to have an applet running, make sure that there's a way for the user to turn it off. There's nothing more annoying than realizing that the software that came with some random piece of hardware that I use maybe once or twice a month is running all the time on my machine.
If you've written an applet because you want to let the user know about some cool feature, why not use the RunOnce key to let the user know about the feature, letting them know how to discover the feature later on, then shut up and never bother them again?
For all applets (and all applications that are expected to run all the time):
Think about how to reduce the applets impact on the user. Reduce the DLL load in your applet whenever possible - each DLL you load consumes a minimum of 4 private pages (16K on x86) and takes between 500K and 1M cycles to load. Anything you can do to reduce that is better. If you can get away with just loading kernel32.dll and the C runtime library, it's better. Consider delay loading DLLs that you infrequently use.
Reduce the stack size used for the threads in your applet - by default Windows threads get a 1M commitment and 10K of reserve (which really turns into 12K of reserve due to paging). That means that every thread is guaranteed to consume at least it's stack commitment space in virtual memory (the good news is that it's virtual memory - normally that'll just be space reserved in the paging file, not real memory).
Reduce the number of processes that your applet needs. There's rarely a good reason for you to require more than one process to do work. About the only one I can think of is if you split functionality to increase the amount of code you have running at a high privilege level). As an example of this, in Windows Vista, the audio stack runs in two separate services - the AudioSrv service and the AudioEndpointBuilder service. This is because a very small part of the functionality in the audio engine has to do some operations that require LocalSystem access, but the rest of the audio stack can run just fine as LocalService. So the AudioEndpointBuilder service contains the high privilege code and the AudioSrv service contains the rest. If you feel you need to have a separate process to provide reliability (you run the code out-of-proc in case it crashes), windows Vista provides a cool new feature called the "restart manager". The restart manager allows the OS to restart your application if it crashes, reducing the need to run code out-of-proc.
Don't forget that Windows is a multi-user system. Some of your customers won't want your applet, others will. So make sure that the settings to enable/disable the applet are instanced on a per-user basis. It's really annoying when you right click on a notification area icon and see that the "disable this" menu is disabled because you're running as a normal user (which is most users on Vista). Whenever I see this, I know that the author of the applet didn't consider the normal user case.
If you can target Vista only, consider reducing your thread and I/O priority. If your applet is performing processing that's not directly related to the user, use the new PROCESS_MODE_BACKGROUND_BEGIN option in the SetPriorityClass API to let the system know that your process should be treated as a low priority background process - that way your applet won't preempt the user's work. You can also use the new FileIoPriorityHintInfo method of the SetFileInformationByHandle to let the OS to prioritize your I/Os to a handle below that of other user operations.
Next: Mitigations for updaters (no post tomorrow since I'm moving offices).
Since I spend so much time railing about applets, I also tend to look at applets to see what they do (after all, the first step in knowing how to defeat the enemy is to understand the enemy).
In general, applets seem to fall into several rough categories:
Let me take them in turn...
Updaters: I LIKE updaters. Updaters are awesome. IMHO, I trust applications that include updates more than those that don't (because an updater implies a commitment to further development and bug fixes). However way too many vendors build programs that run all the time and do absolutely nothing other than wait to check for updates every week (or every month). One other problem with updaters is that sometimes the authors of the updater use the updater to push unrelated software (at the moment, I'm particularly annoyed at the iTunes updater - if you install just Quicktime, the updater tries to get you to install Quicktime+iTunes, and there seems to be no way of shutting it up).
Notification Area Handlers: Every application seems to want to put its own icon in the notification area. To me, the functionality that is offered by many of these is of limited value. For example, my display driver includes an applet that allows the user to quickly switch between screen resolutions, but I almost never change my screen resolution - so why provide a easy shortcut for that functionality? I'm not sure why, but personally I believe it's because of branding (since you get to put an icon with your notification area handler, it makes it obvious to the user that you've installed the software). Some pieces of notification area functionality are quite useful (the "big 4" (Sound, Network, Battery, Clock) in Windows are good examples, as are things like RSSBandits' status indicator), but many of them make me wonder (which is why I suspect that branding is the real reason behind many of the notification area icons).
Helper applications: These are things like "FlashUtil9d.exe" (running on my machine right now) and other support processes. Users often don't see these (since they don't bring up UI), but they live there nonetheless. I have an HP 7400 printer at home, and the printer driver for that runs 2 separate processes for each user (one of which hangs during shutdown every time a user logs off).
Services: A special class of helper application, services have some significant advantages over helper applications (and some drawbacks). Services can be centrally managed, and expose a common startup/shutdown interface. They also can be automatically started at system boot, have strict dependencies, and can run in arbitrary user contexts (including elevated contexts). On the other hand, it's difficult (and in many ways effectively impossible) to have services run in the context of the currently logged on user. I'm a huge fan of services, but it's possible to totally overdo it. In Windows Vista, there were a slew of new services introduced, and more and more applications are creating services, since the currently logged in user is no longer an administrator. An example of a helper service is the WHSConnector service that comes with Windows Home Server (another of my current favorite products), and there are a bazillion others.
I'm sure that there are other categories of applets, but these 4 appear to be the biggies.
Tomorrow: So why are applets bad?
I've been wanting to write this one for a while, but continually got sidetracked, but there's no time like the present...
Many others (I'm too lazy to chase down references) have commented on the phenomenon known as "bloatware" (also known as "craplets" or "shovelware").
I'm not going to talk about them, too much has been written about them by others already.
Instead I want to talk about applets in general. These are the "little" helper processes that software seems to leave lying around after installation. These are a particular pet peeve of mine, I'm well known inside MS (or at least within the Windows division) as being rather fanatical about them, and fighting tooth and nail (sometimes successfully) to get them removed. I don't know how many times I've asked: "Why does your product (or feature) have all this crap running (where 'crap' is defined as 'stuff I don't want running on my machine')?"
Applets come in lots of sizes and shapes - they can be services waiting on an app to use them; they can be processes that handle systray icons; they can be helper applications. But they share one common: they all consume resources, sometimes LOTs of resources. And I would rather that these applets NOT consume resources.
Nowadays, machines come with a fair amount of resources - my current dev machine is a dual 2.4g Core2Duo 6600 with 2G of RAM and a reasonable amount of disk space (750G on 3 drives), but Vsta runs on machines that are far less capable (before it died, my laptop was a P2 with 512M of RAM and it ran Vista Ultimate just fine (no glass, but other than that it worked well)). On such a machine, every single unnecessary process can be painful.
The Windows team has known that this has been an issue for years, and has built in a ton of features into the operating system to help alleviate the pain and suffering associated with applets (some of which have been there since NT 3.1), but the reality is that nobody takes advantage of this functionality, and that's a real shame.
In a potentially futile attempt at trying to inspire people to improve our customers' experiences, I'm going to dedicate this week to writing posts about applets and how developers can fix them.
Btw: I want to be totally clear here: Microsoft is just as guilty as others in this arena.
Tomorrow: Why do people write applets?
As a senior developer at Microsoft, you often find yourself participating on a number of v-teams. One of the v-teams I'm on is responsible for approving new services added to Windows. As I've mentioned before, I'm a nutcase about stuff running on my machines, and services are absolutely among the things I care about passionately. As a part of my work on that v-team, I wrote this little bit up a couple of years ago (it's been edited slightly to remove proprietary information):
I've been sitting watching the <new services v-team> process for a couple of months now, and I've seen a number of trends that concern me. Every single new feature (and it seems like there have been thousands of new features) seems to require its own service to perform operations. Don't get me wrong - it's wonderful that these functions are running as services and not as separate processes. But every single one of the new services that I see being requested is enabled on all SKUs of Windows. All of them, it seems. And they're all auto-start. The <new services v-team> has done a terrific job of reducing the number of own-process services that are running. That's truly awesome, and it's great for our customers. But I don't think that they're going far enough. We need to take a harder line on our services. Because even if multiple services are hosted in a single process, they each still burns at least one thread. And that thread consumes working set. And it affects startup time. And if your code has memory (or GDI/User object) leaks, it can render computers unusable. The other thing to consider is that every running service in Windows increases the Windows attack surface. In Windows XP, we had 40ish services on a running system. We've got almost twice that on a default Vista install these days (assuming my test machine is a default longhorn install). Now I appreciate that everyone's feature is critical for their customers, but I'm wondering if they're all necessary for all customers. Do you REALLY believe that your code is going to be used by every single one of the nearly a billion users of Windows? Is your service going to make every single one of those billion people's lives better? If your service isn't, then maybe every one of those billion people don't need to be running your code. As I've said, I've been thinking about this for a while, and I think I've got a few things that should be considered when you're trying to figure out if your service really needs to be installed. First off, I know that your feature is the most important thing you're doing, but that's true for every single one of the developers working on the Windows product. We can't all be number one, so think very seriously about the relative importance of your feature. If your service is auto-start, is it REALLY necessary? Will every user of Windows achieve positive benefits from your service? If your service is tied to a piece of hardware, does your service need to be running if the hardware isn't present? Can you tie the service to the installer for your hardware? If your service is tied to a particular UI, and the user never invokes your UI, is your service doing the user any good? Can your UI start the service if it's not running? Does your service REALLY need to be enabled and auto-start (even auto-start-delay) on every SKU? Really? How is your feature/service discoverable? If your feature isn't easily discoverable, does the service that supports that feature really have to be run until the user discovers your feature and starts to use it? Now for some services this is clearly the case. But for a huge number of the services that we've been coming up with, it's equally clearly not. Even my own service, Windows Audio doesn't meet all of these criteria. I'd be more than willing to have the service be manual start unless there's an audio card present, and to change the installer for audio adapters to enable the service. Because on a machine without audio hardware, there's no point in the service running until the hardware arrives. There IS one important scenario where it's important to have the Windows Audio service running: that's Remote Desktop - when running a remote desktop, even if the server doesn't have audio hardware, we can still play audio using the TS client's audio hardware. But that's a relatively weak scenario. And I'd be willing to change it (or work to change the remote desktop service to ensure that the audio service is started when a client connects). Are you?
I've been sitting watching the <new services v-team> process for a couple of months now, and I've seen a number of trends that concern me.
Every single new feature (and it seems like there have been thousands of new features) seems to require its own service to perform operations. Don't get me wrong - it's wonderful that these functions are running as services and not as separate processes.
But every single one of the new services that I see being requested is enabled on all SKUs of Windows. All of them, it seems. And they're all auto-start.
The <new services v-team> has done a terrific job of reducing the number of own-process services that are running. That's truly awesome, and it's great for our customers.
But I don't think that they're going far enough. We need to take a harder line on our services. Because even if multiple services are hosted in a single process, they each still burns at least one thread. And that thread consumes working set. And it affects startup time. And if your code has memory (or GDI/User object) leaks, it can render computers unusable. The other thing to consider is that every running service in Windows increases the Windows attack surface.
In Windows XP, we had 40ish services on a running system. We've got almost twice that on a default Vista install these days (assuming my test machine is a default longhorn install).
Now I appreciate that everyone's feature is critical for their customers, but I'm wondering if they're all necessary for all customers. Do you REALLY believe that your code is going to be used by every single one of the nearly a billion users of Windows? Is your service going to make every single one of those billion people's lives better? If your service isn't, then maybe every one of those billion people don't need to be running your code.
As I've said, I've been thinking about this for a while, and I think I've got a few things that should be considered when you're trying to figure out if your service really needs to be installed.
First off, I know that your feature is the most important thing you're doing, but that's true for every single one of the developers working on the Windows product. We can't all be number one, so think very seriously about the relative importance of your feature.
If your service is auto-start, is it REALLY necessary? Will every user of Windows achieve positive benefits from your service?
If your service is tied to a piece of hardware, does your service need to be running if the hardware isn't present? Can you tie the service to the installer for your hardware?
If your service is tied to a particular UI, and the user never invokes your UI, is your service doing the user any good? Can your UI start the service if it's not running?
Does your service REALLY need to be enabled and auto-start (even auto-start-delay) on every SKU? Really?
How is your feature/service discoverable? If your feature isn't easily discoverable, does the service that supports that feature really have to be run until the user discovers your feature and starts to use it?
Now for some services this is clearly the case. But for a huge number of the services that we've been coming up with, it's equally clearly not.
Even my own service, Windows Audio doesn't meet all of these criteria. I'd be more than willing to have the service be manual start unless there's an audio card present, and to change the installer for audio adapters to enable the service. Because on a machine without audio hardware, there's no point in the service running until the hardware arrives. There IS one important scenario where it's important to have the Windows Audio service running: that's Remote Desktop - when running a remote desktop, even if the server doesn't have audio hardware, we can still play audio using the TS client's audio hardware.
But that's a relatively weak scenario. And I'd be willing to change it (or work to change the remote desktop service to ensure that the audio service is started when a client connects). Are you?
This all is a bit of a digression - it's not about mitigations, it's about the hard decisions you should make when thinking about adding services, but it's worth publishing anyway.
So how do you mitigate services? First off, combine like services into a single process. That way, instead of taking two processes, you only consume one process (see my earlier post where I listed the costs of a process).
Secondly, as I indicated above, consider making your service a manual start service that's triggered by some UI action. Unless there's a real need for your service to be running all the time, let the UI (or an API if your service surfaces an API) start your service.
Third, seriously consider making your service a delayed auto-start service - this is functionality new in Vista/Windows Server 2K8 that allows the service controller to delay starting your service so it doesn't interfere with boot time.
In addition, seriously consider how much time you spend in your service's start routine. The less time the better (especially if you're an auto-start service). The less work you can do before reporting that your service has started to the service controller, the faster the system will boot.
Tomorrow: Applet best practices - collecting the thoughts of the previous several posts into a single post.
 Please note: While there ARE more services in Vista than in XP, this comment is mostly hyperbole.
 This didn't happen, the powers that be decided that since every workstation class machine with a Vista logo had to have an audio solution that it was ok to keep the audio service as an auto-start service (they also felt that audio was going to be used by every one of those users :)
 Yeah, I know - it's a vista-only mitigation, but it's a good one.
First off (as always), reconsider your need for a notification area handler. Seriously consider if it's appropriate for your application to have a notification area handler. Do you really believe that it provides sufficient functionality to justify taking up limited notification area real-estate? Really?
From what I've seen, some notification area handlers are quite well thought out and provide easy access to useful information or commonly accessed functionality (the volume, clock, taskmgr, "safely remove hardware" and RSS Bandit are ones come to mind). Some have questionable value (the network and outlook handlers come to mind), some just don't seem to make sense at all (Quicktime and handlers like the various display driver and printer control panel notification area handlers). After all, do think that your customers really need to know how much ink remains in their printer all the time?
As I've mentioned in earlier articles, you should always have a mechanism for disabling your notification area handlers - it's just polite (and if you don't, your customer's are going to find other ways of disabling your notification area handlers). Since you're building a mechanism to disable your notification area handler, why not specify it as a checkbox in your installer? That way your customer never gets annoyed in the first place. In addition (and I've also mentioned this before), make sure that your notification area handler is instanced on a per-user basis. That way you (a) don't require elevation to disable your notification area handler, and (b) you let the various users of the computer make a choice - some may choose to use your applet, others may not.
If you've decided that you MUST have a notification area handler, then why chew up an entire process to handle it? Windows offers a number of mechanisms to let you reduce the impact of your handler (it doesn't remove the impact, just reduces it). For instance, you can use the task scheduler (mentioned earlier in my post about updaters) facility to launch your notification handler upon user logon - that provides a centrally manageable interface to allow for task control. On Vista, you can define your task as firing a COM handler, in which case your applet gets launched in a hosting process - that means that instead of having a dedicated process, you live in a process that's shared with other jobs (including other notification area handlers).
One final thing about notification area handlers: If you don't have anything to say, shut up :)! People get really annoyed by notification area icons, one way to reduce their ire is to simply not register an icon unless you have something to say. You can see the effect of this with Windows Defender - they only insert their notification area icon if they need to alert the user, otherwise they're mute. There's a huge caveat with this though: adopting this behavior can have unintended consequences. When the Defender team adopted this behavior (only showing the icon if there was a problem), they received a flurry of complaints from users because they felt that if the icon wasn't present, defender somehow wasn't working. As a result of this, the defender team added an option (off by default) to add their icon in the notification area all the time. Personally, I think that people believe this way because they've been conditioned by poorly written notification area handlers so that they believe they're not protected unless they see the little icon, even though the icon has nothing to do with their protection level.
Next: Mitigations for services.
 As always, please remember my definition of crap: "Crap is defined as stuff I don't want running on my machine" - you may very well disagree with my opinions about the relative usefulness of the various applets I've listed above.
So how do you make an updater be less horrible.
First off, as I suggested for all applets, consider not having one at all. For instance, Collectorz.Com's applications each check for updates periodically when they are started. That way you bury your update functionality with the application, and it alleviates the need to worry about external updaters.
If your application is itself a plugin (think Flash, Quicktime, Java or a driver (of any kind)), then you don't have a convenient application on which to hang your updater. For that case, whatever you do, don't burn a process whose sole purpose is to check for updates once a month. Instead, use the task scheduler functionality built into Windows to schedule your updater. The task scheduler is a remarkably flexible mechanism for scheduling periodic operations. Even using the Task Scheduler 1.0 interfaces (which are available on Windows platforms going back to Windows ME), you can generate triggers that will cause tasks to be run daily, weekly, monthly, monthly on a specific day of the week, logon, idle, etc. For Vista, the list of trigger types is enhanced to include triggers on system events, groups of triggers, etc.
One of the cool things you can do with scheduled tasks is to specify the context in which the job runs - jobs can be scheduled to run in the context of the user at the console, in the system context, the context of an interactively logged on user, to run only if a specific user is logged on, etc.
Using the task scheduler means that you can get your updater to run without consuming any long-term resources.
Once you've decided that you need to update the application, you've got to download the update. For that, you really have two options. The first is that you can assume that the user is going to want the update and pre-download it, the second is that you download it after informing the user about update. For either case Windows has a nifty feature called "BITS" which allows you to download data from the web without interfering with the user - essentially the BITS service is aware of the traffic generated by the interactive user and it throttles its transfers if it detects that the user's using the network. It also supports progressive downloading so it can handle the network dropping out mid transfer. Windows Update's downloader is built on top of BITS, but I'm not aware of any 3rd party apps that use it (which is a shame, because it really is cool). BITS is available on at least Windows XP and later, so it's not "yet another vista-only feature".
Also, whatever you do, don't ever require elevation for your updater - I cannot imagine any scenario that would require that your updater run elevated - it just annoys the user who complains about unnecessary elevation prompts.
Next: Mitigations for notification area handlers.
The first and most important thing that a person considering writing applet needs to do is to stop and consider if they really do need to write that applet.
The answer may very well be "yes", but far more often, the real answer is "no".
Once you've decided that you have no choice but to write the applet, you can still make sure that your applet doesn't interfere with the experience of your customer by following some relatively straightforward "best practices" (there may be others beyond these, but this list functions as a good start).
Anyway, that's a start to the list, there may be more that I've missed, so I may update this list as I (or you) come up with others.
Tomorrow: How do I personally feel about craplets?
I've actually had way more fun than I realized writing this series - I honestly didn't think I had this much content on this subject. As I'm finishing up the series, I want to talk about how I personally feel about applets.
As I've mentioned before, I often end up sending mail to various email lists inside Microsoft railing about all the crap that <pick my favorite victim of the day> has running on my box (I did start these posts by saying "It's my machine, dagnabbit").
I always add a caveat to that comment: "Please remember that my definition of 'crap' is 'Stuff I don't use'. If I use it, by definition it's not 'crap', and I recognize that my 'crap' isn't always your 'crap'". What I find fascinating is the number of people who respond "Wow, I never thought of it that way!"
There are a lot of people who are rather upset about applets, given the lack of quality and respect for the user I've seen in some of them, I'm not totally surprised at this.
It's extraordinarily easy, when designing a feature or component to get caught up in the idea that you MUST make your feature discoverable and thus you forget one of the basic tenets of system programming: "Do exactly what you need to do to get the job done as efficiently as possible." They forget that it's NOT their machine to do with as they will, it's their customers' machine, and their customer might not be as enamored of their feature as they are.
At the end of the day, I tend to be relatively agnostic about applets. I recognize that they have to exist, I like some of them (I've already mentioned that I like RSS Bandit's (not really an applet, to be honest) use of the notification area, similarly, I like taskmgr's use), I tolerate some of them (the flash helper applet), and I'm utterly infuriated by still others (there's an printer driver that came with a printer I own that crashes the spooler service once a day, even though the printer in question is powered off and is used maybe once a month).
What gives me hope for the future is that it's NOT impossible to write applets that have relatively minor impact on the user - if you follow some basic rules, it's possible to have applets that don't tank the system. But the critical thing is that you MUST change your mindset about your applet. Instead of trying to figure out how to get your functionality in your users faces, try to figure out how to make your user want to use your functionality. Make sure that you provide significant value to the customer before you start consuming their resources.
Every single applet that runs on a machine acts as a tax on the customers machine - they consume memory and cpu resources that could normally be used by the user. Eventually the taxes add up and the customer's computer becomes totally useless.
I firmly believe that the people who write software actually want to help improve their customers experience (yeah, I know that I'm hopelessly naive). But the people who write the programs that our customers use are NOT trying to actively harm the user. They just don't know how to do the right thing (or are faced with with schedule pressures that don't allow them to do the right thing).
My personal feeling is that people will realize which vendors have produced bad applications and will eventually start avoiding those vendors because of the quality of their product. And when you hit them in their pocketbook, it works - the vendors will either improve the quality of their product or they'll go out of business.
A while ago, I'd mentioned that Daniel was cast as Orin Scridlow in SCT's summer season production of "Little Shop of Horrors".
Friday August 3rd is his opening night! He'll be performing at 7PM on August 3rd, 1PM on August 4th, 7PM on the 8th, and 7PM on the 10th.
This one's going to be worth seeing - I've seen a bit of his Orin, and it reminds me of what Alan Cumming did with the character of MC in the 1998 Roundabout production of Cabaret. It ain't Steve Martin up on stage there.
I can't wait :)