Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
One of the features that you’ve been pretty clear about (I’ve received over 100 emails on this topic!) is the desire to improve the disk defrag utility in Windows 7. We did. And from blogs we saw a few of you noticed, which is great. This is not as straight forward as it may appear. We know there’s a lot of history in defrag and how “back in the day” it was a very significant performance issue and also a big mystery to most people. So many folks came to know that if your machine is slow you had to go through the top-secret defrag process. In Windows Vista we decided to just put the process on autopilot with the intent that you’d never have to worry about it. In practice this turns out to be true, at least to the limits of automatically running a process (that is if you turn your machine off every night then it will never run). We received a lot of feedback from knowledgeable folks wanting more information on defrag status, especially during execution, as well as more flexibility in terms of the overall management of the process. This post will detail the changes we made based on that feedback. In reading the mail and comments we received, we also thought it would be valuable to go into a little bit more detail about the process, the perceptions and reality of performance gains, as well as the specific improvements. This post is by Rajeev Nagar and Matt Garson, both are Program Managers on our File System feature team. --Steven
In this blog, we focus on disk defragmentation in Windows 7. Before we discuss the changes introduced in Windows 7, let’s chat a bit about what fragmentation is, and its applicability.
Within the storage and memory hierarchy comprising the hardware pipeline between the hard disk and CPU, hard disks are relatively slower and have relatively higher latency. Read/write times from and to a hard disk are measured in milliseconds (typically, 2-5 ms) – which sounds quite fast until compared to a 2GHz CPU that can compute data in less than 10 nanoseconds (on average), once the data is in the L1 memory cache of the processor.
This performance gap has only been increasing over the past 2 decades – the figures below are noteworthy.
In short, the figures illustrate that while disk capacities are increasing, their ability to transfer data or write new data is not increasing at an equivalent rate – so disks contain more data that takes longer to read or write. Consequently, fast CPUs are relatively idle, waiting for data to do work on.
Significant research in Computer Science has focused on improving overall system I/O performance, which has lead to two principles that the operating system tries to follow:
Both rules have reasonably simply understood rationale:
File systems such as NTFS work quite hard to try and satisfy the above rules. As an example, consider the case when I listen to the song “Hotel California” by the Eagles (one of my all time favorite bands). When I first save the 5MB file to my NTFS volume, the file system will try and find enough contiguous free space to be able to place the 5MB of data “together” on the disk. Since logically related data (e.g. contents of the same file or directory) is more likely to be read or written around the same time. For example, I would typically play the entire song “Hotel California” and not just a portion of it. During the 3 minutes that the song is playing, the computer would be fetching portions of this “related content” (i.e. sub-portions of the file) from the disk until the entire file is consumed. By making sure the data is placed together, the system can issue read requests in larger chunks (often pre-reading data in anticipation that it will soon be used) which, in turn, will minimize mechanical movement of hard disk drive components and also ensure fewer issued I/Os.
Given that the file system tries to place data contiguously, when does fragmentation occur? Modifications to stored data (e.g. adding, changing, or deleting content) cause changes in the on-disk data layout and can result in fragmentation. For example, file deletion naturally causes space de-allocation and resultant “holes” in the allocated space map – a condition we will refer to as “fragmentation of available free space”. Over time, contiguous free space becomes harder to find leading to fragmentation of newly stored content. Obviously, deletion is not the only cause of fragmentation – as mentioned above, other file operations such as modifying content in place or appending data to an existing file can eventually lead to the same condition.
So how does defragmentation help? In essence, defragmentation helps by moving data around so that it is once again placed more optimally on the hard disk, providing the following benefits:
The following diagram will help illustrate what we’re discussing. The first illustration represents an ideal state of a disk – there are 3 files, A, B, and C, and all are stored in contiguous locations; there is no fragmentation. The second illustration represents a fragmented disk – a portion of data associated with File A is now located in a non-contiguous location (due to growth of the file). The third illustration shows how data on the disk would look like once the disk was defragmented.
Nearly all modern file systems support defragmentation – the differences generally are in the defragmentation mechanism, whether, as in Windows, it’s a separate, schedulable task or, whether the mechanism is more implicitly managed and internal to the file system. The design decisions simply reflect the particular design goals of the system and the necessary tradeoffs. Furthermore, it’s unlikely that a general-purpose file system could be designed such that fragmentation never occurred.
Over the years, defragmentation has been given a lot of emphasis because, historically, fragmentation was a problem that could have more significant impact. In the early days of personal computing, when disk capacities were measured in megabytes, disks got full faster and fragmentation occurred more often. Further, memory caches were significantly limited and system responsiveness was increasingly predicated on disk I/O performance. This got to a point that some users ran their defrag tool weekly or even more often! Today, very large disk drives are available cheaply and % disk utilization for the average consumer is likely to be lower causing relatively less fragmentation. Further, computers can utilize more RAM cheaply (often, enough to be able to cache the data set actively in use). That together, with improvements in file system allocation strategies as well as caching and pre-fetching algorithms, further helps improve overall responsiveness. Therefore, while the performance gap between the CPU and disks continues to grow and fragmentation does occur, combined hardware and software advances in other areas allow Windows to mitigate fragmentation impact and deliver better responsiveness.
So, how would we evaluate fragmentation given today’s software and hardware? A first question might be: how often does fragmentation actually occur and to what extent? After all, 500GB of data with 1% fragmentation is significantly different than 500GB with 50% fragmentation. Secondly, what is the actual performance penalty of fragmentation, given today’s hardware and software? Quite a few of you likely remember various products introduced over the past two decades offering various performance enhancements (e.g. RAM defragmentation, disk compression, etc.), many of which have since become obsolete due to hardware and software advances.
The incidence and extent of fragmentation in average home computers varies quite a bit depending on available disk capacity, disk consumption, and usage patterns. In other words, there is no general answer. The actual performance impact of fragmentation is the more interesting question but even more complex to accurately quantify. A meaningful evaluation of the performance penalty of fragmentation would require the following:
Let’s walk through an example that helps illustrate the complexity in directly correlating extent of fragmentation with user-visible performance.
In Windows XP, any file that is split into more than one piece is considered fragmented. Not so in Windows Vista if the fragments are large enough – the defragmentation algorithm was changed (from Windows XP) to ignore pieces of a file that are larger than 64MB. As a result, defrag in XP and defrag in Vista will report different amounts of fragmentation on a volume. So, which one is correct? Well, before the question can be answered we must understand why defrag in Vista was changed. In Vista, we analyzed the impact of defragmentation and determined that the most significant performance gains from defrag are when pieces of files are combined into sufficiently large chunks such that the impact of disk-seek latency is not significant relative to the latency associated with sequentially reading the file. This means that there is a point after which combining fragmented pieces of files has no discernible benefit. In fact, there are actually negative consequences of doing so. For example, for defrag to combine fragments that are 64MB or larger requires significant amounts of disk I/O, which is against the principle of minimizing I/O that we discussed earlier (since it decreases total available disk bandwidth for user initiated I/O), and puts more pressure on the system to find large, contiguous blocks of free space. Here is a scenario where a certainly amount of fragmentation of data is just fine – doing nothing to decrease this fragmentation turns out to be the right answer!
Note that a concept that is relatively simple to understand, such as the amount of fragmentation and its impact, is in reality much more complex, and its real impact requires comprehensive evaluation of the entire system to accurately address. The different design decisions across Windows XP and Vista reflect this evaluation of the typical hardware & software environment used by customers. Ultimately, when thinking about defragmentation, it is important to realize that there are many additional factors contributing towards system responsiveness that must be considered beyond a simple count of existing fragments.
The defragmentation engine and experience in Windows 7 has been revamped based on continuous and holistic analysis of impact on system responsiveness:
In Windows Vista, we had removed all of the UI that would provide detailed defragmentation status. We received feedback that you didn’t like this decision, so we listened, evaluated the various tradeoffs, and have built a new GUI for defrag! As a result, in Windows 7, you can monitor status more easily and intuitively. Further, defragmentation can be safely terminated any time during the process and on all volumes very simply (if required). The two screenshots below illustrate the ease-of-monitoring:
In Windows XP, defragmentation had to be a user-initiated (manual) activity i.e. it could not be scheduled. Windows Vista added the capability to schedule defragmentation – however, only one volume could be defragmented at any given time. Windows 7 removes this restriction – multiple volumes can now be defragmented in parallel with no more waiting for one volume to be defragmented before initiating the same operation on some other volume! The screen shot below shows how defragmentation can be concurrently scheduled on multiple volumes:
Among the other changes under the hood in Windows 7 are the following:
Best practices for using defragmentation in Windows 7 are simple – you do not need to do anything! Defragmentation is scheduled to automatically run periodically and in the background with minimal impact to foreground activity. This ensures that data on your hard disk drives is efficiently placed so the system can provide optimal responsiveness and I can continue to enjoy glitch free listening to the Eagles :-).
Rajeev and Matt
Great read about defrag there, very interesting to have a little elaboration on some of the technical details. I think what ari9910 says is a good idea though. Obviously you've put a lot of work in to ensure that defrag can be easily turned off even mid-process without any damage to the system. Given the work that has been done on that wouldn't it make sense to make more thorough use of it? I personally have the power options set to never spin down the drives, never go into sleep and never hibernate. It switches the monitors off after 15 minutes if the computer hasn't been in use. I noticed that the indexing process automatically kicked in when the monitors were switched off and that was great. An example of the system being intelligent and carrying out processes associated with improving my experience while I'm not using it. It would make sense to be able to set up a rule for running defrag in a similar way. To be able to set it to automatically run defrag when the computer switches the monitors off contingent on defrag not having been run for a certain length of time or perhaps contingent on disk fragmentation reaching a particular level (or some combination of the two) would be a really smart feature. If this option was present it would also make sense to have the system 'remember' that it had started a defrag process which ought to be finished incase the user 'interrupts' it. $0.02
The question is: if my company uses a server to store data - and therefore data is NOT stored on individual machines - is defragging even necessary on the individual machines? Or would it just be needed on the server? Or would the machines still need defragging but way less often?
I set virus scanning and defragmenting to occur every nightat 2am. Usually I turn the computer off at night. Once a week I just leave it on at night and the maintenance happens automagically.
I am using Windows XP. I have a Disk defragmentation and run it once a week. There is white for space, green for unmovable files, blue for contagious files and the red is for fragment files. Now every time I ran the disk defragmentation to analyze it would say you do not have to defragment this file up until last night. So, I did the defragment but I have about 80% blue contagious file and about 30% unmovable. How do I delete these files to free up more space on my computer?
After scheduled the defragmentation, will the disk defragmentation take place at scheduled time without analysing the fragmented percentage? Will it take place no matter what the fragmented percentage is ?
Two things -
1. Where in the registry are the settings that control the default automatic defrag? Specifically, that it is enabled and the day of the week and the time when it runs?
2. Any Eagles fan knows that "Hotel California" is 6 1/2 minutes long, not 3, unless you only listen to some Top 40 station that probably never plays the long version of Light My Fire either.
thank you for your text, and since it is helpful for me, the only way to re-pay you, is from my side of expertise: will you search YouTube with "Beethoven Pathetique Sonata - 2nd Movement". Beside one blonde, you can hear the genuine source of the Hotel California you quoted... The cats would buy Whiskas - but what are buying we?
Pavel Kozák, CZ, EU
It would sure be nice if the desktop had a status windiow
which would show each and every program running in the
background. Like, heres the programs loaded into ram.
These are running right now ands these are not.
And make it so no program can avoid being shown.
So nothing can hide from the administrator
or machine owner.
Why cant it just defrag in the background, using a minimal amount of process power, every time something is deleted? Why can't Windows use its fancy index to copy and move files into fragmented space? The guys at Apple figured this out a long time ago, why does Microsoft resist doing the same?
my problem is my program is not running at all!! i've tryed to do everything but nothing!
All versions of Microsoft Windows include a tool for disk defragmentation. The Windows Disk Defragmenter tool is a limited version of the Diskeeper program from Diskeeper Corporation. Disk Defragmenter does not include all the features available in the full version of Diskeeper.
Microsoft Diskeeper partnership: Microsoft Partner Relationship: support.microsoft.com/.../en-us
Could you please tell me what 0% consolidated means during defrag
firstly, thank you for a well thought out explanation of defragmentation, on different operating systems.
I am a gamer, who also uses a games editor, video editor & generally programs which need a lot of system resources.
so I stop a lot of things running in the background, to achieve this.
the schedule time is no good to me, as I do not use the pc at set times. & turn it off, when not in use.
doing backups has the same problems.
so I turn them off & do manually, as I am about to leave the pc, for a while.
but I can only do this one task at a time.
suggestion: it would be good to add a utility to queue as many different programs, to run, in the order of our choice.
then lock the pc, go out for the day, & come home to all the maintenance tasks completed.
with a report of tasks completed.
when putting the tasks into the utility, a time to completion, would also be good.
so if I will be gone for 1hr 45 mins, I could pick a task or tasks to run in that timeframe.
one of the problems with backups, is files moving while it is happening.
so continuous backups become a problem, if I forgot the defrag is about to happen.
would be good to have a "save as", backup. (when copying large files)
as files windows are happy being on my pc, it stops backing them up, because some of the files, share the same name.
well, after coming back to my pc, hours later, to find it was stopped seconds after I left it, is not good.
I miss the graphical defrag of windows XP.
sorry for getting off topic, a bit. but all maintenance tasks should all be connected, reguardless of vender.
eg. my security software updates, scans, followed by defrag, etc. with the backup last.
so it does a backup of a very healthy pc.
Never say never.
SSD drives can usefully be defragged for empty space consolidation in order to: Create an image to transfer to a smaller drive/partition, truncating it - we want to make sure only empty space is truncated.
Also, why don't Microsoft listen to their customers, we WANT to see progress graphically, it is compelling, it is soothing, it is re-assuring.