Engineering Windows 7

Welcome to our blog dedicated to the engineering of Microsoft Windows 7

Disk Defragmentation – Background and Engineering the Windows 7 Improvements

Disk Defragmentation – Background and Engineering the Windows 7 Improvements

  • Comments 89

One of the features that you’ve been pretty clear about (I’ve received over 100 emails on this topic!) is the desire to improve the disk defrag utility in Windows 7. We did. And from blogs we saw a few of you noticed, which is great. This is not as straight forward as it may appear. We know there’s a lot of history in defrag and how “back in the day” it was a very significant performance issue and also a big mystery to most people. So many folks came to know that if your machine is slow you had to go through the top-secret defrag process. In Windows Vista we decided to just put the process on autopilot with the intent that you’d never have to worry about it. In practice this turns out to be true, at least to the limits of automatically running a process (that is if you turn your machine off every night then it will never run). We received a lot of feedback from knowledgeable folks wanting more information on defrag status, especially during execution, as well as more flexibility in terms of the overall management of the process. This post will detail the changes we made based on that feedback. In reading the mail and comments we received, we also thought it would be valuable to go into a little bit more detail about the process, the perceptions and reality of performance gains, as well as the specific improvements. This post is by Rajeev Nagar and Matt Garson, both are Program Managers on our File System feature team. --Steven

In this blog, we focus on disk defragmentation in Windows 7. Before we discuss the changes introduced in Windows 7, let’s chat a bit about what fragmentation is, and its applicability.

Within the storage and memory hierarchy comprising the hardware pipeline between the hard disk and CPU, hard disks are relatively slower and have relatively higher latency. Read/write times from and to a hard disk are measured in milliseconds (typically, 2-5 ms) – which sounds quite fast until compared to a 2GHz CPU that can compute data in less than 10 nanoseconds (on average), once the data is in the L1 memory cache of the processor.

This performance gap has only been increasing over the past 2 decades – the figures below are noteworthy.

Graph of Historical Trends of CPU and IOPS Performance

Chart of Performance Improvements of Various Technologies

In short, the figures illustrate that while disk capacities are increasing, their ability to transfer data or write new data is not increasing at an equivalent rate – so disks contain more data that takes longer to read or write. Consequently, fast CPUs are relatively idle, waiting for data to do work on.

Significant research in Computer Science has focused on improving overall system I/O performance, which has lead to two principles that the operating system tries to follow:

  1. Perform less I/O, i.e. try and minimize the number of times a disk read or write request is issued.
  2. When I/O is issued, transfer data in relatively large chunks, i.e. read or write in bulk.

Both rules have reasonably simply understood rationale:

  1. Each time an I/O is issued by the CPU, multiple software and hardware components have to do work to satisfy the request. This contributes toward increased latency, i.e., the amount of time until the request is satisfied. This latency is often directly experienced by users when reading data and leads to increased user frustration if expectations are not met.
  2. Movement of mechanical parts contributes substantially to incurred latency. For hard disks, the “rotational time” (time taken for the disk platter to rotate in order to get the right portion of the disk positioned under the disk head) and the “seek time” (time taken by the head to move so that it is positioned to be able to read/write the targeted track) are the two major culprits. By reading or writing in large chunks, the incurred costs are amortized over the larger amount of data that is transferred – in other words, the “per unit” data transfer costs decrease.

File systems such as NTFS work quite hard to try and satisfy the above rules. As an example, consider the case when I listen to the song “Hotel California” by the Eagles (one of my all time favorite bands). When I first save the 5MB file to my NTFS volume, the file system will try and find enough contiguous free space to be able to place the 5MB of data “together” on the disk. Since logically related data (e.g. contents of the same file or directory) is more likely to be read or written around the same time. For example, I would typically play the entire song “Hotel California” and not just a portion of it. During the 3 minutes that the song is playing, the computer would be fetching portions of this “related content” (i.e. sub-portions of the file) from the disk until the entire file is consumed. By making sure the data is placed together, the system can issue read requests in larger chunks (often pre-reading data in anticipation that it will soon be used) which, in turn, will minimize mechanical movement of hard disk drive components and also ensure fewer issued I/Os.

Given that the file system tries to place data contiguously, when does fragmentation occur? Modifications to stored data (e.g. adding, changing, or deleting content) cause changes in the on-disk data layout and can result in fragmentation. For example, file deletion naturally causes space de-allocation and resultant “holes” in the allocated space map – a condition we will refer to as “fragmentation of available free space”. Over time, contiguous free space becomes harder to find leading to fragmentation of newly stored content. Obviously, deletion is not the only cause of fragmentation – as mentioned above, other file operations such as modifying content in place or appending data to an existing file can eventually lead to the same condition.

So how does defragmentation help? In essence, defragmentation helps by moving data around so that it is once again placed more optimally on the hard disk, providing the following benefits:

  1. Any logically related content that was fragmented can be placed adjacently
  2. Free space can be coalesced so that new content written to the disk can be done so efficiently

The following diagram will help illustrate what we’re discussing. The first illustration represents an ideal state of a disk – there are 3 files, A, B, and C, and all are stored in contiguous locations; there is no fragmentation. The second illustration represents a fragmented disk – a portion of data associated with File A is now located in a non-contiguous location (due to growth of the file). The third illustration shows how data on the disk would look like once the disk was defragmented.

Example of disk blocks being defragmented.

Nearly all modern file systems support defragmentation – the differences generally are in the defragmentation mechanism, whether, as in Windows, it’s a separate, schedulable task or, whether the mechanism is more implicitly managed and internal to the file system. The design decisions simply reflect the particular design goals of the system and the necessary tradeoffs. Furthermore, it’s unlikely that a general-purpose file system could be designed such that fragmentation never occurred.

Over the years, defragmentation has been given a lot of emphasis because, historically, fragmentation was a problem that could have more significant impact. In the early days of personal computing, when disk capacities were measured in megabytes, disks got full faster and fragmentation occurred more often. Further, memory caches were significantly limited and system responsiveness was increasingly predicated on disk I/O performance. This got to a point that some users ran their defrag tool weekly or even more often! Today, very large disk drives are available cheaply and % disk utilization for the average consumer is likely to be lower causing relatively less fragmentation. Further, computers can utilize more RAM cheaply (often, enough to be able to cache the data set actively in use). That together, with improvements in file system allocation strategies as well as caching and pre-fetching algorithms, further helps improve overall responsiveness. Therefore, while the performance gap between the CPU and disks continues to grow and fragmentation does occur, combined hardware and software advances in other areas allow Windows to mitigate fragmentation impact and deliver better responsiveness.

So, how would we evaluate fragmentation given today’s software and hardware? A first question might be: how often does fragmentation actually occur and to what extent? After all, 500GB of data with 1% fragmentation is significantly different than 500GB with 50% fragmentation. Secondly, what is the actual performance penalty of fragmentation, given today’s hardware and software? Quite a few of you likely remember various products introduced over the past two decades offering various performance enhancements (e.g. RAM defragmentation, disk compression, etc.), many of which have since become obsolete due to hardware and software advances.

The incidence and extent of fragmentation in average home computers varies quite a bit depending on available disk capacity, disk consumption, and usage patterns. In other words, there is no general answer. The actual performance impact of fragmentation is the more interesting question but even more complex to accurately quantify. A meaningful evaluation of the performance penalty of fragmentation would require the following:

  • Availability of a system that has been “aged” to create fragmentation in a typical or representative manner. But, as noted above, there is no single, representative behavior. For example, the frequency and extent of fragmentation on a computer used primarily for web browsing will be different than a computer used as a file server.
  • Selection of meaningful disk-bound metrics e.g. boot and first-time application launch post boot.
  • Repeated measurements that can be statistically relevant

Let’s walk through an example that helps illustrate the complexity in directly correlating extent of fragmentation with user-visible performance.

In Windows XP, any file that is split into more than one piece is considered fragmented. Not so in Windows Vista if the fragments are large enough – the defragmentation algorithm was changed (from Windows XP) to ignore pieces of a file that are larger than 64MB. As a result, defrag in XP and defrag in Vista will report different amounts of fragmentation on a volume. So, which one is correct? Well, before the question can be answered we must understand why defrag in Vista was changed. In Vista, we analyzed the impact of defragmentation and determined that the most significant performance gains from defrag are when pieces of files are combined into sufficiently large chunks such that the impact of disk-seek latency is not significant relative to the latency associated with sequentially reading the file. This means that there is a point after which combining fragmented pieces of files has no discernible benefit. In fact, there are actually negative consequences of doing so. For example, for defrag to combine fragments that are 64MB or larger requires significant amounts of disk I/O, which is against the principle of minimizing I/O that we discussed earlier (since it decreases total available disk bandwidth for user initiated I/O), and puts more pressure on the system to find large, contiguous blocks of free space. Here is a scenario where a certainly amount of fragmentation of data is just fine – doing nothing to decrease this fragmentation turns out to be the right answer!

Note that a concept that is relatively simple to understand, such as the amount of fragmentation and its impact, is in reality much more complex, and its real impact requires comprehensive evaluation of the entire system to accurately address. The different design decisions across Windows XP and Vista reflect this evaluation of the typical hardware & software environment used by customers. Ultimately, when thinking about defragmentation, it is important to realize that there are many additional factors contributing towards system responsiveness that must be considered beyond a simple count of existing fragments.

The defragmentation engine and experience in Windows 7 has been revamped based on continuous and holistic analysis of impact on system responsiveness:

In Windows Vista, we had removed all of the UI that would provide detailed defragmentation status. We received feedback that you didn’t like this decision, so we listened, evaluated the various tradeoffs, and have built a new GUI for defrag! As a result, in Windows 7, you can monitor status more easily and intuitively. Further, defragmentation can be safely terminated any time during the process and on all volumes very simply (if required). The two screenshots below illustrate the ease-of-monitoring:

New Windows 7 Defrag User Interface

New Windows 8 Defrag User Interface

 

In Windows XP, defragmentation had to be a user-initiated (manual) activity i.e. it could not be scheduled. Windows Vista added the capability to schedule defragmentation – however, only one volume could be defragmented at any given time. Windows 7 removes this restriction – multiple volumes can now be defragmented in parallel with no more waiting for one volume to be defragmented before initiating the same operation on some other volume! The screen shot below shows how defragmentation can be concurrently scheduled on multiple volumes:

Windows 7 Defrag Schedule

Windows 7 Defrag Disk Selection

Among the other changes under the hood in Windows 7 are the following:

  • Defragmentation in Windows 7 is more comprehensive – many files that could not be re-located in Windows Vista or earlier versions can now be optimally re-placed. In particular, a lot of work was done to make various NTFS metadata files movable. This ability to relocate NTFS metadata files also benefits volume shrink, since it enables the system to pack all files and file system metadata more closely and free up space “at the end” which can be reclaimed if required.
  • If solid-state media is detected, Windows disables defragmentation on that disk. The physical nature of solid-state media is such that defragmentation is not needed and in fact, could decrease overall media lifetime in certain cases.
  • By default, defragmentation is disabled on Windows Server 2008 R2 (the Windows 7 server release). Given the variability of server workloads, defragmentation should be enabled and scheduled only by an administrator who understands those workloads.

Best practices for using defragmentation in Windows 7 are simple – you do not need to do anything! Defragmentation is scheduled to automatically run periodically and in the background with minimal impact to foreground activity. This ensures that data on your hard disk drives is efficiently placed so the system can provide optimal responsiveness and I can continue to enjoy glitch free listening to the Eagles :-).

Rajeev and Matt

Leave a Comment
  • Please add 3 and 3 and type the answer here:
  • Post
  • I read somewhere that the defragmenter in Vista can't defragment files over 64MB and doesn't include these files in the list of top fragmented files when you run the analysis from the DOS prompt. Is this true and does this limitation still exist in the Windows 7 defragmenter?

  • Not everything gets defragged.  Some system files are immovable.  See:

    http://support.microsoft.com/kb/227350

    (Files Excluded by the Disk Defragmenter Tool)

    see also:

    http://support.microsoft.com/kb/961095

    http://support.microsoft.com/kb/174619

  • @obsidience:

    Your "zones" proposal does sound like a good idea, but I think it is probably bad both in theory and in practice.

    In theory, it falls foul of a few principles:

    1. Code optimization is an empirical science. You have not measured the cost of not using the zone system. Of course, this is overcome with a fair bit of work. But as far as we know, your proposal may not deliver much in the way of savings, but will create a number of complications.

    2. If you don't want a radical redesign of either the filesystem or the Windows API, you will need to have the defragmenter make assumptions about the semantic content of the files. This is all kinds of bad. First of all, it will probably be wrong. Even if you get it right, it will become wrong with changes to the API. This sort of thing was the number 1 cause of bugs in earlier Windows OSes and their applications. See Raymond Chen's 'The Old New Thing' blog for various rants. It used to be a little more acceptable, because without those optimizations Windows would have run so slowly as to be unusable, but today it is Really Wrong.

    3. Radically changing the API to optimize a single application is probably not a great idea. It passes on the (economic, not clock) cost to other developers.

    4. Radically changing the filesystem to optimize a single application is also probably not a great idea. You are liable to break third party tools, windows tools, etc. that will corrupt the filesystem if they are allowed to run.

    In practice:

    Much of the stuff that belongs in the different zones resides in a single file. You usually store resources like icons within your executable file.

    1. If you do decide to change the API, so that applications have to store their icons separately from their executable, and that sort of thing, many existing applications will not do this. You cannot just break all these applications, or you will get "Windows 8 is broken" from users, so you will need to keep the old 'deprecated' functions. There is absolutely no benefit to the developer of using the 'new and improved' interface, since from their perspective their is no improvement, so they will continue to use the deprecated functions. Even if it does start getting used, it will take 5+ years for the majority of applications to start using it. By that time, NTFS may be replaced, hard drives will have mostly become SSD, etc.

    2. If you change the filesystem, say to add predefined tags, so that the filesystem knows which zone a file belongs in, the same argument holds. Also, you are liable to make the filesystem unreadable by earlier versions of Windows, third party tools, etc. You will also need to ensure that the OS is backwards compatible with the old filesystem: Samba will take a while to incorporate this as best it can, but if you don't do this you will break any number of file servers. People will not say "Oh. Samba is broken. It doesn't follow the new NTFS filesystem properly." They will say "Windows 8 is fscked. Don't let it near a corporate network."

    3. If you avoid making these two changes, apart from the theoretical problem, you still need to face the problem of embedded resources. Should you split the file into two zones? This will slow down copying and writing the file. Or should you cache a copy of the resource in the correct zone. Caching is a useful tool, but apart from the space cost, it runs the danger of becoming dirty, eg. when your application crashes. Usually you can avoid this by cleaning the cache when your application starts. We are talking about an operating system, though. "When the application starts" means when the OS boots up. The cache would then actually serve to slow down the boot sequence. If you don't do this, you *will* get a dirty cache when you have a power failure, etc. But let's put that aside. The system will need to look in your home directory to see what's on the desktop and what's in your start menu. It will then need to see what resources the applications have, presumably by reading them. Then it will go to the cache to load the resource, rather than just getting it from the file.

    I cannot see any obvious benefits here. What you can do, and this is what many other OSes do by default, is put the system and applications in one partition and the data in another. You can do this yourself. Just mount a data partition to the home directory. Obviously it's not a total solution because you probably have a bunch of applications sitting in your desktop or in your "Documents" folder within your home directory, but it may go some way towards your "zone" system.

  • One thing I was expecting from XP and forward is that I could get a better disk map, extended, showing the files or group of files on each "cluster" or "square" at that zoom level. That tells me where each file is.

    Instead of that, and even when they say they've listened -"In Windows Vista, we had removed all of the UI that would provide detailed defragmentation status. We received feedback that you didn’t like this decision, so we listened, evaluated the various tradeoffs, and have built a new GUI for defrag!"

    Say, a new one, with no info! (even not XP disk map)

    I remember (where you took your defrag from) Norton, getting more and more intelligent and user configurable.

    Automatic and user-defined features it supported.

    Placing files according filter conditions at the beggining/end/middle of the drive.

    This meant I could move the fixed size pagefile to the end, the boot files to the beggining (at that time there were no software updates), and the rest in the middle of the disk.

    A complete disk map

    A set of other feature I don't recall right now. But that was the Norton Utilities, before it was Symantec.

  • SSDs have internal "wear-balancing" software that will intend to use ALL the cells in the drive the same amount of times. That means, you modify an existing file, and it will save the new blocks into a different place of a chip or in a different chip intentionally. No matter what the file system thinks where the file is really located.

    So, even if you defrag them, or use a tool to image the drive and restore it (some of them restore the files in a contiguous fashion) it will fragment big time internally.

  • RE.

    "Nice summary of graphs and charts in the beginning, but I feel like big "feature" is missing here.

    Perhaps it is not directly related to disk fragmentation (though I though I heard it was back in early Vista days), but what about "aligning" of software for faster loads.

    Great example is during system boot - the OS knows it will need a lot of drivers, registry, executable files (dlls, etc). Aligning all those for one or very few contiguous load would significantly improve startup.

    Granted this is something that can be "prepared" during the initial OS setup, but overtime as you add hardware, update kernel pieces (security, anyone?), what will re-align these pieces for fast load?

    Plus, what about Icons and Background graphics and other such "nonsense" - this is all part of "creating" the user desktop, and the faster it happens the better the experience!

    Tuesday, February 03, 2009 1:14 PM by adir1

    TRY :

    start command prompt Right click and select "run as administrator

    Type defrag c: /b

    Command prompt shows :

    C:\Windows\system32>defrag c: /b

    Microsoft Disk Defragmenter

    Copyright (c) 2007 Microsoft Corp.

    Invoking boot optimization on (C:)...

    Pre-Defragmentation Report:

           Volume Information:

                   Volume size                 = 1,81 TB

                   Free space                  = 1,58 TB

                   Total fragmented space      = 0%

                   Largest free space size     = 920,24 GB

           Note: File fragments larger than 64MB are not included in the fragmentat

    ion statistics.

    The operation completed successfully.

    Post Defragmentation Report:

           Volume Information:

                   Volume size                 = 1,81 TB

                   Free space                  = 1,58 TB

                   Total fragmented space      = 0%

                   Largest free space size     = 20,00 MB

           Note: File fragments larger than 64MB are not included in the fragmentat

    ion statistics.

    C:\Windows\system32>

    Wheeee !!!

  • How stupid are you people!?!? So, go figure, I just found out that my Vista Ultimate machine cannot Analyze a Drive to even see if it needs to be defragged and after some searching, I am lead here to find that this was a purposeful decision on the part of someone who must have at least a little intelligence.

    I DO NOT KEEP MY COMPUTER ON 24/7 MORONS! Most home users don't. I knew enough to check up on whether I needed to Defrag every month or so. Now, I have no choice and no way of telling the progress.  And when I turn my laptop on, on a Wednesday evening apparently and my Defrag is running... I don't even know it.  Maybe that's why my system sometimes lags, but it sure would have been nice to have been clued in on why!  And you wonder why people make the jump to Apple!?  You alienate your own users.

  • Its very useful information to everyone. am glad to wish your work . thanks for sharing. keep try to updating. its works a lot.

  • I wonder why Microsoft's engineers took so much time to get it and I am not sure if they completely get why other defrag programs are more effective.

    Speed up a disk access (aka reducing total seek-latency by reducing disk seeks) is not related to only the fragment of a files but to the location of multiple files. Where these files are located on the disk (the beginning is usually faster than the end of the disk) and how far are the files used by a process in relation with each other make all the difference. Some years ago I posted clearly here how it should work: http://www.mydefrag.com/forum/index.php?topic=117.0

    Jeroen, the author of MyDefrag (formerly JkDefrag) got it. Other people got it too (see the post).

    Once the previous problem is solved the previous problem, reducing the defragmentation time is another beast. The best solution should aim to find out where all the files should be on the decompressed volume before the defragmentation starts (do it algorithmically in RAM) and the quickest way to get the file there.

    Strategies that are going to plan the disk zones so it will not become quickly fragmented again are other things to consider.

    Finally, incremental solutions vs optimize all at once are also viable solutions but they complicate the scenario. Personally I would prefer to have the computer run the defrag every other month if I am sure that at the end my computer would be really faster and do not get fragmented quickly more than before running the defrag.  

    I hope this help.

    Best,

      Chris

    When will we have a drefrag.

  • You can schedule a defrag in Windows XP:

    Scheduled Tasks --> Browse for a new task and create:--> C:\Windows\system32\defrag.exe c: -f

    Yay.

  • I understand from this post that the defrag doesn't just run on a Wednesday at 1am but also whenever the system is idle. This I have found to be true on my system - but I have an issue that the defrag runs non-stop 24/7 every day (unless I use the machine - in which case it stops). I find the noise from the computer irritating which let me to look into this. Question - why would it be constantly running when it reports only 3% fragmentation? Could this be to do with the computer have a number of VERY LARGE files 20Gb+ (virtual machines of all sorts) ? I am tempted to turn off defrag as I fear it will burn out my disks with all this activity. Any advice on this would be appreciated. (PS. please delete all angry people comments from this page - they get in the way of those who are professional and trying to be constructive). Good post.

  • So in W7 defrag

    -you can watch status.

    -move some more files

    -do defrag parallel on disks

    What about files used more often placed at fast part of disk?

  • I couldn't help but laugh at this ..

    One thing I was expecting from XP and forward is that I could get a better disk map, extended, showing the files or group of files on each "cluster" or "square" at that zoom level. That tells me where each file is.

    WHY? .. what possible use is this sort of information to the end user??

    I'm happy to say that I have far better things to do with my time than watch silly pointless graphs and worry about this sort of thing!

  • I can sit for hours and hours playing with defragging operations. totaly engrossing! too bad work has stopped on Ultimate Defrag. seems to have been one of the more innovative ones out there recently.

  • it featured such things as file order placement, and a rough graphical rep. of the disk surface, more or less. and it showed where exact files were exactly. And it let you put specific files at the front or back o'the disk for archiving or high performance. Like man, you put all your windows and apps up front, and all your old pst and data and zip files at the back, for either sewper fast axis or slow and lame archive performance.. T'was a good thing!

Page 4 of 6 (89 items) «23456