Engineering Windows 7

Welcome to our blog dedicated to the engineering of Microsoft Windows 7

Windows Desktop Search

Windows Desktop Search

One of the points of feedback has been about disabling services and optionally installing components—we’ve talked about our goals in this area in previous posts.  A key driver around wanting this type of control (but not the only driver) is a perception around performance and resource consumption of various platform components.  A goal of Windows is to provide a reliable and consistent platform for developers—one where they can count on system services as being available, as well as a set of OS features that all customers have the potential to benefit from.  At the same time we must do so in a way that is efficient in system resource usage—efficient enough so the benefit outweighs the cost.  We recognize that some percentage of customers believe solving this equation can only be done manually—much like some believe that the best car performance can only come from manual transmission.  For this post we’re going to look into the desktop search functionality from the perspective of the work we’re doing as both a broadly available platform component and to provide the rich end-user functionality, and also look at the engineering tradeoffs involved and techniques we use to build a great solution for everyone.  Chris McConnell, a principal SDE on the Find and Organize team, contributed this post.  --Steven

Are you one of those folks who believes that search indexing is the cause of your drive light flashing like mad? Do you believe this is the reason you’re getting skooled when playing first person shooters with friends? If so, this blog post is for you! The Find and Organize team owns the ‘Windows Search’ service, which we simply refer to as the ‘indexer’. A refrain that we hear from some Vista power-users is they want to disable the indexer because they believe it is eating up precious system resources on their PC, offering little in return. Per our telemetry data, at most about 1.5% of Vista users disable the indexing service, and we believe that this perception is one motivator for doing so.

The goal of this blog post is to clarify the role of the indexer and highlight some of the work that has been done to make sure the indexer uses system resources responsibly. Let’s start by talking about the function of the indexing service – what is it for? why should you leave it running?

Why Index?

Today’s PCs are filled with many rich types of files, such as documents, photos, music, videos, and so on. The number of files people have on their PC is growing at a rapid pace, making it harder and harder for them to find what they’re looking for, no matter how organized their files may (or may not) be. Increasingly, these files contain a good deal of structure, with metadata properties which describe their contents. A typical music file contains properties which describe the artist, album name, year of release, genre, duration of the song, and others which can be very useful when searching for music.

Although search indexing technologies date back to the early days of Windows, With Windows Vista Microsoft introduced a consumer operating system that brought this functionality to mainstream users more prominently. Prior to Vista, searching was pretty rudimentary – often a brute force crawl through the files on your machine, looking only at simple file properties such as file name, date modified, and size, or an application specific index of application specific data. Within Windows, a more comprehensive search option allowed you to also examine the contents of the files, but this wasn’t widely used. It was fairly basic functionality – it treated all files just the same, without the tapping in to the rich metadata properties available in the files.

In Windows Vista, the indexing service is on by default and includes expanded support in terms of the number of file formats and properties which are indexed. The indexer watches specific folders on your PC and catalogues their contents to facilitate fast searching of those files. When Windows indexes your music files, it also knows how to extract the music-specific properties which you’re most likely to search for. This enables support for more powerful searches and richer views over your files which wasn’t possible before. But this indexing doesn’t come free, and this is where engineering gets interesting. There’s a non-zero cost (in terms of system resources) that has to be paid to enable this functionality, and there are trade-offs involved in when and how you pay that price. There is nothing unique to indexing—all features have this cost-benefit tradeoff. 

Trade-Offs

Many search solutions follow(ed) the traditional “grep” model which means every search will read all of the files you wanted to search. In this case, you paid with your time as you waited for the search to execute. The more files you searched, the longer you waited each time you searched. If you wanted to perform the same search again, you would “pay” again. And the value you were getting in return wasn’t very good since the search functionality wasn’t particularly powerful. With Windows Vista , the indexer tries to read all of your files before you search so that when you search, it’s generally quicker and more responsive. This requires the indexer to scan all of your files just once initially, and not each and every time you perform a search. If the file were to change, the indexer would receive a notification (a “push” event) so that it could read that file again. When the indexer reads a file, it extracts the pertinent information about the file to enable more powerful searches and views. The challenge is to do this quickly enough so that the index is always up to date and ready for you to search, but also doing so in such a way that it doesn’t impact the performance of your system in a negative way. This is always a balancing act requiring trade-offs, and there are a number of things the indexer does to maintain its standing as a good Windows citizen while working to make sure that the index is always up-to-date.

A Model Citizen

A lot of work has gone into making the indexer be a model Windows citizen. We’ve written an extensive whitepaper on the issue, but it’s worth covering some of the highlights here. First and foremost, the indexer only monitors certain folders, which limits the amount of work it needs to do to just those files that you’re most likely to search. The indexer also “backs off” when you are actively using your PC. It indexes files more slowly, or stops entirely depending on the level of activity on the PC. When the indexer is reading files it uses low priority I/O and CPU and immediately releases the file if another application needs access.

It’s critical that we get all of these issues right for the indexer, because it’s not only important for the features that our team builds (like Windows Search), but it’s important to the Windows platform as a whole. There are a host of applications which require the ability to search file contents on the PC. Imagine if each one of those applications built their own version of the indexer! Even if all of these applications did a great job, there will be a lot of unnecessary and redundant activity happening on your PC. Every time you saved one of your documents there will be a flurry of activity as these different indexers rushed to read the new version. To combat that, the indexer is designed to do this work for any application which might choose to use it and provide an open platform and API with flexibility and extensibility for developers. The API designed to be flexible enough to meet needs across the Windows ecosystem. Out of the box, the indexer has knowledge of about 200 common file types, cataloging nearly 400 different properties by default. And there is support for applications to add new file types and properties at any time. Applications can also add support for indexing of data types that aren’t file-based at all, like your e-mail. Just a few of the applications that are leveraging the indexer today are Microsoft Office Outlook and OneNote, Lotus Notes, Windows Live Photo Gallery, Internet Explorer 8, and Google Desktop Search. As with all extensible systems, developers often find creative uses for components for the system services. One example of this is the way the Tablet PC components leverage the index contents to improve handwriting accuracy.

Constantly Improving

We’re constantly working to improve the indexer’s performance and reliability. Version 3 shipped in Windows Vista.  Major improvements in this version included:

  • The indexer runs as a system service vs. as a per user process.  This minimizes impact on multi-user scenarios e.g. only one catalog per system results in reduction in catalog size and prevents re-indexing of the same content over and over.  Additional benefit is gained from the robust nature of services.
  • The indexer employs low priority I/O to minimize impact of indexing on responsiveness of PC.  Before Windows Vista, all I/O was treated equally.

We’ve already released Windows Search version 4 as an enhancement to either Windows XP or Vista which goes even further in terms of performance and stability improvements, such as:

  • Significant improvements across the board for queries which involve sorting, filtering or grouping. Example improvements on Vista include:
    1. Getting all results while sorting or grouping has been improved. Typical query improvements  are up to 38% faster.
    2. CPU time has been reduced by 80%
    3. Memory usage has been reduced by 20%
  • Load on Exchange servers is reduced over 95% when Outlook is running in online mode.  With previous versions of Windows Search, large numbers of Outlook clients running in online mode could easily overwhelm the Exchange server.
  • Reliability improvements including:
    1. We made a number of fixes to address user-reported situations that previously caused indexing to stop working.
    2. We improved the indexer’s ability to both prevent and recover from index corruptions.  Now, when catalog corruption is detected it is always rebuilt automatically – previously this only happened in certain cases.
    3. We added new logging and events to help track down and fix reliability issues.

And we’ve done even more to improve performance and reliability for the indexer in Windows 7 which you’ll soon see at the PDC. If you still believe that the indexer is giving you trouble, we’ve got a few things for you to try:

  • Download and install Windows Search 4 (on Vista or XP).
  • Download and install the Indexer Gadget from the Windows Live Gadget Gallery (Vista only). This gadget was written by one of our team members, and gives you a quick way to view the number of items indexed. It also allows you to pause indexing, or to make it run full-speed (without backing off).
  • If you‘re one of those people who like to get under the hood of the car and poke around the engine, you can use the Windows Task manager and/or Resource Monitor to monitor the following processes: SearchIndexer, SearchFilterHost, SearchProtocolHost.

If you feel as though your system is slow, and you suspect the indexer is the culprit, watch the gadget as you work with your PC. Is the number of indexed items changing significantly when you’re experiencing problems? If you pause the indexer, does your system recover? We’re always looking to make our search experience better, so if you are still running into issues, we want to hear about them. Send your feedback to idx-help@microsoft.com.

Chris McConnell

Find and Organize

Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post
  • WDS is great. I use it since the 2.6 version on XP. Now I'm on Vista since a couple of weeks.

    I didn't understand why on XP WDS 4 couldn't let you set your shortcut to the wds search box as you were able to in the previous version. WDS 4 indexes a lot better but not letting users choose something like this is lame. I hated that (workaround: install 2.6, set shortcuts then install 4.0).

    One of the awesomeness of WDS was that "everything launcher" you could set up.

    I was crazy to say the least that Vista doesn't have this feature, I had to download a third party tool just to behave like WDS 2.6.

    So for Windows7, Desktop Search feature list:

    -WDS mappable on the shortcut I want.

    -WDS ui can be set everywhere (to the point of a gadget style 70% transparency window).

    -"everything launcher" back without third parties.

    -WDS capable to understand that when I use this or that app, do not index (should be very useful in a DAW world).

    Thanks!

  • I like Vista Search and I use it all the time. I never disabled Indexing nor do I plan to do so, but it made me notice that this latest entry is more like a Vista ad than a Windows 7 engineering blog entry. So let's get back to the plans for 7, shall we?

    As someone else mentioned, disk IO is a precious resource, especially on today's laptop hard drives. I am finding myself watching the blinking hdd LED on my laptop ever more often. Whether it's Superfetch, ReadyBoost or Indexing, they should be more conservative with IO. As soon as ssd goes mainstream, Windows can go and have fun with the drive, but until then some restraint would be beneficial. I had to disable ReadyBoost, for example, because for some reason, it deletes and rebuilds its cache on each and every reboot, causing too much disk activity on startup (2-3 minutes until the desktop is usable). Also, it seems to me that UAC prompts too make the disk go crazy for a few moments.

  • I forgot to mention in my latest comment one last suggestion: integrate Start++ like funtionality into W7 Search customizable with plugins. Default plugins could me made of Windows apps like WMP. Integrating this kind of Search experience with Speech Recognition would make W7 start playing Coldplay when I say "play Coldplay, dammit!".

  • Privacy fears might be one reason why people disable indexing service. There are even some privacy tutorials that mentions this. Of course you use need to use something like Eraser to securely wipe files. But is file entry removed from index once you hit delete?

    Index, word has a bad reputation attached to it. Everyone knows (corrupted) index.dat files which don't get cleared.

    By the way Recycle Bin only offers Empty Recycle Bin option it would be nice to have built in Shred Recycle Bin option also.

  • 2nd that Actually very IMPORTANT.

    SECURE deletion.

  • The "Desktop Search" doesn't find things I expect it to find, like a text file with a random extension.

    The normal Vista search confuses me.  For example, I can never tell if it's searching for filenames or actual content.  When I cancel a search and then change the search, I can't tell if the change got through.  And you can't see what it's doing at the moment, the green bar just stands still on the right side of the toolbar.  There's no status indication like "searching c:\windows\winsxs\..." which could give you a hint at why it's taking so long, or confirm that it's actually looking where you want it to look.

    So (even though it looks like I'm the only one) I just wanted to say that Windows Search isn't yet very usable for me.

  • I am a translator and I have thousands upon thousands of Office documents on my hard drive; from time to time I need to search for specific words or expressions in documents that are many years old. Theoretically at least, I am the perfect candidate for Windows Desktop Search.

    However, I only use WDS to index programs in the Start menu. For all other searches I prefer Copernic Desktop Search, which has much better UI (for my needs, at least) and the same low-IO priority feature. It has also been available on the market ever since XP SP2, and after Vista SP1 it has even integrated very nicely in the shell. So for me, WDS is too little, too late.

  • Actually I was very disappointed when I first used Vista's search.

    Common knowledge from XP.

    We type a word and we expect to get files on the current directory only.

    No way. We get results from any attributes and from all subfolders.

    We insist. There must be a way. We learn we need to know the syntax, enable physical language, use WIN+F interface or advanced search.

    But the syntax is somewhere hidden in the clouds.

    Is there an easy way to search the current dir only?

    We enter advanced search, choose the dir we want, deselect the subfolders one by one. Why???

    Then we try another search and the results are in the clouds again.

    Search just crashes this way.

    We need to close it every time and open it again.

    Spare me.

    Can't we search a single directory the easy way? How about .\*.any, or right click on a folder to search just that folder either like the old good days?

    Search also crashed on me in the past in certain cases (trying to find *.any files in windir conflicted with another program - a theme I think lol).

    The bottom line?

    I love Windows Search 4 on Vista.

    It is just great, despite all the small glitches I mentioned.

    I regret I didn't insist to find the way it works much earlier.

    Now I use it a lot.

  • It sounds like, from what people are saying, that WDS has improved massively since XP. I found it so bad that I just instinctively avoid doing search in Windows apart from dumb searching via the command line (normally I'm looking for files by name).

    I will have to give search on Vista a try, to see how its improved.

  • well, Windows desktop search 4.0 is pretty good than its predecessor. I would like to suggest a Reverse Engineering Search!. In searching of pictures, suppose that we have a million pictures, say pictures of people. Suppose we are giving a picture of a person, can there be a reverse engineering search where it displays the results according to the relevancy in the matches of the available pictures?!. In short words, we upload a photo and the search engine looks for matches in the available picture and display's the results!.

  • "The indexer also “backs off” when you are actively using your PC. It indexes files more slowly, or stops entirely depending on the level of activity on the PC. When the indexer is reading files it uses low priority I/O and CPU and immediately releases the file if another application needs access."

    In Windows Media Player this is definitly not true.

    I have set WMP11(Vista) to index my music-folder.

    Whenever i move an album to this directory my harddrive start thrashing and my cpu will start working even if i'm doing other stuff at the same time.

    If i try to rename the direcory just after i've copied it i will fail because the directory is being "used" (hmm, indexer maybe?).

    Moving files to other directories goes instantly without thrashing.

  • [What is happening to Comment submission here?  have tried 5 times to submit a comment under both FF3 and IE7. Each time the comment "appears" to be received but doesn't appear. I'm also taken back to the http://blogs.msdn.com/e7/default.aspx page rather than seeing my comment or some text along the lines of "your comment will be reviewed by a moderator."

  • Hmmm - I think there might be a character-limit bug involved here for comment submission. Perhaps a "characters left" field? And please please make this comment box as wide as the comment display area!

  • Put it this way, if I was searching for a particular system file on my HDD and I mistyped the name of the file, then it's faster to query Google, let Google correct the name for me, and then download the file from an internet server, than to wait for Windows Search to find _what I wanted_ on my HDD.

  • Vista search features are still very underwhelming for me.

    1. Searching on start menu items is hampered in that it only matches leading substrings e.g. search for 'stitch' and 'Canon Photostitch' is not found.

    2. Various search elements do not allow for conflation of accented characters. e.g. if I search for music composed by "Handel" then it won't optionally extend to searching for "Haendel" (I typed the accented a, but this form field is not accepting it). This is an issue within WMP's search as well - you never know how precise the metadata it downloads is going to be. actually most of WMP's search and filtering is designed in isolation to everything else that MS does. For example, create a smart-playlist that searches for Artist=X or Composer=X, and it returns a double result-set for every item that satisfies both. That makes a mess of playlists with this doubling or tripling up of some items (Resolved BY DESIGN).

Page 4 of 10 (138 items) «23456»