Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
One of the points of feedback has been about disabling services and optionally installing components—we’ve talked about our goals in this area in previous posts. A key driver around wanting this type of control (but not the only driver) is a perception around performance and resource consumption of various platform components. A goal of Windows is to provide a reliable and consistent platform for developers—one where they can count on system services as being available, as well as a set of OS features that all customers have the potential to benefit from. At the same time we must do so in a way that is efficient in system resource usage—efficient enough so the benefit outweighs the cost. We recognize that some percentage of customers believe solving this equation can only be done manually—much like some believe that the best car performance can only come from manual transmission. For this post we’re going to look into the desktop search functionality from the perspective of the work we’re doing as both a broadly available platform component and to provide the rich end-user functionality, and also look at the engineering tradeoffs involved and techniques we use to build a great solution for everyone. Chris McConnell, a principal SDE on the Find and Organize team, contributed this post. --Steven
Are you one of those folks who believes that search indexing is the cause of your drive light flashing like mad? Do you believe this is the reason you’re getting skooled when playing first person shooters with friends? If so, this blog post is for you! The Find and Organize team owns the ‘Windows Search’ service, which we simply refer to as the ‘indexer’. A refrain that we hear from some Vista power-users is they want to disable the indexer because they believe it is eating up precious system resources on their PC, offering little in return. Per our telemetry data, at most about 1.5% of Vista users disable the indexing service, and we believe that this perception is one motivator for doing so.
The goal of this blog post is to clarify the role of the indexer and highlight some of the work that has been done to make sure the indexer uses system resources responsibly. Let’s start by talking about the function of the indexing service – what is it for? why should you leave it running?
Today’s PCs are filled with many rich types of files, such as documents, photos, music, videos, and so on. The number of files people have on their PC is growing at a rapid pace, making it harder and harder for them to find what they’re looking for, no matter how organized their files may (or may not) be. Increasingly, these files contain a good deal of structure, with metadata properties which describe their contents. A typical music file contains properties which describe the artist, album name, year of release, genre, duration of the song, and others which can be very useful when searching for music.
Although search indexing technologies date back to the early days of Windows, With Windows Vista Microsoft introduced a consumer operating system that brought this functionality to mainstream users more prominently. Prior to Vista, searching was pretty rudimentary – often a brute force crawl through the files on your machine, looking only at simple file properties such as file name, date modified, and size, or an application specific index of application specific data. Within Windows, a more comprehensive search option allowed you to also examine the contents of the files, but this wasn’t widely used. It was fairly basic functionality – it treated all files just the same, without the tapping in to the rich metadata properties available in the files.
In Windows Vista, the indexing service is on by default and includes expanded support in terms of the number of file formats and properties which are indexed. The indexer watches specific folders on your PC and catalogues their contents to facilitate fast searching of those files. When Windows indexes your music files, it also knows how to extract the music-specific properties which you’re most likely to search for. This enables support for more powerful searches and richer views over your files which wasn’t possible before. But this indexing doesn’t come free, and this is where engineering gets interesting. There’s a non-zero cost (in terms of system resources) that has to be paid to enable this functionality, and there are trade-offs involved in when and how you pay that price. There is nothing unique to indexing—all features have this cost-benefit tradeoff.
Many search solutions follow(ed) the traditional “grep” model which means every search will read all of the files you wanted to search. In this case, you paid with your time as you waited for the search to execute. The more files you searched, the longer you waited each time you searched. If you wanted to perform the same search again, you would “pay” again. And the value you were getting in return wasn’t very good since the search functionality wasn’t particularly powerful. With Windows Vista , the indexer tries to read all of your files before you search so that when you search, it’s generally quicker and more responsive. This requires the indexer to scan all of your files just once initially, and not each and every time you perform a search. If the file were to change, the indexer would receive a notification (a “push” event) so that it could read that file again. When the indexer reads a file, it extracts the pertinent information about the file to enable more powerful searches and views. The challenge is to do this quickly enough so that the index is always up to date and ready for you to search, but also doing so in such a way that it doesn’t impact the performance of your system in a negative way. This is always a balancing act requiring trade-offs, and there are a number of things the indexer does to maintain its standing as a good Windows citizen while working to make sure that the index is always up-to-date.
A lot of work has gone into making the indexer be a model Windows citizen. We’ve written an extensive whitepaper on the issue, but it’s worth covering some of the highlights here. First and foremost, the indexer only monitors certain folders, which limits the amount of work it needs to do to just those files that you’re most likely to search. The indexer also “backs off” when you are actively using your PC. It indexes files more slowly, or stops entirely depending on the level of activity on the PC. When the indexer is reading files it uses low priority I/O and CPU and immediately releases the file if another application needs access.
It’s critical that we get all of these issues right for the indexer, because it’s not only important for the features that our team builds (like Windows Search), but it’s important to the Windows platform as a whole. There are a host of applications which require the ability to search file contents on the PC. Imagine if each one of those applications built their own version of the indexer! Even if all of these applications did a great job, there will be a lot of unnecessary and redundant activity happening on your PC. Every time you saved one of your documents there will be a flurry of activity as these different indexers rushed to read the new version. To combat that, the indexer is designed to do this work for any application which might choose to use it and provide an open platform and API with flexibility and extensibility for developers. The API designed to be flexible enough to meet needs across the Windows ecosystem. Out of the box, the indexer has knowledge of about 200 common file types, cataloging nearly 400 different properties by default. And there is support for applications to add new file types and properties at any time. Applications can also add support for indexing of data types that aren’t file-based at all, like your e-mail. Just a few of the applications that are leveraging the indexer today are Microsoft Office Outlook and OneNote, Lotus Notes, Windows Live Photo Gallery, Internet Explorer 8, and Google Desktop Search. As with all extensible systems, developers often find creative uses for components for the system services. One example of this is the way the Tablet PC components leverage the index contents to improve handwriting accuracy.
We’re constantly working to improve the indexer’s performance and reliability. Version 3 shipped in Windows Vista. Major improvements in this version included:
We’ve already released Windows Search version 4 as an enhancement to either Windows XP or Vista which goes even further in terms of performance and stability improvements, such as:
And we’ve done even more to improve performance and reliability for the indexer in Windows 7 which you’ll soon see at the PDC. If you still believe that the indexer is giving you trouble, we’ve got a few things for you to try:
If you feel as though your system is slow, and you suspect the indexer is the culprit, watch the gadget as you work with your PC. Is the number of indexed items changing significantly when you’re experiencing problems? If you pause the indexer, does your system recover? We’re always looking to make our search experience better, so if you are still running into issues, we want to hear about them. Send your feedback to email@example.com.
Find and Organize
3. I can't get it to just Search Everywhere by default. Most often when I can't find something, it's because an app has saved or downloaded the file into a hidden or system folder (the program folder or a browser or user cache).
4. Someone made a point about target folders moving about too quickly so you can't click on what you want. This bugged the hell out of me until I reordered "Recent Places" by Date instead of by Name. However because Vista can't hold onto any shell customization for more than half an hour, that quickly reverted.
5. Performance: quite often it is faster for me to open a CMD window, type dir /s *.txt and get the result thus than to wade through the mess of Search Window results which bring back full-text results when I just want file names.
6. It really peeves me that you don't attempt to suggest alternate search strings based on keywords in the index (to cover cases like "wrod" above). Considering that I + one dev created a query control that did this look up AND did red-squiggle spell-checking AND did it multi-lingually back in 1997. Oh and yes it shipped as part of Picture It! image search and Office Clip Art gallery search a full decade ago.
Put it this way, if I was searching for a particular system file on my HDD and I mistyped the name of the file, then it's faster to query Google, let Google correct the name for me, and then download the file from an internet server, than to wait for Windows Search to find _what I wanted_ on my HDD.
I would like to see 2 menu items when I right click a folder in Explorer:
"Add this folder to search index" - Quickly add new folders including subfolders
"Re-index this folder" - When I think that not all files were found by Windows Search
And maybe in right click menu on files: "Add this file type to search index"
Would make things a lot easier.
As far as how to make the UI for Search better, I think most commenters would be made happy if you just copy the search UI from the Mac. I bring up Finder, click the search box, and start typing a filename. The Mac finds it, because the Mac indexes the whole drive by default, I believe. Vista doesn't find it, because I haven't specifically added that folder to the indexer. Also, you get a heading above the results area giving you a choice to search "This Mac" or the current directory, and next to that you can choose to search "Contents" or "File Name". And those selections don't cause a re-search; they just filter the results. Brilliantly simple. If the Mac doesn't have this UI patented or something, just copy it.
Also, there needs to be a keyboard shortcut to get to the Search bar in Windows Explorer. You've got Alt+D to go to the address bar. Why not Alt+S to go to the Search bar?
Well, that's the thing - Mac OSX's indexing has been "real time" since OSX 10.4, and I expected Vista's to be the same - and was dissapointed to find that it wasn't.
WDS 4.0 brought some great changes (unfortunately you borked the UI on XP), I find results as fast as OSX 10.5 now, but I agree it would be nice if you never had to specify a folder to be indexed, it would just happen automatically. This comes at a price though - for example, stick a USB HD in a Mac and you can't search it until Spotlight is finished indexing it, which sucks. Indexing speed is very, very fast though - MS still has some work to do in this area by comparison, albeit MS seems more sensitive to processor usage than Apple does. The wide variety of industries and applications likely mean that MS simply can't construct their software in the same way as their competitors as they simply have far more on their plate to consider.
I do agree that the UI needs to be improved, and significantly so. For example, the results in OSX 10.5 display the icons/previews of files in different ways depending upon the content, for example you can have a small details listing of hits for the keyword you specified in email messages, but below that a large preview section will be shown for images, which makes perfect sense.
With Vista, it's all or nothing - you get a window of results, then you have to change the view style for all at the same time - I don't need 128x128 icons of email messages, I simply want a list - but if the indexer finds some images with that correspond with my keyword, it makes sense that they're displayed in a large icon so I can view the contents.
What's even worse though is when you start scrolling. Regardless if it's on an integrated graphics chipset or an 8800GT, eventually you'll have to take your hand off the scrollbar for Vista WDS results display to "catch up", which makes quickly scrolling to see your hit an incredible pain.
Echo that being able to specify a filename or search for contents would be greatly welcomed, Vista sometimes gives bizarre results back, especially as I'm searching for the filename or email subject 10 times more than I'm searching for content. It's very frustrating to have a file names, say joeblowinvoice45.xlsx, do a search for "45" and not see that hit anywhere if the first two pages. Of course, search for joeblow, it's there.
Also, I agree that overall the combination of I/O services is straining some systems today, but was just NUTS for the time of Vista's release, where 1 gig 4200rpm laptops were still being sold in quantity, and were likely some of the most popular consumer machines at the time and even more so as laptop sales continue to grow as a percentage. You had the indexer, superfetch, Windows defender, system restore, previous versions etc all touching the HD in basically any usage scenario - no wonder people developed a negative reputation of Vistas performance. Tone down all that disk I/O!
I guess the file system is the source of the index issues. That files does not get indexed. WinFS would do the job. And of course you would not need the whole indexing procedure. It would be automatically done during file creation. The way it should be.
Search should not be core functionality of the OS. Sorry it a little off topic but modern OS's are going this way NetBSD is the best example it comes as a complete base OS then you build it up to what you need it to do. My idea of a perfect windows is to be able to purchase it without any optional pieces then add what i need, as i need it for whatever purpose the machine serves from Gaming platform to workstation to server. The vista model tried to emulate this very poorly by confusing OS functionality with features, 2008 server is much better on this but could use quite a bit of help becasue you guys are still confusing functionality with features; even with the server core implementation.
If the new x64 windows doesn't index network shares I'm going to beat the entire search team with an ugly stick.
Where did it go? Why can't we have it back? And for the love of all that's good and holy put it in Windows 7.
Uh, WDS 4.0 indexes network shares.
gonzc900 a total rewrite?
What are you on? why throw away years of well tested code? Yes, by all means update it. It does annoy me when people refer to Windows as needing a total rewrite, when neither Mac OSX (based on 30 year old BSD operating system) or Linux (based on a 18 year old clone of a 30 year old operating system!).
What we need is less legacy support, as that is what often contains the security bugs. Take the recent GDI+ flaw. I think Microsoft should make GDI applications look visually older than WPF applications, as this would be the biggest intensive for developers to upgrade. We also need more rich .NET controls for WPF!
The Indexing is impressive because searches return results instantly.
Allowing API tapping the index is a great idea bringing its full sens t this service.
The gadget to see the content and to turn it on and off is great. Before reading the post I already wanted to talk about that.
But why it has to be a gadget? Turning it on and off to free resources when we need 100% of our computer is not a gadget IMO.
Looking at the content may be more "mundane", yet it may be helpful to know what type of info there is and what keywords to enter.
I wouldn't emphasize too much about indexing multimedia meta datas: I know that among the hundreds of pictures on my PC none of them contain relevant meta data.
None of my mp3 rips (music CD to mp3) contains such a metadata, but are perfectly sorted. Other media stuffs may contain meta datas but I don't know them or if there are any... etc But ok, the idea is good.
The only source of imperfection (beside preformance hit) is the lack of configuration.
I cry for more configuration with the indexing service: Like which folder/subfolder to scan or to ignore. Wich file extention/name (with*wildcard) to include/exclude.
This is extremely important.
For example I have thousands of file in some folders, I know perfectly what it is and in no the indexing service should search there.
Off Topic: Chris McConnel wrote
"Applications can also add support for indexing of data types that aren’t file-based at all, like your e-mail"
Since Outlokk Express "Windows Mail", mails are file based.
-prompt to index after huge changes in the file system (basicaly when poeple copy their files to their new computer or when copying the content of a data cd onto the hdd)
-search single/current directory, or a selected directory easier
-search either filename or text content/metadata or both in a clearly separate manner
(wtroot worte:"The normal Vista search confuses me. For example, I can never tell if it's searching for filenames or actual content. ")
-make easier to or remove a directory from the index list
"Of course, if Vista had a proper explorer, you wouldn't need to search in the first place...."
I wished I could install Windows without Explorer. I immediately install Xplorer2 because if I use Vista's Explorer, after 2 minutes I start banging my head on the desk (already destroyed 3 keyboards doing so -lol). Steven, I hope there will be a huge blog topic discussing that...
"What *should* have happened is that a new file system should have been created where updating the index was part of the file create/modify API. That would mean you *always* had an up to date index, and the system would *never* need to be scanned."
Interresting comment but I don't know if it's realistic/realisable and if it won't slow down the PC even more than indexing at the end.
I tried few versions of desktop search for XP including the last 4th one. As always ended up uninstalling it, for few reasons:
First. Under XP IO is not prioritized unlike in Vista. Which causes your PC to lag like molasses. For some reason it never figures out that I'm actually doing something interactive and it does not back off with indexing.
Second. I set it up to index my team's source tree but could not find how to make it skip the content of Subversion's internal folders (they are hidden and named .svn), so any results contained a ton of garbage.
Third. Even though indexer tracks change notifications of the file system, when I'm doing search it still uses outdated indexes. Why can't it see that 10 out of several hundred thousand files were changed and instead of using indexed entries for them, it should fall back to regular 'grep' search. This is more serious issue, than it might appear at first. If search returns with empty result the conclusion user makes is that there are no files matching the query, not that there might be a part of the index missing.
Fourth. If you invest so much into indexing/search algorithms, why is it hard to add regex?
I can see that Windows Search and the "Indexer" is something that Microsoft has a lot of faith/investment in at this point so I wont waste words being overly critical. Only this, please realize that the level of acceptance of this feature hinges on the hard drive not looking like its constantly being ground up some rogue "indexer process" that I dont use very often. I really do believe that most people including myself are minimalists when it comes to their os.
I'm pretty happy with the indexer in Vista. I've enabled it only on certain directories i really need the indexer in, let it create the index and after it was done (took some hours) it never bothered me with high hdd load again.
i'm not missing the old search dialog, i usually use the search to access one of the indexed files faster by typing it into the startmenu search field. i don't understand what some people mean by not being able to search in a specific folder - of course you can, just open the folder and enter the keyword into the explorer's search field.
I agree that there should be less legacy support in windows 7. How long do you intend to drag old stuff along to the new versions of windows? Especially when those legacy hardware components don't perform well under Vista. I'm crying inside every time i hear people complaining about Vista not running smoothly on their 1.2 GHz P4 w/ 512MB ram. IMHO, there should be a period of time after which the hardware (that can't handle the OS anyway) should be marked as unsupported.