Engineering Windows 7

Welcome to our blog dedicated to the engineering of Microsoft Windows 7

Follow-up: Windows Desktop Search

Follow-up: Windows Desktop Search

  • Comments 77

The discussion and email about desktop search offered an opportunity for us to have a deeper architectural discussion about engineering Windows 7.  There were a number of comments suggesting alternate implementation methods so we thought we’d discuss another approach and the various pros and cons associated with it.  It offers a good example of the engineering balance we are striving for with Windows 7.  Chris McConnell wrote this follow-up.  --Steven (See you at the PDC in a week!)

Thanks for all the great feedback on our first blog post on Windows Desktop Search.  I’ve summarized a number of points that have been made and added some comments about the architectural choices we have made and why.

Integration with the File System

As some posters have pointed out, one possible implementation is to integrate indexing with the file system so that updating a file immediately updates the indices.  Windows Desktop Search takes a different approach.   There are two aspects of file system integration: knowing when a file changes and actually updating the indices before a file is considered “closed” and available.   On an NTFS file system, the indexer is notified whenever a file changes.   The indexer never scans the NTFS file system except during the initial index.  It is on the second point—updating the indices immediately when a file is closed that we made a different choice.  Updating immediately has the benefit that a file is not available until it is indexed, but it also comes with a number of potential disadvantages.  We chose to decouple indexing from file system operations because it allows for more flexibility while still being almost real-time.   Here are some of the benefits we see in the approach we took:

  1. Fewer resources are used.  Inverted indices are global.  An inverted index maps from a word found in a property to a list of every document that contains that word.  Indexing a single file requires updating an index for every single unique word found in the file.   A single document might then update a very large number of individual indices.  Making these changes and committing them with the same robustness found on individual files would be very expensive.  The design of the indexer allows scheduling and aggregating these changes so that much less work is done overall—that means less CPU and less disk I/O.  The system can be more robust because indexing doesn’t only happen when a file is closed—and it can even be retried if necessary.
  2. File system operations are prioritized over indexing.  Getting files robustly updated and available is necessary for applications to use them.  We don’t want to delay that availability by forcing the cost of indexing into file close operations.   Searching over files is important, but is less important than actually working with files.  We wouldn’t want applications to decide individually if the indexer should be turned on or off just because they were seeking the best performance with respect to the file system.
  3. There are lots of file types.  Microsoft supplies extractors (IFilter/IPropertyHandler) for many common file types as part of Windows.  There are many other file types as well so it is important to allow non-Microsoft developers to write their own extractors.  In Vista (and Windows 7), these extractors run in a locked down process that ensures that they are secure and do not affect the performance of the whole system.  If indexing had to happen before a file was available, then an extractor could impact (intentionally or not) all file system operations.  
  4. Some files are more valuable to index then others.  If indexing happened when a file is closed, then there is no control over the order files are indexed.  Decoupling allows prioritizing indexing some files over others.  For example, searching for music is much more likely than searching for binary files.  If both music files and binary files have changed, then the indexer ensures it indexes the music files first.  Some files are not worth indexing at all for most people.  Several comments suggested that we should index the whole drive.  We can do that—and for those who would find it valuable it easy to add folders to be indexed.  (You can also remove them, but that is much less common so that is controlled through the control panel “Indexing Options.”)  For most people indexing system files is just a cost—they would never search for them and would be confused if they showed up as the result of a search. 
  5. Not everything is a file in single file system.  Windows is all about supporting diversity.  There are many different file systems like FAT32 and CDFS and we would like to be able to search over those as well.   If we integrated with only NTFS, then we would have to still have a loosely coupled system for other file systems.  Many applications also have databases optimized for their own needs.  For example, Outlook has a database of email.  If only files were indexed, then the email in the database could not be indexed unless Outlook either compromised their experience by using files only, or complicated their implementation by duplicating everything in both the file system and the database.

Advanced Queries

A number of people expressed frustration with the lack of an advanced query UI.  Microsoft has many advanced query user-interfaces in many products, but these are generally focused on well-defined query languages (SQL) or on specific domains (like the Advanced Find in Outlook).  With Vista we wanted to address the query problem in a manner more familiar to people today—a single edit control.  Our implementation supports a rich query language within that edit control.  This is the same approach people are familiar with for web searching for both standard and advanced queries.

We had two observations that led to this approach:

  1. The most important part of a search are the search terms.  Usually a single term is enough (and as we know from web searching, the majority of searches are one or two words).   And for refinement the file system tools of thumbnails, sorting, and/or type ahead can be used to narrow the search.  
  2. It is reasonable to consider a design for an advanced query UI covering property based search, but it will generally be unwieldy for all but the bravest people.  As we mentioned, Windows Search covers over 300 properties by default so if you show every property then the UI is unusable.  If we only show the most commonly used properties then how do you handle all of the other properties?  Would properties be grouped by the common application or by attributes such as times, names, file attributes, etc.?  Some of you might value the Outlook Advanced Find… interface, but there you see some of the challenges and that is within a specific domain where the grouping or related properties probably can be understood. 

In designing Vista we incorporated the feedback that it is desirable to do precise queries.  The approach taken in Vista was to support a rich query language which allows all properties and a fairly natural syntax.  For example typing “from:gerald sent:today” will find all email from “Gerald”  sent today!   The big issue is that people do not know or the query language.  In Windows 7, we have focused on helping people see how to use the query language in context. For now, you can see the following for some information on Vista’s query syntax.  Much of this syntax and experience is similar to web search that we all use today.

A number comments were about substring matches in filenames, which we do not currently support.  This is part of the overall discussion about advanced queries.  In order to efficiently execute queries, the indexer builds indices that are based on individual words.  In Vista we introduced “searching as you type” to our search UI.  Under the hood this is implemented as prefix matches on the indexed words.  So when you type, ‘foo’, we look for all terms that start with those letters including ‘food’ and ‘football’.    Even more interesting if you type ‘foo net’ we will match on items that have the words ‘food’ and ‘network’ in them.   (If what you really want is to match the phrase “foo net” then typing those words inside quotes will do that—another example of advanced query syntax)   We have focused primarily on searching for terms found in any property, but there is no question that filenames are special.  In recognition of that we support suffix queries on filenames.  If you type ‘*food’ then we will return files that end in ‘food’ like “GoodFood”.  We do this by reversing the filename and then indexing it as a word.  For example, the reverse filename of “GoodFood” would be “DooFdooG” which we index as a word.  The suffix query ‘*food” is transformed into a prefix query “doof*” over the reverse filename index—clever, no?   So we support prefix matches for all properties and suffix matches for filenames, but we do not support substring matches. 

Performance and Citizenship

A number of comments focused on improving performance and citizenship—and we definitely agree on this input.   We are always striving to make Windows do more with fewer resources.  For those who have turned off indexing all together we hope that our continued improvements will make you reconsider.  Even if you organize all of your files and don’t find search useful for files, perhaps you will find start menu search, email search or Internet Explorer 8 address bar search useful.  We have worked hard at improving performance and citizenship across Windows.  Some of this progress is visible in WS4 and soon in Windows 7.  We have improved along all of our dimensions including indexing cost, battery life, citizenship, query speed and scrolling speed.  We have some tremendous tools that help us track down performance problems.  If you want to help, please contact idx-help@microsoft.com and we will tell you how to collect performance traces we can analyze so that we can continue to make improvements.

Chris McConnell

Find and Organize

Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
  • @Prixsel: The "Awesome bar" might be awesome for people who can't remember URLs. When I type in "win" I only want to see those sites which URL begins with win, but not the "Engineering WINdows 7" blog or "Hamilton WINs grand prix". I also have hundreds of bookmarks which I don't want to see in the search results.

    The fact that I can't switch back to the old behaviour just shows immature project management, even if people just want the old behaviour because they are used to it.

  • @har0ld: But but but.. I want Bob and his friends (Rover and Clippy) back :-<

  • I use Vista for two years, I thought I knew search but I always find new ways to do my job.

    This time I found I can install indexing service compatibility mode and I can also change whether to search subfolders or not, files only and/or contents.

    I new how to change those things but I never noticed what I could change.

    I suppose I know nothing, even though I thought I knew Vista.

  • you should *never* have to search the start menu just to find and launch a program. if it is in fact faster to do this than drill down through endless sub-menus, you have designed the start menu incorrectly. get away from vista - make things simpler

    as things stand, for me, the start menu has gotten way out of control. rather than adding yet another layer of pointless complexity please redesign it from the ground up so that everyone can find the program they want to launch as fast as possible. yes, even my 65 year old mother (whenever i ask her over the phone to launch a certain application it takes her about 2 minutes to find it as things stand)

  • I wanted to follow up on a couple of points by Ruslan and Andre:

    - We do various things in the search service to control power consumption on battery power. In our tests we have not found indexing to be a significant drain on battery power.

    - The data files containing the index are stored with restricted permissions so non-admin users do not have either read or write access. No index data is transported off the machine to other servers; and two different users on the same machine will not be able to see search results from each other's data, unless they already have read access to that data. So I don't think you need to have any security or privacy concerns from using the indexer.

    - Regarding different locations, the user does have control of what locations are indexed today through the Indexing Options control panel. And searches can be restricted to certain kinds of item {for example with the "kind:email" query syntax, or with the quick filter buttons at the top of the search pane in Vista}. So some of what you are suggesting is already possible, although I agree this is an area where more is possible.

    - Andre, lots of people are using Windows Search to index large amounts of source code, and in general I believe this works well. Indexing a couple of thousand files after you sync doesn't take very long. I can't guarantee this will work for your scenario but I wouldn't dismiss it out of hand. And regarding security, again only processes running as administrator have access to the index.

  • This is the second post where I've made comments and they never show up. I'm beginning to believe it's because I used IE8 beta 2, and there is some bug devouring my comments.

    It seems this post is full of cool stuff I already have in Vista/WS4. What do we have to look forward to in 7? What's your overall vision of the future?

    Here's my little wishlist, as usual:

    i. Please work on finding a way to do substring search. Not being able to type 'Note' and find 'OneNote', for example, is a bit discouraging.

    ii. Please make it easier/more obvious to create a new Search Folder with the Advanced Search UI. The only time I see a 'New Search' button is when I'm already searching.

    iii. This is more Organize than Find -- I would love a way to tag or add metadata to files that are not Office documents, or photos, etc. If you implement a way to add info to any filetype, I understand that it would not be portable or embedded in the file, but it would help oh so much. I would love to organize and group files by Categories and Tags, but it doesn't even work for txt or pdfs -- only for Microsoft files. Please help.

    iv. I think it would be awesome if the Start Menu UI could give us visual results like IE8 does, but for our images, documents, etc -- a little but of Start++ functionality. I also think it would be awesome if searches didn't need to be fixed at the corner of my screen and could be near my mouse or more like Launchy... but that's probably for a different team.

  • Why not work on consistancy in how the search works?

    Searching my start menu can frequently fail - instead of turning up items that have previously shown up (ie: start menu items that START in the word I am searching for), instead I get items that have nothing whatsoever to do with the search term.

    Honestly, I find the new search to be over reaching and the end result is a highly unstable and unfathomable result with inconsistant behaviour.

    Best examples:

    1. Having to search for my saved searches.  You sure can't access them from the search page directly.  What was the point in saving them again?  Maybe I should save a search that shows where the searches are - oh wait...

    2. When you search in the start menu, it shows start menu items.  When you search in a folder, it shows folder items and subfolders.  

    Now try to search in control panel.  Drilling down has zero effect on scope.

    When you do get to something in the control panel that looks like a list (ie: the uninstall software list), it still searches the control panel applets, and not the list - which would be the more natural expectation from how search works everywhere else.

    It would be nice if search worked the same everywhere it was exposed and most importantly, it would be nice if it actually worked realiably, and finally it would be nice if features like saved searches were not so totally pointless.

    Oh, and if someone could provide a simple, well explained rational to upgrade to search 4 - that would be nice.  Other  I would rather not risk the upgrade hazard and inevitable disruptive reboot (I remember when they told us Vista would need less reboots on updates.... they lied, but that is a different point entirely.)

  • Hmm, I am rather organized I think, hardly ever use the Search, don't put too much effort in it ;-)

  • I run Desktop Search 4 on my XP laptop.  It really improved my search speed within outlook.  But outside of Outlook, my experience is really terrible.  The search interface is poor, and the relevancy of the hits that I get is poor as well.  

    Examples:

    1) I have a file on my desktop called ECS 260 Template Metaprogramming.  I tried searching for this using the word template.  I got a large number of results, none of which were this file.

    2) Ok, so maybe there are lots of files containing the word "template".  How about I try the phrase "ECS 260"?  There are only a few file on my system that contain that phrase, and I happen to know which ones they are.  Windows Desktop Search 4 returned 8 or 9 files.  None are that file, or one of the other files on my desktop with that phrase in the title.  In fact, none are in any way related to my search.  (I did get a bookmark for the old alt.2600 newsgroup, which does contain 260 as a substring.)  But these results were not even close!

    3) Ok, how about I place "ecs 260" in quotes?  Maybe that will force it to do a string match for the whole phrase.  It finds...no files at all!

    4) Ok, how about I search for the word "report"?  There is at least one file on my desktop that contains that word in its title.  Does it find it?  Nope.  But it does give me 460 matches of various files that contain the word "report" in there somewhere.  Probably a hundred e-mails and many other files.  None actually are the files on my desktop that contain that word.

    5) Ok, how about I give it a really big hint?  I set the folder to Desktop, and tell it to search for the word "report".  It finds lots of files with the word "report" in the text somewhere in subfolders, but none of the files on my desktop, which have report in their name!

    6) Just to see, I try clicking on one of the files it DID return.  It tells me its location, then says that the file can't be found.  Same with a bunch of other files.  So in addition to returning wrong results, it returns outdated results too!

    So we have several problems here:

    1) Search results are not relevant to my search

    2) Search results are outdated

    3) Search results include any file that includes the search term(s) anywhere, and appears to be performing an "or" search rather than an "and" search, meaning that two terms make the results much more general.

    4) Search results do not include files that contain the search term in the filename

    5) The search interface is poor

    6) Search results are not ordered in any particularly meaningful way.

    This is much like using a search engine from the very earliest days of the web.  I think you can do better...

  • ak47wong, to search for files with a specific rating, use either asterisks to represent the number of stars, or <number> stars.

    Example:

    To find files with a rating of 3 stars:

    rating:***

    or

    rating:3 stars

    Greater than or equal to 3 stars:

    rating:>=***

    or

    rating:>=3 stars

    Greater than 3 stars:

    rating:>3 stars

    rating:>***

    etc.

    For unrated files, use:

    rating:unrated

    Regarding some of the comments about remembering the properties -- it isn't necessary to remember them. Right-click a file and click properties to see the properties that apply to files of that type. Also, any property can be added as a column in an Explorer window (right-click the column header and click More... for the full property listing). Once added, you can use the column's drop-down menu to sort, stack, filter, or group by those properties, or right-click in the whitespace of the window and use the context-menu to do the same. Further, you can use whatever the property is named in the property list or column header as the keyword for an advanced query. For, example, Date modified and Date created are two property names. You can use date modified:, date created:, or just created: or modified: and some value, for the same respective results. You can also use just date: and some value which will return both files modified or created within a given date range.

  • In your previous follow-ups, you guys generally acknowledged the common issues from the comments as areas of potential improvement.  This time, it seems to me like you dismissed all comments on the perceived faults of the advanced search UI.  I'm sure this was not intentional, but it is disheartening for those looking for improvements in Windows 7.

    Here is a review of some common complaints that I do not feel were adequately addressed:

    1) The query language is not a replacement for a good advanced UI.  Users shouldn't have to learn a new language to perform anything more than basic searches.  (However, I do appreciate the effort that went into the new system.)

    2) As you admit, file names *are* special.  We should be able to search for substrings, even if this requires a few changes to the WDS.  Substring matches should also be considered for all fields, although I understand if this would require too much overhead.

    3) Options like "search only in file names", "search in only in content", and "do not search subfolders" should be available.  It should be easy to search for ranges of dates, sizes, etc.  These options apply to *all* file types and thus should be easily accessible in the advanced UI.

    4) Searching for non-indexed items is especially painful.  The process is slow and the progress is not adequately displayed for the user.  Furthermore, if I search files/folders that are not indexed, a non-indexed search should automatically be initiated.  It is incredibly frustrating to look for something I know is there only to read "no files found".  A slower search (with clear progress information) is always preferable to a failed search.

    Despite my complaints, I do appreciate your communication through this blog.

  • In addition to the common issues mentioned in my previous post, I also have some personal opinions on things that could be improved.

    A) I would love to see a spelling correction/suggestion feature like Google's.

    B) I should have an option to prevent searching in archives.

    C) The Vista advanced search UI requires too much clicking and doesn't support enough typing.  I would prefer something in the spirit Windows 2000's advanced search UI where all the common options were immediately available and editable.

    -Searching dates ranges, restricting search locations, etc. takes more clicks in Vista

    -I should be able to *type* paths into the "Location" box

    D) Please allow setting the default search options (e.g. always show advanced UI, include hidden/system files, do not search in archives, etc.).

    E) Restore the Search option in context menus and the Start menu.  I understand that you were sued, but there is no reason why you can't leave these options but make them launch the user's preferred search provider.

  • I would like to see a UI where I have the following options;

    Search for: [Query string]

    Search in: [Location]

    Include archives: [x]

    Include subdirectories: [x]

    Search in metadata: [Meta data]

    Search in content: [x]

     on closed panel

       Whole words only: [x]

       Case sensitive: [x]

       Find files NOT containing: [x]

       ASCII:[x]

       Unicode: [x]

       Hex:[x]

       RegEx:[x]

       [x] Date between [date][date]

       [x] File size [<=>] [kb]

    !   Find duplicates [x]

           same name, same size, same content

    =====================================

    In Expoler I would like to see a split screen mode where you can execute 1 key pres copy etc. (Thats right Norton (total) commander style window). Hot keys for everything. You should minimize the number when someone has to switch from keyboard to mouse and back.

    File compare option is much needed. Where you can compare the the content of two file and see the difference on text, or binary data.

    Also folder synchronization.

  • Great post, the link to the query syntax was informative!

    Is it possible to index files with an unknown or random extension with desktop search?  Right now it just seems to ignore those files, which makes Desktop Search feel unreliable.

    People say "search my harddisk" and instead it searches a unpredictable subset of your harddisk, excluding files like "readme.now".

  • I raise the concerns of being unable to search for all file names. In fact, my mind wonders how the same engineers that design and build windows7 (as those are, probably, savvy computer geeks) do not miss those functionalities.

    A very good email search seems to be the main thrust for this development but I reckon it as an blinded walk against web based email apps. Unfortunately this seems to be more important than the old an reliable file search. Changing this (by breaking and not extending it) seems to be a big "no no" in usability.

    New search tools and functionalities should not break the most basic thing: find files.

    Please don't get me wrong. I look forward for new search. Especially with very large hard drives and a large number of files, file searching need to be revised. But this should be made in an extended way without breaking what the user expects. I would really like to see RegEx in searching (something like grep, with the possibility of refining searches), but I know this is impossible as it requires much more from the end user, that only wants one thing: put text into search box get what he expects.

    But I leave one suggestion: make something inteligent in the way search results are displayed (and not in FIFO way).

    Finally, a question: From what exactly does w7 differs from vista? I'm afraid that the new features can be mistaken from a new Vista SP. A post with those differences would help users to understand better what is w7.

Page 3 of 6 (77 items) 12345»