Engineering Windows 7

Welcome to our blog dedicated to the engineering of Microsoft Windows 7

Follow-up: Windows Desktop Search

Follow-up: Windows Desktop Search

  • Comments 77

The discussion and email about desktop search offered an opportunity for us to have a deeper architectural discussion about engineering Windows 7.  There were a number of comments suggesting alternate implementation methods so we thought we’d discuss another approach and the various pros and cons associated with it.  It offers a good example of the engineering balance we are striving for with Windows 7.  Chris McConnell wrote this follow-up.  --Steven (See you at the PDC in a week!)

Thanks for all the great feedback on our first blog post on Windows Desktop Search.  I’ve summarized a number of points that have been made and added some comments about the architectural choices we have made and why.

Integration with the File System

As some posters have pointed out, one possible implementation is to integrate indexing with the file system so that updating a file immediately updates the indices.  Windows Desktop Search takes a different approach.   There are two aspects of file system integration: knowing when a file changes and actually updating the indices before a file is considered “closed” and available.   On an NTFS file system, the indexer is notified whenever a file changes.   The indexer never scans the NTFS file system except during the initial index.  It is on the second point—updating the indices immediately when a file is closed that we made a different choice.  Updating immediately has the benefit that a file is not available until it is indexed, but it also comes with a number of potential disadvantages.  We chose to decouple indexing from file system operations because it allows for more flexibility while still being almost real-time.   Here are some of the benefits we see in the approach we took:

  1. Fewer resources are used.  Inverted indices are global.  An inverted index maps from a word found in a property to a list of every document that contains that word.  Indexing a single file requires updating an index for every single unique word found in the file.   A single document might then update a very large number of individual indices.  Making these changes and committing them with the same robustness found on individual files would be very expensive.  The design of the indexer allows scheduling and aggregating these changes so that much less work is done overall—that means less CPU and less disk I/O.  The system can be more robust because indexing doesn’t only happen when a file is closed—and it can even be retried if necessary.
  2. File system operations are prioritized over indexing.  Getting files robustly updated and available is necessary for applications to use them.  We don’t want to delay that availability by forcing the cost of indexing into file close operations.   Searching over files is important, but is less important than actually working with files.  We wouldn’t want applications to decide individually if the indexer should be turned on or off just because they were seeking the best performance with respect to the file system.
  3. There are lots of file types.  Microsoft supplies extractors (IFilter/IPropertyHandler) for many common file types as part of Windows.  There are many other file types as well so it is important to allow non-Microsoft developers to write their own extractors.  In Vista (and Windows 7), these extractors run in a locked down process that ensures that they are secure and do not affect the performance of the whole system.  If indexing had to happen before a file was available, then an extractor could impact (intentionally or not) all file system operations.  
  4. Some files are more valuable to index then others.  If indexing happened when a file is closed, then there is no control over the order files are indexed.  Decoupling allows prioritizing indexing some files over others.  For example, searching for music is much more likely than searching for binary files.  If both music files and binary files have changed, then the indexer ensures it indexes the music files first.  Some files are not worth indexing at all for most people.  Several comments suggested that we should index the whole drive.  We can do that—and for those who would find it valuable it easy to add folders to be indexed.  (You can also remove them, but that is much less common so that is controlled through the control panel “Indexing Options.”)  For most people indexing system files is just a cost—they would never search for them and would be confused if they showed up as the result of a search. 
  5. Not everything is a file in single file system.  Windows is all about supporting diversity.  There are many different file systems like FAT32 and CDFS and we would like to be able to search over those as well.   If we integrated with only NTFS, then we would have to still have a loosely coupled system for other file systems.  Many applications also have databases optimized for their own needs.  For example, Outlook has a database of email.  If only files were indexed, then the email in the database could not be indexed unless Outlook either compromised their experience by using files only, or complicated their implementation by duplicating everything in both the file system and the database.

Advanced Queries

A number of people expressed frustration with the lack of an advanced query UI.  Microsoft has many advanced query user-interfaces in many products, but these are generally focused on well-defined query languages (SQL) or on specific domains (like the Advanced Find in Outlook).  With Vista we wanted to address the query problem in a manner more familiar to people today—a single edit control.  Our implementation supports a rich query language within that edit control.  This is the same approach people are familiar with for web searching for both standard and advanced queries.

We had two observations that led to this approach:

  1. The most important part of a search are the search terms.  Usually a single term is enough (and as we know from web searching, the majority of searches are one or two words).   And for refinement the file system tools of thumbnails, sorting, and/or type ahead can be used to narrow the search.  
  2. It is reasonable to consider a design for an advanced query UI covering property based search, but it will generally be unwieldy for all but the bravest people.  As we mentioned, Windows Search covers over 300 properties by default so if you show every property then the UI is unusable.  If we only show the most commonly used properties then how do you handle all of the other properties?  Would properties be grouped by the common application or by attributes such as times, names, file attributes, etc.?  Some of you might value the Outlook Advanced Find… interface, but there you see some of the challenges and that is within a specific domain where the grouping or related properties probably can be understood. 

In designing Vista we incorporated the feedback that it is desirable to do precise queries.  The approach taken in Vista was to support a rich query language which allows all properties and a fairly natural syntax.  For example typing “from:gerald sent:today” will find all email from “Gerald”  sent today!   The big issue is that people do not know or the query language.  In Windows 7, we have focused on helping people see how to use the query language in context. For now, you can see the following for some information on Vista’s query syntax.  Much of this syntax and experience is similar to web search that we all use today.

A number comments were about substring matches in filenames, which we do not currently support.  This is part of the overall discussion about advanced queries.  In order to efficiently execute queries, the indexer builds indices that are based on individual words.  In Vista we introduced “searching as you type” to our search UI.  Under the hood this is implemented as prefix matches on the indexed words.  So when you type, ‘foo’, we look for all terms that start with those letters including ‘food’ and ‘football’.    Even more interesting if you type ‘foo net’ we will match on items that have the words ‘food’ and ‘network’ in them.   (If what you really want is to match the phrase “foo net” then typing those words inside quotes will do that—another example of advanced query syntax)   We have focused primarily on searching for terms found in any property, but there is no question that filenames are special.  In recognition of that we support suffix queries on filenames.  If you type ‘*food’ then we will return files that end in ‘food’ like “GoodFood”.  We do this by reversing the filename and then indexing it as a word.  For example, the reverse filename of “GoodFood” would be “DooFdooG” which we index as a word.  The suffix query ‘*food” is transformed into a prefix query “doof*” over the reverse filename index—clever, no?   So we support prefix matches for all properties and suffix matches for filenames, but we do not support substring matches. 

Performance and Citizenship

A number of comments focused on improving performance and citizenship—and we definitely agree on this input.   We are always striving to make Windows do more with fewer resources.  For those who have turned off indexing all together we hope that our continued improvements will make you reconsider.  Even if you organize all of your files and don’t find search useful for files, perhaps you will find start menu search, email search or Internet Explorer 8 address bar search useful.  We have worked hard at improving performance and citizenship across Windows.  Some of this progress is visible in WS4 and soon in Windows 7.  We have improved along all of our dimensions including indexing cost, battery life, citizenship, query speed and scrolling speed.  We have some tremendous tools that help us track down performance problems.  If you want to help, please contact idx-help@microsoft.com and we will tell you how to collect performance traces we can analyze so that we can continue to make improvements.

Chris McConnell

Find and Organize

Leave a Comment
  • Please add 6 and 8 and type the answer here:
  • Post
  • One more comment:

    I still consider Windows 2000's search to be the best. Simple, straightforward, and powerful. I never had any issue with it not finding a file I knew was there (I've come not to trust Vista's search results, which has made me less inclined to use search) and I could easily toggle the advanced options as often as I needed (regular searching by size would often free up several gigs of space on my HD and help me track down out of control log files, for example).

    I miss that experience.

  • I installed search 4 on XP after the previous post on search. I did not know about the file name search error (error in my opinion), and then it took longer to get to the old working search companion.  

    Played with it, it worked quite nice, query language logic etc., but as I am searching file names and content of recently changed file, most of the time, I uninstalled. Now the working search is back.

    "Windows is all about supporting diversity".

    I hope the old search stays an option to support guys like my who don't mind waiting a few seconds for a basic search to find sub strings in file names. 100% hit.

    Also supply enough API's so that third parties can also implement alternative "unwieldy" search GUIs.

    O, did I have trouble to find a place to configure, expected it to be available on the search interface. Later I notice yet another new icon cluttering the notification area. It is definitely not important enough to claim  space in the notification area. And there is not a "do not show in notification area" option

  • @ btriffles

    If you saw the Win 7 demo in the keynote this morning, you got a sneak peek at the new search UI. We'll cover it in more depth in future blog posts, but hopefuilly it addressed some of your questions

    1) This point is right on - which is why we provide search filters in the UI that build the query language for you.

    2) We already do prefix matching on names, and have heard the feedback on non-prefix matching.

    3) Some of these options are already available in Vista as global options. Look for the 'Organize' menu -> 'Folder and Search Options' -> Search tab. Some of these options (date/size ranges) are exposed as search filters in Windows 7.

    4) We're working on fixes for some of these problems, stay tuned. There's not much we can do to improve the experience for files that *can't* be indexed, but we're always working to improve the experience of files that should be indexed, but aren't yet.

    Thanks

    - Scott

  • Windows x64 and Network Shares.

    If it doesn't work with Windows x64 and network shares I don't care how you index anything it's effectively useless to me.

  • Missing or not demoed features on PDC.

    - Virtual desktop. Must have, idealy with diferent personalization options for each (minimum 4) desktop. I did not see this in the demo. I will not buy windows 7 without Virtual desktop. I would rather have virtual desktop than fancy multi-touch which is utterly useless on a laptop or desktop, unless you want to disturb the water in the screen saver.

    It is not the first or last time I am telling this to you. Nicely integrated Virtual desktop is one of the most important selling point for me. I dont use search a lot. I have no serious  problem managing my windows. But I do need more desktop space. I do need to organize my workspace to 3-4 different way. 1 for IM instances, 1 for development, 1 for research. 1 for personal desktop.

    Virtual desktops are integral part of every competitor OS-s. There is no reason why should W7 lack that feature.

    - Taskbar, if I can tsee the title on the taskbar  how can I select 1 from 10 IE instance? Oh yes I have to click on the icon and the click ONCE AGAIN to open a instance. OR I have to cick, scan  through  all the icon. I hope you kept the ability to display text on taskbar inted of useless icons.

    - Peer to peer networking, easy desktop sharing, desktop remoting,

  • There may be an advanced query syntax, but where is it documented other than on Technet? To use the syntax you need to be aware of it and Vista's help is very poor in this regard - it's very poor in most respects, actually.

    For anyone who wants to be more than a very basic user, Vista's documentation is massively inferior to practically any Unix out there, not to mention OS/2 - despite having 12 years of additional development since the last OS/2 client was released.

    scripting? vbscript? Not mentioned *anywhere* in the help other than one glossary item.

    Obviously the information exists (online), and is generally of a high quality, but shouldn't this sort of information be included in the operating system, or at least referred to?

  • @lyesmith

    you wont virtual desktop? were is a problem,

    This is for Vista By Microsoft (Mark Russinovich)

    http://technet.microsoft.com/en-us/sysinternals/cc817881.aspx

    try pls.

  • @ Scott

    Thanks for your response.  I watched a video of the keynote, but I didn't see much (due to little coverage and a blurry video).  I look forward to further posts on the topic.  In the meantime, here are a few quick responses:

    3)  I believe some of the options in the Search tab (file names only, include subfolders, etc.) should be local (or at least less hidden) options.  (I had forgotten about that tab.)

    4)  Even if a file can't be indexed (in terms of content), I believe that searches based on basic file system parameters (name, date, size) should never fail.  Despite its power, I trust a 2000/XP search more than a WDS search.

    Basically, I think a combination of the ease of access/dependability of a 2000/XP search and the power of a WDS search would be ideal.

    Regardless of what happens, thanks for listening.

  • Team pls improve Zoom in IE8 multitouch session,

    is not fluid (for now)

  • although late arrival.(excuse me my English)

    I saw the PDC and every video available on Channel 9 -10  you tube , NeoWIn, Long Zheng etc

    Personally when I saw Mr. Steven submit WIndows 7  I had tears in my eyes ,I am not a developers but a consumer user enthusiast. Windows for me means a lot like means as much to many people who follow this blog.

    I want to say to all of you..

    Steven Sinofsky , Microsoft, all deveolpers all team who participated in the execution of WIndows 7 (and continues to work hard )..

    You are an extraordinary TEAM!!!!!!!many many many thank's for your work.

    Microsoft continues this way!

    Thank's

    Domenico

  • @ lyesmith

    Yes, you can enable text on the Windows 7 taskbar. See http://www.istartedsomething.com/20081031/tidbits-about-the-new-superbar-taskbar/

  • More and more games are inserting their program shortcuts in the Games Explorer in Vista while specifically NOT inserting any into the Start Menu.  Believe it or not, a lot of times I hit the Windows key and start typing to make my games happen -- and games that only put launch icons in the Games Explorer don't show up in the results.

    This is irritating.

  • "More and more games are inserting their program shortcuts in the Games Explorer in Vista while specifically NOT inserting any into the Start Menu.  Believe it or not, a lot of times I hit the Windows key and start typing to make my games happen -- and games that only put launch icons in the Games Explorer don't show up in the results.

    This is irritating."

    +1

  • I have seen some initial reviews of Windows 7 M3, and one thing I'm really not happy with is the removal of the sidebar.

    Yes, I understand that it's not terribly useful on a smaller screen, but those of us with large wide screens (and multiple ones at that) have to suffer because of the lowest common denominator.

    The whole purpose of gadgets (to me) is to have them available at a glance.  Putting them on the desktop makes them far less useful because they're usually obscured by windows.  Even with a hotkey to show them, i can't just glance at them anymore, so things like CPU meters, clocks, weather, etc.. become things i have to physically do something to observe.

    I'd much prefer you make the sidebar optional, so that gadgets can still be placed on it if desired or on the desktop.  In fact, the sidebar that's not always on top doesn't take up any extra screen real estate anyways, and works exactly the same way, so I don't really understand why it's being removed.

    This is the #1 thing I hate about Dashboard on the Mac, that you need to hit a hotkey to show the gadgets.  This will be infuriating to many people.

    Please, just reconsider this.

  • Also, I saw mention of the Aero Snap feature.  I love this idea, but please make this user configurable.  I don't even care if it's just a registry tweak.  I want to be able to configure the "zones" of my monitor that it snaps to.  On a 30" monitor, i might want three vertical zones spaced evenly, or i might want quadrants or sextants.  

    This is a feature that makes large screen monitors much more useful, and if configurable extremely useful.

Page 4 of 6 (77 items) «23456