Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
The discussion and email about desktop search offered an opportunity for us to have a deeper architectural discussion about engineering Windows 7. There were a number of comments suggesting alternate implementation methods so we thought we’d discuss another approach and the various pros and cons associated with it. It offers a good example of the engineering balance we are striving for with Windows 7. Chris McConnell wrote this follow-up. --Steven (See you at the PDC in a week!)
Thanks for all the great feedback on our first blog post on Windows Desktop Search. I’ve summarized a number of points that have been made and added some comments about the architectural choices we have made and why.
As some posters have pointed out, one possible implementation is to integrate indexing with the file system so that updating a file immediately updates the indices. Windows Desktop Search takes a different approach. There are two aspects of file system integration: knowing when a file changes and actually updating the indices before a file is considered “closed” and available. On an NTFS file system, the indexer is notified whenever a file changes. The indexer never scans the NTFS file system except during the initial index. It is on the second point—updating the indices immediately when a file is closed that we made a different choice. Updating immediately has the benefit that a file is not available until it is indexed, but it also comes with a number of potential disadvantages. We chose to decouple indexing from file system operations because it allows for more flexibility while still being almost real-time. Here are some of the benefits we see in the approach we took:
A number of people expressed frustration with the lack of an advanced query UI. Microsoft has many advanced query user-interfaces in many products, but these are generally focused on well-defined query languages (SQL) or on specific domains (like the Advanced Find in Outlook). With Vista we wanted to address the query problem in a manner more familiar to people today—a single edit control. Our implementation supports a rich query language within that edit control. This is the same approach people are familiar with for web searching for both standard and advanced queries.
We had two observations that led to this approach:
In designing Vista we incorporated the feedback that it is desirable to do precise queries. The approach taken in Vista was to support a rich query language which allows all properties and a fairly natural syntax. For example typing “from:gerald sent:today” will find all email from “Gerald” sent today! The big issue is that people do not know or the query language. In Windows 7, we have focused on helping people see how to use the query language in context. For now, you can see the following for some information on Vista’s query syntax. Much of this syntax and experience is similar to web search that we all use today.
A number comments were about substring matches in filenames, which we do not currently support. This is part of the overall discussion about advanced queries. In order to efficiently execute queries, the indexer builds indices that are based on individual words. In Vista we introduced “searching as you type” to our search UI. Under the hood this is implemented as prefix matches on the indexed words. So when you type, ‘foo’, we look for all terms that start with those letters including ‘food’ and ‘football’. Even more interesting if you type ‘foo net’ we will match on items that have the words ‘food’ and ‘network’ in them. (If what you really want is to match the phrase “foo net” then typing those words inside quotes will do that—another example of advanced query syntax) We have focused primarily on searching for terms found in any property, but there is no question that filenames are special. In recognition of that we support suffix queries on filenames. If you type ‘*food’ then we will return files that end in ‘food’ like “GoodFood”. We do this by reversing the filename and then indexing it as a word. For example, the reverse filename of “GoodFood” would be “DooFdooG” which we index as a word. The suffix query ‘*food” is transformed into a prefix query “doof*” over the reverse filename index—clever, no? So we support prefix matches for all properties and suffix matches for filenames, but we do not support substring matches.
A number of comments focused on improving performance and citizenship—and we definitely agree on this input. We are always striving to make Windows do more with fewer resources. For those who have turned off indexing all together we hope that our continued improvements will make you reconsider. Even if you organize all of your files and don’t find search useful for files, perhaps you will find start menu search, email search or Internet Explorer 8 address bar search useful. We have worked hard at improving performance and citizenship across Windows. Some of this progress is visible in WS4 and soon in Windows 7. We have improved along all of our dimensions including indexing cost, battery life, citizenship, query speed and scrolling speed. We have some tremendous tools that help us track down performance problems. If you want to help, please contact email@example.com and we will tell you how to collect performance traces we can analyze so that we can continue to make improvements.
Find and Organize
whatever happened to WinFS and Cairo...all this stuff can be easily managed if they had a dumbed down multi-threaded sql db running in the background because all the metadata can be easily managed, customized, re-located,indexed and backed up. I'm thinking of writing a poem on it...
RE: Partial matching
At least support word segmentation!
I have "OpenPandora" on my start menu. When I want to start it, I hit the windows key and type "pan" and it doesn't find it. I have to type "OpenP" (As I have more than one "Open*" item on my menu). Please use a dictionary and analysis of camel/pascal casing, underscores, etc. to segment and index based on that. It would alleviate 95% of all complaints about partial matches.
Side note: I really like that this blog exists and there have been some great posts, but many posts still really reak of self-justification instead of an open dialog.
I stopped using search in Windows when I no longer got any hits while searching for files containing specified text. First I could not understand why I got not hits. I know there was files that contained the text I was looking for. Then I found out that Microsoft had changed how search worked. It no longer searched all files, just some type of files. I use search a lot for finding code to reuse, or to find where an error occurs by searching for the error message. So now I almost only use UltraEdit to search for files. I only use Windows search for finding files when I know parts of the file name.
While reading this post, about how Windows search works now. I see that it is even more crippled. Not being able to search for only files with a specific file extension? Not being able to search for files with a specific word in the file name? I guess I have to keep using UltraEdit.
Your scenario isn't what Chris was talking about. Your file has spaces between the words, so each word will be indexed separately, and you will be able to find that song by searching for "looking for" or whatever substring of the file name you want. As long as you aren't searching for *part* of a *word* (like "ooking or").
That's exactly how WDS / Windows Search works and has always worked :)
The index is (is built on) a database, running on a very robust database engine.
You can ABSOLUTELY search for files with a specific extension. The extension is indexed separately from the filename itself. You can search for part of a file name if it is:
1) The beginning of the name, ie. "she" will find "shell32.dll"
2) The end of the name "*ll32" will also work.
3) Part of a word separated by a space or other breaking character. So if I have a file named "Foo Bar-Something.txt" I can find it by searching for "bar" or for "Something"
You can just search for .dll, *.dll, ext:dll, etc.
I promise no "annoying dog". Jon and I both can claim some level of responsibility for Clippy and as we've said we like to think of our team as a "learning organization" :-)
For Consumer Enthusiast (like me)
Update in real time !
Mr Steven we're ready :D
We can always enter cmd at c:\users and type *vori* /s >vori.txt to get all files and folders under c:\users that contain "vori", like favorites printed on file vori.txt.
At least with Vista...
We might also create batch files with %1 to search any substring this way.
Is it so difficult for search to support the old dir functionality or is there another reason we don't get this support?
With vista we can search substrings.
Are we going to have the same functionality with W7?
The instant start search.
Most useful feature for time saving.
Least used by the people.
For some reason everyone I show, says its amazing, but they never remember to use it, stuck in their old ways
Who ever knew that query syntax existed? Here's your problem Microsoft: you've got way too much hidden stuff like this that can't be discovered and that there's no help or tutorial on from within the product. Sticking this on the MS Developers Network isn't of any use to end users, be they novices or power users. You've got the same problems with Office now as well... functionality there is so well hidden, its completely lost.
"For those who have turned off indexing all together we hope that our continued improvements will make you reconsider. Even if you organize all of your files and don’t find search useful for files, perhaps you will find start menu search, email search or Internet Explorer 8 address bar search useful."
I've never searched files by their content except source files. And I currently have hundred thousands of source files in my local repository. Do you really think Desktop Search is the right tool to search within source files? When I sync to the (head) revision thousands of files gets updated. Is Desktop Search the right tool for that?
So I won't consider using Desktop Search at all. I also think that the indexing database is a security risk because malicious software can misuse the collected information and can hide behind the indexing service.
And regarding address bar search, I hate the "Awesome bar" of Firefox. I'd almost dropped FX because of that if there wouldn't have been Add-Ons to get the old behaviour back. How can you change a core feature without having a setting to get the old behaviour back? Horrible project management.
"Who ever knew that query syntax existed? Here's your problem Microsoft: you've got way too much hidden stuff like this that can't be discovered and that there's no help or tutorial on from within the product. -- burgesjl"
A clarion call to improve your help within Windows! Usually, all that is in any Help system is the information a beginner needs to get started with Windows (or that other program). Turning on the internet-enabled Extended Help doesn't help for most of the queries I've typed in.
It seems that searching with Google on the forums is the only way to go (and you still have to wade through all of the non-essential posts to *finally* find what you're looking for).
*Reply do Andre
How can you hate "Awesome bar" of Firefox 3? It shows exactly everything similar that has those letters you searched for (looked up from your history).
While this topic covers technical aspects of the search, there other aspects of the search that concern me:
1) How does the Desktop Search affect system power management (essential for laptops)? Is power consumption increased when indexing is turned on? It seems logical to assume that: storage media is accessed to read file contents, more processing power and more memory is needed to perform indexing, storage space and again storage media access is required to store the results.
2) Security and privacy concerns. Can results of indexing contain data that was extracted from confidential documents, e-mails, etc.? Can indexing results get hijacked and transferred over the Internet to an unknown entity? Search engines would be happy to get their hands on the collected statistics.
3) Many programs already have built-in functionality to index (media) files and maintain media libraries. Wouldn't it be better to create similar libraries and only index folders or partitions that user wants to index? Selecting a specific catalog(library) during a search would narrow down search criteria, and searches can be performed much faster, giving more control to the user. I also strongly believe that searches should be more application specific, e.g File Search, Media Search, Document (Office) Search.
@burgesjl I agree with your comment. User needs hints on how to use the advanced query language, at least. Alternative screen(s) that would allow more specific search criteria would be essential for a basic user. Not everyone is that advanced. It would also simplify life of the support personnel.
Bottom line, WDS seems a gray area to me with questionable intentions.