Base types, Collections, Diagnostics, IO, RegEx…
Updated 6/10/08 2:20pm: clarified details of proposed solution
Here it is, Part 3 of the long path series, which started over a year ago. I apologize for leaving you hanging; the BCL team has been busy lighting up the web. Because of the delay, I’ll summarize the compatibility concerns as context for the proposed solution.
Recall from Part 1 that one way to bypass the MAX_PATH limit with Win32 File APIs is to prepend \\?\ to the file name. This allows you to create paths that are only subject to NTFS restrictions, and the length limit is 32K. However, \\?\ has another side-effect -- it bypasses all Win32 file name canonicalization.
BCL gets a lot of requests for long path support. Some specifically request that we allow the \\?\ prefix. This brings up the question: is \\?\ requested because it allows longer paths, or do users also want to create paths that don’t conform to Win32 naming conventions? Our investigation indicates that, while there are specialized areas where non-canonical file names are useful, the overwhelming majority of users just want longer paths.
Why is this distinction important? If you just want longer paths, you don’t necessarily want the side effect of turning off all Win32 file naming conventions. For example:
File names like this are problematic for other apps (independent of the path length: using \\?\ you can create a file name shorter than MAX_PATH that doesn’t adhere to Win32 naming conventions), to the extent that (I expect) a great majority of users will want the framework to enforce canonicalization, at least as the default behavior.
Note that the above statements are a commentary on unbridled use of \\?\. The problem could still be resolved as follows: behind the scenes we first canonicalize the path using GetFullPathName (since GetFullPathName isn’t subject to the MAX_PATH restriction) and then prefix\\?\. Perhaps non-canonical names could be allowed on an opt-in basis.
Either way, suppose .NET lets you create paths up to 32K in length. Now you have a new problem: you have a file that, most likely, no other app on your system can use. It would have to support the \\?\ syntax. Furthermore, many .NET APIs won’t even be able to work with this file: recall from Part 1 that this syntax only works with the Win32 file-related APIs, but not for general Win32 functions that accept paths (e.g. LoadLibrary).
This blog series has focused fairly heavily on nuances of the \\?\ prefix, simply because it’s commonly viewed as the workaround to the MAX_PATH limitation. Let’s switch focus to some reasonable goals in the absence of a unified solution (exposed, for example, by Win32 APIs).
Because of the compat concerns with Goal 2, we don’t want users to “accidentally” use this solution.
Fortunately, the Vista shell has provided a precedent of allowing longer path names in a compatible way. It’s called auto-path shrinking and it attempts to squeeze a file name into MAX_PATH by shrinking the long file names into the short name equivalents piecewise behind the scenes. Before describing that, note that the proposed solution is a hybrid approach:
1. Try to squeeze the file name into MAX_PATH characters using auto-path shrinking. Only used for existing files, and paths that don't have the file:///?\ prefix (see below)
2. Allow use of the file:///?\ prefix for creating as well as opening (in general allow this for every operation corresponding to a Win32 file API that supports this). We will not attempt to add the file:///?\ prefix behind the scenes; at most we'll provide a helper to perform such as AddLongPathPrefix. In any case, the user must intentionally request this and not stumble into using file:///?\ by accident. This part is TBD: we think it makes sense to expose as an option whether we should always enforce other Win32 file name restrictions other than length, and enforcing file naming rules would be the default.
Let's describe auto-path shrinking a bit more. If you pass in a file name that exceeds MAX_PATH:
It will try to shrink it under the MAX_PATH limit by using the short name equivalents:
This solution may seem odd at first (beyond the ironic spin that we’re coming full circle to short file names). But it’s very compelling for adoption in the framework since paths of this form are acceptable to Win32 API (it’s a valid Win32 file name).
Some important clarifications:
* This brings up two questions. One is that users can turn off short file name generation via a registry value. This is discussed below. Also, you’ll notice this solution is NT-focused, but Silverlight can run on Macs. We also intend to handle platform-specific path limits with long path efforts, instead of enforcing Windows MAX_PATH (as we do currently).
Allowing use of file:///?\ will likely require a permission demand greater than FileIOPermission, perhaps even full demand for full trust. However, for many apps that need to work with long paths, this isn't a problem. We should investigate ways to relax this demand for partial trust scenarios like isolated storage.
Let’s look at some pros and cons of auto-shrinking:
We’re curious to hear your feedback about this approach.
PingBack from http://blogs.msdn.com/bclteam/archive/2007/03/26/long-paths-in-net-part-2-of-3-long-path-workarounds-kim-hamilton.aspx
Hmm, using short file names sounds more like a hack than a real solution.
For one thing short file names exist in Window just for the sake of compatibility. Using what is basically an obsolete feature of Windows to "fix" .NET which is supposed to be "the future" sounds backwards to me.
Second don't be so quick to assume that turning off short filename generation is uncommon. Generation of such names is known to slow things down so don't be too surprised if people turn it off. Even if they don't know how to do it some enterprising person may include this feature in an "optimization" tool and spread it across. This means that .NET developers won't be able to relay on this "fix" working on every computer.
Here's a case where people had to turn short name generation off because of performance issues: http://channel9.msdn.com/forums/TechOff/407930-SQL-Server-2008-CTP-FILESTREAM--NTFS-limitations/
If .NET framework is to fix this somehow I think it is better to simply allow \\?\ (even if this will need full trust to work). This will put .NET framework on par with Win32 and who knows, maybe one day Win32 will increase the limit and limit the need for \\?\.
Before you start throwing around the word "hack" you might re-read and notice that I also proposed we support \\?\. :) However, we would not prepend this automatically behind the scenes and it would require some (tbd) heavier demand than FileIOPermissions.
The auto-shrinking case is what we could attempt behind the scenes to soften the impact for the "slightly above" cases, where users don't necessarily want to get into the world of \\?\.
Put me down as a strong "no" for this approach.
The fact that Windows allows users to turn off short file names should be reason enough. I might be wrong, but short file names were basically a kludge added for backwards-compatibility in Windows 95. It's even further moved into the "compatibility" realm when dealing with Windows NT based operating systems.
One thing I've been very happy about over the years is that short filenames are effectively "dead" - nothing uses them by choice any longer.
Adding it to .NET to support a fringe case (path lenghts > MAX_PATH) will effectively make it a "core case" - something that all applications will start using. Effectively, you will be reintroducing the "hell" (no offense) of short paths to the world once again.
I've seen existing bugs in .NET (can't remember which classes) where it "faked" short paths by truncating then concatenating ~1 to the end - regardless of what the underlying filesystem will do. You have no guaruntee before you touch the NTFS filesystem that it *really* will turn into ~1 in the Win95 days files with the same beginning would become ~2, ~3, etc. This is the deal breaker - you cannot guaruntee compatibility as long as this is true. If I have a directory called "Program Files" and another called "Program Data", your algorithm will specify the wrong one! You will not know if the directory is PROGRA~1 or PROGRA~2 before you make the API call unless you make other I/O requests to the NTFS / storage system.
As stated in your post - the BCL is effectively limited by what the Win32 API has done (inconsistent \\?\ support). Why is this any different than what Win32 API developers face? They have the same issues - some support long paths, some do not.
I'd rather see a "modern" solution that follows the recommended practice set out by the Operating System (Kernel) guys. That practice is to use \\?\, not introduce kludges that are only there for backwards compatibility with DOS.
"Shrinking" paths will be a *LOT* scarier (conceptually) for a developer than having to know which paths are supported in which context.
It still doesn't fully work around the MAX_PATH issue either - the developers who are asking for a workaround for MAX_PATH are envisioning \\?\, which is not the same as using short filenames to have "a bit more room". They want the full meal deal, not a faked version.
And if you're not satisfying the core audience who are requesting this feature, why would you introduce a "DOS" kludge into .NET? It just doesn't make sense.
Hmm, these comments are strange given that I propose explicit use of \\?\ should be allowed. Maybe people are skimming and not noticing? In any case, to avoid having to repeatedly say "read the blog" I'll make some updates to ensure that's clear.
As an example to my previous post, try the following on Windows Vista:
* Open Command Prompt
* Change directory to C:\
* Type "dir /a /x"
Notice the short filenames displayed? On my machine, "ProgramData" is "PROGRA~2".
With the proposed algorithm, the file:
"C:\ProgramData\<really long path>\MyFile.txt"
Would be translated into:
"C:\PROGRA~1\<shrunk path components>\MyFile.txt"
However, you now have non-deterministic behavior. The file will actually be saved as:
"C:\Program Files\<really long path>\MyFile.txt", which is not what the developer intended.
That said, perhaps additional API calls will be used to perform the actual translation and "lookup" pre-existing files.
However, wouldn't the proposed behavior still be non-deterministic for files being created?
The solution using short names looks messy, and IMHO it might look safer in the short term, but it has a lot of potential for creating compatibility nightmares in the long term. It makes path canonicalization even more complicated, it would have a cost in performances by requiring disk access or an in-memory cache for short names lookups, whether path are too long would depend on the length of usually hidden short names, and it will often end up in displayed paths and a poorer user experience.
I agree with Mike's description: it's a hack. Granted, it's an attractive one, which would work in the short term, but you will end up paying the "long paths tax" later, and it will be more expensive.
My advice would be to start supporting long path using \\?\ and optional canonicalization in .Net, and push for legacy applications to be upgraded (starting by Windows components and APIs). It will take somewhat longer, but you will end up with less clutter in your code base, and less nightmares with unintuitive behaviors in applications. Moreover, it would not be that bad: long paths are still uncommon, not a critical feature, and once the basics tools (especially Windows Explorer) support them, there will be a clear reason for third parties to start supporting them.
Do not forget, too, that not supporting long paths often mean static buffers, potential overflows truncations or error cases, and thus an area where security vulnerabilities could creep in.
Whether you allow \\?\ in parallel with path shrinking is mostly irrelevant. You would still be introducing more complex path handling rules, relying on short names, and add a lot of potential complexity in .Net's behaviour. The drawbacks are still there.
Moreover, if you give a simple, partial solution and a solid, harder to use one, most programmers will choose the first. The option to explicitly use of \\?\ would be mostly ignored. On the other hand, if you focus on using \\?\ behind the scene, you could try to create an easy to use, cleaner API, and try to hind the quirks of the underlying Win32 calls.
Note that I rearranged the post to highlight that \\?\ is indeed part of the solution and that auto-shrinking helps "make it work" for opening existing files when \\?\ is not used.
I have to agree that my mention of \\?\ support was rather buried before -- basically it was the closing sentence of the "solution" part. I guess I can't expect everyone to pore over every sentence. Hope it's clearer now. :)
Please be aware that 8.3 filename generation has to be disabled for any application that creates and reads files rapidly, and has more than a few thousand files per directory. Otherwise the performance is terrible -- there is a KB about it which I can't find right now.
So disabling 8.3 generation is much more common than you might think.
IMO you should prepend \\?\ automatically for all .NET APIs that support it (but still enforcing the other path restrictions of course). .NET APIs that wrap Win32 functions without \\?\ support, should use the proposed automatic path shrinking if the path is too long, so that they'll be also able to handle long paths whenever possible.
Of course you will still have problems with other applications that don't support long file paths, but these applications would not work with path shrinking either, if they don't shrink paths themself.
Also while the proposed solution will only work if short file name generation is not deactivated, this solution would still be able to work in most (or many) cases.
What do you think?
Eric, I don't think you understand the algorithm that is being proposed. It wouldn't be blindly truncating and appending "~1" to the file name. It would use the *correct* short name for a given long name. I can't believe how anybody could imagine it working any other way.
As to the original question, given that \\?\ will be supported anyway, I'm not sure what the benefit of path shrinking would be. It would work some times, but not all times, and so you'd still need to work around the times it doesn't work with \\?\.
I think path shrinking should be limited only to those cases that won't be fixable by supporting \\?\ -- namely LoadLibrary and that sort of thing.
First, I think this is a great discussion to also reflect on the MSDN Technical Interoperability Forum, because I suspect there are many more things that haven't been thought about, and attracting more eyeballs could be valuable.
With regard to short names, I think people don't get that the short names are always there (right?) for any non-short name of a directory or a file. Nicht wahr? You want to see some, run edit.com (yes, that one) from a console window. Then notice what the console shows as the current directory path afterwards. (So would we be breaking cmd.exe with this stuff?)
The interoperability issues I am thinking of have to do with (1) shared directories that have these monster name snakes in them, (2) software that reads and writes related formats, such as ISO CD-ROMs and DVDs. Oh, yes, ahem, flash drives and that Windows Home Server on my workgroup LAN. There are APIs that have hard-coded limits for passing filenames back and forth too (there are ones in ODMA that I am probably too familiar with).
Then there are all those web file uploaders and downloaders, etc., etc. Probably one more way to break Media Player in an inscrutible way, etc.
I hope the wider conversation around this happened internally (on that internal list you mentioned in one of these posts). Is there some place where people triage things that will cost more in support incidents than the problem they solve?
Hmm, yes, I think this goes onto the Technical Interoperability forum and maybe some wider readership, though not sure what that is.
orcmid: Notice the comments above about how short name generation can be turned off. While it's true that turning off short name generation won't delete the short names that *already exist*, it obviously won't generate any new ones, so it'll only help you in some situations.
I believe the main problem with shrinking the paths is that it only works some of the time, and it's totally arbitrary whether it'll work on any given path (so your application might be OK on an operating system that has short name generation turned on, but if your customer has it turned off, it might not work for them... I imagine it's going to be very difficult to track those sorts of problems down -- better to fail consistently).
As part of the long-term plan, have you considered deprecating strings-as-paths altogether and using the Uri class instead?