Base types, Collections, Diagnostics, IO, RegEx…
Updated 6/10/08 2:20pm: clarified details of proposed solution
Here it is, Part 3 of the long path series, which started over a year ago. I apologize for leaving you hanging; the BCL team has been busy lighting up the web. Because of the delay, I’ll summarize the compatibility concerns as context for the proposed solution.
Recall from Part 1 that one way to bypass the MAX_PATH limit with Win32 File APIs is to prepend \\?\ to the file name. This allows you to create paths that are only subject to NTFS restrictions, and the length limit is 32K. However, \\?\ has another side-effect -- it bypasses all Win32 file name canonicalization.
BCL gets a lot of requests for long path support. Some specifically request that we allow the \\?\ prefix. This brings up the question: is \\?\ requested because it allows longer paths, or do users also want to create paths that don’t conform to Win32 naming conventions? Our investigation indicates that, while there are specialized areas where non-canonical file names are useful, the overwhelming majority of users just want longer paths.
Why is this distinction important? If you just want longer paths, you don’t necessarily want the side effect of turning off all Win32 file naming conventions. For example:
File names like this are problematic for other apps (independent of the path length: using \\?\ you can create a file name shorter than MAX_PATH that doesn’t adhere to Win32 naming conventions), to the extent that (I expect) a great majority of users will want the framework to enforce canonicalization, at least as the default behavior.
Note that the above statements are a commentary on unbridled use of \\?\. The problem could still be resolved as follows: behind the scenes we first canonicalize the path using GetFullPathName (since GetFullPathName isn’t subject to the MAX_PATH restriction) and then prefix\\?\. Perhaps non-canonical names could be allowed on an opt-in basis.
Either way, suppose .NET lets you create paths up to 32K in length. Now you have a new problem: you have a file that, most likely, no other app on your system can use. It would have to support the \\?\ syntax. Furthermore, many .NET APIs won’t even be able to work with this file: recall from Part 1 that this syntax only works with the Win32 file-related APIs, but not for general Win32 functions that accept paths (e.g. LoadLibrary).
This blog series has focused fairly heavily on nuances of the \\?\ prefix, simply because it’s commonly viewed as the workaround to the MAX_PATH limitation. Let’s switch focus to some reasonable goals in the absence of a unified solution (exposed, for example, by Win32 APIs).
Because of the compat concerns with Goal 2, we don’t want users to “accidentally” use this solution.
Fortunately, the Vista shell has provided a precedent of allowing longer path names in a compatible way. It’s called auto-path shrinking and it attempts to squeeze a file name into MAX_PATH by shrinking the long file names into the short name equivalents piecewise behind the scenes. Before describing that, note that the proposed solution is a hybrid approach:
1. Try to squeeze the file name into MAX_PATH characters using auto-path shrinking. Only used for existing files, and paths that don't have the file:///?\ prefix (see below)
2. Allow use of the file:///?\ prefix for creating as well as opening (in general allow this for every operation corresponding to a Win32 file API that supports this). We will not attempt to add the file:///?\ prefix behind the scenes; at most we'll provide a helper to perform such as AddLongPathPrefix. In any case, the user must intentionally request this and not stumble into using file:///?\ by accident. This part is TBD: we think it makes sense to expose as an option whether we should always enforce other Win32 file name restrictions other than length, and enforcing file naming rules would be the default.
Let's describe auto-path shrinking a bit more. If you pass in a file name that exceeds MAX_PATH:
It will try to shrink it under the MAX_PATH limit by using the short name equivalents:
This solution may seem odd at first (beyond the ironic spin that we’re coming full circle to short file names). But it’s very compelling for adoption in the framework since paths of this form are acceptable to Win32 API (it’s a valid Win32 file name).
Some important clarifications:
* This brings up two questions. One is that users can turn off short file name generation via a registry value. This is discussed below. Also, you’ll notice this solution is NT-focused, but Silverlight can run on Macs. We also intend to handle platform-specific path limits with long path efforts, instead of enforcing Windows MAX_PATH (as we do currently).
Allowing use of file:///?\ will likely require a permission demand greater than FileIOPermission, perhaps even full demand for full trust. However, for many apps that need to work with long paths, this isn't a problem. We should investigate ways to relax this demand for partial trust scenarios like isolated storage.
Let’s look at some pros and cons of auto-shrinking:
We’re curious to hear your feedback about this approach.
If you're changing the filename parsing in the BCL, what about allowing names that specify alternate NTFS streams? i.e. c:\MyFile.Txt:extra_metadata
At the moment (well, last time I tried) it will get upset about the extra : character.
The source of the main title is an inside joke I am probably not going to ever explain within the blog.
Firstly, just swapping one very large limit for another isn't really going to solve the problem. People would simply start using very, very long directory names (think of how long URLs get when you start adding on form data -- and a couple of guids -- and a session id -- and the date -- and, and, and).
Secondly, making the limit bigger imposes horrid problems on small devices. My kids have some "clock radios" that can play MP3 files (currently favorite: Oliver!). These have enough trouble with MAX_PATH names.
Thirdly, path names also have to fit into the user interface. The Explorer in Vista already has enormous troubles with the existing path sizes; the last thing it needs is to have more troubles.
Lastly, stop it, stop it, stop it. File paths are a SYSTEM thing to be solved by the SYSTEM groups. The last (deleted) thing I need to deal with is a zillion (deleted) "solutions" by a zillion (deleted) groups, all "solving" the problem in their own, incompatible way.
Suppose you DO come up with a "solution". How will my C++ app deal with it? Will I have to make a call from C++ into .Net just to call your lame API? How will I do it from my installer? Can I do it from the Java program that runs my builds?
Urk about short-name generation being disabled.
So, I agree, this is a SYSTEM issue and in particular a matter with regard to the file system.
I guess I don't know what the use case is that has exceeding MAX_PATH be locally useful while remaining globally safe.
More fodder: Zip files and directory-simulating Zip access,
and then there's the part: protocol and OPC
um, the file: URI,
oh, and who gets to figure out all the threat-modeling of this combined with NTFS streams, hiding root kits, etc.
and what about using long-path injection as a form of buffer-overrun exploit, although I suspect that would be hard (assuming the Win32 APIs never change and neither does the MAX_PATH constant, for obvious down-level protection reasons).
Just tossing things around here. The bigger problem with this conversation is that the inside one is completely separated from the outside one and who knows what well-trodden ground we are revisiting. So I think I will shut up now. I hate write-only feedback channels.
I think you're overanalysing this WAY too much.
The fundamental problem is that .NET says that "?" is invalid in a path. Remove that restriction. That's all you need to do.
"Oh but developers might create paths with invalid characters and spaces at the end!"
SO WHAT! Stop holding our hand. .NET is not VBScript for dummies.
Right now, we are forced to re-implement the whole System.IO namespace purely to get over this shortsighted and rediculous piece of validation. In fact, I don't see wht the framework has to validate paths at all: just pass it down to the API and let it throw an error if it's not happy with it.
Stop nannying us. We're not idiots. Stop making assumptions on our behalf.
I tried you sample code from your blog but I cannot get it to work on XP SP2. Is the \\?\ mechanism not supported for really long file names?
StringBuilder sb = new StringBuilder("C:\\SubDir\\SubDir");
The Directory does already exist. I get as return value: The filename, directory name, or volume label syntax is incorrect.
Is there a known limitation with the \\?\ approach?
Ahh found the issue. The fsutil command did lead the way:
C:\>fsutil fsinfo volumeinfo c:\
Volume Name : System
Volume Serial Number : 0xf43ee3c8
Max Component Length : 255
File System Name : NTFS
Supports Case-sensitive filenames
Preserves Case of filenames
It seems that the directory name or file name itself cannot be longer than 255 characters. The 32K limit apply for the full path but the path parts itself cannot be longer than 255 characters still.