Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Why does Windows still place so much importance on filenames?

Why does Windows still place so much importance on filenames?

Rate This
  • Comments 35

Earlier today, Adrian Kingsley-Hughes posted a rant (his word, not mine) about the fact that Windows still relies on text filenames.

The title says it all really. Why is it that Windows still place so much importance on filenames.

Take the following example - sorting out digital snaps. These are usually automatically given daft filenames such as IMG00032.JPG at the time they are stored by the camera. In an ideal world you’d only ever have one IMG00032.JPG on your entire system, but the world is far from perfect. Your camera might decide to restart its numbering system, or you might have two cameras using the same naming format. What happens then?

I guess I’m confused.  I could see a *very* strong argument against Windows dependency on file extensions, but I’m totally mystified about why having filenames is such a problem.

At some level, Adrian’s absolutely right – it IS possible to have multiple files on the hard disk named “recipe.txt”.  And that’s bad.  But is it the fault of Windows for allowing multiple files to have colliding names? Or is it the fault of the user for choosing poor names?  Maybe it’s a bit of both.

What would a better system look like?  Well Adrian gives an example of what he’s like to see:

Why? Why is the filename the deciding factor? Why not something more unique? Something like a checksum? This way the operating system could decide is two files really are identical or not, and replace the file if it’s a copy, or create a copy if they are different. This would save time, and dramatically reduce the likelihood of data loss through overwriting.

But how would that system work?  What if we did just that.  Then you wouldn’t have two files named recipe.txt (which is good).

Unfortunately that solution introduces a new problem: You still have two files.  One named “2B1015DB-30CA-409E-9B07-234A209622B6” and the other named “5F5431E8-FF7C-45D4-9A2B-B30A9D9A791B”. It’s certainly true that those two files are uniquely named and you can always tell them apart.  But you’ve also lost a critical piece of information: the fact that they both contain recipes.

That’s the information that the filename conveys.  It’s human specific data that describes the contents of the file.  If we were to go with unique monikers, we’d lose that critical information.

But I don’t actually think that the dependency on filenames is really what’s annoying him.  It’s just a symptom of a different problem. 

Adrian’s rant is a perfect example of jumping to a solution without first understanding the problem.  And why it’s so hard for Windows UI designers to figure out how to solve customer problems – this example is a customer complaint that we remove filenames from Windows.  Obviously something happened to annoy Adrian that was related to filenames, but the question is: What?  He doesn’t describe the problem, but we can hazard a guess about what happened from his text:

Here’s an example. I might have two files in separate folders called recipe.txt, but one is a recipe for a pumpkin pie, and the other for apple pie. OK, it was dumb of me to give the files the same name, but it’s in situations like this that the OS should be helping me, not hindering me and making me pay for my stupidity. After all, Windows knows, without asking me, that the files, even if they are the same size and created at exactly the same time, are different. Why does Windows need to ask me what to do? Sure, it doesn’t solve all problems, but it’s a far better solution than clinging to the notion of filenames as being the best metric by which to judge whether files are identical or not.

The key information here is the question: “Why does Windows need to ask me what to do?”  My guess is that he had two “recipe.txt” files in different directories and copied a recipe.txt from one directory to the other.  When you do that, Windows presents you with the following dialog:

Windows Copy Dialog

My suspicion is that he’s annoyed because Windows is forcing him to make a choice about what to do when there’s a conflict.  The problem is that there’s no one answer that works for all users and all scenarios.    Even in my day-to-day work I’ve had reason to chose all three options, depending on what’s going on.  From the rant, it appears that Adrian would like it to chose “Copy, but keep both files” by default.  But what happens if you really *do* want to replace the old recipe.txt with a new version?  Maybe you edited the file offline on your laptop and you’re bringing the new copy back to your desktop machine.  Or maybe you’re copying a bunch of files from one drive to another (I do this regularly when I sync my music collection from home and work).  In that case, you want to ignore the existing copy of the file (or maybe you want to copy the file over to ensure that the metadata is in sync).

Windows can’t figure out what the right answer is here – so it prompts the user for advice about what to do.

Btw, Adrian’s answer to his rhetorical question is “the reason is legacy”.  Actually that’s not quite it.  The reason is that it’s filenames provide valuable information for the user that would be lost if we went away from them.

Next time I want to spend a bit of time brainstorming about ways to solve his problem (assuming that the problem I identified is the real problem – it might not be). 

 

 

PS: I’m also not sure why he picked on Windows here.  Every operating system I know of has similar dependencies on filenames.  I think that’s an another indication that he’s jumping on a solution without first describing the problem.

  • What I would venture is that using the file name as THE identity token of the file in the file system is what is causing this guys trouble. Arguably the name is just metadata about the file, a very important part, but still just metadata, no different than the last write date or the permissions. One could argue that the user should be able to "name" the file whatever she wants, independently on how the OS determines the identity of the file, copying then the two files called recipe.txt to the same folder should then be just a matter of annoyance to the user because she doesn't know anymore which one is which. This could be also extended to the usage of the extension paradigm to "tag" the file type, which should be also part of the metadata not part of the file identity. Even folders could be thought as just mere views of the underlying data then, whether the same folder is in two folders or the file is copied being a bit more natural to express.

    Now the interesting though experiment here is how to design an api to deal with a file system such as this, one would open the file by its id token which would be resolved after the user picks a file in some sort of FileOpenDialog UI. Almost like one imagine the file system internal API must be, after the directory is resolve to the actual entry in the MFT. The apps would now deal with those IDs directly instead of through the "view" of directories and file entries.

  • @Nobody.  I want to talk about that particular issue in the post after the next one. There are some interesting challenges involving user expectations to that solution.

  • (Please delete this if it's a double-post. The first time I tried to submit it I got no feedback to indicate whether it posted or not.)

    It sounds like he's mostly annoyed at that dialog box. It often doesn't present you with the information that you're actually going to use to make the decision (you may need to open one or both files to do that), it doesn't give you much idea of how many more conflicts are coming up, it doesn't let you defer the decision until you've copied the other files or seen the other conflicts. (I might be wrong - it's a while since I've copied lots of files like that on Windows and I'm on a Linux machine right now - but I think that's how it behaved last time I saw it.) Windows can't make the decision what to do by itself, but it's certainly possible to think of ways that the experience could be less painful.

    That said, I don't think his idea is entirely unworkable. Suppose that we're only talking about user documents - no system files, nothing that's cross-linked by filename or shortcut or anything like that to complicate the picture. Suppose that any time you copy or move a file and there's already a file with the target name it never overwrites the target - you just end up with two files with the same name in the same folder. Could a system like this work?

    It seems like it could. What happens when I have two copies of a file and want to keep just one? Well, I have to delete the one I don't want. That doesn't seem hard to understand. I still need to do some work, but it's my problem, not the system's, and I can do it in my own time. I couldn't use typed in paths to uniquely specify files any more, but perhaps such a system would always require picking files from some sort of GUI - most users get by rarely needing to type in the names of existing files, and when they do it's often to type the first few letters and then select an item in a list view. Applications already make a distinction to users between editing an existing document and creating and saving a new one, so we needn't end up with duplicates every time I open a document, edit it and save it. Now, you couldn't just drop this behaviour in to an existing operating system without breaking pretty much everything, but you could imagine one designed this way from the ground up, at least in its handling of user documents. I wouldn't be surprised if some purely object-oriented operating system tried something similar.

  • @Weeble: You're describing the Windows XP experience.  The file copy dialog was dramatically improved for Windows Vista.  How do you handle the "copying an updated file from the laptop" scenario if you never replace the existing file (where you *do* want to overwrite the file)?  What about the "updating my media library from home" scenario (where you *don't* want to overwrite the file)?

    Forcing the user to come back and clean up after the copy command can also result in a poor experience.  People would say "@#$@#$ windows, why doesn't it understand that I wanted to overwrite the file?"

    These decisions are tricky, which is why I decided to write the followup post.

  • What's the names of the different photos in your iPad/iPhone/iPod Touch Photos app? What are the names of the files for the notes in your phone's note-taking app? The names of the save-game files? The MP3s in your music library app?

    By framing the question of filenames in a filesystem context, you risk prematurely jumping to conclusions. What if, from the perspective of the user, there is no filesystem? Without a filesystem, you don't need names. Perhaps you need tags, dates, camera model, author, etc. etc. Perhaps these things are more or less convenient. Perhaps there is still a filesystem behind the scenes, but the user doesn't *necessarily* need to know that.

  • @Weeble, and a tiny bit @Nobody:

    That whole suggestion sounds like an bad idea to me.

    For starters, I think you're making the common case a lot of work in order to make the uncommon case easier. How often do I want more than one file with the same name in the same directory? Well, it's fairly hard to predict how I'd use that feature if it existed, but I suspect it would be fairly rarely. But how often do I copy a file from one place to another and want to overwrite the destination? Fairly often. And when I DO do it, it's often with a bunch of files at once. Now you're telling me I have to go through and clean up that? You say you can do that "on your own time", but I don't WANT to spend the time on it. I'd lose WAY more time to that, especially when you take into account the occasional mistake, than I do to dealing with the fact that names and files are in a 1-1 correspondence. (Disclaimer: names and files are not actually in a 1-1 correspondence, due to hard links.)

    Now you could use some unique ID that gets created when a file is created and remains unchanged throughout its life, independent of how the file contents change. (E.g. pick a GUID.) This could be an interesting interface. And it resolves this problem at the cost of creating another (preserving identity across what look to the OS like new-file creations; see below.) This may be more along the lines of what Nobody was considering.

    Second, you can't require a GUI -- I strongly feel that if it's not scriptable, it's not remotely acceptable. I don't think there's anything that's completely fundamentally wrong about such an interface, but there ARE a lot of questions you have to work out. If I say "type *.txt" at the command prompt, how will that play out in terms of what the shell, type, and OS do? What if I'm using something like Cygwin Bash where the *.txt gets expanded by the shell? [For a few reasons, I generally favor such an interface rather than have programs interpret wildcards.] How will the target program know how to interpret the resulting file names, since they no longer suffice to identify files? I think you'd have to rather completely rethink how program invocation and shells work, perhaps to the point of command line arguments actually representing typed entities. (E.g. *.txt would expand to a list of file objects, not just a list of strings.)

    Then, you say "Applications already make a distinction to users between editing an existing document and creating and saving a new one, so we needn't end up with duplicates every time I open a document, edit it and save it", but this is only true of the user's view. In fact, there are a number of programs for which this is actually NOT true if you look at the actual API calls it makes. Programs do things behind the scenes like 'del file.txt; create file.txt', or 'ren file.txt backup.txt; create file.txt'. Again, this isn't insurmountable; Windows hacks around this problem for some metadata currently. (See Raymond Chen: blogs.msdn.com/.../439261.aspx. Incidentally, this is why you get useful file creation dates on Windows and not on Unix.) However, this hack seems, well, hackish, and I'm not sure how much I'm comfortable depending on it for something vital. In particular, that thing I mentioned in the second big paragraph -- give each file a GUID -- would absolutely depend on this working nearly perfectly.

    I will be very interested to see the next couple entries though. I tend to get fairly passionate about some aspects of file system design. :-)

  • It's not clearly a better system, but neither is it clearly an unworkable one. Perhaps a "duplicates happen" system with extra tools for cleaning up or synchronizing files would be more intuitive overall than a "no duplicates" system which forces immediate resolution of conflicts. After all, it does seem to be closer to how real-world objects - such as paper documents - work. It's certainly an interesting idea to consider.

    It's interesting to note that synchronizing files between multiple locations is fundamentally a hard problem. We're not even considering cases where the user really wants the resolution "merge the duplicated files". In that case we would probably say that they should be using some specialised application like a source control system. How do we decide where to draw the line between what should be built in to the file browser and what should be handled by another application?

  • That dialog was one of the best changes MS made to the explorer I can think of. There are scenarii for all three options (and I could think of a 4th that allows the user to specify a new name when keeping both instead of the default behavior). And I don't see how it doesn't give me all the information I need to make the right decision. Now if someone wants to complain about the XP dialog boxes, just go on, I doubt anyone would want to stop you.

    Also actually I totally DON'T agree that it's a bad thing to have several files with the same name on a disk - readme.txts or config files come to my mind.. it's not just the filename but the absolute path that has lots of information. If I have two files that can be distinguished based on some tag, I can just as easily adjust the filename. Also how would I specify a specific file if there could be several with the same name in one position? Specfiy the distinguishing tag?

    Sounds like he had a specific problem and generalized from it without thinking about the hundreds of scenarios where his "obvious solution" wouldn't work.

  • I think that Adrian has a point here (though he's expressed it quite badly, and his solution sucks :)). In many cases, users don't care about the filename. When dealing with the photos on my camera, the camera automatically fills in the date (and depending on which camera I'm using, the place) that it was taken, later on I might tag the people who are in the photo, and give it a category tag, but I still just end up with DSC_0003.jpg as the filename. Because I couldn't put enough information in the name to be useful, it just gets ignored. I never even see my music files, since explorer can just show me the actual song details, I just copy/move them around like that (or my music software, which similarly knows more about songs than a filename could reasonably express).

    With regards to personal documents, the one area that you might care about filenames, I often end up with, say, Budget.xlsx, Budget2.xlsx, Budget3.xlsx: I gave it a name (budget), the other important piece of metadata is the date it was created, and that's stored with the file. The filesystem forced me to make up something to uniqify the filenames, when I could already tell them apart. (My grandmother and mother definitely do this too, so it's not just because I'm a "computer type")

    It seems to me that for data(files related to programs are a different story), the important part of a file is not it's name, but it's identity. The file that started out on my desktop as Recipe.txt, and is then copied to my laptop for further editing, and then back, should overwrite Recipe.txt, while my second recipe that I started working on at work called Recipe.txt, which I then copied to my home computer should go alongside. If I copy both of those files to my laptop, rename one to "Hommus Recipe.txt" and the other to "Guacamole Recipe.txt", when I copy them back to the same directory, they should still overwrite the files that they "came from".

    Having filenames also forces me to actually come up with a name for something, which might not be immediately obvious. I definitely get a heap of files on my desktop or documents folder named Foo.txt, bar.txt etc over time, when I want to save some small snippet of text. (Onenote is great for this though)

  • I think that Adrian has a point here (though he's expressed it quite badly, and his solution sucks :)). In many cases, users don't care about the filename. When dealing with the photos on my camera, the camera automatically fills in the date (and depending on which camera I'm using, the place) that it was taken, later on I might tag the people who are in the photo, and give it a category tag, but I still just end up with DSC_0003.jpg as the filename. Because I couldn't put enough information in the name to be useful, it just gets ignored. I never even see my music files, since explorer can just show me the actual song details, I just copy/move them around like that (or my music software, which similarly knows more about songs than a filename could reasonably express).

    With regards to personal documents, the one area that you might care about filenames, I often end up with, say, Budget.xlsx, Budget2.xlsx, Budget3.xlsx: I gave it a name (budget), the other important piece of metadata is the date it was created, and that's stored with the file. The filesystem forced me to make up something to uniqify the filenames, when I could already tell them apart. (My grandmother and mother definitely do this too, so it's not just because I'm a "computer type")

    It seems to me that for data(files related to programs are a different story), the important part of a file is not it's name, but it's identity. The file that started out on my desktop as Recipe.txt, and is then copied to my laptop for further editing, and then back, should overwrite Recipe.txt, while my second recipe that I started working on at work called Recipe.txt, which I then copied to my home computer should go alongside. If I copy both of those files to my laptop, rename one to "Hommus Recipe.txt" and the other to "Guacamole Recipe.txt", when I copy them back to the same directory, they should still overwrite the files that they "came from".

    Having filenames also forces me to actually come up with a name for something, which might not be immediately obvious. I definitely get a heap of files on my desktop or documents folder named Foo.txt, bar.txt etc over time, when I want to save some small snippet of text. (Onenote is great for this though)

  • @voo: "And I don't see how it doesn't give me all the information I need to make the right decision."

    There is at least one piece of highly-relevant information that it fails to give you, which it could and make things a lot better some of the time: "these files differ" or "these files are the same". I think this would be a wonderful addition to that dialog.

    The computer guy in me wants a "diff" button too that brings up something like WinMerge, at least for files that look like text, but at the same time I recognize this is probably not particularly appropriate for most people.

    @Weeble: "It's not clearly a better system, but neither is it clearly an unworkable one. Perhaps a "duplicates happen" system with extra tools for cleaning up or synchronizing files would be more intuitive overall than a "no duplicates" system which forces immediate resolution of conflicts."

    I'm still skeptical; there are a LOT of problems that need to be worked out. (E.g. a tools that helps YOU with resolving those conflicts will do nothing for what I was talking about from the command line point of view.) And I think that some of the reasons you might want multiple names could be better handled with other mechanisms, e.g. store the old version of the file in something that's a little like the "previous versions" thing, where you can retrieve it if need-be.

    That said, I definitely like hearing about wacky ideas. And it's a little bit interesting: a lot of the reasons some people are going to Linux and such is because of Windows's ubiquity, to fight against the Windows "monoculture". But from another point of view, Windows is really the odd one out. What OSes are people using today besides Windows? Linux, OS X, Solaris, ... all the OSes I can think of that I suspect have a noticeable presence in the world have their roots in Unix, except for Windows. And then lots of people say "MS should toss out NT and build a Windows compatibility layer on Unix, the way Apple did." But in some sense, Windows is the lone standout from a Unix monoculture now. And that has problems too, albeit very different ones than the Windows monoculture. One of the problems is reinforcing a sort of Unix orthodoxy. (I think Rob Pike or someone briefly mentioned this in a presentation somewhere.) Out of all of those OSs, how likely is it that they would have done something like Transactional NTFS before MS? I think not at all.

    So don't interpret my earlier comment as "this would never work" so much as "there's a lot of things that someone would have to figure out how to do to make this work". I probably come off a little more opinionated in text than I actually am.

  • I thought I made it clear but I shall try to restate - the system that I was considering is quite obviously utterly incompatible with existing applications and file systems and would only work with an ecosystem of applications designed to support it from the ground up. It needs a mechanism for programs and scripts to communicate and store file identity other than filenames. It needs a mechanism to distinguish replacing the content of a file and creating a new file with the same name. Obviously it is not practical to retrofit this behaviour into a traditional file-system. That doesn't mean it's not useful to consider it as a theoretical way to manage documents.

    I think we are agreed that it looks like such a system would be awkward when we really want to "copy and replace the equivalent documents" or "copy only the documents that don't have equivalents" or some mix of both. I guess I'm just saying that I'd be interested to experiment with such a system to see how painful this is and if there are other ways to resolve those problems than by using filenames as identities. There seems to be some elegance to being able to say that "copying" a document is just that, no more and no less, as opposed to "copying and replacing". Elegance isn't an end to itself, but I find it can mean something is at least worth a second look.

  • I think you missed the point. When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical.

    So, if you merge some folders, for example you have already images from your camera in your pictures folder. Now you are on vacation and download some pictures from your camera to your notebook. When you are back, you would copy the pictures folder from your notebook over your pictures folder on your desktop.

    But because you didn't remember, "Ah, I already downloaded image "IMG00032.JPG" to my desktop, before I start my vacation and forget to delete it on the camera, you downloaded the file again to your notebook. And now you will see the prompt, Windows asking you what to do. If Windows would already know, "Hey, these fles are the same..." there is nothing to ask ;-)

    The problem might be, that it takes too much time:

    To calculate a checksum, you need to read the entire file. If you then decide you want to copy the file, you need to read it again..

  • I think you missed the point. When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical.

    So, if you merge some folders, for example you have already images from your camera in your pictures folder. Now you are on vacation and download some pictures from your camera to your notebook. When you are back, you would copy the pictures folder from your notebook over your pictures folder on your desktop.

    But because you didn't remember, "Ah, I already downloaded image "IMG00032.JPG" to my desktop, before I start my vacation and forget to delete it on the camera, you downloaded the file again to your notebook. And now you will see the prompt, Windows asking you what to do. If Windows would already know, "Hey, these fles are the same..." there is nothing to ask ;-)

    The problem might be, that it takes too much time:

    To calculate a checksum, you need to read the entire file. If you then decide you want to copy the file, you need to read it again..

  • I posted this reply/suggestion to the original:

    I'm not sure what you really expect Windows (or any other OS; they all act the same in this regard) to do here.

    Why don't you use a decent file manager and have it make the filenames properly unique as it moves them off the camera? For example, have it prefix them with the date & time of when they are being moved. Then they will not clash with existing filenames even if the camera has reset it counter.

Page 1 of 3 (35 items) 123