Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Why does the NT redirector close file handles when the network connection breaks?

Why does the NT redirector close file handles when the network connection breaks?

  • Comments 33

Yesterday, Raymond posted an article about power suspend and it's behavior in XP and Vista.  It got almost as many comments as a post in the IE blog does :).

I want to write about one of the comments made to the article (by "Not Amused Again"):

James Schend: Nice one. Do you know why the damn programs have to ask before allowing things to go into suspend mode? It's because MS's freakin networking code is freakin *broken*. Open files are supposed to be reconnected after a suspend, and *they are not*, leading to losses in any open files. (Not that saving the files then allowing the suspend to continue works either, as the wonderful opportunistic file locking junk seems to predictably barf after suspends.)

 

A long time ago, in a building far, far away, I worked on the first version of the NT network filesystem (a different version was released with Windows 2000).  So I know a fair amount about this particular issue.

The answer to the Not Amused Again's complaint is: "Because the alternative is worse".

Unlike some other network architectures (think NFS), CIFS attempts to provide a reliable model for client/server networking.  On a CIFS network, the behavior of network files is as close to the behavior of local files as possible.

That is a good thing, because it means that an application doesn't have to realize that files are opened over the network.  All the filesystem primitives that work locally also work over the network transparently.  That means that the local file sharing and locking rules are applied to files on network.

The problem is that networks are inherently unreliable.  When someone trips over the connector to the key router between your client and the server, the connection between the two is going to be lost.  The client can reconnect the connection to the network share, but what should be done about the files opened over the network?

There are a couple of criteria that any solution to this problem must have:

First off, the server is OBLIGATED to close the file when the connection with the client is disconnected.  It has no ability to keep the file open for the client.  So any strategy that involves the server keeping the client's state around is a non-starter (otherwise you have a DoS scenario associated with the client). Any recovery strategy has to be done entirely on the client. 

Secondly, it is utterly unacceptable to introduce the possibility of data corruption.  If there is a scenario where reopening the file can result in a data corruption scenario, then  that scenario can't be allowed.

So let's see if we can figure out the rules for re-opening the file:

First off, what happens if you can't reopen the file?   Maybe you had the file opened in exclusive mode and once the connection was disconnected, someone else got in and opened it exclusively.  How are you going to tell the client that the file open failed?  What happens if someone deleted the file on the share once it was closed?  You can't return file not found, since the file was already opened.

The thing is, it turns out that failing to re-open the file is actually the BEST option you have.  The others are actually even worse than that scenario.

 

Let's say that you succeed in re-opening the file.  Let's consider some other scenarios:

What happens if you had locks on the file?  Obviously you need to re-apply the locks, that's a no-brainer.  But what happens if they can't be applied?  The other thing to consider about locks is that a client that has a lock open on a region of the file assumes that no other client can write to that region of the file (remember: network files look just like local files).  So they assume that nobody else has changed that region.  But what happens if someone else does change that region?  Now you just introduced a data corruption error by re-opening the file.

This scenario is NOT far-fetched.  It's actually the usage pattern used by most file based database applications (R:Base, D-Base, Microsoft Access, etc).  Modern client/server databases just keep their files open all the time, but non client/server database apps let multiple clients open a single database file and use record locking to ensure that the database integrity is preserved (the files lock a region of the file, alter it, then unlock it).  Since the server closed the file when the connection was lost, other applications could have come in, locked a region of the file, modified it, then unlocked it.  But YOUR client doesn't know this happened.  It thinks it still has the lock on the region of the file, so it owns the contents of that region.

Ok, so you decide that if the client has a lock on the file, we won't allow them to re-open the file.  Not that huge a restriction, but it means we won't re-open database files over the network.  You just pissed off a bunch of customers who wanted to put their shared database on the server.

 

Next, what happens if the client had the file opened exclusively?  That means that they know that nobody else in the world has the file open, so they can assume that the file's not been modified by anyone else.  That means that the client can't re-open the file if it's opened in exclusive mode.

Next let's consider the case where the file's not opened exclusively: There are four cases of interest, involving two file attributes and two file open modes: FILE_SHARE_READ and FILE_SHARE_WRITE  (FILE_SHARE_DELETE isn't very interesting), and FILE_READ_DATA and FILE_WRITE_DATA.

There are four interesting combinations (the cases with more than one write collapse the file_share_write case), laid out in the table below.

  FILE_SHARE_READ FILE_SHARE_WRITE
FILE_READ_DATA This is effectively the same as exclusive mode - nobody else can write to the file, and the client is only reading the file, thus it may cache the contents of the file The client is only reading data, and it isn't caching the data being read (because others can write to the file).
FILE_WRITE_DATA This client can write to the file and nobody else can write to it, thus it can cache the contents of the file. The client is only writing data, and it can't be caching (because others can write to the file)

For FILE_SHARE_READ, others can read the file, but nobody else can write to the file, the client can and will cache the contents of the file, .  For FILE_SHARE_WRITE, no assumptions can be made by the client, so the client can have no information cached about the file.

So this means that the ONLY circumstance in which it's reliable to re-open the file is when a file has never had any locks taken on it and when it has been opened for FILE_SHARE_WRITE mode.

 

So the number of scenarios where it's safe to re-open the file is pretty slim. we spent a long time discussing this back in the NT 3.1 days and eventually decided that it wasn't worth the effort to fix this.

Since we can't re-open the files, the only option is to close the file.

As a point of information, Lan Manager 2.0 redirector for OS/2  did have such a feature, but we decided that we shouldn't implement it for NT 3.1. The main reason for this was the majority of files opened in OS/2 were open for share_write access (it was the default), but for NT, the default is to open files in exclusive mode, so the majority of files can't be reopened.

 

  • Considering the evolution of Windows--from standalone PCs with local files to networks with remote files and the complication of sleep/suspend--it sure seemed right for the OS to paper over the differences as much as possible. Long term, though, it seems to make work harder for app developers.

    I guess that's why the Internet is largely a stateless place, or the state is explicitly held by the clients. There's more work up front, you have to face the issues immediately in the design. With transparent access at the app level, it's easy to ignore those problem scenarios because they're relatively rare.
  • Good point Dave.  NFS is also stateless, which has its own set of horrible issues (try writing a reliable database that runs over NFS, it can't be done).  

    For grins, search for "IMAP NFS Crispin" to see some of Mark's comments about people who try to store their IMAP data stores on NFS volumes - without reliable file locking, it's impossible to do a remote database without corruption.

    That's why client/server is such a powerful paradigm, because it allows you to take a networked problem and turn it into a local problem.

    And I disagree that it pushes the problem to the app developers.  If you don't make F&P network access seamless to the app author, it dramatically reduces the value of the network.  I would have a significantly harder time if I couldn't run programs over the network, or copy files from the shell/command prompt.
  • Of course, this practice of pretending that remote files are local files has drawbacks of its own.

    The most annoying one is that applications with few exceptions assume that file operations will complete quickly and so complete them in the UI thread. Windows Explorer even does this frequently. As we all know, I/O operations on network-mounted filesystems in normal circumstances can block for quite a while if a connection needs to be re-established, and if the remote server has gone away completely that read operation might well block for ten seconds or more.

    This is most annoying in applications which, when they get focus, interrogate the active document on disk to see if another application has changed it in the mean time. If the filesystem has since gone away, the call blocks and the application appears to hang. I see this at least once a month at work when the admin reboots the file server to apply patches.

    There is an important distinction between local file operations and remote file operations, and while in ideal cases it's nice to pretend they're the same thing, in practice it just seems to lead to trouble. The lesson for all developers reading this is that you should perform all I/O in a separate, non-UI thread, even if you think it's "just" file I/O! It'll make life happier for people using slow USB Mass Storage devices and floppity disks, too.
  • Most of the cases you described are problematic because files may change while disconnected.  If the server stored an ID that changed when a file is modified, and the client stored the ID that applied to the file before the disconnect, then the client could detect if it's safe to silently reconnect.

    There are scenarios when the client would still break, e.g. if the client is using a lock on a file to coordinate something else, so a temporarily unlocked file would be a problem even if the file is unmodified, but those programs will probably break anyway.
  • > and use record locking to ensure that the database integrity is
    > preserved (the files lock a region of the file, alter it, then unlock
    > it).

    So what happens if the client sends 4 SMB packets (lock, write, write, unlock -- the client did 2 write operations to update the record), and the connection dies between the two "write" packets?  Unless SMB is transactional *and* treats the entire lock-write-write-unlock sequence as a single transaction (I have no idea if it does or not; I doubt it though), then not only will the file not be in the assumed state, it won't even be *valid* anymore.  Even if the server closes the file and thereby invalidates all the client locks, the file will still have corrupt data in it, because the first write succeeded but the second failed.

    And telling the client about this does no good either, because the client can't fix the file's corrupt data (the connection is gone).

    So much for "making network file access look exactly like local file access".  You can't have your file closed on you in between two writes (especially if it's locked) when it's local.

    (This also probably explains part of why Access uses .ldb files, actually.  If the .ldb file is there but has no locks on it, then the database is considered corrupt and needs to be "repaired".  Normally the last Access process to have the .mdb open will delete the .ldb file when the .mdb file gets closed, or at least remove the last lock on the file.)
  • Ben, you're not pretending that remote files are local files.  You're asserting that remote files can make the same assumptions about reliability that local files can.  So an app that is developed against a local file will continue to work remotely.

    Any application that assumes that I/O to a local file will complete quickly is making an invalid assumption - you can't assume the local media is quick (think removable media (floppy or cdrom drives)).  

    Aryeh, you're sort-of right, but that would require modifications across the entire stack, from the filesystems up through the server (what happens when a local user changes the file contents?).  Also, how do you ensure that the value is kept in sync?  I guess you could use a change #(or write count or last write time) but if what would cause the change # to modify?  If the modified page writer flushes a previously modified page to disk, does that change the write count?

    The other problem is that this value must be persisted (maybe the reason for the failure was a server reboot), but some filesystems don't permit the persistance of arbitrary metadata (it might be a legacy filesystem like cdfs or FAT, for example).  So now the set of cases where it's possible to re-open the file is even further reduced.

    Solving this problem correctly is VERY hard, and the cost of not solving it correctly is a really subtle data corruption bug, which isn't good.
  • Larry,
    That was my point, essentially. Any program that assumes file I/O is "quick" (for some value of quick) and does it in a UI thread is going to suck on anything but hard disks. Including Explorer.

    Clearly application developers don't get it, so another alternative is to force them to care by making them acknowledge the problem somehow. One possibility that springs to mind is to force the apps to use async I/O, but that would be a pain for anyone who knows how to write a multithreaded app and wants to handle the asynchronisity themselves.

  • <i>Secondly, it is utterly unacceptable to introduce the possibility of data corruption.</i>

    Larry,

    Thanks for covering this (I'm the NotAmused who started the topic); I do see your point. My own apps. now simply blanket-respond to the WM_POWER... messages with a "no way".

    Care to cover ISAM OpLocks problems...?
  • NAA, I'm not sure why FS oplocks are broken after suspend/resume, I never worked on the local filesystems, so I'm sorry, I can't :(

  • I understand that there is no clear-cut answer in many of these cases, but haven't they already been handled?

    When you open a file which is cached for off-line access and then reconnect to the server, how is that any different in terms of file integrity?

    I would really like to see this fixed in harmless cases (read-only files, opened with write sharing, etc.)
  • Gabe, it's not.  But the rules for CSC are pretty strict, the newer copy wins.  But there's a manual sync step going on with CSC, it's possible to detect conflicts, resolve them, etc.

    When you're dealing with a live file, you don't have the luxury of resolving conflicts, you need to make a real-time binary decision: re-open or no.

    Oh, and read-only files are NOT a harmless case.  What happens if the file's marked read/write, modified, then marked read-only during the interim?  Remember, the cost of getting this wrong is data corruption.  Realistically, you could enable an optional behavior to allow reopening of read-only files but it would have to be off by default.


  • It seems that you ignore what SHOULD BE the most usual (and the only no-changes needed recoverable) case, simply because it was the only one... not because it was hard, or dangerous.  This irks me because this IS the most common case for networked files (read only, deny none).

    Even the presence of locks could have be allowed if the server kept a generation counter on every open-for-write, lock grant or write request processed at the server. When the client received the initial lock success, it should remember that generation count. When reconnecting and reapplying locks, it would fail the reopen if the generation count coming back from the RElock request was not exactly what it expected (meaning other locks were granted or writes occured, so the server cannot guarantee that state is good).  The generation count would not matter until a reopen or relock request.  Some optimizations on reopen can be done based on the original open mode (e.g. the server doesn't need to bump the generation count for writes on a SHARE_DENY_WRITE file, since only the client with it open could be doing the writes). Finally, the lock count doesn't even have to be on-disk, since no reconnects should be legal if the server went down with open client handles.

    Simply put, these issue would allow MANY other single-writer access recoveries to happen, which is the most common situation.

    If we failed a reconnect (and relock), the client will be notified the next time it talks to the file in ANY form of unsuccessful reconnect; but this is NO DIFFERENT than what you have to do if the server goes down.
  • Marc,
     When the cost of failure is data corruption, you can't design features around what is the most common case, you need to design around the most reliable case.  Yes, it would be possible to kludge something together that worked most of the time.  But not something that was reliable to the point that people would put their mission critical data on the server.

     There ARE some solutions for the read-only read mode deny write case (which happens to be the "reading a program from a network share case"), and we strongly considered adding it.

     The lock problem is intractable (you can't solve it in the server, because you have to deal with local access to the file, which doesn't go through the server), but the other problems can be solved with sufficient plumbing (they require protocol changes, server changes, filesystem changes, etc, but they are possible).

     I made a conscious decision NOT to cover all the potential issues and solutions, figuring that what I'd posted was sufficient, but trust me, this is a harder problem than even what I've described.

     At the end of the day, we decided that the cost associated with such a feature wasn't worth the benefit it would bring - the reality is that applications need to deal with failure of file accesses on local access, so they're already going to have code to handle the case of a read failing, the only difference.

     By pushing the problem to the application, we ensure that the application will discard any cached data, which means that their data integrity will be maintained - we can't know what data they've cached, but the app does.

     There's a different team that owns the NT redirector, it's been 15+ years since that design decision was made, it's possible they might consider making a different decision, but maybe not.
  • > You just pissed off a bunch of customers who wanted to put their shared database on the server.

    Based on the years of problems I've had to fight as a contract IT support tech with lousy applications that use shared-file "databases", I really wish you guys had pissed off some Customers...

    ("Now we have to 're-index' the 'database' because it is 'corrupt'..."   "Do you have the right version of VREDIR.VXD?"  "Have you disabled OpLocks on your NT server?"  *sigh*)
  • I feel(just feel that, no deep thinking performed) that maybe we can use the approach of version control system.

    Once a file is opened through the network, a copy of the file is saved to client local and any read/write is directed to it. The file content is sync-ed only when the file is closed or flushed.

    On the remote side, the file is set read-only(i.e. exclusive write). And an timestamp is set on both side to mark the expiration time. If the expiration time come and the file is not closed, it's forced close.

    On the local side, the timestamp is checked on each write. If expried, the write fails, and mark the file closed.(As if some external programs forced closing the file handle) If the file closes this way, a last attempt to sync the file(file on disk only, content not flushed are not sync-ed) is made. If can't update the remote side, the content is rolled back to last sync-ed state.

    With buffered read/write and "checkout" system, seems that a workable system that can reopen file over network can be made. Of course the overhead on diskcache can be large, but the side of the temp-disk cache can be set by users just like the virtual memory pagefile.

    The other throught is, what if we make it another way by making "all programs think they are working on the network files"?
Page 1 of 3 (33 items) 123