Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Why does the NT redirector close file handles when the network connection breaks?

Why does the NT redirector close file handles when the network connection breaks?

  • Comments 33

Yesterday, Raymond posted an article about power suspend and it's behavior in XP and Vista.  It got almost as many comments as a post in the IE blog does :).

I want to write about one of the comments made to the article (by "Not Amused Again"):

James Schend: Nice one. Do you know why the damn programs have to ask before allowing things to go into suspend mode? It's because MS's freakin networking code is freakin *broken*. Open files are supposed to be reconnected after a suspend, and *they are not*, leading to losses in any open files. (Not that saving the files then allowing the suspend to continue works either, as the wonderful opportunistic file locking junk seems to predictably barf after suspends.)


A long time ago, in a building far, far away, I worked on the first version of the NT network filesystem (a different version was released with Windows 2000).  So I know a fair amount about this particular issue.

The answer to the Not Amused Again's complaint is: "Because the alternative is worse".

Unlike some other network architectures (think NFS), CIFS attempts to provide a reliable model for client/server networking.  On a CIFS network, the behavior of network files is as close to the behavior of local files as possible.

That is a good thing, because it means that an application doesn't have to realize that files are opened over the network.  All the filesystem primitives that work locally also work over the network transparently.  That means that the local file sharing and locking rules are applied to files on network.

The problem is that networks are inherently unreliable.  When someone trips over the connector to the key router between your client and the server, the connection between the two is going to be lost.  The client can reconnect the connection to the network share, but what should be done about the files opened over the network?

There are a couple of criteria that any solution to this problem must have:

First off, the server is OBLIGATED to close the file when the connection with the client is disconnected.  It has no ability to keep the file open for the client.  So any strategy that involves the server keeping the client's state around is a non-starter (otherwise you have a DoS scenario associated with the client). Any recovery strategy has to be done entirely on the client. 

Secondly, it is utterly unacceptable to introduce the possibility of data corruption.  If there is a scenario where reopening the file can result in a data corruption scenario, then  that scenario can't be allowed.

So let's see if we can figure out the rules for re-opening the file:

First off, what happens if you can't reopen the file?   Maybe you had the file opened in exclusive mode and once the connection was disconnected, someone else got in and opened it exclusively.  How are you going to tell the client that the file open failed?  What happens if someone deleted the file on the share once it was closed?  You can't return file not found, since the file was already opened.

The thing is, it turns out that failing to re-open the file is actually the BEST option you have.  The others are actually even worse than that scenario.


Let's say that you succeed in re-opening the file.  Let's consider some other scenarios:

What happens if you had locks on the file?  Obviously you need to re-apply the locks, that's a no-brainer.  But what happens if they can't be applied?  The other thing to consider about locks is that a client that has a lock open on a region of the file assumes that no other client can write to that region of the file (remember: network files look just like local files).  So they assume that nobody else has changed that region.  But what happens if someone else does change that region?  Now you just introduced a data corruption error by re-opening the file.

This scenario is NOT far-fetched.  It's actually the usage pattern used by most file based database applications (R:Base, D-Base, Microsoft Access, etc).  Modern client/server databases just keep their files open all the time, but non client/server database apps let multiple clients open a single database file and use record locking to ensure that the database integrity is preserved (the files lock a region of the file, alter it, then unlock it).  Since the server closed the file when the connection was lost, other applications could have come in, locked a region of the file, modified it, then unlocked it.  But YOUR client doesn't know this happened.  It thinks it still has the lock on the region of the file, so it owns the contents of that region.

Ok, so you decide that if the client has a lock on the file, we won't allow them to re-open the file.  Not that huge a restriction, but it means we won't re-open database files over the network.  You just pissed off a bunch of customers who wanted to put their shared database on the server.


Next, what happens if the client had the file opened exclusively?  That means that they know that nobody else in the world has the file open, so they can assume that the file's not been modified by anyone else.  That means that the client can't re-open the file if it's opened in exclusive mode.

Next let's consider the case where the file's not opened exclusively: There are four cases of interest, involving two file attributes and two file open modes: FILE_SHARE_READ and FILE_SHARE_WRITE  (FILE_SHARE_DELETE isn't very interesting), and FILE_READ_DATA and FILE_WRITE_DATA.

There are four interesting combinations (the cases with more than one write collapse the file_share_write case), laid out in the table below.

FILE_READ_DATA This is effectively the same as exclusive mode - nobody else can write to the file, and the client is only reading the file, thus it may cache the contents of the file The client is only reading data, and it isn't caching the data being read (because others can write to the file).
FILE_WRITE_DATA This client can write to the file and nobody else can write to it, thus it can cache the contents of the file. The client is only writing data, and it can't be caching (because others can write to the file)

For FILE_SHARE_READ, others can read the file, but nobody else can write to the file, the client can and will cache the contents of the file, .  For FILE_SHARE_WRITE, no assumptions can be made by the client, so the client can have no information cached about the file.

So this means that the ONLY circumstance in which it's reliable to re-open the file is when a file has never had any locks taken on it and when it has been opened for FILE_SHARE_WRITE mode.


So the number of scenarios where it's safe to re-open the file is pretty slim. we spent a long time discussing this back in the NT 3.1 days and eventually decided that it wasn't worth the effort to fix this.

Since we can't re-open the files, the only option is to close the file.

As a point of information, Lan Manager 2.0 redirector for OS/2  did have such a feature, but we decided that we shouldn't implement it for NT 3.1. The main reason for this was the majority of files opened in OS/2 were open for share_write access (it was the default), but for NT, the default is to open files in exclusive mode, so the majority of files can't be reopened.


  • vredir.vxd???  What's that?  Sounds like some windows 95 thingy...

  • > vredir.vxd???  What's that?  Sounds like some windows 95 thingy...

    Yeah-- the Windows 9X redirector for Microsoft File n' Print sharing. I spent many a day troubleshooting problems w/ cruddy "shared file database" applications "corrupting" files due to bugs in various versions of VREDIR.VXD. The trademark of these crappy apps was the instruction to add a value to the LanManServer parameters to disable oplocks.

    You're talking about the NT redirector, of course, but the repressed memories of fighting with these applications that were too low-rent to bother with real client/server database systmes came welling back up. The problems are fewer now, since we've got the NT redirector on the desktops today, but I'm still horrified fairly regularly when I find new applications that continue to use such a mediocre and inefficient way to handle storing and sharing databases.
  • Funny (not "ha-ha", but curious), that *nix systems of all kinds have for ages been able to run completely off NFS (with no, absolutely zero, local disk needed), yet Windows has never been able to do this, and later incarnations needs both one and two *GIGA*byte of local storage to even start (I won't even mention how horrible the RAM requirements of Windows have become).
  • Mike,
     Talk to Mark Crispin sometime about trying to run a real-world server off NFS volumes.  It simply can't be done (no locking semantics, no reliable write-through semantics).

     The problem is that there are a huge number of times when it's CRITICAL that a client be able to confirm that a data write has been committed to the hard disk - without it, you can't do database transactions (if you can't be sure the write hit the hard disk, you can't commit the transaction).  NFS doesn't provide that support.  

    Try asking Oracle if they'll support storing the data of an Oracle database on an NFS drive.  They'll laugh you out of the office.

    And I have no idea where you got the idea that Win2K3 requires 2 gigabytes of local storage for F&P.
  • Well my Windows folder on a W2k3 server takes up 1.6 Gb of disc-space... That probably includes a bunch of cab files (drivers etc.) and backups for patches etc., but it still requires > 1Gb of storage... :)

    BUT; who cares? I mean, the cost of storage is so low today that it makes virtually no difference if an OS takes 1, 2 or even 10 Gb of harddisc space.
  • I'm not sure I understand the statement that corrupting customer's data is to be avoided at all costs when oplocks are enabled by default on nt/w2k/xp... servers.  We had a flaky switch which caused innumerable "delayed write failures" e.g data corruption, since the redirector sends a one byte write to reserve space and sends the bulk of the data at some point in the future.   Once the one byte write completes, the write requests is returned with a success error code.

    I agree in principle that network storage should be handled the same as local storage, but the fact is that its not the same because its much less reliable.  If you are writing an application that depends on network storage you have to take this into consideration and write code to handle it.  You can't depend on the redirector to do the right thing.  Unfortunately, very few dev groups do this.

    I believe the current behaviour of the redirector will change.  It will have too to be useful in mission critical areas where large data sets need to be store in a shared area.  Companies that are in this category have lots of money to spend.

    Also, I seem to remember seeing on msdn that access databases on network storage is not a supported configuration.  Doesn't stop everyone from doing it though.

    What does iscsi do when the network goes down?

  • Larry, did you know the phrase "Because the alternative is worse" is copyright Raymond? :)

    [3]... too many links!
  • Why do you need a generation counter if NTFS has a last-modified field? Sure, that doesn't solve the FAT32-server "problem". I don't care. A file server is typically set up as such, and if the documentation clearly says: "To support SMB reconnections, use NTFS", FAT32 is a non-issue.

    The "we can't solve it allways, so we never solve it" attitude means 90% of the data loss was preventable.

    I'm not yet convinced a big protocol change is needed. Disconnections are pretty rare. You can take some time to figure it out afterwards; it may involve more than a few extra messages so resync server and client. Up front the only requirement is that both server and client keep their own state, so this can be compared afterwards - no wire protocol involved, I'd guess.
  • The really fun thing is that .NET apps running from a network share pack a major sad if the connection to the server is lost while they're running.

    What's weird is that most of the time they don't even simply abort cleanly, they just go off into la-la land and occasionally mutter about bizarre errors.
  • Why not add a type of lock that persists across disconnects to Windows Vista and Windows Server "Longhorn", so that during suspend and hibernation just before disconnecting, the client will hold this lock and the server tracks it during the disconnection? It will behave like a normal lock. When the client wakes up and reconnects, the client tells the server the client have the lock and then it is converted into a normal lock. If the same client does not tell the server when it reconnects, all such locks are discarded.
  • you are a fuckin loser
  • A bit unclear about your description of NFS here as having "no locking semantics" and "no reliable write-through semantics". NFS has supported  fcntl-locking for a good long time now, and the write semantics have always been absolutely clear: once a write has been acknowledged to the client, it must have been committed to stable storage. Now, you can change the behaviour of the server if you prefer high performance to keeping all your data, but I'd advise against that!

    Which is not to say that there are plenty of problems with NFS. In particular the fact that it is -- unlike, from your description, CIFS's -- designed to be "as close to the behaviour local files as possible" means that when the network goes away applications which are doing IO to network filesystems must block until the network comes back. And requiring all writes to be committed to disk before they are acknowledged means that NFS is slow. But they are simply consequences of the stated requirement: unchanged applications must run reliably on the network filesystem.

    (You are also, by the way, correct to say that Marc Crispin has strong views on this as on so many things.)
  • Chris,
     Everything I've heard about NFS is that NFS implements file record locking semantics as advisory - if you flock() a region of the file for write access, someone can still write to that region if they are running in another process using a different handle.

    This is why the documentation for flock explicitly states that it doesn't work on NFS (  Having said that, that referenced documentation is clearly broken, the comment that flock isn't supported on Win9x is just silly.

    Advisory record locks make it quite difficult to implement a reliable flat file shared database, since it means that you can't ensure that the database file isn't corrupted.  Instead you have to trust that everyone follows the same rules.
  • It's certainly true that UNIX file locking is advisory (there is an implementation of mandatory locking on many UNIX systems but it's not typically used). It's correct that you have to "trust that everyone follows the same rules" in respect of file locking, but that's not a big problem with a shared database file, since you already have to trust that they follow the same rules in respect of the format of the data in the file too. Typically you have a single implementation of the database library or whatever, in which case it makes no difference that the locks are advisory: so long as the library always acquires a lock at the appropriate points, it doesn't matter whether it *could* actually do IO without having done so.

    It is also true (per the PHP documentation you quote) that flock(2) doesn't lock files across NFS shares in general; only the fcntl(2) call with command F_SETLK ("an fcntl lock") has this effect. (Exception: some systems emulate flock(2) with fcntl locks, and indeed PHP's flock function is itself implemented in terms of fcntl locks on systems without an flock call.) This is an irritating historical issue, and it's not often documented as well as it should be (note, for instance, that flock and fcntl locks have rather different semantics) but it's not relevant to the question of whether file locking is available on NFS.
  • PingBack from

Page 2 of 3 (33 items) 123