Starting with XP and Windows Server 2003 we have a new shadow copy functionality feature built-in the operating system. In my previous posts, I already mentioned some of of the high-level scenarios that are enabled by this feature. One example is backup, and the ability to avoid the "open files" problem. Another example is the Shadow Copies for Shared Folders feature, also known as "Previous Versions". But I haven't mentioned how shadow copies are actually implemented.
First I would like to point out that the VSS (Volume Shadow copy Service) infrastructure allows third-party shadow copy implementations to be installed in the OS, and make them accessible through a common API. We won't describe here this general infrastructure. For now, we note that, in general, the built-in shadow copy implementation will be used if no other implementations are present in the system. And only this built-in implementation is the subject of the article below.
The main idea behind shadow copies is to have a static, immutable view of a certain file or directory. For example, no matter what you do with your word document (overwriting it or even deleting it), you still have a stable image of this file at, say, 12.00 PM. So if you seriously screw up the current document, you can quickly recover its contents by right-clicking on it in explorer, and selecting the "Previous Versions" tab. (Note, however that you must access this file through a share with SCSF enabled, otherwise the Previous Versions tab won't be available).
Under the cover, this versioning functionality is not implemented by copying the entire file each time the system creates a previous version. We use instead a copy-on-write approach, that deals only with fragments of changed data. The unchanged data is internally shared between the original file and the previous version.
Now here comes the interesting stuff. The copy-on-write functionality is not implemented at the file system level. It is actually implemented below the file system, at the volume level where you deal only with equal-size blocks (or sectors) of raw data. It would be much more complicated to to implement a versioning scheme above the file system level. The reason is that you need to keep track of all the file system metadata changes, for example. Being below the file system, you only need to deal with block/sector-level changes, and therefore you end up with a simpler implementation.
VOLSNAP.SYS just treats the volume contents as an uniform array of 16K blocks at sector level (not NTFS cluster level). From the NTFS point of view, this array includes of course the high-level NTFS data structures, the NTFS metadata like the $MFT, $Secure, etc. VOLSNAP doesn't need to know where $MFT is located, for example. Remember that the size of a NTFS cluster (4K usually) is usually different than the size of a VOLSNAP block (16K) which is also different than the disk sector size (512 bytes). For example one VOLSNAP block (16K) will contain exactly 32 sectors.
To understand better the copy-on-write algorithm, let's go to a practical scenario:
The read on the shadow copy works in the following way: let's assume that the user reads the file above. The read I/O is intercepted by VOLSNAP as a sequence of reads for sectors contained in blocks A - E. For the first blocks (A, B, C), VOLSNAP sends back the saved versions of these sectors at 12.00 AM. For the next blocks D, E which weren't changed in the meantime, VOLSNAP sends back the current contents from the original volume.
In the end, the file system on top of the shadow copy device receives an exact copy of these sectors at the time of shadow copy creation (12.00 AM). So the user sees the exact copy of the file at 12.00 AM.