Recently, I was working with a program that needs to cache the content of a network file on the local disk for faster, easier access. The network file is expected to change periodically according to an unpredictable schedule; the objective is to keep the local file in "close" sync with the online copy without a lot of cost or complex protocols.

The network content is static (when it's not changing!), so hosting it on a web server and accessing it via HTTP seems like a reasonable start. Implementation-wise, it turns out the file is needed immediately when accessed, so downloading it on-demand is not an option due to the possibility of lengthy network delays or connectivity issues. Fortunately, it's okay to use an out-of-date version as long as there's a good chance the next access will use the latest one. So a reasonable approach seems to be to wait for the local file to be needed, use it immediately in its current state, then update it to the latest version by asynchronously downloading the network file and replacing the local copy in the background.

IfModifiedSinceWebClientDemo sample

That approach satisfies the requirements of the scenario, but it's a little wasteful because the local file is likely to be used much more frequently than the online version gets updated - which means most of the downloads will end up being the same version that's already cached locally! Fortunately, we can improve things easily: the HTTP specification defines the If-Modified-Since header for exactly this purpose! By including that header with the HTTP request, the server "knows" whether the local file is out of date. If so, it returns the data for the network file as usual - but if the network file has not changed more recently, the web server returns a 304 Not Modified result and no content. This "short circuiting" of the HTTP response eliminates the need to send redundant data and reduces the network traffic to a single, short HTTP request/response pair.

 

When implementing a solution like this with the .NET Framework, two approaches spring to mind. The first is to use the low-level HttpWebRequest/HttpWebResponse classes and manage the entire operation directly. HttpWebRequest has an IfModifiedSince property that can be used to set the relevant HTTP header (and format it correctly), so this approach is straightforward and quite flexibile. However, it also requires the caller to manage the transfer of bits from the source to the destination (including reading, writing, buffering, etc.), and that's not really code we want to write. The second approach is to use something like the higher-level WebClient class's DownloadFileAsync method to do the entire download and call us back when everything has been taken care of. That seems preferable, so lets go ahead and set the WebClient.IfModifiedSince property and... umm... wait... WebClient doesn't have an IfModifiedSince property! And not only doesn't the property exist, you're not allowed to set it manually via the Headers property: "In addition, some other headers are also restricted when using a WebClient object. These restricted headers include, but are not limited to the following: ... If-Modified-Since ...".

Darn, I really wanted to use WebClient and avoid having to encode the If-Modified-Since header myself. If only it were possible to tweak the way WebClient initializes its underlying HttpWebRequest, we'd be set... Hey, what about the WebClient.GetWebRequest method? Isn't this exactly what it's for? Yes, it is! :)

 

To make this all work, I created IfModifiedSinceWebClient which is a WebClient subclass that adds an IfModifiedSince property and overrides GetWebRequest to set that DateTime value onto the underlying HttpWebRequest. Unfortunately, there are two issues: the destination file gets deleted before the download starts (so it ends up being 0 bytes when HTTP 304 is returned) and HTTP 304 is defined as a failure code, so WebClient thinks a successful (NOOP) download has failed. To address both issues and offer a seamless experience, IfModifiedSinceWebClient exposes a custom UpdateFileIfNewer method that's asynchronous (i.e., "fire and forget") and simple. Just pass it the path to a local file to create/update and a URI for the remote file. (You can optionally pass a "completed" method to be called with the result of the asynchronous update.) UpdateFileIfNewer sets the If-Modified-Since header and initiates a call to DownloadFileAsync, providing a temporary file path. If the remote file is not newer than the local one, no transfer occurs and the UpToDate result is passed to the completion method. If the remote file is newer (or the server doesn't support If-Modified-Since), the local file will be replaced with the just-downloaded copy and the Updated result will be returned. And if something goes wrong, the local file is left alone and the Error result is used.

 

Here's what a typical call looks like:

private void MyMethod()
{
    // ...

    var localFile = "LocalFile.txt";
    var uri = new Uri("http://example.com/NetworkFile.txt");
    IfModifiedSinceWebClient.UpdateFileIfNewer(localFile, uri, UpdateFileIfNewerCompleted);
    // Note: Download occurs asynchronously

    // ...
}

private void UpdateFileIfNewerCompleted(IfModifiedSinceWebClient.UpdateFileIfNewerResult result)
{
    switch (result)
    {
        case IfModifiedSinceWebClient.UpdateFileIfNewerResult.Updated:
            // ...
            break;
        case IfModifiedSinceWebClient.UpdateFileIfNewerResult.UpToDate:
            // ...
            break;
        case IfModifiedSinceWebClient.UpdateFileIfNewerResult.Error:
            // ...
            break;
    }
}

 

[Click here to download the source code for IfModifiedSinceWebClient and the sample application shown at the start of the post.]

(Don't forget to update the sample's localhost test URI to a valid URI for your environment.)

 

IfModifiedSinceWebClient is a simple subclass that adds If-Modified-Since functionality to the .NET Framework's WebClient. But that's only half the battle - the UpdateFileIfNewer method makes the "asynchronously update a file if necessary" scenario work by building on that with a temporary file, error detection, and result codes. The result is a seamless, unobtrusive way for an application to keep itself up to date with dynamically changing online content without incurring unnecessary network overhead. Although there are other, more sophisticated solutions to this problem, it's hard to beat the simplicity and compactness of IfModifiedSinceWebClient!

Aside: For bonus points, IfModifiedSinceWebClient could also set the local file's "last modified" time to the Last-Modified value returned by the server. I haven't done so in the sample because it doesn't seem like the subtle time skew (between the client and server clocks) will matter in most cases. However, I reserve the right to change my mind if practical experience contradicts me. :)