In this article, we will discuss a sample CLR function that illustrates accessing an external resource such as a file from inside a SQL Server stored procedure or function, . The sample highlights a number of different features and concepts, such as file-access, table-valued functions and impersonation.

Let us review first how to implement a table-valued function. If you have been playing with table-valued functions (TVFs) in SQL Server 2005 beta 2, you will notice that the contract is much simplified in the April CTP of SQL Server 2005. The basic requirements are as follows:

The TVF should be mapped to a static method in the CLR (let us call it the root method) that returns either an IEnumerator or an IEnumerable interface. The IEnumerable interface is supported mainly to support scenarios where you want to crack collections (such as arrays, lists) into tabular result sets. It is very straightforward to implement such TVFs because the managed classes that implement these collections already implement the IEnumerable interface. The IEnumerable interface has a single method called GetEnumerator that returns an IEnumerator interface. For cases where you’re not cracking such a collection, you can implement the IEnumerator interface itself (instead of the indirect route of implementing IEnumerable that returns an IEnumerator). The IEnumerator interface requires one property and two methods that need to be implemented. Following are the descriptions of these methods along with how SQL Server calls them.

  • MoveNext: This method moves the current position of the “result-set” to the next row. The initial position of the “result-set” is before the beginning, so the consumer (in this case SQL Server) will always call MoveNext first which positions it on the first row of the result set.
  • Current property: This property returns the current “row” of the result set as an Object as per the current row position
  • Reset: This method resets the current position to before the first row.

The root method should be annotated with a SqlFunctionAttribute that indicates what the fill-row method is (typically another static method in the same class as the root method). The fill-row method cracks the record represented by the Object returned by IEnumerator.Current into multiple column values using output parameters. Thus, the fill-row method has 1+n arguments where n is the number of columns returned by the TVF. The first argument is an input argument which is the Object that SQL Server got from calling IEnumerator.Current. The next n arguments are output arguments that the method should fill with the column values; the data types of these arguments should be compatible with the column data types as declared by the CREATE FUNCTION statement. 

Having reviewed the basic elements of building a TVF, let us look at our example scenario. The sample implements a table-valued function called GetFile, that given the path-name for a file, reads the contents of the file and returns it as a varbinary(max) value as a single row with a single column. You might wonder why this is not written as a scalar-valued function since after all it just returns a single scalar. The reason is performance: whenever dealing with large values, it is of course important to try and stream the values to SQL Server instead of buffering the entire contents along the way. This means using the SqlBytes data type as the return type of the function (well, actually the type of the single column returned by the function). SqlBytes allows streamed movement of binary data to SQL Server. In our case, we will construct a SqlBytes object using a managed FileStream object that represents the contents of the specified file. However we need a way to close this FileStream once SQL Server is done reading the contents of the file. Since SqlBytes does not implement the IDisposable interface, there is no way for SQL Server to close the file. So if we implemented this function as a scalar function, there will be no way to close the FileStream and we will have an open file handle even after the query that reads the file contents completes execution. On the other hand, if we model this function as a table-valued function, then the MoveNext method (which is part of the IEnumerator interface that needs to be implemented for the TVF) can close the FileStream once it reaches the end of the result-set (which in this case is just one row).

Having established the need for a table-valued function, let us review our implementation. The implementation consists of two classes: FileAccess which implements the root method (GetFile) and the fill-row method (called of course FillRow). The GetFile method returns a SingleFileLoader object that implements the IEnumerator interface.

The SingleFileLoader is constructed using the specified file name which it opens stashing away the FileStream in its private state. (It also impersonates the current caller before it opens the file; more on this later). MoveNext is coded such that it always returns only one row. When it reaches past this one row it closes the FileStream (the main reason why we chose the TVF approach). Current simply returns the FileStream that was created when the SingleFileLoader was constructed.

Here is the code (which will work only in the April CTP of SQL Server 2005 or later).


using System;
using System.Collections;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Data.SqlTypes;
using System.Security.Principal;
// This class implements a table-valued function that returns one row with one 
// column that retrieves the binary contents of a specified input file name.
// Returns NULL on most exception conditions such as inability to impersonate,
// file-doesn't exist etc.
public partial class SingleFileLoader : IEnumerator
{
    private FileStream fs;
    private bool IsBegin = true;
    private String fn;
    public SingleFileLoader(String FileName)
    {
        fn = FileName;
        SingleFileLoaderHelper();
    }
    private void SingleFileLoaderHelper()
    {
        WindowsImpersonationContext OriginalContext = null;
        try
        {
            //Impersonate the current SQL Security context
            WindowsIdentity CallerIdentity = SqlContext.WindowsIdentity;
            //WindowsIdentity might be NULL if calling context is a SQL login
            if (CallerIdentity != null)
            {
                OriginalContext = CallerIdentity.Impersonate();
                fs = new FileStream(fn, FileMode.Open);
            }
            else fs = null;
        }
        catch
        {
            //If file does not exist or for any problems with opening the file,
            // set filestream to null
            fs = null;
        }
        finally
        {
            //Revert the impersonation context; note that impersonation is needed only
            //when opening the file.
            //SQL Server will raise an exception if the impersonation is not undone
            // before returning from the function.
            if (OriginalContext != null)
                OriginalContext.Undo();
        }
    }
    public Object Current
    {
        get
        {
            return fs;
        }
    }
    public bool MoveNext()
    {
        //Ensure returns only one row
        if (IsBegin == true)
        {
            IsBegin = false;
            return true;
        }
        else
        {
            //Close the file after SQL Server is done with it
            if (fs != null) fs.Close();
            return false;
        }
    }
    public void Reset()
    {
IsBegin = true;
SingleFileLoaderHelper();
    }
}

public partial class FileAccess
{
    [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "FileContents varbinary(max)")]
    public static IEnumerator GetFile(String FileName)
    {
        return new SingleFileLoader(FileName);
    }
    public static void FillRow(Object obj, out SqlBytes sc)
    {
        //If non-existent file, return SQL NULL
        if (obj != null) sc = new SqlBytes((Stream)obj);
        else sc = SqlBytes.Null;
    }
}


Assuming that this code is compiled into an assembly called TVF.dll, the assembly and the table-valued function can be registered in SQL Server by executing the following DDL:


use MyDB
go
if exists (select * from sys.objects where name = 'GetFile')
      drop function GetFile
if exists (select * from sys.assemblies where name = 'TVF')
      drop assembly TVF
go
create assembly TVF
from 'c:\Projects\TVF\TVF\TVF.dll'
with permission_set = external_access
go
create function GetFile(@fn nvarchar(300))
returns table(filecontents varbinary(max))
as external name TVF.FileAccess.GetFile
go

Following are some simple test queries to test the TVF:

select *
from GetFile('C:\MyDir\function1.cs')
go
select *
from GetFile('NonExistentFile.cs') – should return NULL
go


Impersonation
As you might have noticed, there is code to impersonate the current security context in the constructor for SingleFileLoader - actually the bulk of the work is done by SingleFileLoaderHelper. SQL Server does not automatically impersonate the current caller when executing code in an EXTERNAL_ACCESS or UNSAFE assembly (it is not possible to access an external resource in the SAFE assembly, so impersonation is moot for that case). Hence by default, code will run and access resources under the SQL Server service account unless there is explicit code to impersonate. In this example, it is sufficient to impersonate only when the file is opened. Make sure to Undo the impersonation (done in the finally clause) since SQL Server will raise an error if the impersonation has not been reverted before execution returns to SQL Server.

The way you retrieve the current calling context is by calling SqlContext.WindowsIdentity. This might return NULL if the calling context is not an NT Authenticated login; in this example, we simply return a NULL value for the function when impersonation fails in this manner.

So what about OpenRowset(BULK)?
You might wonder why should we write this in the CLR, while SQL Server 2005 natively supports the OpenRowset(BULK ‘filename’, SINGLE_BLOB) syntax that does exactly the same thing. While this specific scenario is available natively, there are other such things that you can write in CLR with more flexibility that you cannot do with OpenRowset(BULK); for e.g. retrieve all files in a directory that have a specific extension  (I might post this sample in a future entry). You could also choose not to impersonate or to impersonate a fixed security context in the CLR based implementation while with OpenRowset(BULK) you always get impersonation of the caller.

The next question I can imagine someone asking is how this performs compared with the native OpenRowset(BULK). My unscientific measurements on the April CTP (IDW 14) reveal that the CLR TVF is slower than OpenRowset(BULK). However, there has been some recent performance work on improving the transition cost coming back from managed code to SQL Server which should bring the performance of the CLR TVF to be close to that of OpenRowset. This performance improvement will be available in the next CTP of SQL Server 2005.

- Balaji Rathakrishnan
Microsoft SQL Server

This posting is provided "AS IS" with no warranties, and confers no rights.