In my last post, I said we would look at how we could return a table that contains a directory listing of all the .xml files on my machine.  Using the SQL/CLR LoadFile method we created in the last post, our goal is to come up with a single T/SQL statement that can dump all the .xml documents sitting as files in the file-system into our new SQL Server 2005 database.

 

Of course, to get a directory listing, I could use good old, rather unsafe xp_cmdshell:

 

xp_cmdshell 'dir c:\*.xml /s /b'

 

(NOTE: In SQL Server 2005 we have turned xp_cmdshell off by default, to among other things, encourage people not to use it!)

 

The xp_cmdshell route works, sort of.  But what we also need is a ‘Last Write Time’ so we can do subsequent incremental loads.  This makes the xp_cmdshell route trickier; I have to come up with some ‘dir command’ and then start parsing the xp_cmdshell output.  Yuck!

 

Another xp_cmdshell problem   The account that is used to get the directory listing is the service account of the sqlservr.exe process.  What I want is the credentials of the user who is asking for the directory listing to be used to access the file system.

 

We are going to look at a new feature in SQL Server 2005 which will give us a safe, secure, fast and rich directory file listing, these are SQL CLR Table Valued Functions (also known as TVF) and a glimpse of the new security EXECUTE AS framework that will ensure the right, correctly privileged, security credentials are used to access the filesystem. [UPDATE 7/21: I am wrong here.  EXECUTE AS does NOT affect the security credentials we use when leaving the sqlserver.exe process.  We always use the service account credentials to access resources external to the process.]

 

Show me the TVF magic   TVFs are powerful, but there is a little bit of ‘magic source’ you just have to know, and that is the magic behind the ‘FillRow’ method.  The ‘FillRow’ method is specified by adding the SqlFunction ‘function attribute’ to the header of your C# function, like this:

 

[SqlFunction(FillRowMethodName = "FillRow")]

public static IEnumerable DirectoryList( ...

 

The method name specified in the FillRowMethodName is implicitly called by the SQL/CLR framework each time the MoveNext() method on the returned  IEnumerable object (or type that implements the IEnumerable interface) is called.  The FillRow method must have a signature that looks like this:

 

private static void FillRow(Object obj, out <col1_type> <col1>, out <col2_type> <col2>, ... )

 

Where the 1st parameter (Object obj), once cracked as an object array, contains the values of one output row.  The subsequent function parameters (out <col1_type> <col1> etc.)are ‘out’ parameters that contain the values that will appear in the columns for the row that is being constructed.  If you don’t get this now, don’t worry, the code is usually just a simple pivot of a list of array elements from the cracked Object to the out parameter variables.

 

Stream me a table of files   This how it is done:

 

using System;

using System.IO;

using System.Collections;

using Microsoft.SqlServer.Server;

 

public partial class UserDefinedFunctions

{

      [SqlFunction(FillRowMethodName = "FillRow")]

 

            public static IEnumerable DirectoryList(string rootDir, string wildCard, bool subDirectories)

            {

                  ArrayList rowsArray = new ArrayList(); // Already implements IEnumerable, so we don't have to

 

                  DirectorySearch(rootDir, wildCard, subDirectories, rowsArray);

 

                  return rowsArray;

            }

 

 

The entry function must be static and return a class that implements the IEnumerable interface.  Notice the ‘magic source’, the SqlFunction attribute that specifies the FillRow method name, this FillRow method get called implicitly each time the SQL/CLR framework implicit calls the MoveNext() method on the rowsArray collection, and it is required for SQL/CLR TVFs.

 

Next, we go down to the filesystem and search the Directories, and then Files for the files that match the wildCard, adding each matching file as an element to the rowsArray collection.

      private static void DirectorySearch(string directory, string wildCard, bool subDirectories, ArrayList rowsArray)

      {

            GetFiles(directory, wildCard, rowsArray);

 

            if (subDirectories)

            {

                  foreach (string d in Directory.GetDirectories(directory))

                  {

                        DirectorySearch(d, wildCard, subDirectories, rowsArray);

                  }

            }

      }

 

      private static void GetFiles(string d, string wildCard, ArrayList rowsArray)

      {

            foreach (string f in Directory.GetFiles(d, wildCard))

            {

                  FileInfo fi = new FileInfo(f);

 

                  object[] column = new object[2];

                  column[0] = fi.FullName;

                  column[1] = fi.LastWriteTime;

 

                  rowsArray.Add(column);

            }

      }

  

Here is the magic FillRow method, it gets called once each time the framework calls .MoveNext() on the underlying ArrayList method (which implements the inherited IEnumerable interface)

 

      private static void FillRow(Object obj, out string filename, out DateTime date)

      {

            object[] row = (object[])obj;

 

            filename = (string)row[0];

            date = (DateTime)row[1];

      }

};

 

 

The FillRow method passes in an object as the 1st parameter, this object is then cracked as an array whose elements need to be assigned to the out parameters which match the columns for the row being constructed.

 

Save the source above as a single file ‘directorylist.cs’, and build it as an assembly library:

 

csc /target:library /out:"c:\DirectoryList.dll" "directorylist.cs"

 

(TIP: There maybe several csc.exe’s hanging out on your machine.  You can find out which one you need by using the new sys.dm_clr_properties dynamic management view)

 

SELECT value FROM sys.dm_clr_properties WHERE name = 'directory'

 

value

-----------------------------------------------

C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\

 

To add the assembly and expose the C# function as a T/SQL function, run this:

 

          IF EXISTS (SELECT * FROM sys.objects WHERE name = 'DirectoryList')

                   DROP FUNCTION DirectoryList;

 

          IF EXISTS (SELECT * FROM sys.assemblies WHERE name = 'SqlClrAssembly')

                   DROP ASSEMBLY SqlClrAssembly;

 

          CREATE ASSEMBLY SqlClrAssembly

                   FROM 'C:\DOCUME~1\stuartpa\LOCALS~1\Temp\SqlClrObjects.dll'

                   WITH PERMISSION_SET = EXTERNAL_ACCESS;

      GO

 

          CREATE FUNCTION DirectoryList (   @root_directory nvarchar(max),

@wildcard nvarchar(max),

@subdirectories bit )

                   RETURNS TABLE (filename nvarchar(max), last_write_time datetime)

                   AS EXTERNAL NAME SqlClrAssembly.UserDefinedFunctions.DirectoryList;

      GO

 

By default SQL/CLR functionailty is what we call ‘Off By Default’ (or 'OBD'), this means that while we can do the above CREATE ASSEMBLY and CREATE FUNCTION, we cannot actually invoke the function.  To turn the SQL/CLR functionality on, run the following command:

 

          sp_configure "clr enabled", 1;

      GO

 

          RECONFIGURE

      GO

 

Now we can test the function we have just created:

 

      SELECT filename FROM dbo.DirectoryList( 'c:\windows', '*.xml', 1 )

 

filename                                                 last_write_time

-------------------------------------------------------- -----------------------

c:\windows\$NtServicePackUninstall$\dataspec.xml         2002-11-21 13:53:13.813

c:\windows\$NtServicePackUninstall$\filelist.xml         2002-08-29 05:00:00.000

c:\windows\$NtServicePackUninstall$\lclmm.xml            2002-08-29 05:00:00.000

c:\windows\$NtServicePackUninstall$\xsl-mappings.xml     2002-08-29 05:00:00.000

         

 

Putting it all together   As I promised in my last post, we can now write a single T/SQL statement to insert the contents of all my (well-formed) .xml files as rows in the xml_documents table.  Using the LoadFile SQL/CLR function and the xml_documents table created in my last post, with the DirectoryList SQL/CLR TVF we created in this post, the single T/SQL statement below dumps the contents (using dbo.LoadFile) of all the .xml files on my filesystem (as found with DirectoryList) into the XML data type column in the xml_documents table.

 

      /* Insert the contents of all .xml files on c: into the xml_documents table */

      INSERT INTO xml_documents

            SELECT dbo.LoadFile( filename )

            FROM dbo.DirectoryList( 'c:\windows', '*.xml', 0 )

            WHERE last_write_time > 'the last time I ran it'