Restrictive world of SqlClr computed columns and indexability

 

With Yukon entering the world of managed code, the world of indexes has just become a tad bit complicated but hey we are talking about smart/passionate Sql Server customers here – I am sure if we show them the right direction outlining the restrictions and the reasons why we have them, they will embrace it with loveJ.

 

Let us go over a simple tsql UDF and discuss indexing a little bit:

 

create function tsql_udf (@in int)

returns int

with schemabinding

as

begin

          return @in * @in

end

 

The following things can be done successfully on a computed column invoking the function tsql_udf:

 

  • It can be indexed

create table tsql_table

(

col int,

                   comp_col as dbo.tsql_udf(col)

          ) go

          -- create primary index index on col from tsql_table

          create clustered index pri_idx on tsql_table(col)

          go

          -- create secondary index index on comp_col from tsql_table

          create nonclustered index sec_idx on tsql_table(comp_col)

          go

 

  • It can be persisted

            create table tsql_persisted_table

          (

                   col int,

                    comp_col as dbo.tsql_udf(col) persisted

          )

          go

Property

Description

Notes

IsDeterministic

Function is deterministic or nondeterministic.

Local data access is allowed in deterministic functions. For example, functions that always return the same result any time they are called by using a specific set of input values and with the same state of the database would be labeled deterministic.

IsPrecise

Function is precise or imprecise.

Imprecise functions contain operations such as floating point operations.

IsSystemVerified

The precision and determinism properties of the function can be verified by SQL Server.

 

SystemDataAccess

Function accesses system data (system catalogs or virtual system tables) in the local instance of SQL Server.

 

UserDataAccess

Function accesses user data in the local instance of SQL Server.

Includes user-defined tables and temp tables, but not table-variables.

 

Whenever you create tsql user-defined function, Sql Server will do all the validation for you to see if the function is deterministic, precise, if it does data access or not. Computed columns that invoke a function can be persisted and/or indexed only if is deterministic, is system verifiable and if it does not do data access. Refer to the table given above to see what the function properties like deterministic and precise really mean.

 

SqlClr functions and indexability

 

As SqlClr functions are written in one of the .NET languages, there is no way the code can be verified for any of the required properties for indexability. So in Yukon, we introduced 4 new attributes namely IsDeterministic, IsPrecise, UserDataAccess and SystemDataAccess. Now the onus is on the user to set these attributes while writing his SqlClr function. We do try to avoid any harm by setting incorrect values for determinism that can be done by imposing restrictions.

 

Let us discuss indexability involving computed columns that invoke SqlClr functions in detail with an example. Let us consider the CLR equivalent of function tsql_udf that we saw above

 

using System;

public class testclass

{

    public static int clr_func(int i)

    {

          return i*i;

     }

}

 

Let us create a SqlClr function called clr_udf on method clr_func

 

So far so good. Everyone agrees that this function will exactly do the same thing as our tsql function tsql_udf  but unfortunately the computed column invoking the clr function the way it is written can neither be peristed nor indexed because by default every CLR function is non-deterministic, imprecise and cannot do data access. Even though we are fine with the “no data access” part of it, the determinism requirement is broken here.

 

Let us see what happens when we try to index the computed column that invoked the SqlClr function clr_udf

 

create table clr_indexed_table

(

     col int,

     comp_col as dbo.clr_udf(col)

)

go --  works fine

create clustered index pri_idx on clr_indexed_table(col)

go -- works fine

-- Executing the statement that creates the secondary index on comp_col throws an error

create nonclustered index sec_idx on clr_indexed_table(comp_col)

go -- throws an error

The error thrown is as follows:

“Column 'comp_col' in table 'clr_indexed_table' cannot be used in an index or statistics or as a partition key because it is non-deterministic.”

 

Trying to persist the computed column that invokes clr_udf will again fail saying that the function is not deterministic.

 

Let us try to modify our function to fix this problem by explicitly setting the attribute IsDeterministic and IsPrecise to true.

 

using System;

using Microsoft.SqlServer.Server;

public class testclass

{

     [SqlFunction(IsDeterministic=true, IsPrecise=true)]

     public static int clr_func(int i)

     {

          return i*i;

     }

}

 

Let us see what we can do after setting the IsDeterministic and IsPrecise attributes to true. Now the following can be successfully done on the computed column invoking clr_udf:

  • It can be persisted and once persisted, it can be indexed

create table clr_persisted_table

(

col int,

                   comp_col as dbo.clr_udf(col) persisted

          ) go

          -- create clustered and nonclustered indexes on col and comp_col from clr_table

          create clustered index pri_idx on clr_persisted_table(col)

          go

          create nonclustered index sec_idx on clr_persisted_table(comp_col)

          go

 

Let us see if we can create an index on our computed column without persisting the value in the table (remember – this was possible on our tsql counterpart)

 

create table clr_table

(

col int,

                   comp_col as dbo.clr_udf(col)

          ) go

          -- create clustered index on col from clr_table

          create clustered index pri_idx on clr_table(col)

          go

          -- create nonclustered index on comp_col from clr_table

          create nonclustered index sec_idx on clr_table(comp_col)

          go

 

Executing the statement that tries to create the secondary index throws an error saying

“Cannot create index or statistics 'sec_idx' on table 'clr_table' because SQL Server cannot verify that key column 'comp_col' is precise and deterministic. Consider removing column from index or statistics key, marking computed column persisted, or using non-CLR-derived column in key.”

 

This is our first step towards the complex world of indexability. From the above error message, we can infer that Sql Server does not completely trust the user when he sets the custom attributes IsDeterministic and IsPrecise because it tells you that it cannot verify if the clr function is deterministic.

 

Why don’t we trust the user? What can really go wrong here??

 

Assume if the user incorrectly sets the custom attribute IsDeterminsitic to true for a non-deterministic function and assume he is able to create an index on a computed column that invoked this function without persisting it. In this case, as the computed column is not persisted on the disc, it will be evaluated on the fly by executing the function every single time. This might lead to index corruption as the function might return different values for the same input as the function is non-deterministic.

 

So for the sake of user’s safety, Sql Server in this release REQUIRES the user to persist the computed columns (unlike the tsql case) to actually index the computed columns.

 

Let us talk about Data Access and computed columns now. There are two choices here. You can either be honest to the system when you do data access in your clr function and set your UserDataAccess and SystemDataAccess attributes to DataAccess.Read and SystemDataAccess.Read respectively. If you do this, you will be not be able to persist your computed column that invokes this function and that means no indexes on your computed columns. But if you try to trick us by lying to us, we will still get youJ. You can set your data access attributes to DataAccess.None and SystemDataAccess.None and then successfully persist the computed column invoking the function and even create your secondary index on the computed column after persisting it. But during run time, when you insert into the table, you will get an error complaining about your data access.

 

In the next blog, we will talk about SqlClr objects and indexed views which can get quite challenging with all the restrictions that we impose and how the user can get in to trouble if a few things are tweaked against these restrictions. After all, we love customers and we always want the best for themJ.

                       

 Sashi Parthasarathy

Sql Server