Unintended Consequences of Scalar-Valued User Defined Functions

Unintended Consequences of Scalar-Valued User Defined Functions

Rate This
  • Comments 6

Author:            Chuck Heinzelman

Reviewers:     Kevin Cox, Dan Jones, Lara Rubbelke

 

During a customer engagement, we noticed a query that was taking a long time to execute.  We saw that the query had a scalar-valued User Defined Function (UDF) encapsulating functionality in the SELECT clause.  To understand why this could be a problem, you need to understand how SQL Server deals with functions in the SELECT clause.  Let’s take a quick dive into how this works in SQL Server.

Assume that I have a table with the following structure:

CREATE TABLE dbo.Numbers

(

   Number integer NOT NULL PRIMARY KEY CLUSTERED

);

Now assume that this table has 100,000 rows in it – numbered 1 to 100,000.  As an aside, I like to have a table like this with about 1,000,000 rows in it for occasions where I need to duplicate data.  You never know when you will need a table of numbers!

If I execute the following SQL Statement, what will I get?

SELECT Number, GetDate() AS CurDate

FROM dbo.Numbers

WHERE Number <= 1000;

If you answered 1000 rows with the numbers 1 - 1000 and the exact same value for the CurDate column in every row, you would be correct.  This is because SQL Server only executes the GetDate() function once for the entire statement.

Now assume that I create the following function:

CREATE FUNCTION dbo.GetDateFunction()

RETURNS DateTime

AS

BEGIN

   RETURN GetDate();

END

If I execute the following SQL Statement, what will I get?

SELECT Number, dbo.GetDateFunction() AS CurDate

FROM dbo.Numbers

WHERE Number <= 1000;

If you said that my results would be the same as in the last example, you would be incorrect.  Since we encapsulated the GetDate() function in our own user-defined function, SQL Server had to execute it for every row returned, not just once for the entire query.  This knowledge is fundamental to understanding the performance of the queries that you write as well as ensuring that you get the results that you expect.

Unfortunately, our problem was not this simple.  We were dealing with encryption and decryption functions.  Take the following query as an example:

SELECT RowID, DecryptByKey(EncryptedColumn)
FROM dbo.TableWithManyRows;

In this case, even though the DecryptByKey function is a built-in function, it will be called once per row (each individual value needs to be decrypted).  Given that, what would be the harm of having this function embedded in a user defined function like this:

CREATE FUNCTION dbo.DecryptByKeyFunction(@Data varchar(255))

RETURNS varchar(255)

AS

BEGIN

   RETURN DecryptByKey(@Data);

END

And using it like this:

SELECT RowID, dbo.DecryptByKeyFunction(EncryptedColumn)
FROM dbo.TableWithManyRows;

Implementing it this way would allow you to easily swap out the encryption functionality for your entire system by changing a single function rather than changing every query that uses this logic.

The problem is that while the inline DecryptByKey function is called for each row in the query, the key is only opened once for the entire statement.  If you put the DecryptByKey function in your own user defined function, the key will need to be opened for each function call rather than once for the entire query.  In our scenario, this caused a huge performance issue.

 

Summary

Knowing the behavior of built-in and User Defined Functions is extremely important to understanding the performance and output of the queries that you write.  I am not trying to tell you to never use UDFs – just be sure to test your solutions thoroughly and understand all of the “unintended consequences” of encapsulating logic inside a UDF before rolling your solution out – even to your developers.

 

Leave a Comment
  • Please add 1 and 6 and type the answer here:
  • Post
  • Wouldn't the use of a table valued user defined function have much better performance, and retain the benefit of reusable code?

  • Considering GetDate function example, I read some time ago that it's a good idea to store the GetDate, when possible, in a variable and reuse the variable in the query. This makes sense especially when a UDF is used and the result is the same across the whole result dataset, as in the above Date example.

    Concerning the crypto example, wouldn't be ideal if a given cryptographic key could be applied to the whole dataset, even when the crypto functions are used inside of a UDF? I haven't worked with cryptographic functions so not sure if such functionality exists or is feasible from an architectural standpoint.

    There are situations in which the use of a UDF in inevitable, this coming with performance degradation. If I'm not mistaking in Oracle can be created an index on a deterministic UDF. I think it would be nice to have this kind of functionality also in SQL Server.

  • Ack, my comment got lost.  I think.

    It's not clear from this post whether there really is a lot of overhead with "openeing the key" for DecryptByKey.  What does that overhead involve?  It's not mentioned in the DecruptByKey documentation.

    Of course SQL internally has to call DecrptByKey or the UDF (or both) once per record.  Everything in SQL is done one record at a time under the hood (reading the value of a column from each record HAS to process one record at a time!).

    And it's also unfortunate that SQL can't make the inline function situation work as efficiently as the UDF situation.  

  • I think 'unintended consiquences' is a bit diplomatic. I've always been appauled by how naievely SQL treats UDFs, allowing the performance of a query to be wrecked just by encapsulating a fairly complex expression (but nevertheless one the optimizer can 'see though') into a UDF and hence forcing row-by-row execution.

    This just seems like poor implementation choices to me.

  • Hi Adrian.

    SQL Server 2005 has computed columns (persistent or nonpersistent):

    msdn.microsoft.com/.../ms191250(v=sql.90).aspx

    And, you may define indexes on computed columns:

    msdn.microsoft.com/.../ms189292(v=sql.90).aspx

  • What if you used a table valued function to store your crypto function.  Would this not eliminate both the problem of performance (since it would still be called once) and you could change the algorithm by changing one function?  I guess my question is, what are the performance implications of table valued UDFs?

Page 1 of 1 (6 items)