I recently blogged about canonical functions in Entity Framework and SQL Server’s LEN() function. The purpose of those two posts was to lay out the foundation for this one, and to allow it to be brief and coherent. Please read those two posts before proceeding.
If you’ve been using the Edm.Length() canonical function from the Beta 2 and Beta 3 releases, you may have noticed that it’s behavior (and the T-SQL generated for it), when used against the SqlClient provider, have changed.
In Beta 2 SqlClient used to generate the following T-SQL for Edm.Length(x):
LEN(x + '.') – LEN('.')
The purpose of generating a non-trivial expression was to compensate for LEN()’s behavior of ignoring trailing spaces by appending a non-blank character. Unfortunately, that approach had one big deficiency – it was returning a wrong result for string values at the 8KB boundary.
In Beta 3 we investigated the adoption of a different compensating formula, based on the DATALENGTH() function, but the formula was too complex, and it was returning wrong results over Unicode strings with surrogate characters. Thus we concluded that the provider should not try to alter its server’s behavior, and we mapped Edm.Length(x) to a plain LEN(x). You should be aware of that behavior, and design workarounds that suite your specific scenarios.
Going forward, we’ve requested a new string length function from the T-SQL team – one that accounts for all characters. When, whether, and in what form it will be implemented is still unknown, but we are convinced that’s the right approach to provide a canonical string length behavior in both Entity Framework and SQL Server.