A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages). The default collation for character data in SQL Azure databases is SQL_Latin1_General_CP1_CI_AS. This collation is also used across the SQL Azure infrastructure to sort and compare metadata that defines database objects. The server and database level collations are not configurable in SQL Azure. However, you can use a collation of your choice at the column and expression level. This article will show you how.
Although the server and database collations cannot be configured in SQL Azure, you still can query both of these properties, for example:
SELECT DATABASEPROPERTYEX('TestDB', 'Collation')
Currently, both of queries will return the default collation: SQL_Latin1_General_CP1_CI_AS.
If the solution you are building on SQL Azure requires a different collation for character data you will need to set the collation at the column level or use the expression level collation to explicitly cast to a specific collation. Keep reading to learn how.
When using SQL Server Management Studio’s Generate Script Wizard (more about using the Generate Script Wizard in this blog post), column collations are included by default. You can verify this by checking that the “Include collation” option is set to “True” (the default value).
This sample shows how to create columns with a specific collation:
CREATE TABLE t
id int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
c1 nvarchar(20) COLLATE SQL_Latin1_General_CP1_CI_AS,
c2 nvarchar(20) COLLATE Japanese_CI_AS
To retrieve column collation property for the example above:
SELECT name, collation_name FROM sys.columns
WHERE object_id = OBJECT_ID('t', 'U')
AND name <> 'id'
For more information on how to use column or expression level collations see COLLATE and Setting and Changing Collations in SQL Server Books Online.
Along with metadata objects (such as logins, databases, tables, views, and column names, etc.), the default collation also applies to identifiers for variables, GOTO labels, and temporary tables. Temporary tables and table-valued parameters (and associated user-defined table types) represent an interesting case to be aware of. While the default collation of databases in SQL Azure is the same as collation used for character data columns in temporary tables or user-defined table types, you can get a mismatch in collations and collation conflicts when you use column-level collations in your database. Also, the database_default option for setting a column-level collation will always map to SQL_Latin1_General_CP1_CI_AS.
If you need to create temporary tables (or table types) that will be used in conjunction with tables that have columns that use non-default collation you should create columns in temporary tables with matching collation. For example:
CREATE TABLE #TestTempTable
id int IDENTITY(1,1) PRIMARY KEY,
c2 nvarchar(20) COLLATE database_default,
c3 nvarchar(20) COLLATE Japanese_CI_AS,
Note that columns c1 and c2 will have the same collation in the example above.
The following queries will return a collation conflict error:
SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c1
SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c2
While these two will work:
SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c3
SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c2 COLLATE Japanese_CI_AS
The above correction in the Transact-SQL is an example of a declarative collation in the expression, that is executed when the statement is executed and doesn’t affect the data stored in the table.
It is also a good practice to ensure that Unicode strings are preceded with a capital letter N that stands for National Language in the SQL-92 standard, and must be uppercase. If the Unicode string is not preceded by the N-prefix the code page that corresponds to the current database collation will be used to convert characters if N-prefix which in some cases could cause data corruption.
In the example below first entry of Hiragana Letter E will be converted correctly, whereas the second entry will cause data corruption. This example also illustrates that column collation is not used for conversion.
INSERT INTO t
VALUES(N'え', N'え'), ('え', 'え')
SELECT * FROM t
Do you have questions, concerns, comments? Post them below and we will try to address them.
Ouch. This calls for heavy code rewriting ...
Thanks for the great article!
I use Arabic collation on SQL Server to retrieve arabic data correctly. Now that I've uploaded my app on the cloud and I'm using SQL Azure, what shall I do to retrieve the arabic data correctly?!
In the real world we have thousands of places in a given application where you can make a search like
SELECT * FROM T WHERE T.Name like '%João%'
We would have to change everything to something like
SELECT * FROM T WHERE T.Name like '%João%' COLLATE SQL_Latin1_General_CP1_CI_AI
It is an impossible task and preventing us from adopting SQL Azure.
Are there any plans to give us the power to change database collations in the near future?
Like the previous post stated, we really need at least CI_AI or no deal.
True. It is VERY restrictive to be unable to set the collation. I also need CI_AI. I want José and Jose to be equivalent, and _AS is just killing it.
I just don't see how forcing collation to expressions and columns is a good way of luring customers into Azure. "Sure, take your current database and .... re-create it! because it won't work as-is".