I will start several articles describe my thought on UTF-16 support in SQL Server.  Before I start to discuss this topic, I use the content "Supplementary Characters, Surrogate Pairs and SQL"   to describe the difference between UTF-16 and UCS-2:

UTF-16: This encoding stores the basic Unicode characters using single 16 bit units and others characters using two 16 bit units. UTF-16 is the primary encoding mechanism used by Microsoft Windows 2000, Windows 2000 Server, Windows XP and Windows 2003 Server. On the other hand SQL server 2000 uses the UCS-2 encoding schema to store Unicode data.

Note: The UCS-2 encoding scheme is actually a subset of the UTF-16 scheme. Every UCS-2 encoded code point is identical to the encoding of the same code point in UTF-16. Also, most new implementations using the Unicode standard now employ UTF-16, UTF-8 or UTF-32 instead of UCS-2.

I recommend your guys read the above article, Wiki page UTF-16/USC-2 and also Supplementary Characters in SQL Server 2005. Also, this article explains why we call Windows support UTF-16 encoding even the core Unicode data type WCHAR is not surrogate aware.   

To answer the question of how SQL Server can support UTF-16 Encoding for nvarchar data type, we need to solve following two issues:

How to deal with good data? In here, the good data is the data conforming the UTF-16 encoding, i.e., no invalid characters in the data (in particular, no single surrogate code point). The answer is pretty simple: following the correct length semantic and don't make the data invalid. The nvarchar type internally use WCHAR* to store unicode value, while each WCHAR is a short type (2 bytes). It is necessary to distinct the number of characters with number of WCHARs. For any character based operation we provide to end user, the system should treat a surrogate pair as a single character. Examples are len function return number of characters, and for a surrogate pair, count as one characters. All string functions which take an integer index as input parameter, the index should mean the character index. However, internally, we might not necessary to use length semantic.

There are many cases that a good UTF-16 data might become invalid, so we need to avoid such case.  Using correct character semantic is one case we certainly need to do.  There are other cases that we might generate invalid data from good data. Examples are:  when sending/receiving data to/from the client, we might send/receive data in chunk, then we need to be careful of not break the surrogate data. Another case is for nvarchar(max) and ntext type which store data in multiple physical pages, then store a surrogate pair in different page might have potential issue.

Now, let us consider the "bad" data. According the UTF-16 standard, the bad data here is the data which has only one surrogate point (i.e., not a pair). The first question is that how we prevent bad data be persisted into database. There are many ways to insert data into a database, such as converting from binary to nvarchar type, bcp/bulk insert, and calling RPC etc. Note, C# ‘s String type and C++ WCHAR type has no UTF-16 validation checking, which means that SQL Server must rely on itself to do input validation. Actually, we already have certain UTF-16 validation inside SQL Server, i.e., Surrogate data, (either paired or single) are not supported for use in metadata, such as in names of database objects, SQL Server do have such check for disallowing Surrogate Data.

Another question I like to raise is that what if we failed to do UTF-16 data validation.  For example, suppose the data stored on the disk already invalid (this can happen if we up-grade from old version of SQL Server, and there are some invalid UTF-16 data stored.  Of course, we can do data validation during up-grade, but it will be a size of data operation and we usually want to avoid it), what is the result of calling a string function with such invalid data as input.  We can 1) throw an error and terminate the statement 2) treat a single surrogate point as other normal character 3) or change the invalid data to a valid data (such as a question mark ? which we did when converting a nvarchar data into varchar data, but the data is not in the code page associated with the varchar type).   Case 2 raise another interesting question which is that can we just treat a single surrogate point is still a valid character, and don't do any data validation? 

Let us end today's talk with an example of UTF-16 Support in SQL Server today.  In SQL Server 2005, we have XML data type support.  The XML data type is internally stored as UTF-16 encoding (we use varbinary(max) data type internally, but the content itself is UTF-16 encoded text).  In particular, single surrogate point is not allowed, and SQL Server will automatically detect such invalid data when doing XML parsing.  Some of the XQuery, such as substring function internally call SQL Server's substring function, which made these function surrogate unsafe.