“Why?” my five year old asks. “Because I said” is my usual response. In the blog comments, we have noticed an abundance of “Why do we have to have a clustered index on our tables?” In this case “Because we said” is not going to cut it and really does not work with my daughter either.
SQL Azure requires clustered indexes for our replication technology. At any one time, we are keeping three replicas of data running – one primary replica and two secondary replicas. We use a quorum based commit scheme where data is written to the primary and one secondary replica before we consider the transaction committed. Each write is preformed to exactly the same leaf node in same spot of the data row of all three replicas. In other words, the data pages are exactly the same in all three replicas.
In SQL Server and SQL Azure, clustered indexes are organized as B-trees. Each page in an index B-tree is an index node. The top node of the B-tree is the root node. The bottom level of nodes in the index is the leaf nodes. Any index levels between the root and the leaf nodes are collectively the intermediate levels. Having B-trees maintained on each replica independently enables us to do certain performance optimizations, local maintenance like defrag and reducing cross machine traffic since we do not ship data that as needed to run physical recovery across machines.
In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.
Nonclustered indexes have the same B-tree structure as clustered indexes, however in SQL Server If the table is a heap, which means it does not have a clustered index, the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The whole pointer is a Row ID (RID). The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list. For more information, see Heap Structures. The reason we don’t support heap tables is that data pages are not ordered – a requirement for our replication.
In SQL Azure the ordering of the data pages are the key to our data replication, and that is why you need to have a clustered index on your tables. We believe that you are the best person to pick the clustered index and will pick that best clustered index for your database design to achieve maximum performance.
Do you have questions, concerns, comments? Post them below and we will try to address them.
"In SQL Azure the ordering of the data pages are the key to our data replication, and that is why you need to have a clustered index on your tables"
Is pretty close to because I said so...
You have pages in a heap, you have pages in a b-tree. A page changes, it needs to be replicated. What's the difference?
Index maintenance slows the loading of rows. In high volume loads of data, quite often, indexes are dropped or disabled to allow the rows to be loaded without indexing and then rebuild the indexes following.
Since a clustered index *is* the table, it can't be dropped to do such a load. So each row will have to find the correct page to fit into, if there's room then great, if there isn't... whoa, slow down for a page split.
While this may be good for the replication, it seems bad for loading performance.
Thank you for the explanation. It was helpful.
I just checked and with build 11.0.2065 I can now create heaps.
Was the clustered key limitation removed recently ?
Hello there, I am so excited I found your site, I really found you by error, while I was researching on google something else , Anyhow I am here now and would just like to say kudos for a fantastic post and a all round thrilling blog (I also love the theme/design), I don’t have time to go through it all at the moment but I have saved it and also included your RSS feeds. <a href = "www.keratintreatmentsalon.com/Brazilian_Blowout_MZBX.html" >Brazilian Blowout</a> and <a href = "www.keratintreatmentsalon.com/" >Keratin Hair Treatment</a>