Greg Linwood, a fellow SQL Server MVP, has started a series of articles in which he attempts to prove...
Paul - I just wante to make sure I understand correctly your second bullet point on "what is a heap":
"This means that rows will be inserted into the heap anywhere where there is space (don't let this statement confuse you - for a table that has nothing but inserts, records will always be appended to the last allocated page)"
Microsoft kb article 297861 indicates:
"In SQL Server 7.0, and later, SQL Server generally optimizes inserts into a heap with the assumption that saving space is more important than performance. That is the tradeoff you choose to make when you decide to leave a table as a heap. Therefore, an insert into a heap often spends time searching for a location to insert a new row."
Are you saying that if I have a heap which has never had any deletes or updates, SQL Server 7.0 and later do not try to optimize for space usage - they simnply insert the new rows at the last allocated page? Does the same apply if I start from an empty heap and insert millions of rows with a single INSERT/SELECT FROM statement?
The KB article really only applies when there is free space in the pages that comprise the heap. For a heap that is append-only, with no deletes, the free space will always be in the last allocated page - so there's nothing to optimize - the only place to insert the record is in the last allocated page. The same applies to the insert/select example you give.
Paul - thank you for the reply.
If I create a variable length table then, depending on the data inserted on a page, it might happen that there is free space in pages other then the last allocated one. For example:
create table T1(col1 varchar(8000))
insert into T1 select replicate('a', 100)
insert into T1 select replicate('a', 8000)
The above two inserts allocate two pages with free space left on both of them.
I used DBCC PAGE to see where would data go if I ran a third insert (by the way, thank you for the great posting on DBCC PAGE):
insert into T1 select replicate('a', 10)
Although there should be enough free space on the second page, in all my tests SQL Server consistently inserted the data in the first one. I ran into the same behaviour if I changed the second insert (the one that allocates the second page) to insert only 7800 characters:
insert into T1 select replicate('a', 7800)
Again, the third insert (inserting a 10 character row) would consistently use the first page, although there should be more than 200 bytes free on the second one.
I also ran a loop as in the following and watched the FreeSpace Scan/sec counter being high for the duration of the loop (I was running these tests on my local machine with nothing else running on SQL Server):
declare @counter int
set @counter = 1
while (@counter < 10000)
insert into T1 select replicate('a', 1000)
set @counter = @counter + 1
I ran my tests both on SQL Server 2000 and SQL Server 2005 with similar results.
All the above seems to indicate that even in the case of 'insert only' heaps, SQL Server still tries to optimize for space and will utilize a page which might not necessarily be the last allocated one. What am I missing?
Thanks again for your time and thank you for the great information and knowledge you are sharing in your posts.
Yes, when the records are randomly sized then it will make use of any free space in earlier pages.
Greg Linwood, a fellow SQL Server MVP, has started a series of articles in which he attempts to prove
Very nice blog on those fragmentation series..but where is you next one on the clustered index and non-clustered index?
My another question,
if all the rows from a page are deleted, that page gets deallocated and reclaimed by the database and if all pages are empty due to range deleteion in an extent, the extent will get deallocated and reclaimed by the database. Is that correct statement? Is that the same for the heap? I dont think so as "empty" page(due to deletion) do not reclaimed and deallocated for the heap.
Can you explain in that area in respect to the when a range deletion occurs on the page, what happens to the page?
You can ping me at firstname.lastname@example.org
I haven't done them yet - I'll probably redo most of the series at the end of the summer.
When pages and extents are deallocated is complicated, and the subject of a whole blog post. I'll get to it...
MVP Hugo Kornelius once reported that he encountered a situation in which it was possible to perform
Adding some piece of Information that could be useful for someone else
Deleting Rows from a Heap
When rows are deleted from a heap the Database Engine may use row or page locking for the operation.
----> As a result, the pages made empty by the delete operation remain allocated to the heap.
----> When empty pages are not deallocated, the associated space cannot be reused by other objects in the database.
To delete rows in a heap and deallocate pages, use one of the following methods.
• Specify the TABLOCK hint in the DELETE statement. Using the TABLOCK hint causes the delete operation to take an exclusive lock on the table instead of a row or page lock. This allows the pages to be deallocated. For more information about the TABLOCK hint, see Table Hints (Transact-SQL).
• Use TRUNCATE TABLE if all rows are to be deleted from the table.
• Create a clustered index on the heap before deleting the rows. You can drop the clustered index after the rows are deleted. This method is more time consuming than the previous methods and uses more temporary resources.
For more information about locking, see Locking in the Database Engine.