Scaling to Extremely Large Lists and Performant Access Methods
The whitepaper for on understanding the best ways to access and work with extremly large lists is now available. The paper titled "Working with Large Lists in Office SharePoint Server 2007" evaluates performance characteristics of large SharePoint lists under different loads and modes of operation. Although this whitepaper appears to be focused exclusively on SharePoint Server, you'll find when it comes to list scalability and programmatic access to lists there is little difference between WSS 3.0 and Office SharePoint Server 2007, especially since they use the same base lists.
The whitepaper walks through the test itself and how it was created, and provides the APIs that were used to test from browser access, SPList with For/Each, SPlist with SPQuery, SPlist with DataTable, SPListItems with DataTable, the Lists Web Service, Search and the PortalSiteMapProvider. In addition to showing great charts of comparison of these programmatic methods, a results section in the paper shares the analysis and results. The perf tester, Steve Peschka did a great job of pushing the product into the millions to show the real scale of lists and what methods work and what should be avoided. It's a great read.
The conclusion of the paper and it's findings is shared here:
There is documented guidance for Microsoft® Office SharePoint® Server 2007 regarding the maximum size of lists and list containers. For typical customer scenarios in which the standard Office SharePoint Server 2007 browser-based user interface is used, the recommendation is that a single list should not have more than 2,000 items per list container. A container in this case means the root of the list, as well as any folders in the list — a folder is a container because other list items are stored within it. A folder can contain items from the list as well as other folders, and each subfolder can contain more of each, and so on. For example, that means that you could have a list with 1,990 items in the root of the site, 10 folders that each contain 2,000 items, and so on. The maximum number of items supported in a list with recursive folders is 5 million items.
In Office SharePoint Server 2007, virtually all end-user data is stored in a list. A document library, for example, is just a specialized list. The same is true for calendars, contacts, and other interfaces; they are all just customized versions of the basic SharePoint list, also referred to as an SPList. The individual items in the list are referred to as list items generally, or an SPListItem in an SPListItemCollection in the Office SharePoint Server 2007 object model. The findings in this article are equally important across all of the ways in which you store and work with data in a Office SharePoint Server 2007 site.
There are some scenarios in which you want to take advantage of the features of Office SharePoint Server 2007, but need to exceed the limit of 2,000 items per container. If you write your own interface for managing and retrieving the data, it’s quite possible that you can go past this limit without an adverse impact on farm performance. You may be able to manage larger lists to some extent by using views within Office SharePoint Server 2007 that are filtered such that there are never more than 2,000 items returned. Filtered views provide better performance than just trying to view one large flat list, but are not as efficient as breaking down the list into different containers if you are using the predefined browser-based Office SharePoint Server 2007 interface.
If you develop your own interface, there are several different ways to retrieve list data, each with different performance characteristics. Some data access methods perform very well, but are only useful in a limited number of scenarios. Finally, there are also performance tradeoffs that need to be made with other data maintenance tasks in addition to data retrieval.
I recommend Developers especially read this, but IT Professionals and Information Architects can learn what methods of access provide the best performance.
I've updated my list of key performance links with this paper.
Joel Oleson