Last week we talked about the collections API and how tomove things around on screen using collections. Today, I want to take a lookunder the hood to see what collections are how they work, and what the benefitsof collections are.

So why were collections created in the first place? Inessence collections are a storage and network optimization.


Go over to http://memo.hardrock.comone more time (yes, you’ve seen it, but bear with me). You will notice that theguys from Vertigo have upped the size of the collection: you can now view 600images in the collection of Rock memorabilia.

Pay close attention: the collection of all 600 images openspretty much immediately after the application is launched. Some images areblurry for a while depending on your network connection until more detail isloaded. It wouldn’t be possible to show content that quickly if a networkroundtrip was required to load each individual image. Even if each image wasonly a few hundred bytes in size, it would still take considerable amounts oftime to round trip and open each file (remember: modern browsers only allow upto 6 connections at the same time anyway – we would only be able to download 6images at a time, even if each image is loaded in 1/6th of a second,it would still take 100 seconds to load all of the images).

How does Seadragon/Deep Zoom solve this problem? We storeimages in texture atlases, in essence we cram image thumbnails into tiles; andin the same manner as single images, we store the texture atlases as a tiledimage pyramid.

The collection stores images up to level 8 (by default ifyou use deep zoom composer, this is customizable if you use the deep zoom toolsAPI), so images up to 256x256 are stored in a collections. For most largecollections of large images, the size overhead of storing this collection isnegligible.

Let’s take a look at how a sample collection may look:


For this set of 9 images, we are showing, say, level 8, 7,and 6. At the bottom of the pyramid, each image is stored in its own tile. Butif you look just one level up, at level 7 (a quarter of the size of level 8 bythe way), four images share a tile, and at level 6 all images share one tile.

It takes a collection of more than 65536 images to have morethan one tile at the lowest level of detail, where each image only takes up onepixel (thus making space for 256x256=65536 images).

Deep Zoom always loads the tip of the pyramid first, so formost collections <65536 images that means loading a pretty small tile to atleast show *something on screen*. The time it takes to blend in that top levelis usually enough time to load a couple more levels and started blending those.By loading just a handful of tiles in the hardrock demo, users can immediatelysee something on screen, and pretty quickly detail gets blended in as thenetwork downloads more detail in more tiles.

The observant reader will notice a limitation ofcollections: the assumption is that images displayed on screen are at leastsomewhat related. The collection optimization stops working if you makecollections that break that assumption. For example, if you have a collectionof several million images, and you decide to display 1000 of those images thatare all stored on separate tiles even at lower level of details, then deep zoomwill load a lot of tiles that share image data with images that aren’t evenshown on screen.

In our experiments, however, this hasn’t caused problems forcollections around 10000 items in size at all. It really needs to be a prettydegenerate case to cause noticeable “badness”.

Dynamic Collections

When we thought of the collection file format and thestorage thereof, we had another scenario in mind that isn’t properly exposed inSilverlight yet: we wanted to make it possible to quickly add and remove itemsto and from collections. Imagine you have a collection of search results; wewanted to make it possible to create or modify a collection of search resultson the fly without too much performance penalty.

And in fact, adding or removing collection items is O(1),i.e. it’s basically free algorithmically. Of course almost all of the time willbe spent in IO operations getting the tiles written to disk if you use dynamiccollections.

Collections in Deep Zoom meet the following goals:

(1)   Images are crammed into tiles in a “prettyoptimal” way (you can see some space between images in the example above, but thatusually compresses away pretty well)

(2)   Metadata to access image content in tiles isminimal (it’s the same metadata for each level in the collection, and the onlyinfo theoretically required is the index of the item and the aspect ratio ofthe image)

(3)   Minimal space waste: At each level of detail, atmost one tile is less than completely full.

When you look at the requirements above, the strange MortonLayout we use (as documented on MSDN) becomes clear: by using a fractal layout,we can ensure that each thumbnail for an individual image is relativelyspeaking, at the same offset from the top left corner as in any other level.Because of that, we only need to store the metadata for the location of athumbnail once, not again and again for each level. And since we use the MortonLayout, we only need to store a small number, or ordinal. In fact, ifcollections weren’t dynamic, we could use just the position in the list as thatordinal.

So when you need to add an item to a collection, you add itto the end using the Morton number to figure out the image thumbnail’slocation.

If you look at the above image of a collection:


Then you add a few images, it will look like this:


You can easily see how this scales to infinity whilemaintaining all the properties we need from dynamic collections.