LinkedIn | FaceBook | Twitter
Most applications have four parts – input, computation, storage and output. I’ll write about all of these over time, but today I want to focus on how to choose the storage part of the equation. This won’t be a full tutorial, full of detail and all that, but I will put forth some “rules of thumb” that you can use as a starter. I’ll also try and include some good pointers so you can research more.
NOTE: Utility Computing, or “the cloud”, or platform/software/architecture as a service, is a young discipline, and most certainly will change over time. That’s its advantage – that it can change quickly to meet your needs. However, that means information (like this blog entry) can be out of date. Make sure you check the latest documentation for Azure before you make your final decision, especially if the date on the post is older than six months or so. I’ll try and come back to update them, but check them nonetheless. Always start your search on the official site: http://www.microsoft.com/windowsazure/
Let’s start out with your options. You have four types of storage you can use for your applications:
· SQL Azure databases
Here are some rules of thumb for when you use each – and again, these are only guidelines. I’ll point you to some documentation for more depth.
Blobs: Use these for binary data (in other words, not text), and think of them like files on your hard drive. There are two types – block and page.
Use block blobs for streaming, like when you want to start watching a movie before it even completes the download. You can store files up to 200GB at a pop. And they parallelize well.
Use page blobs when you need a LOT of storage – up to a terabyte – and pages are stored in 512KB, well, pages. You can access a “page” directly, with an address.
Tables: Massive text blocks accessed using Key/Value pairs. If you’re used to “NoSQL”, you have the idea. You get one index on that pair, so choose the sort or search wisely. Not relational, but large, and fast.
Queues: This storage is used to transfer messages between blocks of code. If you think of the stateless-programming web-world, you need a way to tell one application something that isn’t event-based. This is how you do that. Since it’s a queue, it helps you with something called “idempotency”, which means that a single message on the queue will get processed once, and only once.
SQL Azure Databases: If you need relational storage, want to leverage Transact-SQL code you already have, or need full ACID, this is for you. There are size restrictions here, but I’ll not detail them so this information lives a little longer. Check out http://microsoft.com/sqlazure for specifications, whitepapers, the lot.
OK – I’ll end with a chart. This has some more information that you might find useful in your decision process:
If you need:
Server Side Processing
Access from outside Azure
More info on Azure Storage:
Many thanks to my teammates, Stephanie Lemus and Rick Shahid for help with the information in this post. Thanks for the help!
Page blobs support random access. Their primary use case is providing the backing store for an Azure Drive which can be attached to an instance and used like an NTFS drive.
Hey Buck - nice post. I am curious about the Size Limit for an Azure Queue which is listed at 100 TB in the table above. I know the max message size for a single queue message is 8 KB  (plus any metadata), so does this imply (if I am doing my Kilo=>Mega=>Giga=>Tera math properly) 12,000,000,000+ messages can be queued up at one time in a single queue? I don't recall ever seeing any documented max size / limit before for queues, so this is pretty interesting, and I am just verifying.
(If correct, it is a pretty awesome limit: enqueuing at 500 messages per second would take several hundred years to fill up the queue - then several hundred more to drain it again...)
codingoutloud - you're correct, that's a huge limit!