Windows Azure provides two main things - a place to run your code and a place to store your data. When you are reading this keep in mind that Windows Azure is built for writing and hosting highly scalable and available applications - it is not about just moving your existing application into the cloud to let someone else deal with maintenance. This kind of application requires a different architecture and different building blocks.

Instances

Instances - this is a virtual machine where your role runs. You can create multiple instances of each role and spin up new instances in a very short space of time. An Instance can be configured to run a Web role, a Worker role or both a Web and Worker role.

Roles

Your code runs in a role; this is basically where your application will execute. In these roles, you have access to pretty much everything in the .Net framework - the only caveats that your code will run in Partial Trust - so no COM Interop or pInvoking in these applications, thank you very much. There are also a few other permission you can't have like accessing machine specific resources like the event log, direct access to the file system - but then accessing these on a highly scalable cloud base app doesn't really make too much sense. For a detailed list of what you can and can't do, please see "Windows Azure SDK Trust Policy Reference" in the SDK documentation - sorry couldn't find a like to it on MSDN.

Currently there are 2 types of role:

Web Roles - Analogous to an .Net Web Project; you can run ASP.Net WebForms, MVC, ASP.Net Web Services and WCF Services in these roles. These must be on ports 80 (HTTP) or 443 (HTTPS). Any sites or services you expose are going to have to be in a Web Role.

Worker Role - This is similar to a Windows Service. It has no user interaction - in fact you can't actually expose any functionality from a Worker Role. You can access any of your Storage from a here, but nobody is going to make direct calls into your code. You get a "Main" function, but if you are going to be monitoring storage objects (think queues) or executing tasks on time base conditions, you are going to have to deal with the scheduling yourself. Expect your "Main" function to contain a loop, or more likely be spinning off some threads to do the work.

Storage

From the storage side of things, you have 3 options:

Tables - First of all, stop thinking about Database tables. No really, stop thinking about database tables. If you are a SQL guy and a committed RDMS fanatic, skip this bit and jump straight to using SQL Data Services. Still with me? Ok, you best sit down as this is where the world changes because to make truly scalable applications we have to throw out all the things that cause scalability problems. That means no transactions, very, very limited support for indexing, no transactions, effectively no query operations, no transactions, oh and no transactions. Ok, you get the point; the behaviours that you rely on in the transactional/relational world don't exist in Azure. Tables are very simples affairs that have columns and rows. Each column has a RowKey, which is index and is stored in sorted order (think SQL clustered index). None of the other columns are indexed, so if you tried to query against them, you end up doing a scan of all the rows.

The power of table storage is its support for partitioning. Think of partitions as the identifier of the node that the table is stored. Spreading your data across multiple partitions means that you can scale out very, very well. Storage is also very cheap (well I'm expecting it will be when the prices are announced) so don't be afraid to store the same data in more than one place. In transactional applications, you typically have quite a simple insert where the data is split across a normalised schema and have complex queries where the data is joined back up again. In highly scalable applications, you have complex inserts where the same data is stored in several places but arranged in different ways and simple queries - effectively select * from partition- well maybe with a where rowkey ==.

Blobs - For storing large amounts of data - the current limit for a blog is 50GB which should do for most applications! Because uploading 50GB in a single HTTP request isn't going to be a fun, fast or fulfilling way to write your applications, the API allows you to upload your blobs as a set of blocks (up to 4MB in each). Once all the blocks are on the server they can be combined into a single blob.

Queues - A first in, first out durable data store that can be accessed from any of your roles and indeed from outside your Azure deployment - you could access this from a client based app (WinForms, WPF), but you would need to deploy your super secret storage key to the client. Typically you will be using queues to communicate between different roles or to balance across different instances and to introduce reliability into your applications. The killer feature of the queues is its ability to bring a message back to life after a specified amount of time. The programming model for a queue is

  1. Pop message from queue
  2. Process message
  3. Delete message from queue

If you don't delete the message within a certain amount of time, the message will become visible again and will be returned the next time a pop request is issued. This means two things:

  1. If you application falls over during the Process message stage, all is not lost, you aren't going to lose a message - this is how we introduce reliably into our apps
  2. You need to make your code re-entrant - i.e. you have to be able to process the same message more than once and not fall over if the message has already been half processed.

 

Ok, that was a quick tour of the building blocks. In future posts I'll talk about how we can use the behaviour of these things to build powerful and scalable applications.

Neil.