This post is part of a series diving into the implementation of the @home With Windows Azure project, which formed the basis of a webcast series by Developer Evangelists Brian Hitney and Jim O’Neil. Be sure to read the introductory post for the context of this and subsequent articles in the series.
In my last post, I started diving into the WebRole code of the Azure@home project covering what the application is doing on startup, and I’d promised to dive deeper into the default.aspx and status.aspx implementations next. Well, I’m going to renege on that! As I was writing that next post, I felt myself in a chicken-and-egg situation, where I couldn’t really talk in depth about the implementation of those ASP.NET pages without talking about Azure storage first, so I’m inserting this blog post in the flow to introduce Azure storage. If you’ve already had some experience with Azure and have set up and accessed an Azure storage account, much of this article may be old hat for you, but I wanted to make sure everyone had a firm foundation before moving on.
Azure storage is one of the two main components of an Azure service account (the other being a hosted service, namely a collection of web and worker roles). Each Azure project can support up to five storage accounts, and each account can accommodate up to 100 terabytes of data. That data can be partitioned across three storage constructs:
Also available are Windows Azure drives, which provide mountable NTFS file volumes and are implemented on top of blob storage.
Azure@home uses only table storage; however, all storage shares some common attributes
To support the development and testing of applications with Windows Azure, you may be aware that the Windows Azure SDK includes a tool known as csrun, which provides a simulation of Azure compute and storage capabilities on your local machine – often referred to as the development fabric. csrun is a stand-alone application, but that’s transparent when you’re running your application locally, since Visual Studio communicates directly with the fabric to support common developer tasks like debugging. In terms of Azure storage, the development fabric simulates tables, blobs, and queues via a SQLExpress database (by default, but you can use the utility dsinit to point to a different instance of SQL Server on your local machine).
Development storage acts like a full-fledged storage account in an Azure data center and so requires the same type of authentication mechanisms – namely an account name and a key – but it’s a fixed and well-known value:
Account name: devstoreaccount1 Account key: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
That’s probably not a value you’re going to commit to memory though, so as we’ll see in a subsequent post, there’s a way to configure a connection string (just as you might a database connection string) to easily reference your development storage account.
Keep in mind though that development storage is not implemented in exactly the same way as a true Azure storage account (for one thing, few of us have 100TB of disk space locally!), but for the majority of your needs it should suffice as you’re developing and testing your application. Also note that you can run your application in the development fabric but access storage (blobs, queues, and tables, but not drives) in the cloud by simply specifying an account name and key for a bona fide Azure storage account. You might do that as a second level of testing before incurring the expense and time to deploy your web and worker roles to the cloud.
Sooner or later, you’re going to outgrow the local development storage and want to test your application in the actual cloud. The first step toward doing so is creating a storage account within your Azure project. You accomplish this via the Windows Azure Devloper Portal as I’ve outlined below. If you’re well-versed in these steps, feel free to skip ahead.
All access is via HTTP; therefore, a storage account is identified by a name used as the fifth-level domain name for the endpoint of the storage service. For instance, a storage account with the name of snowball, would have an endpoint of snowball.table.core.windows.net for accessing Azure table storage. Since the account name is part of a URI, that name must be unique across all of Windows Azure.
Once the service has been created, a project summary page with five sections is displayed. Those sections include:
With your storage account provisioned, you’re now equipped to create and manipulate your data in the cloud, and for Azure@home that specifically means two tables: client and workunit.
As depicted on the left, data is inserted into the client table by WebRole, and that data consists of a single row including the name, team number, and location (lat/long) of the ‘folder’ for the Folding@home client application. We’ll see in a subsequent post that multiple worker roles are polling this table waiting for a row to arrive so they can pass that same data on to the Folding@home console application provided by Stanford.
WebRole also reads from a second table, workunit, which contains rows reflecting both the progress of in-process work units and the statistics for completed work units. The progress data is added and updated in that table by the various worker roles. Reiterating what I mentioned above, Azure tables provide structured, non-relational, non-schematized storage (essentially a NoSQL offering). Let’s break down what that means:
Let’s take a look at the schema we defined for the two tables supporting the Azure@home project:
client table
The PartitionKey defined for this table is the UserName, and the RowKey is the PassKey. Was that the best choice? Well in this case, the point is moot since there will at most be one record in this table.
workunit table
The PartitionKey defined for this table is the InstanceId, and the RowKey is a concatenation of the Name, Tag, and DownloadTime fields. Why those choices?
When I introduced Azure storage above, I mentioned how all the storage options share a consistent RESTful API. REST stands for Representational State Transfer, a term coined by Dr. Roy Fielding in his Ph.D. dissertation back in 2000. In a nutshell, REST is an architectural style that exploits the natural interfaces of the web (including a uniform API, resource based access, and using hypermedia to communicate state). In practice, RESTful interfaces on the web
The RESTful architecture employed by Azure table storage specifically subscribes to the Open Data Protocol (or OData), an open specification for data transfer on the web. OData is a formalization of the protocol used by WCF Data Services (née ADO.NET Data Services, née “Astoria”).
The obvious benefit of OData is that it makes Azure storage accessible to any client, any platform, any language that supports an HTTP stack and the Atom syndication format (an XML specialization) – PHP, Ruby, curl, Java, you name it. At the lowest levels, every request to Azure storage – retrieving data, updating a value, creating a table – occurs via an HTTP request/response cycle (“the uniform interface” in this RESTful implementation).
While it’s great to have such a open and common interface, as developers, our heads would quickly explode if we had to craft HTTP requests and parse HTTP responses for every data access operation (just as they would explode if we had code to the core ODBC API or parse TDS for SQL Server). The abstraction of the RESTful interface that we crave is provided in the form of the StorageClient API for .NET, and there are abstractions available for PHP, Ruby, Java, and others. StorageClient provides a LINQ-enabled model, with client-side tracking, that abstracts all of the underlying HTTP implementation. If you’ve worked with LINQ to SQL or the ADO.NET Entity Framework, the programmatic model will look familiar.
To handle the object-“relational” mapping in Windows Azure there’s a bit of a manual process required to define your entities, which makes sense since we aren’t dealing with nicely schematized tables as with LINQ to SQL or the Entity Framework. In Azure@home that mapping is incorporated in the AzureAtHomeEntities project/namespace, which consists of a single code file: AzureAtHomeEntities.cs. There’s a good bit of code in that file, but it divides nicely into three sections:
ClientInformation entity definition
It should be fairly obvious from comparing the code below to the client table schema that the ClientInformation class maps directly to that table. LIne 12 begins the constructor for a new entity (which we’ll eventually see being used in the WebRole code). In Lines 22 and 23, you’ll note the assignment of the PartitionKey and RowKey fields. Those fields (along with the read-only Timestamp field) are defined in the base class TableServiceEntity.
1: public class ClientInformation : TableServiceEntity
2: {
3: public String UserName { get; set; }
4: public String Team { get; set; }
5: public String ServerName { get; set; }
6: public Double Latitude { get; set; }
7: public Double Longitude { get; set; }
8: public String PassKey { get; set; }
9:
10: public ClientInformation() { }
11:
12: public ClientInformation(String userName, String passKey, String teamName,
13: Double latitude, Double longitude, String serverName)
14: {
15: this.UserName = userName;
16: this.PassKey = passKey;
17: this.Team = teamName;
18: this.Latitude = latitude;
19: this.Longitude = longitude;
20: this.ServerName = serverName;
21:
22: this.PartitionKey = this.UserName;
23: this.RowKey = this.PassKey;
24: }
25: }
WorkUnit entity definition
This entity of course maps to the workunit table, and is only slightly more complex than the ClientInformation entity above, by virtue of the concatenation required to form the RowKey (Lines 26-29).
1: public class WorkUnit : TableServiceEntity
3: public String Name { get; set; }
4: public String Tag { get; set; }
5: public String InstanceId { get; set; }
6: public Int32 Progress { get; set; }
7: public String DownloadTime { get; set; }
8: public DateTime StartTime { get; set; }
9: public DateTime? CompleteTime { get; set; }
10:
11: public WorkUnit() { }
12: public WorkUnit(String name, String tag, String downloadTime, String instanceId)
13: {
14: this.Name = name;
15: this.Tag = tag;
16: this.Progress = 0;
17: this.InstanceId = instanceId;
18: this.StartTime = DateTime.UtcNow;
19: this.CompleteTime = null;
20: this.DownloadTime = downloadTime;
22: this.PartitionKey = this.InstanceId;
23: this.RowKey = MakeKey(this.Name, this.Tag, this.DownloadTime);
25:
26: public String MakeKey(String n, String t, String d)
27: {
28: return n + "|" + t + "|" + d;
29: }
30: }
ClientDataContext
The ClientDataContext is the wrapper class for all of the data access, just as the DataContext is the entry class to LINQ to SQL and the ObjectContext is for the Entity Framework. Here, the base class is TableServiceContext (which in turn extends DataServiceContext). It’s via the TableServiceContext instance that you make the connection to Azure table storage (specifying endpoint and credentials) and enumerate entities in table storage. Under the covers, the context manages object tracking on the client and handles the translation of CRUD (Create-Read-Update-Delete) operations to the underlying HTTP requests that make up the Azure Table Service REST API.
In this somewhat simplistic implementation, two queries exist on the underlying context, each returning the complete contents of one of the tables (client or workunit); however, we could have included additional IQueryable<T> properties to provide options to return subsets of the data as well. Making these properties IQueryable enables us to compose additional query semantics in the WebRole and WorkerRole code.
1: public class ClientDataContext : TableServiceContext
3:
4: public ClientDataContext(String baseAddress, StorageCredentials credentials)
5: : base(baseAddress, credentials)
6: {
7: }
8:
9: public IQueryable<ClientInformation> Clients
10: {
11: get
12: {
13: return this.CreateQuery<ClientInformation>("client");
14: }
15: }
16:
17: public IQueryable<WorkUnit> WorkUnits
18: {
19: get
20: {
21: return this.CreateQuery<WorkUnit>("workunit");
22: }
23: }
At this point we’ve got a lot of great scaffolding set up, but we haven’t actually created our tables yet, much less populated them with data! In the next post, I’ll revisit the WebRole implementation to show how to put the StorageClient API and the entities described above to use.