The purpose of the Windows Azure ISV blog series is to highlight some of the accomplishments from the ISVs we’ve worked with during their Windows Azure application development and deployment.  Today’s post is about Windows Azure ISV BrainCredits and how they’re using Windows Azure to deliver their online service.

BrainCredits provides a system to help people track all their informal learning on a virtual transcript, including instructor-led or self-study learning, such as webinars, classes, tutorials, books, conferences, blogs or videos.  The system is designed as a highly available, high-volume web-based Model-View-Controller (MVC) application and was built following an agile process, of pushing small, incremental releases into production. To do this, the team needed an architecture that would support fast read operations and allow for very targeted updates without having to recompile or retest the entire application. They decided on a CQRS (Command Query Responsibility Segregation) style architecture. They also decided to host the application on Windows Azure to take advantage of fine-grain scaling of individual subsystems (web roles or worker roles) independently depending on traffic and background workload.

CQRS architectures essentially separate write actions from read actions. With BrainCredits, you’d have write actions, such as registering for an instructor-led class, and read actions, such as seeing your online resume. BrainCredits handles write actions by having the web role collect requests (aka commands) and routes them to the worker role asynchronously via queues. This allows the UI response time to be very fast and also reduces the load on the web role. In this case, BrainCredits was able to deploy Small instances for their Web Roles, with each instance consuming a single core.

The basic architecture is below:

To achieve asynchronous communication between web and worker roles, the following Windows Azure objects were used:

  • Windows Azure queues. The web role instance drop messages in a queue, alerting the worker role instances that a command needs to be handled.
  • Windows Azure blobs. Blobs store serialized commands, and each command queue message points to a specific blob. Note: a blob is used to store the command because a BrainCredits user can add free-form text to some of the commands being issued, resulting in unpredictable message sizes that occasionally exceed 8k, the then-current queue message size limit. With the new 64K message-size limit announced in August 2011, this indirection is likely unneeded.
  • Windows Azure table storage. Event sourcing data is captured in Windows Azure Tables. As events are raised by the domain, the events are stored in table storage and used by the CQRS framework to re-initialize a domain object during subsequent requests. Windows Azure Table Storage is also used for storing command data such as user UI clicks (which are unrelated to domain events). The Domain Event table allows BrainCredits system administrators to recreate all of the UI steps that a user took during their visits to the site (e.g. search performed, page loaded, etc.)
  • Windows Azure Cache. The cache is used for storing data between requests, to provide the user some feedback on commands being executed but have not completed yet. This allows BrainCredits to handle eventual consistency in the application so as to provide the user the appearance of a synchronous experience in an asynchronous application.

One point about VM size: A Small VM instance provides approx. 100Mbps bandwidth. If BrainCredits found a performance bottleneck due to reading and writing command and event content that impacted total round-trip processing time for a large command or event, a larger VM size would have been a viable solution. However, based on testing, a Small instance provided very acceptable customer-facing performance. By keeping the VM size at Small (e.g. 1 core), the “idle-time” cost is kept to a minimum (e.g. 2 Small Web Role instances + 1 Small Worker Role instance equates to approx. $270 as a baseline monthly compute cost). Medium VMs would increase this minimum cost to about $540. It’s much more economical to scale out to multiple VM instances as needed, then scale back during less-busy time periods.

There are a few key points illustrated by the BrainCredits architecture:

  • Focus on End-user experience. By following a CQRS approach and handling updates asynchronously, Web response time is unaffected by long-running background processes.
  • Scalability. BrainCredits separated all background processing into a Worker Role. While it’s entirely possible to process queues asynchronously in a Web Role thread, this would impact scale-out options. With a separate Worker Role, the system may be fine-tuned to handle user-traffic and background-processing load independently.
  • Cost footprint. VM size selection is important when considering minimum run-rate. While it’s tempting to go for larger VMs, it’s often more cost-effective to choose the smallest VM size, based on memory, CPU, and network bandwidth needs. For specific details about VM sizes, see this MSDN article.

Remember that there are often several ways to solve a particular problem. Feel free to incorporate these solution patterns into your own application, improve upon it, or take a completely different approach. Feel free to share your comments and suggestions here as well!

Stay tuned for the next post in the Windows Azure ISV Blog Series; next time we’ll share Digital Folio’s experience with Windows Azure.