The purpose of the Windows Azure ISV blog series is to highlight some of the accomplishments from the ISVs we’ve worked with during their Windows Azure application development and deployment.  Today’s post, written by Fielding Systems Founder and CEO Shawn Cutter, describes how the company uses Windows Azure to power and deliver its web-based services for its Oil and Gas industry customers.

-------------------------------------------------------------------------------------------------------------

Fielding Systems provides two powerful web-based services for midstream and upstream oil and gas companies of all sizes that help operators streamline production activities and increase production output with field operations management, remote monitoring, and production analysis.  Each application includes a fully-featured mobile version allowing all users access to their data and operations from any modern smartphone including iPhone, Android & Windows Phone.

FieldVisor is a field automation and data capture application that can be used to track everything that takes place in an oil and gas production operation.  Users can track production, equipment, service, treatment history, tasks and many other facets of a production operation. FieldVisor is more heavily focused on manual data input to replace pen and paper field operations and providing robust analysis and reporting on that data.  ScadaVisor monitors remote devices such as flow meters, pump-off controllers, tanks, compressors, PLC's, artificial lifts, and various other SCADA devices in real time.  It is the only truly cloud-based service of its kind since it is supported by our own cloud-based communication and polling engine called VisorBridge.  This gives Fielding Systems a definitive competitive advantage over all other hosted SCADA service providers.

Originally, all Fielding Systems applications were hosted in our own data centers.  These applications were upgraded to take full advantage of Windows Azure, and were completely migrated onto Windows Azure in July, 2011. The decision to move to the cloud was made based on a strong desire to focus our efforts and resources entirely on cutting-edge software technology rather than the management of servers, backups, and the networking required by each application. We recognized that often companies spend valuable time and resources getting stuck focusing on things to support their business rather than making driving technology forward a focal point, which the cloud allows us to do.  The move to the cloud also facilitates major cost savings through the reduction of excess capacity of server resources and high software licensing costs involved in maintaining our own data center and colocations. After thorough evaluation of various cloud options, we selected Windows Azure because it offered us a more powerful development platform.  When taken advantage of, Windows Azure would allow for speed to market and scalability on a level much greater than the other offerings, which at the time, were essentially just hosted virtual machines.

The initial migration to Windows Azure resulted in minimal savings over the existing data center costs for rack space, power, backups, and secondary hot site colocation.  However, since the migration, the costs continue to go down as we have been expanding our offerings and customer base. 

Architecture

FieldVisor, FieldVisor Mobile, ScadaVisor, and ScadaVisor Mobile exist in Windows Azure as separate multi-tenant web applications with single-tenant SQL Azure databases for each of our customers.  These applications are supported by a Central SSO Administration Application to manage all users, roles, security and some other application configuration along with a multi-threaded worker role for purposes of processing alarms and notifications, maintaining customer databases, and performing the remote data collection from field devices.

We currently utilize nearly every aspect of the Windows Azure cloud including:

  • Compute Instances: multiple web and worker roles
  • Blob Storage: Blob storage is used for site incremental upgrades and automated SQL Azure database backups using BacPac.
  • Table Storage: Tables are used for centralized data reprocessing and performance counter data for complete system logging.
  • Queue Storage: Queues are used for event scheduling, real time device data requests, automated notifications, and worker role management. 
  • SQL Azure: All customer application data is stored in single-tenant databases along with a few central core management databases that are multi-tenant.
  • SQL Azure Reporting: Reporting in FieldVisor and ScadaVisor is supported by SQL Azure Reporting.  Reports are run on-demand by users using the Report View ASP.NET control and also on user-defined schedules controlled by a worker role process that manages all scheduled reports.
  • Caching: Caching is used for the session provider in each application along with cache support for each of the web applications.  Caching is also utilized heavily to limit the load on each SQL Azure database.

We considered Table Storage for a number of processes while upgrading each application. Due to the complexity of data and the required existence of SQL Databases, we decided to utilize databases for all central processes but chose to adhere to a pub/sub model for background processing, database inserts, automated imports/exports, and remote device polling.

SSO / Central Multi-Threaded Worker Roles

Our custom Single Sign On (SSO) service along with all worker roles that process scheduled tasks, notifications, automated imports/exports, and anything else we need done reside on a few Small instances.  Most of the actual work and processing that is done is performed as small units of work in SQL Azure, so the overhead required on these compute instances is low.

Applications

FieldVisor and ScadaVisor, along with their mobile versions, supporting web and OData services, are all housed on two Medium compute instances.  Our multi-tenancy deployment process handles the rollouts and manages IIS to spin up new sites and services.  The diagram depicts the single tenant databases for each application.

SQL Azure Reporting

Upon initial production deployment to Windows Azure, we were required to keep our own instance of SQL Server 2008 R2 Reporting Services running that handled the processing all reports for both FieldVisor and ScadaVisor.  We have since moved all report processing over to SQL Azure Reporting and were one of the first companies to go to production with SQL Azure Reporting.

Worker Roles

All worker roles instances are designed to run in a multi-threaded environment where each task processor has its own thread with a single master thread on each instance that maintains all the other threads.  The single master thread also functions as an enterprise master thread on one of the compute instances to ensure that there is one and only one decision maker that will determine what processes need to run and when.  This worker role architecture was built to fill the void of a limitation of Windows Azure not providing SQL Agent or additional task scheduler to handle these types of scheduled tasks.  The processes running on these worker roles provide both internal management functions and also customer facing services.  Some examples of these processes include:

  • Critical device alarm processing
  • Running customer scheduled reports
  • Internal health checks and reports
  • Sending email/SMS notifications
  • Database maintenance including re-indexing and blob backups
  • Automated imports and exports for customers

The master scheduler utilizes Windows Azure Queues to submit each task or group of tasks to the queue for processing.  The worker threads, running on multiple compute instances, monitor the various queues and pick up the tasks for processing.  The multi-threaded nature of each instance allows us to maximize all resources on each instance and also allows for nearly infinite scalability on task processing going into the future.  The Windows Azure Queues enable the worker roles to be resilient in the event of an instance going offline; those tasks are then placed back on a queue where they will be picked up and processed by another instance.  Every single unit of work has been designed with idempotency in mind so that as each message is processed from a queue, the unit of work itself contains all of the data needed for each thread to process and if one of the workers should disappear, the unit of work can be picked up by any other worker instance without risk of data corruption. 

Multi-Tenancy

During the migration to Windows Azure, we decided to convert each application to run in multi-tenant application mode rather than provisioning separate applications for each customers and each application.  This decision was based on two key factors:

  • Increased maintenance and rollout times for each application for all customers

  • Large amount of server resources required for each customer

Each application, during heavy usage, can consume a great deal of memory that would effectively consume all memory on even a Large instance if each customer essentially had its own IIS application for each service.  We handle multi-tenancy through an additional data access layer that maps SQL Azure connections to appropriate databases based on the services available to a particular customer and based on the identity of the user.  We currently host the core customer-facing services on two Medium instances and as the load increases during heavy use, additional instances can be brought up to share the services of processing during peak hours of the day.  We monitor the performance using performance counters and leverage the Compute API to manage our services.  For our services, we watch two key counters: CPU and memory usage. 

We also created our own multi-application host deployment process and management layer that uses a worker role to monitor for changes in zip files on blob storage, and update applications when changes are detected.  The process drastically reduced the administration and maintenance overhead of rolling out updates. 

Windows Azure Worker and Web Role Sizing

We load-tested each application individually for diagnostic, performance, and compute instance sizing considerations prior to the complete application migration to Windows Azure.  The Central SSO service and a handful of worker processes had already been running on two Small Compute instances with no issue or performance concerns.  These processes were to be left alone since the number of requests per second on these instances is minimal and the majority of the processing involved simple, short bursts resulting in low IO and memory usage even at peak times.  As more resources are needed, additional instances could be added in order to scale out the throughput. 

For the application instances, several different configurations were tested to assess the correct configuration.  The tests consisted of recording the user request patterns in each application and replaying those in Test Studio.  The main configurations tested were 2-6 Small Compute Instances as compared 2-3 Medium Compute Instances.  While Medium instances obviously contain more memory, there are also differences in network and IO capacity:

  • Small: 100Mbps
  • Medium: 200Mbps 

These differences were reflected exponentially during heavy load testing.  When generating 30-50 requests/second against a configuration with 6 Small Compute instances, requests would eventually get queued and timeouts would result.  The Medium sized instances easily withstood heavy load testing and we feel that in the future, we can continue to scale out rather than up to Large instances.  We also can break the Medium instances out so that they are handling separate services meaning we go from having four compute instances sharing the load of all services to having two sets of Medium instances with 50% of the services being hosted each set.

Conclusion

For Fielding Systems, nearly the entire Windows Azure stack is being utilized in our drive to focus entirely on building incredibly scalable oil and gas software services rather than worry about server capacity, backups, and administration.  Before choosing to upgrade and move everything to Windows Azure, we looked at other cloud offerings.  We saw the long term benefits of a Platform as the Service (PaaS) design and understood the clear advantage it had over simply adding and managing virtual machines in some remote data center.  While our competition spends time and resources on managing their cost centers, we focus on improving our core technology: software.  We are constantly looking at ways to further improve scale by leveraging Windows Azure more with increased use of Windows Azure Caching and Windows Azure Service Bus.    The cloud is not just something we sell to our customers; we also use it internally for everything we can with services like Office 365 and Dynamics CRM.  We are on the bleeding edge of technology and loving it.