Azure, AWS, Cloud .... exciting stuff. These are all modern incarnations of hosting environments, and developers are finding them attractive because of the resume polishing impact that a successful service or application has.
When done right, this kind of app lowers costs to the business by reducing run-costs, equipment costs, and letting the decison on when to add more capacity become simple. But getting there is not easy. Keeping them running is the bigger challenge - so visiblity into what is going well and what isn't going well is a critical factor of building an app that can be managed from remote. Thus, I coin a term - the managed application.
A managed application is different than a running application in that it is designed to be run with a very low cost profile that goes beyond the deployment and hardware savings associated with the new forms of hosting environments. Instead, a managed application reduces the largest run-cost - the cost associated with downtime. When a service that is making money or saving money is down, the costs (opportunity cost, reputation, SLA driven reimbursements, career success) start to add up quickly. Every second spent diagnosing a problem is a second that these costs grow.
Based on access to operational data for some of the most popular services on the planet, let me share some fuzzed details (fuzzed to protect the specifics). In general, the measure I am intested in is MTTR - the mean time to restore when an incident is noted. Let's define incident as "something a paying customer notices that results in dissatisfaction with the decision to use the service in the first place". The incident clock starts ticking when the customer notices. It ends when the customer can no longer notice. Whether the underlying problem is fixed or not is not important - the impact is no longer noticeable. Thus, the acronym breaks down to "Mean Time To Restore" - restore service.
There are some lossy moments in calculating this. Not being able to see that the cusotmer is noticing a problem is a challenge. If you don't know the service is impaired, you are wasting valuable time and the costs are mounting. Thus the time between when the incident starts and when the team responsible for keeping costs low (hello operations!) is important - think of this as the monitoring latency that I talked about in my post last week.
So what do we do about it? How do we deal, as developers, with making sure these important elements are a part of our design? Read on - this week I introduce the "managed application" as an experience that development is ultimately in control of, and I would argue, should be held accountable for.
Developers that are setting out to create cloud hosted applications need to invest time and resources to create their applications so that they can be automatically managed by the cloud operating environment (“fabric” controller) and remotely managed where necessary. Hosted applications run in constrained environments that optimize traditional hardware and facility costs. In these environments, such as Windows Azure, the constraints come in the form of limits on what the development and operations teams in a business can directly configure. To live within these constrained environments, new development practices must be employed. These practices together allow developers to create a new class of applications that were designed to be remotely managed. I am calling this class of applications “Managed Applications”.
The three tenets that distinguish a managed application from traditional applications are:
These three tenets roughly align to the well-known application life cycle. An application goes through a development phase where it is designed, coded and tested. Then it goes through a deployment and readiness phase. The final phase is the production phase. It is in the production phase that a managed application exhibits substantially better cost performance than traditional application designs.
 While the focus of the post is on the development of cloud-ready “managed” applications, a lot of the tenets are applicable for modern, on-premise deployments as well.
 See Application patterns for green IT for some ideas on this topic.