Architecting for Fitness using Complex Adaptive Systems

Fitness of complex adaptive systems (CAS) is an area of research that I have been studying over the last few years, specifically how these systems are structured and how they behave.

One of the characteristics of a CAS is that the rules that define its operation are relatively simple, elements can self-organize, but this lead to many outcomes due to the concept of emergence. This is where new patterns are created based on large number of relatively simple interactions. Self-organization and emergence are fundamental considerations on the development of a complex adaptive system.

A great example of a CAS is the game of chess. Chess has a number of relatively simple rules where the pieces around bound to a set of movements. Even with this fact, it is very difficult to predict the outcome of game. The player’s decisions impact one another. There is feedback in the system. The rules of cause and effect are not sophisticated enough to understand the entire ecosystem of the game of chess. The actors play a huge role in the outcome of the game.

I also believe that the fitness of a system is not only a function of how healthy it is. Most often the health of a system is related to how the operations team or the “run function” can maintain steady-state. Sure the system maybe healthy, but how can it respond when confronted with change? The overall fitness of a system has to strike a balance between maintaining a steady state and the ability to adapt. Fitness also is tied to purpose. Purpose is defined by an architectural agreements between the designer and user, including governance. So our job as architects is to take the view not only of operations, but the end-user or consumer of the system. A system that is maladaptive to the business would not be considered fit, but one that is in a state in of decline and perhaps on its way to retirement. A system that lives at the "edge of chaos” is where it is most “buff”. But designing a “systems architecture” in this manner can be a bit unnerving, so let’s consider a few aspects.

Another clarification that is worth stating is that we as architects must design, and not engineer. As mentioned in my pervious blog and to summarize here: architecture is about structural integrity (durability), driving and inspiring the right behaviors (functionality), and provide an aesthetic appeal by making information useful, purposeful, and actionable (utility). Accepting that our designs may not have all the physical elements specified to the n-th degree gives other domain architecuts, software developers, and infrastructure engineers flexiblity to specifiy the best element for the job, and perhaps swap elements as long as they meet the architecture design specification.

Consider a cloud operating environment. One can make the assertion that this type of computing environment by its definition is a CAS. The cloud environment on its own provides no value unless it is utilized in a way that captures the user’s imagination or utilized. (A beautiful beehive, a CAS, is worthless unless the bees can make honey.) If the cloud environment is over engineered, it will most likely lose the characteristics that make the cloud attractive like elasticity, availability, etc. What makes the cloud environment fit? An architect may have to design something that is intentionally incomplete, perhaps a foundation of building blocks like Azure so that other people can design the "finished cloud service". The environment needs to be engaged so that users and business can infuse it with its own aspirations, values, and ideas over time. Consider the Internet or any other social computing system where the user interacts with (Internet E-mail, Facebook, eBay, and to certain degree an enterprise deployment of SharePoint.) What drives adoption of these systems is their ability to grow and prosper through emergence.

Self-organization of a system requires a level of intelligence and connectedness. The network of a CAS has a (large) number of nodes and connections. These "nodes" of a system are networked and information about the node itself can be transmitted or received. Consider a data center (our cloud environment) that has a lattice of homogeneous nodes, or in this case a physical server. These servers areconnected via a flat network where they can freely communicate to one another. Read this if you are interested in this concept of data centers. The quality of these connections should be enriched by simple rules and protocols that "couple" a set of servers to work in conjunction with one another or "decouple" servers when there is a health related issue. Self-organization requires a natural tension between discipline and freedom. There should be enough discipline to keep the system operating in a manner were resiliency is balanced with the freedom of virtualized "workloads" to glide on top of these physical servers. This "workload" transportability is key as it allows for physical servers that are more fit, to take on the work if other servers health is in question or in decay. An architect must also consider the "pressure" that would be exerted on these nodes and what happens things are upended.

Rules provide the discipline not only for end users but fuels operational excellence. These are rules are the behaviors (verbs) of the entire system, including the servers themselves but also the rules imposed on the structures (subjects) actors and operators that use and maintain the system. Freedom is reinforced by something in chaos theory that is the strange attractor. The power of this force causes a random, unpredictable system to stay within the observable boundaries without becoming either nonrandom or predictable. The freedom of the user to access "on demand" computing resources to give the appearance of infinite scale. This freedom also has to be monitored to check for preconditions of emergence. If there is "noise", "heat", or any indication of trouble in the system, contradictions between structures and behaviors, incongruences between user supply and demand, and unanticipated events or needs. These all should trigger some action either manual or automatic so that the correct response to restore a balance between the right level of order and chaos.

As you notice here there is a duality that architects have to manage, the number of rules and degrees of freedom. An architect will have to ask herself if the design of the structure provides enough behavioral freedom for the end user and the operational goals of maintaining health. To take this one step further, the system architecture for a CAS will determine the fitness of system over time. Here is a short list to consider:

· The number of rules for your application and infrastructure are minimal yet provide flexibility to handle new potential business outcomes. Balance governance and adoption to accelerate purposeful innovation.

· Incent the right behavior so that users do not over utilize system resources. Promote efficiency as common system design principle where all consistencies benefit by doing more with less.

· The number of rules around the operation are simple enough to foster automation. The system should be responsive enough to be agile in the face “negative” change to keep the solution and infrastructure resilient to provide a level of availability to the information required.

· System architecture fitness should balance partitioning and integration so resources that are in good health are leveraged and resources that are not healthy to not contaminate other resources. Integration of other systems may have feedback that may alter the fitness of the system you are designing for. Consider risks, threat, and vulnerabilities on external systems or environments that are difficult or impossible to incent, govern, manage or control and incorporate that into a resiliency strategy.

· Collecting “node intelligence” requires a level of instrumentation and connectivity to allow for the system to make intelligent decisions based on historical data or learnings. Current and past states should be captured in in and intelligent configuration management system that tracks current and previous states to provide “a memory of the system” so that change can be agile and resilient.

The fact of the matter is that complex adaptive systems are all around us, but we take them for granted. They exist in nature, in your very own organization, and the infrastructures that supports us. Accepting complex adaptive systems have given me more choice and freedom, because it gives my constituencies more choice and freedom over the architecture of the system. I am interested in hearing your points of view.