Thomas Watson is often misquoted as saying “I think there is a world market for maybe five computers.” Although there is little evidence he actually said anything of the sort, the myth lives on because it illuminates a fundamental truth. For the computing needs of the time and the existing workloads, the ‘five computers’ estimate may not have been far off the mark. But as computers became more efficient, it did more than just serve the existing workloads more efficiently.The computer became so efficient it naturally attracted new workloads to take advantage of that efficiency.

Today we see a similar debate surrounding cloud computing. There are many in the industry that argue there is only room for perhaps three cloud providers and hence all clouds will ultimately be public. Indeed there are some who worry about the efficiency gains and argue that private clouds will need to be much smaller than typical public deployments in order not to waste resources. After all, an enterprise’s current workload can be satisfied with a fraction of a typical cloud deployment.

The history of computing indicate otherwise. Rather than making bets on long-term consolidation or trying to satisfy smaller cloud deployments, it is more useful to look forward and identify the new computing paradigms and workloads that will be enabled by the efficiencies of cloud computing. With a nod to Kevin Kelly’s recent book “What Technology Wants” we need to look at the architectural characteristics of the cloud as well as business trends to identify that workload.

What the Cloud Wants

Innovation often arrives in great waves as the right technological precursors come together. Many inventions spring up nearly simultaneously from many different sources. The radio, the light bulb, and calculus were all invented nearly simultaneously by multiple people. Even evolution was developed as a theory by more than one person. This does not mean that mean that any one inventor stole another’s work, but rather that the time was right for the invention. The individual inventor was almost superfluous and the innovation was all but inevitable. Cloud computing is no exception as it was developed simultaneously by Amazon, Google, Microsoft, and others. Cloud computing is the result of the confluence of megaservices, virtualization, multi-core processors, and fast network connectivity.

Computing wants the cloud.

 But if computing wants the cloud, what does the cloud want? Every new computing paradigm results in a new programming model driven by the business needs at the time and the architectural characteristics of that paradigm. Client server architecture gave us SQL databases, middle tier business logic, and the presentation layer. Applications no longer ran in their entirety in user sessions or virtual machines on a mainframe. Rather data was concentrated on a database tier and computation was separated from the data layer as much as possible. This allowed for greater scalability as the middle and presentation layers reduced load on the database tier. This was important because the database tier was the one scale-up tier of the system. It also resulted in greater modularity as multiple applications could rely on the same data without interfering with other systems. SQL databases arose from the requirement that that multiple users could view and update the same data consistently and efficiently.  

The cloud can certainly accommodate these traditional client-server applications, indeed it’s critical for the success of the cloud that do so well. Similarly Terminal Server remains a robust business forty years after the advent of SQL even though it is architecturally closer to the mainframe paradigm. One of the amazing things about cloud computing is the enormous efficiencies that it can provide to existing architectures. But in order to be a leader in cloud computing it’s essential to understand the new workloads that the platform will enable.

Technological Trends

 Here are some of the technological trends that will guide the determination new workloads.
 

  • The cloud wants massively parallel workloads—The increase in core counts and the end of the MHz race means that workloads must be massively parallelizable in order to scale. Any serial requirement ultimately kills scalability. According to Ahmdal’s law even a 95% parallelizable job can only be sped up by 20X, no matter how many cores are applied to the work.

               
  • The cloud wants to consume data—There is a vast amount of business value locked away in corporate datacenters today. This data consists comes from credit card transactions, customer loyalty card usage, purchase tracking, click streams, RFID tags, mobile phones, smart monitoring systems, not to mention twitter and other social media. The volume of this data will continue to increase, even outstripping the exponential increase in available storage.

               
  • The cloud wants schedulable work—Many legacy workloads are highly variable on a variety of timescales. Capacity has to be reserved for short-lived spikes as well daily, weekly, and yearly patterns of usage. This fact is often used as an argument for greater consolidation in public clouds. The idea is that the more workloads you aggregate, the more counter-cyclical workloads you are likely to have. Considering that many industries have the same day/night patterns as well as the same yearly patterns, this seems unlikely. To truly extract the most utility from a cloud (private or public) you need the ability to schedule work when the resources are available.

               
  • Data is unstructured—Much of the valuable business data does not reside in SQL databases. It consists of logs often of uncertain schema. Partly this is simply because SQL does not scale sufficiently. Often the data was collected because it was known to be valuable, but it wasn’t quite clear how it would eventually be used.

               

In summary, what the cloud wants is a massively parallel, flexible, batch-oriented workload that extracts business value from distributed, often unstructured data.

Collectively this workload is commonly known as “Big Data.”