All postings/content on this blog are provided "AS IS" with no warranties, and confer no rights. All entries in this blog are my opinion and don't necessarily reflect the opinion of my employer.
In an earlier blog post I talked about "Web as a Platform" (in Web 2.0's context) and briefly described a layered and componentized perspective in looking at the Web platform in general. And I thought it would be more clarifying to illustrate what a Web platform stack might look like, so this post is intended to describe (not define) a stack view of the Web platform.
Plus, the evolution of this stack is the result of collective innovation contributed by many brilliant minds, and not driven by any single entity. Just as Eric Schmidt had said, "don't fight the Internet", I also don't think we need to model the Web after a specific prescribed framework. Rather, just allow the collective consciousness continue to innovate organically.
Thus this is a description of the Web platform (not a definition), as it is merely an attempt at categorizing the observed patterns and trends, and their relationships and dependencies, in the Web 2.0 phenomenon, into a structured context. There are many ways to describe and categorize these patterns, so this view is not necessarily exact and accurate, but hopefully it can provide some clarifications into the way Web is evolving.
Architecture of the Web
Below is a high-level rendering of the layered components architecture view of the Web platform stack:
The choice of words used is questionable, but the intention here is to highlight the trends and patterns (and their relationships) and hoping to effectively convey the concepts, without spending the time to make sure they are semantically accurate.
Just as I mentioned in my earlier blog post, layers towards the bottom of the stack are progressively closer to raw data and IT architectures, and layers towards the top are closer to people. In general, this layered stack view is used as I think lower layers serve as platforms that encapsulate the underlying complexities and provide abstraction and support to the upper layers. Even though this also kind of describes the evolutionary path (or a maturity model) of the Web in the past few years, I think this stack view is relevant as innovation is still occurring across this entire view.
Some examples to help clarify (nowhere near a comprehensive list; just intended to illustrate the categorizations):
In general, each component in each layer is worthy of a separate detailed analysis, and only a minor fraction of things that are examples of a particular category have been listed. Though the intention is to show that each site or service individually is not representative of the layer component; it is the network effects created by the collection of sites and services in that category. Similarly for the layers in the platform stack, it is the aggregation of individual components that really exemplify the characteristics of that layer.
As a result, we can see that each layer in this stack view has dependencies on the services offered in the underlying layers. And each layer itself provides a level of abstraction and support to the layers above. Thus architecting solutions using the Web as a platform is quickly becoming a process of choosing a target layer (where the solution will reside), and choosing the appropriate combination of support services from the underlying layers.
This view can also be helpful in visualizing general trends of innovative development on the Web, and identify potential spots where opportunistic developments can occur, and areas where they have been turned over to systematic developments. For example, the current state (as of this writing) is that mainstream efforts can be categorized as focused in the "Participation" layer, and is where many of the opportunistic developments are being turned into systematic ones (basically, gaining maturity). The "Interaction" and "Interpretation" layers are considered to be the subject of the next Web (or Web 3.0), and is where much of the research & development efforts are focused in.
One fundamental aspect is that, the Web platform is created and maintained by the collective wisdom contributing to it. It is too big and too diverse for any one entity to own, even though many organizations have investments in multiple areas, and some more than the others. But it is interesting to see how this view of the Web platform is taking shape, based on the inter-dependencies and (almost "symbiotic") relationships established between the clusters of sites and organizations operating on the Web.
The general concept here is that, Web 2.0 applications are taking on a new form. They are composite applications in nature, and increasingly can be created and hosted completely in the Web (cloud), without any dedicated on-premise infrastructure. And they are increasingly being implemented at higher levels of abstraction (moving up the stack).
For established enterprises, this marks a shift in Web application models. From approaches to open up enterprise data silos and providing value-added services to customers ("Applications" layer aspects), to migrate to a model where various higher-level components of the Web ("Integration" and "Participation" layer aspects) can be integrated and leveraged to connect to the communities. For emerging businesses, it is now possible to quickly establish an initial online presence by completely building on the cloud-based Web platform, while looking to add differentiating values with a variety of options (such as dedicated on-premise solutions).
From a user participation perspective, lower layers are progressively closer to people with higher technical expertise, but are populated by smaller communities. On the other hand, upper layers are progressively closer to larger communities as barriers to entry, from a technology perspective, are increasingly lower. This aspect demonstrates the power of network effects in enabling the participation age, and fueling the explosive pace of innovation towards creating a Web that connects/involves more people and is more relevant and intelligent.
Certainly, areas where boundaries are being pushed may still sound like science fiction, and it's fun to imagine that new breakthroughs will bring about sea changes that will overthrow all conventional wisdom. The blogosphere already has tons of speculations in that respect. Though I believe "could" does not equate to "should", such that change for change sake will not add value; only changes that lead to better outcomes will gain adoption. Thus my assessment is that, significant changes are surely imminent, but conventional wisdom will also not cease to complete irrelevance. Eventually, when the pendulum settles, we usually see a hybrid world, with some changes more dominant, and some changes less. The Web is a place where rapid changes are occurring, and as architects and strategists, using a pragmatic viewpoint when facing these changes may help us better plan the migration path between current and future states.
This post is part of a series:
I wanted to take the opportunity and talk about the cloud-optimized architecture, the implementation model instead of the popular perceptions around leveraging cloud computing as a deployment model. This is because, while cloud platforms like Windows Azure can run a variety of workloads, including many legacy/existing on-premises software and application migration scenarios that can run on Windows Server; I think Windows Azure’s platform-as-a-service model offers a few additional distinct technical advantages when we design an architecture that is optimized (or targeted) for the cloud platform.
Cloud platforms differ from hosting providers
First off, the major cloud platforms (regardless how we classify them as IaaS or PaaS) at the time of this writing, impose certain limitations or constraints in the environment, which makes them different from existing on-premises server environments (saving the public/private cloud debate to another time), and different from outsourced hosting managed service providers. Just to cite a few (according to my own understanding, at the time of this writing):
Again, just based on my understanding, and really not trying to paint a “who’s better or worse” comparative perspective. The point is, these so-called “differences” exist because of many architectural and technical decisions and trade-offs to provide the abstractions from the underlying infrastructure. For example, the list above is representative of most common infrastructure approaches of using homogeneous, commodity hardware, and achieve performance through scale-out of the cloud environment (there’s another camp of vendors that are advocating big-machine and scale-up architectures that are more similar to existing on-premises workloads). Also, the list above may seem unfair to Google App Engine, but on the flip side of those constraints, App Engine is an environment that forces us to adopt distributed computing best practices, develop more efficient applications, have them operate in a highly abstracted cloud and can benefit from automatic scalability, without having to be concerned at all with the underlying infrastructure. Most importantly, the intention is to highlight that there are a few common themes across the list above – stateless application model, abstraction from infrastructure, etc.
Furthermore, if we take a cloud computing perspective, instead of trying to apply the traditional on-premises architecture principles, then these are not really “limitations”, but more like “requirements” for the new cloud computing development paradigm. That is, if we approach cloud computing not from a how to run or deploy a 3rd party/open-source/packaged or custom-written software perspective, but from a how to develop against the cloud platform perspective, then we may find more feasible and effective uses of cloud platforms than traditional software migration scenarios.
Windows Azure as an “application platform”
Fundamentally, this is about looking at Windows Azure as a cloud platform in its entirety; not just a hosting environment for Windows Server workloads (which works too, but the focus of this article is on cloud-optimized architecture side of things). In fact, Windows Azure got its name because it is something a little different than Windows Server (at the time of this writing). And that technically, even though the Windows Azure guest VM OS is still Windows Server 2008 R2 Enterprise today, the application environment isn’t exactly the same as having your own Windows Server instances (even with the new VM Role option). And it is more about leveraging the entire Windows Azure platform, as opposed to building solely on top of the Windows Server platform.
For example, below is my own interpretation of the platform capabilities baked into Windows Azure platform, which includes SQL Azure and Windows Azure AppFabric also as first-class citizens of the Windows Azure platform; not just Windows Azure.
I prefer using this view because I think there is value to looking at Windows Azure platform holistically. And instead of thinking first about its compute (or hosting) capabilities in Windows Azure (where most people tend to focus on), it’s actually more effective/feasible to think first from a data and storage perspective. As ultimately, code and applications mostly follow data and storage.
For one thing, the data and storage features in Windows Azure platform are also a little different from having our own on-premises SQL Server or file storage systems (whether distributed or local to Windows Server file systems). The Windows Azure Storage services (Table, Blob, Queue, Drive, CDN, etc.) are highly distributed applications themselves that provide a near-infinitely-scalable storage that works transparently across an entire data center. Applications just use the storage services, without needing to worry about their technical implementation and up-keeping. For example, for traditional outsourced hosting providers that don’t yet have their own distributed application storage systems, we’d still have to figure out how to implement and deploy a highly scalable and reliable storage system when deploying our software. But of course, the Windows Azure Storage services require us to use new programming interfaces and models (REST-based API’s primarily), and thus the difference with existing on-premises Windows Server environments.
SQL Azure, similarly, is not just a plethora of hosted SQL Server instances dedicated to customers/applications. SQL Azure is actually a multi-tenant environment where each SQL Server instance can be shared among multiple databases/clients, and for reliability and data integrity purposes, each database has 3 replicas on different nodes and has an intricate data replication strategy implemented. The Inside SQL Azure article is a very interesting read for anyone who wants to dig into more details in this area.
Besides, in most cases, a piece of software that runs in the cloud needs to interact with data (SQL or no-SQL) and/or storage in some manner. And because data and storage options in Windows Azure platform are a little different than their seeming counterparts in on-premises architectures, applications often require some changes as well (in addition to the differences in Windows Azure alone). However, if we look at these differences simply as requirements (what we have) in the cloud environment, instead of constraints/limits (what we don’t have) compared to on-premises environments, then it will take us down the path to build cloud-optimized applications, even though it might rule out a few application scenarios as well. And the benefit is that, by leveraging the platform components as they are, we don’t have to invest in the engineering efforts to architect and build and deploy highly reliable and scalable data management and storage systems (e.g., build and maintain your own implementations of Cassandra, MongoDB, CouchDB, MySQL, memcarche, etc.) to support applications; we can just use them as native services in the platform.
The platform approach allows us to focus our efforts on designing and developing the application to meet business requirements and improve user experience, by abstracting away the technical infrastructure for data and storage services (and many other interesting ones in AppFabric such as Service Bus and Access Control), and system-level administration and management requirements. Plus, this approach aligns better with the primary benefits of cloud computing – agility and simplified development (less cost as a result).
Smaller pieces, loosely coupled
Building for the cloud platform means designing for cloud-optimized architectures. And because the cloud platforms are a little different from traditional on-premises server platforms, this results in a new developmental paradigm. I previously touched on this topic with my presentation at JavaOne 2010, then later on at Cloud Computing Expo 2010 Santa Clara; just adding some more thoughts here. To clarify, this approach is more relevant to the current class of “public cloud” platform providers such as ones identified earlier in this article, as they all employ the use of heterogeneous and commodity servers, and with one of the goals being to greatly simplify and automate deployment, scaling, and management tasks.
Fundamentally, cloud-optimized architecture is one that favors smaller and loosely coupled components in a highly distributed systems environment, more than the traditional monolithic, accomplish-more-within-the-same-memory-or-process-or-transaction-space application approach. This is not just because, from a cost perspective, running 1000 hours worth of processing in one VM is relatively the same as running one hour each in 1000 VM’s in cloud platforms (although the cost differential is far greater between 1 server and 1000 servers in an on-premises environment). But also, with a similar cost, that one unit of work can be accomplished in approximately one hour (in parallel), as opposed to ~1000 hours (sequentially). In addition, the resulting “smaller pieces, loosely coupled” architecture can scale more effectively and seamlessly than a traditional scale-up architecture (and usually costs less too). Thus, there are some distinct benefits we can gain, by architecting a solution for the cloud (lots of small units of work running on thousands of servers), as opposed to trying to do the same thing we do in on-premises environments (fewer larger transactions running on a few large servers in HA configurations).
I like using the LEGO analogy below. From this perspective, the “small pieces, loosely coupled” fundamental design principle is sort of like building LEGO sets. To build bigger sets (from a scaling perspective), with LEGO we’d simply use more of the same pieces, as opposed to trying to use bigger pieces. And of course, the same pieces can allow us to scale down the solution as well (and not having to glue LEGO pieces together means they’re loosely coupled).
But this architecture also has some distinct impacts to the way we develop applications. For example, a set of distributed computing best practices emerge:
Asynchronous, event-driven design – This approach advocates off-loading as much work from user requests as possible. For example, many applications just simply incur the work to validate/store the incoming data and record it as an occurrence of an event and return immediately. In essence it’s about divvying up the work that makes up one unit of work in a traditional monolithic architecture, as much as possible, so that each component only accomplishes what is minimally and logically required. Rest of the end-to-end business tasks and processes can then be off-loaded to other threads, which in cloud platforms, can be distributed processes that run on other servers. This results in a more even distribution of load and better utilization of system resources (plus improved perceived performance from a user’s perspective), thus enabling simpler scale-out scenarios as additional processing nodes and instances can be simply added to (or removed from) the overall architecture without any complicated management overhead. This is nothing new, of course; many applications that leverage Web-oriented architectures (WOA), such as Facebook, Twitter, etc., have applied this pattern for a long time in practice. Lastly, of course, this also aligns well to the common stateless “requirement” in the current class of cloud platforms.
Parallelization – Once the architecture is running in smaller and loosely coupled pieces, we can leverage parallelization of processes to further improve the performance and throughput of the resulting system architecture. Again, this wasn’t so prevalent in traditional on-premises environments because creating 1000 additional threads on the same physical server doesn’t get us that much more performance boost when it is already bearing a lot of traffic (even on really big machines). But in cloud platforms, this can mean running the processes in 1000 additional servers, and for some processes this would result in very significant differences. Google’s Web search infrastructure is a great example of this pattern; it is publicized that each search query gets parallelized to the degree of ~500 distributed processes, then the individual results get pieced together by the search rank algorithms and presented to the user. But of course, this also aligns to the de-normalized data “requirement” in the current class of cloud platforms, as well as SQL Azure’s implementation that resulted in some sizing constraints and the consequent best practice of partitioning databases, because parallelized processes can map to database shards and try not to significantly increase the concurrency levels on individual databases, which can still degrade overall performance.
Idempotent operations – Now that we can run in a distributed but stateless environment, we need to make sure that same process that gets routed to multiple servers don’t result in multiple logical transactions or business state changes. There are processes that could and prefer duplicate transactions, such as ad clicks; but there are also processes that don’t want multiple requests be handled as duplicates. But the stateless (and round-robin load-balancing in Windows Azure) nature of cloud platforms requires us to put more thoughts into scenarios such as when a user manages to send multiple submits from a shopping cart, as these requests would get routed to different servers (as opposed to stateful architectures where they’d get routed back to the same server with sticky sessions) and each server wouldn’t know about the existence of the process on the other server(s). There is no easy way around this, as the application ultimately needs to know how to handle conflicts due to concurrency. Most common approach is to implement some sort of transaction ID that uniquely identifies the unit of work (as opposed to simply relying on user context), then choose between last-writer or first-writer wins, or optimistic locking (though any form of locking would start to reduce the effectiveness of the overall architecture).
De-normalized, partitioned data (sharding) – Many people perceive the sizing constraints in SQL Azure (currently at 50GB – also note it’s the DB size and not the actual file size which may contain other related content) as a major limitation in Windows Azure platform. However, if a project’s data can be de-normalized to a certain degree, and partitioned/sharded out, then it may fit well into SQL Azure and benefit from the simplicity, scalability, and reliability of the service. The resulting “smaller” databases actually can promote the use of parallelized processes, perform better (load more distributed than centralized), and improve overall reliability of the architecture (one DB failing is only a part of the overall architecture, for example).
Shared nothing architecture – This means a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. With data sharding and maintained in many distributed nodes, the application itself can and should be developed using shared-nothing principles. But of course, many applications need access to shared resources. It is then a matter of deciding whether a particular resource needs to be shared for read or write access, and different strategies can be implemented on top of a shared nothing architecture to facilitate them, but mostly as exceptions to the overall architecture.
Fault-tolerance by redundancy and replication – This is also “design for failures” as referred to many cloud computing experts. Because of the use of commodity servers in these cloud platform environments, system failures are a common thing (hardware failures occur almost constantly in massive data centers) and we need to make sure we design the application to withstand system failures. Similar to thoughts around idempotency above, designing for failures basically means allowing requests to be processed again; “try-again” essentially.
Lastly, each of the topic areas above is worthy of an individual article and detailed analysis; and lots of content are available on the Web that provide a lot more insight. The point here is, each of the principles above actually has some relationship with, and dependency on, the others. It is the combination of these principles that contribute to an effective distributed computing, and cloud-optimized architecture.
So far, rich Internet applications (RIA) don’t have to be concerned with data management on the client side, as most connected implementations are designed to be simply a visualization and interaction layer to data and services on the server side. In these cases, data is retrieved from the server as JSON or XML data types (or serialized binary objects for some platforms), then cached in-memory as objects and collections, part of the application, and eventually discarded as part of the application lifecycle.
McObject’s Perst is an open source, object-oriented embedded database system. Perst has been available in Java (SE, ME, and EE) and in .NET and .NET Compact Framework, and Mono. But recently McObjects ported Perst to Silverlight as well. Its small library size (~1MB including support for generics; ~5,000 lines of code) and small in-memory footprint is packed with a very sophisticated set of features, such as:
Perst inherited much of its heritage from McObject’s eXtremeDB embedded database product, which has been used in devices such as MP3 players, WiMAX base stations, digital TV set top boxes, military and aerospace applications, etc. Yes, even embedded applications and firmware can use databases.
Interestingly, the C# code tree was initially produced from the Java version using the Java-to-C# converted in Visual Studio, then some additional changes to enable specific C# features. While not all Java applications can be converted this easily, this shows that some can. And this also means how similar C# and Java are.
As an embedded database engine, Perst presents a very viable in-memory data management solution to Silverlight applications. It can be instantiated and used directly as part of the application, instead of something that runs in a separate process and memory space, or over the network on the back-end server for most RIA’s. Perst can also use Silverlight’s isolated storage to manage additional data that can be cached in a persistent manner. This enables many more complex scenarios than simply storing JSON, XML, CSV, or plain text data in isolated storage.
One can argue that the “connected client” design (as described above) works best for RIA’s, as they’re rich Web applications intended to enhance the user experience when connected to the cloud, to begin with. Or essentially, everyone can assume ubiquitous connectivity anytime and everywhere, so applications only need to live in the cloud, then all we need is a browser.
But we’re also starting to see very valid scenarios emerge to access the cloud outside of the browser. And applications in these scenarios are often more refined/specialized than their HTML-based counterparts. For example, iPhone apps, iTunes, Wii, Adobe AIR, Google Gears, Google Desktop, Apple Dashboard Widgets, Yahoo! Gadgets, Windows Gadgets, FireFox extensions and plug-ins, Internet Explorer 8 WebSlices, SideBars, and ToolBars, etc. These out-of-browser application models are very valid because it has been observed that users tend to spend more time with them, than the websites behind those clients.
There are many potential reasons for this observation, with some more prevalent in some scenarios. For example, local content is more visible to users and easier to access, without having to open a browser and then finding the website (sometimes needing to re-authenticate) and downloading the content. And of course, content and data can be stored locally for off-line access.
Now that Silverlight 3 also supports installation of applications locally on the desktop, we can add the option of caching some data locally, in addition to being a locally installed visualization and interaction layer to cloud-based services, or a rich disconnected gadget. And McObject’s Perst embedded database would again be a really compelling solution to help with managing the data.
This is especially important for out-of-browser and locally installed applications, as one of their primary benefits is to support off-line usage. Having a database management solution such as Perst can help increase the overall robustness of the user experience, by ensuring work continuity regardless of connection state.
Founded by embedded database and real-time systems experts, McObject offers proven data management technology that makes applications and devices smarter, more reliable and more cost-effective to develop and maintain. McObject counts among its customers industry leaders such as Chrysler, Maximizer Software, Siemens, Phillips, EADS, JVC, Tyco Thermal Controls, F5 Networks, DIRECTV, CA, Motorola and Boeing.
This week I had the opportunity to speak at the IT Architect Regional Conference in San Diego, on the subject of architecting enterprise SOA security. It is an interesting event, with speakers from Microsoft, IBM, Oracle, TIBCO, Fair Issac, and many other organizations. We even gave away a brand new XBox 360 and a Zune!
In a nutshell, my presentation was intended to point out the security aspects of planning an enterprise SOA, and a few topics that don't seem to be covered very often, and with an emphasis towards the future and navigating the organizational and cultural issues.
A brief overview -
Basically, some of the fundamental changes in SOA, such as:
Then of course, these changes also bring along many questions. Particularly many that represent conflicting approaches and each organization may come up with different solutions based on varying trade-offs.
In my opinion, trust-based architectures are much more flexible and scalable, and implementable by today's technology standards. And we couldn't completely eliminate trust in an impersonation/delegation model anyway. For example, a connected node/system has to "trust" service wrappers, agents, and/or local system components to verify user credentials against a centralized repository (such as Active Directory, LDAP, etc.) anyway.
On the other hand, having end-to-end security contexts is indeed conceptually more secure, as it can help better address the man-in-the-middle attacks, but in an SOA with a number of intermediaries between consumers and producers, there is still not an effective solution in managing public keys to support end-to-end message-level data encryption.
It's always interesting to try to take a peek at what may be possible in the future.
Finally, some overall talking points. One important and interesting point that was kind of new to many people is that security in SOA has to be planned and designed just like another process layer. If we overlook security and not plan it carefully, we may end up creating tightly coupled elements in the overall architecture, and impacting the agility we intended to create.
The most visible example of this is trying to implement message-level encryption for the sake of data integrity (message digests) and confidentiality. In order to establish an end-to-end security context (so that intermediaries, including the ESB, should not be able to decrypt sensitive data on transit to the destination), both the intended consumer and producer have to know exactly how to encrypt and decrypt data. And that depends on a previous exchange of public keys, which in this case had to occur directly between the consumer and producer endpoints. That in a way is tight coupling, as the consumer and producer endpoints have to know about each other, and are required to establish a one-to-one, peer-to-peer relationship in terms of public keys exchange used for encryption/decryption. To alleviate the situation, a centralized public key infrastructure can be implemented in an enterprise so that the management and decisions on public key usage can be externalized from endpoints and centralized. However, enterprise solutions in this area are still evolving, and we haven't yet seen effective solutions for doing similar things beyond the enterprise and on the Web.
Lastly, the most important point is that, just like SOA governance, security is also a huge factor of the organization and corporate culture. We have to take a process-first approach to the problem (instead of technology-first), then weave in the technology delivery part of it.
For those interested, the entire slide deck I used can be downloaded from my Windows Live SkyDrive. If you don't have Office 2007, you can download the free PowerPoint Viewer 2007.
When we look at authentication and authorization aspects of cloud computing, most discussions today point towards various forms of identity federation and claims-based authentication to facilitate transactions between service end points as well as intermediaries in the cloud. Even though they represent another form of paradigm shift from the self-managed and explicit implementations of user authentication and authorization, they have a much better chance at effectively managing access from the potentially large numbers of online users to an organization's resources.
So that represents using trust-based, identity assertion relationships to connect services in the cloud, but what do we do to authenticate end users to establish their identities? Most user-facing services today still use simple username and password type of knowledge-based authentication, with the exception of some financial institutions which have deployed various forms of secondary authentication (such as site keys, virtual keyboards, shared secret questions, etc.) to make it a bit more difficult for popular phishing attacks.
But identity theft remains one of the most prevalent issues in the cloud, and signs show that the rate and sophistication of attacks are still on the rise. The much publicized DNS poisoning type of flaws disclosed by Dan Kaminsky at the Black Hat conference (and related posts on C|Net News, InformationWeek, Wired, ZDNet, CIO, InfoWorld, PC World, ChannelWeb, etc.) earlier point out how fragile the cloud still is, from a security perspective, even at the network infrastructure level.
Strong User Authentication
Thus the most effective way to ensure users are adequately authenticated when using browsers to access services in the cloud, is to facilitate an additional authentication factor outside of the browser (in addition to username/password). Which is essentially multi-factor authentication, but available options today are rather limited when considering requirements of scalability and usability.
The aspect of designing and implementing effective user authentication, was the focus of my recently published article, "Strong User Authentication On the Web", as part of the 16th edition of the Architecture Journal. The article discussed a few viable options at implementing "strong" user authentication for end users in the cloud (not limited to multi-factor authentication), and an architectural perspective on many of the capabilities that together form a strong authentication system.
Just one of the many ways to compose these capabilities together. As we move towards cloud computing, the line between internal security infrastructure and public cloud-based services will continue to blur.