Class Description: This Cloud Computing course at SQL University explains the Distributed Computing paradigms used by major vendors, and covers information useful to the data professional for implementing proper architecture designs.
Pre-Requisites: General computer programming data development terminology, industry experience in at least one of those disciplines
Instructor and Bio: Buck Woody – Bio available at: http://buckwoody.com
Class Detail: In this class we’ll focus on:
· What cloud computing is
· Where it can be used
· How it applies to you and your organization
Each day there will be a lecture, along with homework for the next class session. There will be a comprehensive final exam – it’s contained primarily in your work environment!
Class 3 – Cloud Computing – Objections to Cloud Computing
Welcome to the third class in SQL University on “Cloud” computing. If you haven’t had a chance to take a look at class one yet, you may want to switch to that post and learn about the definition of cloud computing, since I’ll be using those terms today.
In computing’s short history, we’ve moved from a centralized model (mainframes and large datacenter systems) to a distributed set of computing (LANs and WANs). We’ve worked quite hard over the last couple of decades moving away from the mainframe, to owning the systems and infrastructure where we run our code.
The “cloud” however, is a kind of return to centralized computing, at least as far as control of the underlying systems applies. So it’s natural, especially at the non-architect level of IT, to question parts of the cloud paradigm. Most often there’s a misunderstanding that an organization should take all of the on-premise infrastructure and code and move it to “the cloud”. This would be a mistake. Distributed Computing systems like Windows and SQL Azure are well suited to certain situations (see class two) and not as well for others.
There are, however, some legitimate concerns about moving to a Distributed Computing environment. And there are some equally valid responses. In today’s class I’ll address the three primary objections I’ve heard most often.
“I’ve spent 20+ years moving to direct-connected computing. When I hear “Cloud” I think I need to replace everything I have, a very daunting task mixed into my already busy day.”
The answer to this concern involves two areas – computing history and technical vision. As I mentioned earlier, we have moved from a centralized model once before – in fact, I hear the exact same concern there – to a distributed LAN environment. Now the industry is moving to a more centralized model using a Distributed Computing paradigm (some computing on-premise, some in an IaaS, some in PaaS, others in SaaS), and there is some inertia to get over for it to take hold.
It’s also important to think about the vision of computing, and its purpose. If the goal is to control hardware and install operating systems, then on-premise only is a good way to go. But that should be a means to an end – the goal should be to enable the organization with technology. After an honest, thorough investigation, perhaps one of the cloud paradigms makes sense for a certain application. In that case, implement that application and move on to the next investigation. Many shops have already done this with things like Payroll or web services. They simply use a SaaS for that.
“I know my own security. I control the process from end-to-end, and my data is within my four walls. If I put my programs on the web I’m faced with the client, network, and vendor’s levels of security. That’s just too big a risk.”
Probably the most frequently asked question (at least directly) is about security in the cloud. And it’s a valid concern. When data leaves your organization, you need to be certain how it will be handled and who will have access to it.
There are several ways to think about this issue. The first is to understand your data, and which parts of it require high levels of security. If a particular datum requires a very high level of security, you can simply use a hybrid approach and not put the data in the cloud at all. You can make the result of the data (such as a “Yes, customer charge is approved”) or a result (101.12) back to the calling cloud application. This is possible in Windows Azure using the Application Fabric, and in fact is at use in many locations.
Another approach is to deeply encrypt the data prior to sending it to a Distributed Computing system. There are code factories that do this, appliances and other hardware devices that allow you to encrypt and decrypt data prior to transmission. Many ATM machines use this paradigm.
The final thought here is that security is in three parts in a Distributed Computing environment: The facilities and hardware, the operating system and environment where the cloud provider runs, and the code you write (in the case of a PaaS solution). I discuss these areas further in this learning plan: http://blogs.msdn.com/b/buckwoody/archive/2010/12/07/windows-azure-learning-plan-security.aspx
“We’ve worked really hard tuning everything from the hardware to the network to get our programs running this quickly. There’s no way the web can handle that kind of speed”
This concern is completely true – if the architecture of the application doesn’t change. The basic premise for a Distributed Computing system is that you should always try and co-locate the data payload close to the computing resources. In Windows Azure, you can choose where your data store is, and a wise choice is often to co-locate the code and the data.
Once again, making the selection for the proper application to move to the cloud is essential. If you plan to import and export terabytes of data each day to support a Business Intelligence system, then the latency question becomes a bigger issue. If, however, you can locate all that data in the cloud to begin with, process it there and deliver a much smaller report on a screen, the problem is mitigated. It all goes back to design and application candidates.
Even with large sets of data, caching, trickle-feeds and so on can also work in your favor. There is more on this topic specifically addressing SQL Azure here: http://blogs.msdn.com/b/cbiyikoglu/archive/2010/01/05/evaluating-application-performance-and-throughput-in-sql-azure.aspx
Along with the references shown above, check out “The Case Against Cloud Computing” at CIO Magazine: http://www.cio.com/article/477473/The_Case_Against_Cloud_Computing_Part_One