So I was talking to an old friend yesterday about building software, and we both came to the conclusion that there are some things you need to learn:

  • From books
  • From others
  • From doing

We then got onto the topic of cloud computing, he works in the same space, so it was a great context to place on our conversation.

Taking the thinking further, we both realized that when we decomposed our jobs into buckets, we arrived at:

  • Working with customers – This is the most important element, and the only way to sharpen your skills are to engage real customers, books are pretty much useless in this area, and while you can learn some tricks from others, nothing beats logging hours in the field.
  • Core engineering – This is bread and butter stuff, and doesn’t really change depending on what your working on. And it definitely get’s ticks in all three categories from above.
  • Working in a team – Again, this doesn’t change all that much, working with your own team and other teams is not necessarily specific to cloud, but when you think of it in terms of the list above, you could learn a little about it in books, but nothing compares to learning from others and from doing.

We then got to the unique parts when working in the cloud, and this ended up being more about trading lessons learned, and while it definitely leverages the core concepts above, there are some aspects which I’ve never encountered in any other type of project.

  1. Think big! Very few projects work at the scale of cloud computing projects. When you take into account the sheer size of the data center deployments, the hardware, facilities, connectivity, nothing can prepare you for the types of problems you will need to solve in terms of core engineering and software development. How do you deploy, manage, test, recover, fail-over, charge, audit, a massive, geographically dispersed, mega-distributed system? You don’t only have to build new software, you need to build new ways to test, deploy, manage and support it.
  2. Shipping! The cloud world is not like the packaged product world, it’s not even like the online services world. Not only do you need to think in small bite sized chunks, to reduce the integration and dependency workload and risk, but you also need to plan for how you get that chunk out to thousands of live servers running live customer apps, in a completely passive and non-intrusive/non-disruptive way. You also have the incumbent hardware that comes with cloud, so not only are you shipping software, your rolling out hardware, and lots of it, to places you never knew about before, I liken it to landing jumbo jets on small islands.
  3. Simplicity! If you can’t describe how you plan to define, develop, test, deploy, test, deploy, test, deploy again, and finally test once more, troubleshoot, maintain, update, and finally decommission the feature you want to build, to a layperson, then think harder or don’t do it. When you’re working in an environment where every bug is a potential go-home bug (the kind of bug that wipes every users data out or something else so catastrophic that you can never recover, you just go home and find a new job), you need to raise the bar on all aspects, and complexity is usually the root of all evil
  4. Test! I have a saying that I only started using when I started working in the cloud, “Give me a tester before you give me a dev”. The reality of the cloud is you cannot and will not ever test enough, the best you can do is test lots. And the types of tests are so diverse, starting from a one box environment for the developer, to check-in tests for unit completeness, integration tests for fit and flow, perf/scale/load tests, environment tests for datacenter compatibility, regression testing, pen testing both at the edge and at the platform boundaries. Key take away is that quality matters so much more in the cloud because the potential impact of a bug is much larger and unpredictable that most other environments. You also want to build self-testing into your code, you want your code/feature to be cognizant of where it is, so it can ask questions of itself, and react accordingly. Think of it like the safety switch on a power point, make sure your code doesn’t need to touch the iron to know it’s hot!

Anyhoo, just some thoughts from my time as an apprentice of the cloud.

Enjoy :)

Technorati Tags: