Tell average naïve developers that their team is embracing DevOps, and panic will fill their eyes. Their hearts will race, their muscles will tense, and their resumes will reinvigorate. DevOps is the bogeyman to unfamiliar developers. The thought of being on call 24 hours a day, 365.25 days a year, to support the crappy code they wrote is enough to give developers the shakes.
I’ve been there. When my team first switched to DevOps, I was filled with visions of my son’s championship baseball game being interrupted, of being woken up at all hours of the night, and of sacrificing all the great food, gaming, sports, hobbies, and social events that life has to offer. I was terrified. I was also an ignorant imbecile.
Like a sappy movie bogeyman, DevOps actually turns out to be your best friend. Sure, at first it’s scary and mysterious. Then it’s obnoxious and messes with your friends. But in the end, you figure each other out, come to terms, and become best buddies. Soon, you wonder how you ever worked any other way. Think I’m delusional? I think you’re an ignorant imbecile. Read on, and decide for yourself.
On a DevOps team, development team members work directly with operations. When customer-impacting failures occur that operations can’t resolve, the development team is responsible for fixing its code, day or night, every day of the year. That’s right, developers like you and me.
The first people to see customer-impacting failures are typically tier 1 operations. These folks generally work in shifts, watching for alerts of failures, entering them into a tracking system, and escalating them to tier 2 operations if the failures appear to be persistent and serious.
For each tracked issue, tier 2 operations follow the troubleshooting guide written by the development team. The guide provides step-by-step instructions to identify and resolve common, correctable failures (everything from “try rebooting” to “examine the expiration date on the certificate and follow the following procedure if it’s expired”). If the troubleshooting guide doesn’t identify and resolve the issue, tier 2 escalates the issue to tier 3. That’s you (and the folks responsible for dragging you out of bed).
If the alerting system is smart enough to ignore transient failures and spot patterns, then tiers 1 and 2 can be combined. If the alerting system also has automated recovery, then tier 2 doesn’t require as many folks. If your code is robust and your troubleshooting guide is simple and comprehensive, then very few issues make it to tier 3, which means you get to sleep.
Shortly after your team switches to DevOps, you realize that service issues are constantly randomizing your team. (I’m assuming a typical dev team of 5 to 50 people.) A little soul searching leads to the following insights:
Notice that DevOps may mess up your life and personal relationships initially, but only because it forces you to do work you should have done in the first place—tough love.
If management expectations of speed and shortcuts kept you from doing the right things before, DevOps puts that shortsightedness up against the harsh reality of unhappy customers. There’s no shortcut to quality. Once the old technical debt is paid off, life returns to normal, only your development team is now working the right way.
I write more about paying off technical debt in Debt and investment.
Some teams stay with the weekly on-call schedule for years, but sophisticated teams take their insights a level deeper.
Again, notice that DevOps drives the development team to behave the way it should—writing great code with high availability, comprehensive instrumentation and troubleshooting, and reliable monitoring. Life is good for your team and your customers!
Instrumentation and troubleshooting instructions go together like hand and glove. The instrumentation provides the error codes and context that operations uses to search the troubleshooting guide. Inadequate instrumentation makes the guide useless. An unclear guide leaves the issue unresolved and soon escalated to you, the developer.
If your team doesn’t ship services, you might think DevOps is someone else’s problem. Think again. Not only will everyone soon be moving to a services model (even apps), but making development responsible for the quality of the main branch is completely analogous to DevOps.
Reverse integrating your team’s branch to the main branch is just like releasing an app or service to production. The main branch build runs at night, just like production. If your changes break the build or product functionality, you make hordes of people angry, just like a service break. The way to determine what’s wrong quickly is through great instrumentation. The way to alert yourself and your fellow team members to issues early is through build monitoring, more commonly known as unit and acceptance tests. Owning the quality of the main branch is a great introduction to DevOps.
We’re all headed toward a DevOps world, and it’s a wonderful place to be.
In a DevOps world, no one gets away with writing crappy, untested, poorly instrumented code. Lazy developers who take shortcuts must constantly stay up late at night, suffering the consequences until they wise up.
In a DevOps world, strong developers enjoy their free time with family and friends. They know issues with their code will be rare, because their code is well-tested and designed to be resilient to failure. In the unusual event that a problem does arise, they know the time to fix it will be short due to their comprehensive instrumentation and troubleshooting instructions (automated or otherwise).
In a DevOps world, customers are delighted with high-quality software that rarely fails—not because the engineers got any smarter, but because they finally started working the right way.
It’s time to do what we should have been doing all along—writing great code, great instrumentation, and great tests. It’s time to embrace DevOps.