-
It's the end of the fiscal year. Most engineers associate this time with performance review season, but for principal-level engineers and higher it's also executive review season. Time to waste weeks of your life writing slides for executive presentations that will be rewritten five times before they are never presented.
Executive reviews aren't a waste of time—occasionally you need an experienced, authoritative voice to blow apart your assumptions and refocus your efforts on desired business results. Preparing isn't a waste of time—being forced to explain yourself to others always helps your thinking, and I've got no desire to look like an idiot in front of the person who approves my compensation. The real waste of time is the focusing on getting the right slides for every season and situation instead of getting the right strategy.
Smart, high-level people simply don't know how to cope with executive reviews. They think it's a time to show off instead of a time to listen. They respond inappropriately to executive criticism of their badly presented, unsuitable slides. I've done it too—it's a trap set by our superiors filling out the poor templates dictated by their superiors. It's the misinformed leading the uninformed. Well, now it's time to break that cycle, avoid the pitfalls, and focus on what matters—valuable feedback on your clear and concise plans.
Wisdom to know the difference
Why do so many otherwise intelligent people bungle executive reviews? I believe there are two reasons—exuberance and confusion.
§ Exuberance over the importance of the moment. It's so important that we must cover every detail rather than focus on what's important. Twisted huh? Who defines what's important? The executive. What's important to the executive? Ask. Press for clarity. Don't accept second-hand smoke from an assistant. It's not good for your health.
§ Confusion between executive reviews and presentations. Both use slide decks. Both involve presenting. The difference is that in presentations the presenter is in control. In executive reviews, executives and their posses are in control. Failure to recognize this difference leads to failure in your review.
The secret of my success
How do you handle executive reviews successfully, obtaining all their potential benefits? Here's a three-step guide:
1. Learn what is important to your executive.
2. Present that information in three slides or less.
3. Respond to questions with insight and to feedback with thanks.
That's it. Let's break it down.
A riddle, wrapped in a mystery, inside an enigma
You first must learn what's important to your executive, from both an informational and a philosophical perspective.
From an informational perspective, what does your executive want to know about your project? Alignment with other projects? Financial contribution? Market share impact? Value proposition? Competitive response? You want to learn how your plans are being evaluated.
From a philosophical perspective, what principles are most important to your executive? Transparency? Alignment? Loyalty? Integrity? Self-confidence? You want to learn how you are being evaluated.
How do you learn your executive's informational and philosophical perspectives? Ask peers who've already been through a review with your executive. Ask your boss and your skip-level boss. You can even get thirty minutes on your executive's calendar. Be sure not to believe any one individual, but see patterns in feedback from multiple sources.
Easy as 1, 2, 3
Now you are ready to create the slide deck. You need only three slides—the current situation, the desired situation, and the tactics to get from the first to the second. That's all you should have. Remember, you are not in control of the review—the executive is. All you can hope to do is frame the discussion. All the other slides can either be cut or left as appendix slides for reference.
The current situation slide may be a problem statement, a current scorecard, or a recap of progress to date. The proper context and information depend on what's important to your executive, which you learned earlier.
The desired situation slide may be a solution, a target scorecard, or a going-forward strategy. It should align with your first slide and resolve the issues it raised.
The tactics slide may be a timeline, bullet list, or table of next steps, typically also indicating risks and mitigations with associated asks. Careful what you say on this slide. It tends to convert directly to commitments.
Eric Aside
The tactics slide seems like it covers a lot of information. Of course, all three slides do. You need to understand how your executive likes information presented.
· Does she like eye charts with all the information on the slide?
· Does she want summary slides and expects you to have all the details in your head?
· Does she prefer detail slides in the appendix that she can reference?
I read the instructions
Many executive reviews require you to follow a template with predetermined slides for you to fill-out. Having a template is terrific—it adds consistency and clearly sets expectations. Unfortunately, most templates are hideous with four or five times the number of slides needed. The templates are either ancient or produced by assistants.
How do you handle a fifteen-slide template? Pick out the three crucial slides—the ones that describe the current situation, the desired situation, and the tactics to get from the first to the second. Get those three slides right—then ensure the other slides align consistently with your crucial slides. This keeps you on message no matter where the executive takes the conversation. If possible, skip past the other slides and stick to the crucial three.
The purpose of the review is to get valuable feedback on your clear and concise plans. By aligning your slides and staying focused, you can frame the discussion and get the input you need.
Eric Aside
A handy tip is to send out a pre-read three days before the review. Write a one-page document that addresses the executive's key questions or provides context that attendees should know before the meeting.
The pre-read helps the conversation stay focused. It also gets the executive in the right frame of mind and provides the opportunity to send clarifying questions to you in advance so you can better target your presentation.
Oh behave!
The last step in my three-step guide to executive reviews is about how to behave. First and foremost, do not be intimidated or enchanted by the executive. Executives used to have your job and they miss it (just ask if you don't believe me). They still use the bathroom. They still embarrass their kids. Do yourself a favor and get over yourself. Act like you've done this before.
The executive will typically take two actions during your review—ask questions and make comments. Most of the time you want to take notes and keep your mouth firmly shut. Use duct tape if you must. Abraham Lincoln said, "It is better to remain silent and be thought a fool than to open one's mouth and remove all doubt."
When should you say something? When you have something insightful and relevant to say. "I agree" doesn't cut it. "We're doing that" doesn't cut it. "When we fixed that issue our support calls dropped 67%" might be worth mentioning if it's relevant. When you open your mouth, be sure you are adding value, staying respectful, and being concise. "Yes," "No," and "I don't know" are often sufficient. The rest of the time you should listen.
Eric Aside
If you need help being succinct, tell a friend at the review to give you a signal when you need to stop talking—perhaps a flashing neon sign.
As I mentioned in "I'm listening
Remember, executive reviews aren't the time to show off. They are the time to receive valuable feedback on your plans. The time to show off is when you deliver on those plans.
How'd I do?
When the review is finally over, you'll likely feel awful. The executive asked many tough questions and made a bunch of pointed comments. Was it a disaster? How can you tell?
Luckily, it's easy to tell how the review went. The key is the level of detail the executive discussed, not in the number of positive or negative comments. If the executive's questions and comments were all general and high level, then the review did not go well. The executive was questioning your basic strategy and assumptions. If the questions and comments were in the details, then the review went quite well. The executive agreed with your strategy and approach and was giving you feedback on small pieces.
Either way, executive reviews aren't about giving you an ego boost. They are about getting valuable feedback on your plans. Learn the information and principles the executive cares about, present your plans as simply as possible within that context, and be a professional during the review, and you'll get all the feedback you need to succeed.
Eric Aside
Are executive reviews really necessary to operate a business? At times they seem to act as a crutch for a lack of organization and aligned planning. When an executive staff's plans are aligned and the organization is built to match the plan, then all the executive should need to run the business are regular staff meetings and coordination meetings with executive peers. It also helps if the executive is accessible to everyone in the organization through blogs, one-on-ones, events, and casual encounters. A number of Microsoft organizations are now run this way.
-
The annual engineering awards are being given out this week at the Microsoft Engineering Forum. Annual reviews will soon follow. These are great opportunities to recognize impactful work. It's too bad most managers are tragically ignorant of how to recognize their employees or truly why they should.
If you are a manager, you're probably relishing this opportunity to heckle all those bad managers. Guess what? I'm talking about you. You don't know how to use recognition properly. You don't know why you should. "But I'm great at recognizing my employees," cries a clueless manager. "I'm always congratulating my team—I even shaved my head once." Let's call this manager, "Chaos."
Chaos thinks recognition is all about morale and motivation. Certainly, there are real benefits for Chaos being a team cheerleader. But if that's the depth of his use of recognition, then chaos is what he'll get.
Everybody wants results
What Chaos fails to recognize is that recognition is a form of reinforcement. Reinforcement drives behavior. Behavior drives results. Results are king at Microsoft. Good recognition focuses on reinforcing desired results (and correcting undesirable ones).
If you aren't thoughtful about the results you seek, recognition can easily drive detrimental behavior. For example, Chaos shaved his head when his team met a tough milestone. To meet that milestone, team members cut corners on quality and deferred fixing structural issues. Those issues increased the "bug debt," prolonged stabilization, and reduced release quality. More importantly, Chaos' team members learned that Chaos rewards cutting corners and doesn't respect quality.
Eric Aside
Cutting corners for rapid development is desirable for prototypes and new ideas, where learning about the problem space is the result you seek.
The end may justify the means
"Yeah, but we hit the milestone. We built team unity. That means something!" claims Chaos. Do the ends justify the means? No, not with your narrow view of the ends.
The ends for Chaos were a united team hitting a tough date. However, others ends were achieved—increased bug debt. If the ends had been defined as a united team hitting a tough date with zero bug debt, then the means can be left to the team.
Chaos is getting cynical. He says, "So if the team robbed a bank to get money to pay a vendor to reduce bug debt then that would be okay?" No, it wouldn't. Chaos has introduced another end—jail time. Define the ends as a united team legally and responsibly hitting a tough date with zero bug debt, and that problem is avoided.
It's not easy to carefully define and communicate the ends you seek. It's much simpler to follow Chaos and reward haphazard results, ignoring the unintended consequences. However, setting clear expectations and recognizing your truly desired results pays huge dividends:
§ You micromanage less.
§ Your team innovates more.
§ You get the results you desire.
§ You have more time to focus on developing and recognizing the great work of your team.
The time has come to act and act quickly
Do you wait till the ends are achieved before recognizing them? Recognition is most effective when given immediately after achieving a goal. But you also want to recognize all the intermediate results that led to the end result.
For example, on the way to your united team legally and responsibly hitting a tough date with zero bug debt, your lead ran a great design review. You should immediately recognize her effort saying, "That was a terrific design review today. I love how you listened to everyone's opinion without critique. I also liked how you summarized that feedback and improved the design. Correcting those errors and misunderstandings early will save us significant time and lead us toward hitting our date with zero bug debt."
Notice the keys to great recognition:
§ It's immediate—the same day, or even better, the same minute.
§ It describes precisely what you liked—as opposed to the generic "nice job."
§ It's tied to the desired end result—reinforcing the real value of the effort.
You may not have the time or opportunity to recognize every positive step toward your end goal. However, you should keep your eyes open and address as many small victories as you can. Your team will love it, they'll better appreciate your expectations, and they'll be better aligned toward your desired result.
Let us celebrate
Chaos wonders, "Day-to-day recognition is great, but doesn't it detract from the big celebration at the end? Was I wrong to shave my head?" Chaos' grooming habits aside, a celebration marking a major achievement is just as important as day-to-day recognition. However, because it's a culmination of a long-term effort it's not as simple as sharing a few comments or shaving your head.
First and foremost, recognition should be for a carefully defined desired result, not something arbitrary like "great effort" or "someone said something nice." For example, don't give out an award for "teamwork." Give out an award for "teamwork across divisions and geographies that led to delivering a complete customer scenario."
Once you have clearly defined criteria for the recognition, plan the celebration. Here's what makes for a lasting positive experience:
§ Make it personal—if you defined the goal, the recognition should come from you. Include some personal remarks that express the spirit behind your goals and the way individuals or the team embodied that spirit.
§ Make it public—public ritual is important. It carries meaning, builds community, spreads the message, and creates a shared sense of purpose.
§ Make it lasting—the associated award should be substantial to the senses—heavy, bold, eye-catching, and tactile. Shaving your head meets many of these criteria, but so do heavy physical awards (like the Oscars) and giant signed banners on heavy poster board.
These guidelines give your celebration enduring emotional impact. Food or money don’t cut it.
I'd like to thank the Academy
The last element to consider is, "Do you give the recognition to the team or to the individual?" While you can certainly give the whole team a celebration and a day off, recognition for carefully defined desired results should be called out to a small number of recipients (typically, no more than five). Any more than that dilutes the impact.
Why is the impact diluted when given to more than five recipients? Because when you recognize a larger group, there are sure to be individuals who didn't embody the spirit of your goals. The message you send is that tagging along is just as good as being the driver. It isn't. Who originated the action? Who drove it to fruition? Those are the people at the center and in the lead. Those are the people you recognize and encourage others to emulate.
Eric Aside
I've been involved in the internal Engineering Excellence (EE) Award program for some time. It recognizes broad improvement in the way we engineer our products and services. The criteria are around demonstrating that improvement and making it available for others (business and customer impact, plus adoption). The recipients are those individuals who came up with the initial idea, first put it in practice, and drove adoption. The ceremony has ritual (Bill Gates used to host it) and the awards themselves are bold, heavy, eye-catching, tactile, and will last a lifetime.
Over the past several years, the EE Awards have recognized and encouraged dramatic improvements in our engineering—more secure and reliable products, better customer feedback, and broader language support. Reviewers in specialized areas are noticing the difference. Hopefully, when enough improvements are broadly in place, everyone will notice.
An interesting side note: a few years ago we gave an EE Award to a tool that consolidated a large number of duplicate efforts, gained broad adoption, and automated processes that improved our software. Unfortunately, the tool itself wasn't well engineered, which hurt productivity and reflected poorly on the award program. A great example of needing to carefully define the ends you seek.
All right, let's review
Now that you know how to recognize your employees and why you should, how can you apply this to the annual review process?
§ Describe what you liked (and disliked) about the results your employees achieved and how they achieved them.
§ Talk about the small results and the big results. Tie them together.
§ Set carefully defined goals for the future that will drive personal development and stronger results for your business and customers.
Clarity around expectations tied to consistent feedback described in plain language is critical to getting great results, and a happy and secure team. Recognition requires careful thinking and deliberate action, but it's not difficult to do. Avoid the chaos and you can even keep your hair.
-
Call me "old school" but I believe in shipping. Trying isn't enough. Getting close isn't enough. Good ideas aren't enough. You've got to ship.
It used to be that interviews started with, "What have you shipped?" If you hadn't shipped recently, "Why?" Why? Because you can't deliver customer value if you don't deliver. You can't iterate and improve without finishing an iteration. You can't get customer feedback without customers.
People used to complain that promotions and rewards were disproportionally distributed to those who shipped. I say, "Absolutely, that's how it should be." Does this hurt quality? No, you set a high minimum quality bar and ship. Does it hurt innovation? No, innovators have always risked an initial drop in pay to receive a big payoff should they deliver.
Eric Aside
Some people complain that the big payoff doesn't exist at Microsoft for innovative ideas. Those people haven't shipped. The people who successfully ship innovative ideas are the ones who become our organizational and technical leaders.
It all starts with shipping. This is particularly apt with services, where everything literally starts with shipping, and where I'm focusing the rest of this column. Our critics claim that in the new world of services Microsoft has forgotten how to ship. Perhaps, but Microsoft has forgotten more about shipping than most companies will ever know. We just need some reminders and reeducation, especially when it comes to services.
Eric Aside
Does a focus on shipping drive death marches? No, death marches delay shipping. As I wrote in “Marching to death” (chapter 1), death marches result from a lack of planning and courage. This is particularly important to understand in the services world where sustainable shipping is critical to long term success.
I offer you my service
How much about shipping services has Microsoft forgotten or doesn't get, according to critics? Not as much as they would have you believe, but enough to make you think. Let's go over the herrings and the heartaches, mixed with a little happiness.
The red herrings:
§ Services make you think about everything differently.
§ Services center on data while packaged products center on functionality.
§ Services have greater security concerns than packaged products.
§ Services have serious issues with dependencies.
§ Services demand higher quality and faster iterations than packaged products.
The heartaches (and happiness):
§ Services run across hundreds of machines, not on a single client.
§ Services must scale out automatically.
§ Services are easier to switch than packaged products.
§ Service upgrades hit everyone instantly.
§ Services are living, changing things.
Let's break these down, starting with the red herrings.
What is that smell?
The first services red herring is a big one, "Services change everything." As I addressed in At your service
The last red herring is among the most common concerns raised about why shipping services differs from shipping packaged products—high availability and Internet time. Look, it's not okay for packaged products to never work or require a reboot every time you use them; at least it hasn't been for quite some time. The quality bar is no different for services, though there are plenty of services that fail constantly.
As for Internet time, that hit packaged products a decade ago with the introduction of Windows Update. And if you think that those patches are just security fixes, you haven't been paying attention. More and more we are fixing all kinds of experience issues shortly after customers report them, for services and packaged products. That's a great thing for customers.
However, gradually improving the customer experience every month or every day isn't enough. Both services and packaged products need to ship significant, orchestrated updates to deliver breakthrough customer value. Facebook wasn't going to gradually update itself into Twitter any more than Vista would gradually update itself into Windows 7. You must focus on what the customer is trying to accomplish, and sometimes that isn't a quick change.
Eric Aside
The best way to learn how to ship is to do it early and often. Make every build a shippable build. Build every day and rebuild the entire system at least every week. Deploy regular tech previews and betas. Deploy regular incremental updates and fixes into production. Ship early, ship often. Practice makes perfect.
There are too many of them
However, not everything about shipping packaged products applies to shipping services. There are mental, process, and team adjustments that you need to make.
First and foremost is that services run across hundreds or thousands of machines dispersed in multiple data centers worldwide. Sometimes functionality and data are replicated. Sometimes functionality and data are specialized. Usually, it's a combination of both for scale and reliability. Naturally, this presents design and synchronization problems but plenty of books have been written about that (read don't rediscover). The less obvious challenges are around debugging and deployment.
Why is debugging a service so tough? Timing issues are killer given multiple threads on multiple processors across multiple machines. Yikes! However, that's not even the toughest challenge.
What's the first thing you do when debugging an issue? Analyze the stack, right? With services the stack is split across servers and requests, making it nearly impossible to trace a specific user action. The good news is that there are new tools that help tie user actions together across machines. The bad news is that this isn't the toughest challenge either. The toughest challenge is that you're always debugging in the live environment. You don't get symbols, breakpoints, or the ability to step through code.
So let's recap. Debugging services means debugging nasty timing issues across multiple machines with no stack, symbols, or breakpoints on live code. There's only one solution—instrumentation—and lots of it, designed in from the beginning, knowing you'll soon be debugging across live machines with no stack, symbols, or breakpoints.
They're multiplying too rapidly!
Solving debugging brings us to the other huge challenge—deployment. Deployment needs to be completely automated and lightning fast. We're talking file copy installation, with fast file copy. No registry, no custom actions, and no manual anything.
Why does deployment need to be so fast and simple? Two reasons:
§ You're installing onto hundreds or thousands of machines worldwide while they are live. Installation must work and work fast with zero human intervention ever. The slightest bit of complexity will cause failures. Remember, five minutes times 1,000 machines equals three-and-a-half days. It had better just work.
§ The number of servers needs to grow and shrink dynamically based on load. Otherwise, you are wasting hardware, power, cooling, and bandwidth in order to meet the highest demand. Because your scale depends on load, it can change any time. When it changes, you need to build out more systems automatically and instantly.
The happiness around deployment is that Azure will do most of the heavy lifting for you (so let it, don't reinvent). However, you still need to design your services to support file copy installation.
Life is so uncertain
Enough of the challenges you can predict, how about the unpredictable ones? The services landscape is in constant change. While some services are sticky because they hold your data (like Facebook or eBay), many aren't sticky at all (like search or news). A few minutes of downtime can cost you thousands of customers. Data compromise or loss can cost you millions of customers. They'll just switch. Our competitors will be happy to accept them. It cuts both ways so you need work hard to both welcome and keep new users.
When you update a service everyone gets the new version instantly, not over years. If there's a bug that only one customer in a thousand experiences, then that bug will hit thousands of customers instantly (law of truly large numbers). That means you need to resolve the issue quickly or rollback. Either way, it's a bad idea to update a service on a Friday and a good idea to have an emergency rollback button always at the ready.
Finally, it's important to realize that services are living, changing things. You'd think that because the servers are all yours with your image and your configuration that it would be a controlled environment—and it is until you turn on the switch. Once the server goes live, it changes. The memory usage changes, the data and layout on the disks change, the network traffic changes, and the load on the system changes. Services are like rivers not rocks. You can't ship and forget services. They need constant attention. To make your life easier, bake resilience in by automating the five Rs—retry, restart, reboot, reimage, and replace (though replace may require human hands at some point).
The happiness that comes with these heartaches are customers willing to switch; an ideal idea testing platform because you can show customers different ideas and see which they prefer on a daily basis; and the ability to ship now and find the tricky intermittent Heisenbugs later (using your five Rs resilience to keep up availability).
Back to basics
There you have it. Some food for thought mixed in with the old basics of writing solid code that focuses on customers and their goals.
However, none of this is worth anything without shipping. Make shipping a priority and we all win. Sure, the quality bar has gone up, but we're not kids selling lemonade anymore. We need to ship quality experiences regularly, on both long and short time scales. We need to ship on the Internet, on the PC, and on the phone. We need to serve our customers well and delight them into sticking with us. It's a long journey, but it doesn't start until we ship.
-
During difficult economic times like these, people tend to whine less about common complaints that now seem trite. Mostly, I'm relieved not to hear how much e-mail is in Ingrid's Inbox, how Brian broke the build again, and how Suresh's service schedule slipped successive sprints.
However, it's during difficult days that we should patch plaguing problems. When are you going to be more motivated to mend malignant maladies? Surely, no additional alliteration is advisable.
A surprising number of common issues can be solved using two simple techniques—single-piece flow and checklists. There's a ton of behavioral and process theory behind why these simple methods are effective. The point is that they are effective and you and your team are less effective without them.
All too easy
Take Ingrid's Inbox. Like most Inboxes, Ingrid's is overflowing. She spends tons of time on it, yet it only gets bigger. She constantly loses track of mail, discovers mail, and revisits mail. It's hopeless.
Ingrid would have her mail under control if she followed single-piece flow. Single-piece flow tells her to handle one piece (one message) at a time till it's done. By "done" I mean she'll never look at that message again (except to answer a related message).
Here's how it works. Each time after Ingrid reads an e-mail message she does one of four actions:
1. She deletes the message (my favorite).
2. She files it away in a folder.
3. She forwards the message to someone else and then deletes or files it.
4. She answers the message and then deletes or files it.
That's it. She never opens a message and then leaves it in her Inbox. By the end of each day, every day, her Inbox is empty. She never misses a mail message, loses a message, or revisits a message.
Single-piece flow is efficient because it removes overhead and rework. There's overhead every time you context switch to look at a new message, and there's rework when you re-read a message. In single-piece flow, overhead is minimized and rework is eradicated.
Eric Aside
Sure, there are exceptions to the read-it-once rule, but they account for less than 5 percent of the mail I get in a day. Even those e-mail messages get filed in a special folder for more intensive attention. I also use Inbox rules to pre-filter mail from discussion groups.
If you're wondering how to get started and don't want to just delete all the mail in your Inbox, follow these steps:
1. Take mail you are actively working on right now and move it into a special folder.
2. Create folders for your remaining mail that correspond to your obligations and interests.
3. Search your Inbox for mail that fits each folder and move that mail into the folder.
4. Go through your special folder and clear out 90% of it using single-piece flow.
Now you are on your way.
Deja vu – all over again
Brian keeps breaking the build. You can punish Brian till he's afraid of checking in code regularly, but doing so only causes other problems.
A better solution is to give Brian a checklist. Checklists are wildly misunderstood, improperly developed, and underutilized. Regrettably, well-meaning compulsive people list everything possible Brian should check before he submits code to the build. That's not only a waste of time, it's also ineffective.
Brian's checklist should list only common causes of build breaks (less than one page’s worth). It should be in Brian's sight when he submits code. The goal is to be quick, easy, and effective.
Too long or complex and Brian won't follow it. Too short and it's not effective. Luckily, most mistakes are common mistakes. Thus, all the team needs to do is collect a list of the common or critical causes for build breaks and turn that into a checklist. The same is true for design review lists, code review lists, and all checklists.
Remember to update your checklists as your failure patterns change or they will become stale. Checklists prevent common errors and promote good habits. Any structured manual process you follow should have a simple checklist—unless you like being Brian the build breaker.
Eric Aside
You may have battles deciding what goes in a checklist. You shouldn't. Remember, you list what the data says are the common or critical failure points, not everything that ever happened.
For example, I added a checklist to my e-mail signature several months ago, which I check and then delete before I send every mail. It lists the two mistakes I've made for years, but haven't made since:
· Check the Cc and Bcc lines for undesired aliases
· Ensure the subject line is accurate
Slip sliding away
Suresh's software squad slipped their schedule on successive sprints. Bad news for a service—or any project. Is it time to work weekends? No. Is it time to slap the squad silly? No. Is it time for single-piece flow and checklists? Yes.
Suresh's squad is made up the usual way—a few PMs including a service engineer, a bunch of developers, and a similar-sized bunch of testers. The PMs write specs, the developers code them, and the testers test them. The squad is using Scrum-like Sprints with a nicely prioritized backlog of features.
Unfortunately, the PMs are creating specs faster than can be implemented. They waste time and effort on specs that change or are cut before they see daylight. The developers and testers jump from feature to feature as they get blocked. Nothing is in sync. Nothing gets finished. The schedule slips incessantly. In retrospectives, all Suresh's squad talks about is unblocking people.
Even though Suresh's squad is using Scrum-like Sprints, they aren't using single-piece flow. They aren't splitting into cross-discipline teams, with each team working on one feature at a time till it's done. They don't even have a checklist that defines done. It's doomed.
Once they create feature teams that spec, code, and test one feature at a time (sometimes called, feature crews) and a checklist that defines done, there's a chance for progress. The single-piece flow removes blocking issues because everyone is working on the same problem. Instead of jumping ahead, the checklist keeps the team honest, motivating them to work together, finish, and stay focused. Now the team doesn't waste time context switching and can tell how long it really takes to complete a feature, leading to confident scheduling and higher quality finished products and services.
Eric Aside
Many people wonder what to do with feature crew PMs once they finish specing, or developers once they hit code complete. For PMs there are two solutions—be part of multiple feature crews or be prepared to work with their dev and test peers on problem solving and development. For developers, the right solution is to work with the test team on tools, automation, unit tests, and component tests.
There's another clever solution written up by a former Microsoft employee, Corey Ladas. He calls it Scrumban and you can read about it on his and Bernie Thompson’s Lean Software Engineering site.
Our two weapons are
So there you have it—single-piece flow and checklists. Two enormously useful, remarkably simple, and yet woefully underutilized techniques for managing workload and building quality software.
Single-piece flow and checklists can be applied to individuals, small teams, and large divisions. They aren't controversial when used pragmatically, and have years of documented case history supporting their effectiveness.
Sure, you could create a whole grassroots movement around single-piece flow and checklists, but that seems a bit overblown for such simple ideas. Maybe you should just use them wisely. Enjoy the time you get back and the improvement in your results. The best things in life are often the most basic and simple.
-
It's Midyear Career Discussion time at Microsoft. Perhaps you just finished, but more than likely you're still trying to squeeze yours in. How'd it go? How will it go? For you? For your manager? Well, that depends.
It depends a bit on your prior performance and your manager's prior performance. It depends a bit on the feedback itself and how that feedback is given. It depends a bit on how your parents raised you and the comfort of your chair. But the biggest influence on the lasting impact of your Midyear Career Discussion is the way you and your manager respond to feedback.
Let me put this delicately to you. You have no frigging idea how to give and take feedback. Seriously—not one frigging idea. Think I'm wrong? You are only proving me right. If you actually knew how to give and take feedback your response would be a sincere and polite, "Thank you."
Thanks for the advice
In fact, there are only two valid responses to feedback, "Thank you" and "Go on."
The "Thank you" is simple and self-explanatory. Too bad most people don't use it. Most people defend themselves, explain their behavior and results, and describe how they are already taking the right steps.
Please, slowly and carefully shut your mouth, empty your mind, and listen. Perhaps you can even take notes. Then, when the generous soul is finished, say, "Thank you."
You don't say it to be polite. You say "Thank you" because you mean it. Your relationships, your life, and our products and services would reek far beyond their current stench if people were not kind enough to provide an outside perspective and help us improve. Thank goodness they are willing to do it. To ensure they continue it's essential to sincerely appreciate it.
Tell me more about me
In addition to "Thank you," a valid response to feedback is "Go on." As in:
§ "Could you talk more about that?"
§ "I don't quite understand—could you describe that further?"
§ "Thank you, that's helpful, what else can I do differently?"
Anything that encourages clear and continued feedback is appropriate.
Back off, man. I'm a scientist
What's inappropriate is anything that questions or cuts off feedback. This includes:
§ "I'm working on it." So what? You're doing the wrong thing, you haven't made much progress, or you are actually improving. Regardless, the feedback is valid and your comment is irrelevant and self-serving.
§ "I was trying to …" So that makes it better? Never confuse reasons with excuses. If you can get better, you should get better. No excuses.
§ "I disagree." So this is news? You're getting feedback. It's opinion. The fact that you like your current approach is not a revelation. When the feedback seems wrong, you've either missed something or left the wrong impression—that’s precious information.
Keep in mind that you don't have to follow whatever advice you get. All you are obligated to do is listen, consider the advice carefully, and thank the person for helping you.
Eric Aside
A particularly important time to keep your mouth closed, take notes, and simply say, "Thank you," is during an executive review.
Now it’s my turn!
Now that you know how to take feedback, it's time to learn how to ask for it and provide it. When asking for feedback and when providing it, there are three basic questions:
§ What is good [about what I'm doing or the work I've done]?
§ What could be better?
§ Any further comments?
You can structure feedback more, but the simplest, complete ask are those three questions. And it is those three questions that you want to answer when you provide feedback.
Eric Aside
Feedback is best provided immediately before or after behavior. The ideal is to provide positive feedback directly after desired conduct; and provide corrective feedback just before it’s needed. In other words, apply feedback at precisely the moment it is most constructive.
For example, a guy on your team sends a great email but forgets to copy a stakeholder. You reply to him right away, “Great mail—concise and insightful.” Later, just before he’s supposed to send the next update you write, “Remember to copy all stakeholders.” The reminder is more useful at that time.
We have come full circle
When providing your feedback, start with what's good, talk about improvements, add on your other comments, remind about improvements, and then reiterate what's good. That order is important.
§ You start with what you like. It sets up the conversation on an upbeat note and prevents the impression that all is lost. If you start with what's wrong, your listener may never hear what's right.
§ Next, you talk about ways to improve. Ideally, your listener should focus on just one change. One change is all that most people can handle at a time. Pick the most impactful improvement and emphasize it.
§ Of course, you'll have plenty of other thoughts that aren't as important. Feel free to mention those in the context of "a few other comments."
§ Then come back to your main message—the one or perhaps two improvements that would make the most difference.
§ Finish with what is going well. It's important to end on a positive note.
Eric Aside
Remember to always focus on the behavior or outcome, not on the person. People can't change who they are, but they can improve their actions and results.
We don't have much time
That's it. Being concise is important. If you want your feedback to matter it should be clear, consumable, considerate, concise, and centered on the receiver. Your feedback isn’t about you and your glorious knowledge; it is about helping the recipient.
If you are on the receiving end of the feedback you should be just as concise. Feedback is precious, whether it's from a customer, a peer, or your manager. Don't get in the way. Encourage it. Savor it. Appreciate it. Thank you.
-
As I said in Nailing the nominals, the two keys to successful big projects (100K+ LOC) are thinking ahead and defining done. Thinking ahead is about design and planning. Defining done is about setting a quality bar and sticking to it. Yet many big projects go astray even when people think ahead and define done. Why?
Often failure is due to poor executive decisions that place their own agendas above shipping. (Given you hit your quality bar, shipping is better—much better.) However, an even more frequent form of failure comes from engineering teams over thinking and over generalizing—trying to solve world hunger instead of feeding the kids in front of you.
Eric Aside
Once you've hit your quality bar, why is shipping always better than messing around with the code? After all, there may be competitive features an executive really wants added at the last minute. How bad can that be? Oh, it can be really bad.
Before you ship, everything is a guess—postponed bug fixes, key features, and design decisions are all guesses until you ship. Once you ship you learn the truth. Putting off the truth in favor of more guessing is never profitable. It upsets customers and partners, and the continuing chaos corrupts the code.
It's so tempting, so intellectually pleasing, and so self-serving to examine the customer problem you're solving and see the bigger problem, the more general problem, the problem you can solve for all time, all peoples, and all nations. Oh spare me, and spare your customers. Cultivating the green fields of broad ideas is not only self-serving, it's a recipe for feature-rich and value-poor products and services your customers will use begrudgingly, despise utterly, and abandon gleefully at the first opportunity. Green fields are full of maggots.
The horror
Yet, I can't walk down a hallway without hearing people trying to come up with ideas, algorithms, or class structures that "will work for this problem, and all problems like it." Evil. Evil! Warning! Watch out. This line of thinking is EVIL!
What's so evil about general solutions? After all, your code could be both a floor wax and a dessert topping. There are three primary pitfalls:
§ You rarely work through the full general solution in one ship cycle. The unfinished framework isn't quite right, but you've shipped it. Now you are stuck with an unworkable legacy code base—forever.
§ You introduce a broader test surface and a broader security attack surface. Neither is desirable.
§ You put the problem at the center instead of the customer. When the customer isn't at the center, your code loses its soul. It goes from being astounding to being adequate, from marvelous to mediocre.
Eric Aside
Why is the unfinished general framework not quite right? Because it's impossible to anticipate everything you and your customer need. After all, you are foolishly solving a general problem, instead of wisely solving a specific problem that you have a chance to iterate toward and get right.
You saved me from this fanatic
Before we break down the three primary pitfalls of general solutions, I need to calm down my agile zealot readers. Because agile methods put a premium on working software and customer collaboration, they tend to avoid the over-thinking trap. In particular, test-driven development and emergent design deliberately focus on keeping solutions as simple as possible and laser-targeted at customer requirements.
Because agile methods avoid the pitfalls of general solutions and general thinking, many agile zealots believe all big design up-front is vile. They are wrong. Sure, the regular refactoring and rework needed for emergent design isn't a problem for small code bases. However, when you get above 100,000 lines of code serious rework becomes extraordinarily expensive. That's why customer-focused, up-front architecture is essential for large projects.
Eric Aside
This was researched by Dr. Barry Boehm, et al, in May 2004. They found that rework costs more than up-front design on projects larger than 100 KLOC. The larger the project, the more up-front design saves you.
The good news is that many agile methods like Scrum, continuous integration, and test-driven development (TDD) work well within a large project with a customer-focused architecture. These great techniques can keep a team locked onto the customer, instead of straying off into the green fields of self-gratifying intellectual exercises in futility.
Eric Aside
TDD is an Extreme Programming (XP) and agile technique for implementation design that produces tight, robust code. As a side benefit, it provides unit tests with great code coverage. The process is fairly simple:
1. Write a new unit test for a requirement of the API or class.
2. Compile and build your program, then run the unit test and ensure that it fails.
3. Write just enough code to make the new unit test pass (and the old ones).
4. Refactor the code as needed to remove duplication.
5. Repeat until all API or class requirements are tested.
Who will save your soul?
Okay, back to the three primary pitfalls of general solutions— incomplete, unworkable legacy code, expanded test and attack surface, and losing your soul.
Say you are developing a big product like Halo or SharePoint. In either case, you can start with a general framework or you can start with a customer story.
The green fields approach is to start with a general framework. The framework supports all forms of weapons, landscapes, and content viewers. Putting together a game or a Web site is simply a matter of choosing the desired combination of weapons, landscapes, and content viewers. Bingo! You've got yourself a cruddy game or Web site.
The game or Web site has no soul because you focused on the framework instead of staying with the story. In addition, to test the framework you've got to consider every combination of weapon, landscape, and content viewer—each of which also presents an attack vector for hackers. And, should you be foolish enough to expose the framework as an "extensible interface," all these problems become orders of magnitude more diabolical.
Eric Aside
You should still have frameworks to help organize your code and classes, but the frameworks shouldn't be general purpose. They should serve a particular purpose defined by the customer story.
Anyone who has played Halo or compared recent versions of SharePoint to the original version knows the value of focusing on the customer story.
It's not gonna get any easier
What's even worse is when you design and develop Halo II and SharePoint 2.0. Inevitably, there are many considerations you didn't address in the prior release that invalidate significant portions of your framework. Unfortunately, those portions are already built and shipped, so you get all the enjoyment of legacy backward compatibility with little of the benefit. Aren't you glad you were just so very clever to focus on the general problem instead of the customer and the story?
"But frameworks and extensible interfaces are critical to our platform!" scream the clueless cretins. Yes, they are when the customer is a developer and the story involves building upon our platform. However, the customers of Halo and SharePoint aren't primarily developers. They are consumers and enterprise customers. The story line for them involves crushing the Covenant and sharing information. Focus on the customer, not on the framework.
Can I tell you a story?
What does it mean to focus on the customer and the story? It means don't solve the general problem—solve the customer problem. The problem customers are trying to solve.
It means knowing the story line (the scenario) for customers. Who are they? What are they trying to accomplish? How are they used to accomplishing it? How might our software help? What would that look like to the customer?
Remember, many of our customers are developers. It's no different for them. Who are they? What are they trying to accomplish? How are developers used to accomplishing it? How might our software help? What would that look like to the developer using our platform and tools?
Once you are focused on the customer and the story, design, develop, test, and deploy that story for those customers—nothing more, nothing less. Make that story compelling and delightful.
Along the way, certain generalizations will emerge. Avoid them unless they are needed for the story. Only engineer what's needed to realize the story for the customer. Everything more general is wasted effort because by the time you need it the story will be different.
Eric Aside
You are better off following the YAGNI philosophy of software development—only implement what you need when you need it. Never implement what you might need for a later feature. YAGNI stands for "You Ain't Gonna Need It." Sometimes the truth isn't polite.
Temptations always come along
So you code up one story (scenario) and start working on the next. Perhaps it's even the next release. Aren't you going to wish you had generalized? No. No, you aren't. If you had generalized you would have done it wrong. You wouldn't have known what you know now.
Say while working on the next story, you realize you need to add a new weapon, landscape, or content viewer. No problem—refactor the design and add what you need now that you know what it is and why you need it. The result is better code, tighter code, and better tested code because the tests can focus on the story line.
Focusing on the customer and their story line isn't difficult conceptually or even in practice, but it does require restraint. It is so tempting to solve the big problem for humanity. Resist that temptation. You can't please everyone, so do your best to at least please your customers. In time, humanity will sing your praises.
-
Plumbing channels waste water into a series of larger and larger pipes till it is expelled. That's because sewage flows downstream, which explains the quality of goods that test, operations, and sustained engineering teams receive. After all, they are downstream of design and development.
I've written about pushing quality upstream for testers in "Feeling testy" (chapter 4), and making services resilient to failure for operations using the five Rs in Crash dummies
You make the call
Should the engineers who design, construct, and test the code be the same engineers who fix the bugs found after release? This is the quintessential question of SE.
If the engineers who built the release fix the post-release bugs, you typically get better fixes, the engineers feel the pain of their mistakes, and the fixes can be integrated into the next release. Then again, the next release may not happen because its engineers are being randomized.
If you have a dedicated SE team you build up knowledge of the code base outside the core engineering team, you can potentially pay a vendor to sustain old releases, and you don't distract or jeopardize progress on new releases. Then again, SE teams get little love, their fixes can be misinformed, you duplicate effort, and the core engineering team isn't held accountable for their mistakes.
Tough call, huh? Nope, not at all. While both models can work, having the engineers who build the release also fix post-release bugs is far better. Only idiots believe a lack of accountability leads to long-term efficiency and high quality. Of course, the world is full of idiots, but I digress.
Someone's got to take responsibility
Yes, a dedicated SE team can work, but long term it will only cause grief for team members and customers. Why? Because you can mitigate post-release fixes distracting the core team, but you can't mitigate the problems with a dedicated SE team.
Let's go through those dedicated SE team problems again.
§ Little love. What would it take for the dedicated SE team to be appreciated as much as the core engineering team? A disaster, right? And what would it take on a day-to-day basis? Non-stop disasters. In other words, the conditions for loving the SE team are undesirable.
§ Misinformed fixes. To get a fix right, recognizing all the implications of changes, you need to deeply understand the impacted portion of the code base. Let's fantasize that the core engineering team has that level of depth. The core team is always considerably larger than the SE team. The SE team has no hope of truly appreciating the impact of fixes. Reality is only worse. Sure you can have the SE team consult with the core team, but doing that all the time defeats the purpose.
§ Duplicate effort. Whenever you have two teams fixing issues in the same code you duplicate effort, by construction. You've got two teams learning the same code, debugging the same code, changing the same code, and testing the same code. There's no getting around it, unless you neglect to incorporate the fixes into the next release, which is even worse.
§ Accountability for mistakes. The whole point of the dedicated SE team is to avoid derailing the core engineering team, protecting them from dealing with fixes. The core team doesn't correct their mistakes in the old code, and doesn't know to prevent those mistakes from recurring in the new code. What's worse is that there's no reinforcement of good and bad behavior. Conscientious heroes don't get to write more quality code, while careless villains fix past mistakes. Thus, we can never expect to improve. A great recipe for joyful competitors and sorrowful customers.
What do I do now?
In contrast, there's plenty you can do to avoid jeopardizing future releases while the core engineering team fixes prior mistakes. Let's run through the relentless, randomizing requests and resolve them.
§ Triviality. How do you avoid wasting the core team's time with issues that aren't software bugs, or have trivial workarounds? You have a small dedicated team triage the issues. Note this team isn't a development team. It's purely an evaluation team that determines which issues are worth fixing. That way, only worthwhile work is passed onto the core team.
§ Prioritization. How do you balance bugs fixes for the last release with work on the new release? You have the dedicated evaluation team prioritize the fixes. There are four buckets: immediate fix (the rare "call the VP now" issue); urgent fix (next scheduled update); clear fix (next service pack or update); and don't fix. These buckets send clear signals to the core team about which bugs to fix at what time.
§ Unpredictability. How do you make inherently unpredictable post-release issues easy for the core team to schedule around? You make them regular events. Deploy one update per month. The urgent fixes each month are queued up by the evaluation team. The core team sets aside the necessary time each month and the fixes are designed, implemented, tested, and deployed on a predictable schedule. This is just as good for customers as it is for the core engineering team. Everyone likes predictability.
In addition, the evaluation team can create virtual images for easy debugging by the core team, improve the update experience for customers, and reflect customer needs and long-term sustainability features back into future releases.
Eric Aside
Of course, it isn't as simple as a small evaluation team prioritizing issues. There's a bunch of orchestration and system support necessary to make SE run smoothly. That part is unavoidable. What is avoidable is duplicating effort, uninformed fixes, and ignoring accountability.
This won't hurt a bit
See, it's not that complicated. You save on staff. You get better fixes. You catch similar issues in advance. You achieve predictability. And you ensure the core engineering team is accountable for quality and learns from its mistakes. All it costs is a relatively tiny dedicated team to manage the monthly update process by evaluating and prioritizing issues. Even that team feels valued due to their differentiated and important role and their direct engagement with solving customer problems.
Yes, sewage flows downstream and no one likes cleaning it up. However, by putting some simple processes in place, you can reduce the sewage and have those responsible mop up the mess. To me that smells like justice.
Eric Aside
What do you do if you are stuck on a dedicated SE team and are experiencing little love, misinformed fixes, duplicate effort, and no accountability from the core team? Here are a few ideas:
§ Create a rotational program with the core team. Everyone spends a month or two a year on the SE team. It's not ideal, but I've already established that point.
§ Measure your efficiency and effectiveness, perhaps by the average time to resolve issues for each bucket, the regression rate, team morale, and customer acceptance of fixes (a balanced scorecard). Optimize, publish your results, and show the core engineering team how great work gets done.
§ You ship updates once a month—celebrate once a month.
-
Why? Why! Why do managers make stupid decisions that cause devastating churn and tawdry results? And it's not just managers, though they are particularly proficient at promoting poor performance—architects, leads, and individual contributors flood the lives of their team with wasteful, useless, misdirected activity, leaving us even less opportunity to deliver real value. What reason is there for this farce? Simple. We are optimizing—optimizing our obsolescence.
What kind of idiot optimizes their own undoing? The ordinary kind. You do it, your friends do it, and your boss does it. It's all those good intentions that pave the way to disaster. We optimize the wrong behavior to achieve the wrong results. It's wrong and avoidable, but hey, why think when you can cause mayhem with so little effort?
You want answers?
Let me save you some trouble and reading by giving you the answer first—optimize for desired results. It sounds simple and obvious, but people pervert that goal in so many imaginative ways that I better break it down word for word.
Optimize—Measure how good you are now, analyze how you could be better, alter your approach, and then measure again. Microsoft is great at optimizing, but we measure what's handy rather than what matters. So, we optimize for the wrong result. You can read more about this in my column, How do you measure yourself?
§ For—Have a purpose. Optimizing for the sake of optimizing is purely self-gratification—don't do it in public. Instead, be deliberate about your purpose. Think it through. Know what you are doing. Be a professional. Wake up.
§ Desired—Focus on what you want, not what you don't. This is a common trap. People optimize around the problem instead of the solution. Bureaucracies and slow software are built upon this misdirection. They focus on controlling people or code to prevent the wrong behavior. Their focus should be on making the right behavior fast and easy, and then catching the exceptions.
§ Results—Never optimize a step or algorithm in isolation. Instead, optimize the end result you seek. We have all experienced the impact of local versus global optimization. It kills our efficiency and innovation; it's killing our planet. Yet over and over again, people can't see beyond the problem at hand to consider the outcome they're truly after.
Can you recognize when you are de-optimized? Let's run through some examples and check.
Eric Aside
I realize it’s a little confusing to talk about optimizing both code issues and people issues in the same column. I couldn’t resist because the number of similarities is startling. If it’s easier for you just think about whichever problem you prefer.
I think I can handle this
How do you handle run-time errors in your code? How fast does your code run when no errors arise? Is it a smooth or bumpy ride for error-free operation? The fastest, simplest path through your code should be the 80% case, not the 20% case. However, that doesn't mean you shortchange error handling; you just don't optimize around it. Trust your code will run error-free, making it run fast. Verify it was error-free, ensuring the right result. Trust but verify.
Likewise, how do you handle people and process errors? Do you check their every move? Do you have people do it your way, jump through hoops, and fill-out redundant forms to ensure they aren't cheating? Or do you trust people to do the right thing—clearing the desired path of obstructions, and later verify that work was done properly? Trust but verify.
My altruistic readers, including managers, might claim that they do trust their coworkers. Really? How did you react the last time something went wrong? Did you quickly fix the root cause and move on, or did you start an inquisition randomizing your team for days or weeks? Do you micromanage or do you delegate? Do you specify every step or do you specify the result? Trust is hard. Luckily, you're being paid.
Déjà vu
How decoupled and cohesive is your code? Are the classes, functions, and functionality all intertwined and unmanageable, or are they independently testable and separable, each piece having its own purpose and task to perform?
Well-architected and layered code is far easier to test, maintain, and enhance. However, it doesn't perform quite as well as tightly coupled code. It's a tradeoff. If you optimize purely for speed, you eventually get un-maintainable spaghetti code. If you optimize purely for architecture, you can't be competitive in performance. How do you strike the right balance?
Most teams don't strike a balance between architecture and performance—they ride a rollercoaster:
1. The team starts with a nice architecture. It works great and everyone feels good.
2. They optimize it for performance. Now it works better—the original team clearly wasn't as sophisticated.
3. The code is unmanageable, it can't be enhanced, and performance has hit boundaries, so the team painfully refactors the code. Now it's manageable again and everyone is happy—the prior team clearly were neophytes.
4. The performance isn't competitive, so the team optimizes again for performance. Now it's competitive again—the prior team clearly had lost its way.
5. Now the code is unmanageable, so return to step 3.
There's another variation—the code is so twisted that team can't fathom refactoring. Their product cycle keeps getting longer and the code keeps getting slower, requiring more memory and processing speed. That's a popular variation.
The right approach is to optimize for desired results—performant code that's easy to maintain and enhance. Instead of just measuring the speed (easy), you measure the speed and the code complexity. You seek the optimal balance of both. If you're really sophisticated, you'll also measure team health and customer satisfaction indicators, seeking a balance of all four. Wow, that's almost like running a business.
The beat of a different drummer
Let's try one more subtle example—product team structure. It's a war zone out there between traditional product development and the upstart Agile adherents. Who's right? Who cares! Never optimize around a step in isolation—optimize for desired results.
The desired result is delivering the most customer value in the shortest time. Remember, customer value is not measured by feature count; it's measured by delivering delightful end-to-end scenarios with high quality.
So how do you deliver high value quickly? You apply the Theory of Constraints (TOC). TOC says that the fastest way a project can accomplish anything is constrained by the slowest step. Say your user experience, content publishing, and operations teams are shared and can scale to your needs; your PM team can spec an average of four features in a month; your development team can code two features in a month; and your test team can validate three features in a month. There's not much point in your PM team going full speed, is there?
Yet managers will push the PM team to keep writing specs the dev team can't process—optimizing locally instead of globally. Adding people to speed up the dev team doesn't work either (note The Mythical Man-Month and the economy)—again, the focus is too narrow.
The right solution is to pace the PM and test teams to the dev team. Put in buffers to account for variability between features, but never have the PM and test teams outpace the dev team. This TOC strategy is called Drum-Buffer-Rope. Because it's hard to precisely predict the dev team's pace, you constrain the size of buffers, avoiding too much work in the dev team's queue should the situation change.
This is why Feature Crews work so well. You're optimizing for the desired result—working scenarios. In Feature Crews, an approach from Office, PM, dev, and test team members tie themselves to one piece of a scenario at a time till it's completely tested and integrated. They can't get ahead of each other. Versions of Scrum and Extreme Programming work the same way. It's not the combined teams that are essential (though communication is easier); it's the pacing of work together that optimizes the delivery of complete, high-quality customer value.
Don't panic
It's so easy to get caught up in the immediate and optimize around the issues directly in front of your face, instead of the ones you actually care about. People do it all the time—I guess we're programmed instinctively that way. That's a perfectly good reason to optimize the wrong behavior for the wrong results, but it’s a poor excuse.
You should know better, and if you don't, you have no right to draw a paycheck. Consider the result you desire to achieve; think it through; measure a balance of factors; and optimize as a whole. It's not that difficult. We attempt it every day as we balance our lives. The key is to be deliberate rather than juggle; to be planned rather than panicked. You can do it if you simply keep your sights on the finish line.
-
Is innovation the act of creating something new (as the dictionary claims) or is it building upon the work of others? To me this is a fundamental question that Microsoft as a company and as a culture has gotten horribly wrong. We deal with the consequences every day. It shakes our self-esteem and cripples our ability to innovate.
The right answer is that innovation is enhancing the work of others—nothing is new. Consider what people claim to be innovative—the iPhone, hybrid cars, and Facebook. Are those new? Are you kidding? Not only would those advances have been impossible without the foundational technologies behind them, but almost every aspect of those innovations had been created earlier in some form. The innovation was putting old technologies together into compelling new experiences and successfully marketing them to a broad audience.
This isn't an indictment of innovation or great advances like Facebook. Quite the opposite—this is a call to acknowledge that innovation is enhancing the work of others. Microsoft culture, in an act of misguided self-mutilation, has connected being innovative with creating something "new" on your own. This has directly led to three crippling consequences:
§ We doubt our innovative spirit as a company.
§ We create disconnected new features instead of compelling new experiences.
§ We reject work we didn't create ourselves as a company, team, or individual, which impairs our productivity, our intelligence, and, ironically, our ability to innovate (an advanced tragic case of Not Invented Here, also known as NIH).
Why the long face?
We doubt our innovative spirit as a company because much of our work goes into enhancing existing products and services, which in turn were built upon ideas from a wide variety of sources. Microsoft haters out there claim this proves Microsoft isn't innovative, and we accept that guilt. It sickens me.
Is Apple not innovative because it “borrowed” from Xerox PARC (Palo Alto Research Center)? Is Google not innovative because it enhanced Yahoo? Is Toyota not innovative because it built upon Henry Ford's ideas of the automobile and lean manufacturing? That's both stupid and insulting.
Being innovative is all about enhancing the work of others. We enhance our own great work and build upon the great ideas we find around the world in all fields. Yet our self-esteem is shaken, even while every new release is filled with innovation.
Perhaps the problem is that our innovation isn't as compelling as the iPhone. Why?
Compelling, and rich
We are so eager to promote our own individual new, small ideas that we fail to create compelling new experiences. You can't market a plethora of disconnected features to a broad audience, so our individual innovations are left unnoticed.
That's beginning to change as groups realize that customer value comes from designing compelling end-to-end experiences. More and more Microsoft products and services have a consistent feel and a complete story. It started with Hardware and Office, crystallized in Xbox 360, and is making its way into Windows—Windows Vista, Windows Server, Windows Live, and Windows Mobile. We don't always get it all right, nor do any of our competitors, but each new release shows progress.
Tragically, our company's culture of individual innovation works against creating compelling end-to-end experiences. You can't assemble seamless scenarios without selflessly supporting the salient story. You have less unbridled innovation from the individual engineer, but the result is greater innovation from the customer's perspective.
As any artist knows, constraints often lead to tremendous creativity. However, individual engineers must first let go of their own preconceptions and embrace the constraints of the larger composition before they can innovate in concert like a symphony instead of a soloist.
Our greatest successes both monetarily and in emotional connection came from working together over several years to get the experience right for customers. Some of the ideas may have come overnight, but as Thomas Edison said, "Genius is 1 percent inspiration and 99 percent perspiration." Those innovative engineers from groups like Office, Developer Tools, and SQL Server, with the patience to iterate, focus on the customer, and bring full experiences together, are the ones who receive the greatest rewards. Yet Microsoft engineers still cling to the notion that working independently from nothing is the secret to success.
I'll do it myself!
Unfortunately, Microsoft internal mythos is all about the individual hero rewriting the kernel memory protection scheme over a sleepless weekend. We value independence, passion, and boldness over compliance and conformity. I love that about Microsoft, but it needs an update.
The problem is that independence, passion, and boldness somehow became "do it yourself." If you didn't invent it, it's not what you need. If your team didn't invent it, it will only cause trouble. If Microsoft didn't invent it, it's dubious at best. That NIH worldview is not only wrong and shortsighted, but it slows our success and paralyzes our progress.
You can be independent, passionate, and bold while responsibly meeting requirements and enhancing a desired customer experience. You can be independent, passionate, and bold from the advantageous platform of others' foundational work.
Conversely, if you choose to go it alone you suffer greatly:
§ Your productivity drops.
§ Your intelligence drops.
§ Your innovation drops.
Maybe I could turn this thing into my advantage
When you reject work that was NIH, you lose the advantage of time, effort, bug-fixes, and knowledge that went into that work. Would you write an operating system from scratch to optimize a small application? Of course not. By using an existing operating system you save incredible effort.
The same is true of using existing libraries, services, tools, and methods. The key to gaining advantage instead of hardship from using the work of others is ensuring the things you depend upon are stable. Take a dependency on version n-1, be careful about using the latest and greatest. That goes for libraries, tools, and even techniques. Grief is avoidable.
Eric Aside
When using existing libraries, services, tools, and methods from outside Microsoft, we must be respectful of licenses, copyrights, and patents. Generally, you want to carefully research licenses and copyrights (your contact in Legal and Corporate Affairs can help), and never search, view, or speculate about patents. I was confused by this guidance till I wrote and reviewed one of my own patents. The legal claims section—the only section that counts—was indecipherable by anyone but a patent attorney. Ignorance is bliss and strongly recommended when it comes to patents.
I'm good enough, I'm smart enough
Rejecting work NIH also leads to brain decay. At Microsoft, we hire the brightest engineers available. Are they still the brightest five years later? Well, that depends. Did they learn something new every year, or did they rebuild the same thing over and over again?
If you have to reinvent a build system every time you switch teams, are you smarter? If you have to reinvent a collection class, a threading framework, or a test harness every time you switch teams, are you smarter? If you have to reinvent project management techniques, spec templates, and bug field definitions every time you switch teams, are you smarter?
No, you’re not smarter. You are getting dumber. Congratulations. You don't get smarter by repeating yourself or by repeating others and their mistakes. You get smarter by building upon the knowledge of those before you.
Those who do not learn from history
Ignoring what others have learned and gained before you, both good and bad, puts you at an innovation disadvantage. As I already mentioned, there's the productivity disadvantage—going it alone is never as fast as building upon the progress of others. However, there's also a significant contextual disadvantage.
If you ignore past work, you're liable to make the same mistakes that people made in the past—avoidable mistakes that cost you time. Sure, you might want to revisit those challenges. Perhaps there's an opportunity for a breakthrough. But do so knowingly, not like a dolt.
If you ignore past work you're also liable to unknowingly duplicate what others have already achieved. You'll feel smart and proud right up to the point that you look like an idiot. Meanwhile, you've lost time and accomplished nothing new. NIH results in nihilism. Fitting, I suppose.
If not me, who? And if not now, when?
Microsoft is filled with smart, passionate, and innovative people. We are an innovative company, in spite of ourselves at times. We need to stop thinking of innovation as an individual effort that appears miraculously from the void, and start thinking of innovation as it really is—a culmination of effort focused on thrilling the customer.
Where do your individual ideas fit? They fit within the context of the larger desired customer experience, and build upon the learning we've gained as a group. The result is ground-breaking innovation that expands the state-of-art and excites the hearts and minds of our customers.
So when you're going to work on a new team, project, or great new idea, understand how it makes the customer experience we've set out more compelling; and learn what others have discovered about your work. Note that I said "what others have discovered" not "if others have discovered." There is nothing new. Someone out there has tried something similar and will have work or knowledge you can use. You can ignore that and be slower, dumber, and less accomplished. Or you can take it to the next level and create real innovation.
Eric Aside
If you are looking to get ahead and build upon the great work of others, one of the best places is in the shared source community (internally at CodeBox and externally at CodePlex). By using personal branches, you can customize and enhance code and tools all you like, while still updating with the latest and greatest improvements from others. By submitting your changes to the main line, you advance the state-of-the-art and might even enhance your image too.
-
People are always looking for that amazing breakthrough technology or process that solves all their problems—enhances their love life, trims their waist, and improves the productivity of their development team. That's why process manias like Agile and Six Sigma are so enticing. Just splat the Scrum tag on your development team and "bam!"—your team is suddenly ten times as prolific.
It would be hilarious if it weren't true in the minds of the narcissistic ninnies with an adiction to magic methodologies. I'm happy that people are willing to try something new. I just wish they wouldn't confuse new with necessary, or minor fixes with major problems.
The truth is that there's nothing magic about producing great software with high quality on time. (And no, software services aren't different in this regard.) Human beings have had large teams building complex engineering marvels for thousands of years. There is no magic. There is solid design and disciplined execution. Period.
Eric Aside
When your code base is less than 100,000 lines, and you've got fewer than 15 people on the project, you don't need solid design and disciplined execution. You can wing it—use emergent design, have a loose upfront design bar, rewrite and refactor the code endlessly while the customer looks over your shoulder. When your code base and your project is bigger, it's solid design and disciplined execution or it's broken code, broken teams, and broken schedules.
Back to basics
I worked at Boeing for five years before coming to Microsoft. The areospace industry uses the terms nominals and tolerances to refer to acceptable values and tight refinements. A nominal might be a support beam being placed two feet from a heating duct. A tolerance would be that location give or take 1.5 inches.
There were always inexperienced engineers who concerned themselves with tolerances. They were constantly looking for a clever technique to give them an extra fraction of an inch of tolerance, as if that mattered. Real engineers who designed and built the airplanes knew these tolerance tinkerers were misguided, out-of-touch, naïve fools who missed the point entirely.
The key to building an airplane isn't tinkering the tolerances—it's nailing the nominals. It's not, "Does the steel stringer bend within 0.058 inches of the specified skin?" It's, "Does the steel stringer pass right through a cooling vent?" Stop worrying about the fancy details and get the basics right, ding-a-ling!
Eric Aside
You might think I'm criticizing Boeing, but it's quite the opposite. When lives are in the balance, it's important to get the basics right—getting the passengers to their destination safely. If the stringer is a few millimeters off, you can shim it. If two key systems run through the same space, a minor accident can lead to a major catastrophe. Never let the details distract you from what customers truly care about.
I want to be a cowboy
The software industry suffers greatly from dingbats tinkering tolerances instead of nailing nominals. You can see it in most software, regardless of how or where it ships. The focus of the engineering was clearly on all kinds of fancy little features instead of getting the software to work right for basic cases.
Maybe the basic cases don't appeal to engineers. Maybe engineers can only relate to tinkerers, not to folks who want solutions that are simple and reliable. However, I don't think so.
I think nailing the nominals requires solid design and disciplined execution, both of which require software engineers to put the project and the customer ahead of their personal interests. Unfortunately, that's not what cowboy coders crave, which is why cowboy coders should either grow up or move out.
But it's so simple
Still here? Still believe the customer and project should come first? Still realize that true customer value only comes when you act as a team, putting the customer at the center instead of yourselves? Good, let's talk about the nominals of software.
As I said, the nominals revolve around solid design and disciplined execution. But that's true for any significant engineering venture. What does it mean for software? I'll keep it simple.
Solid design means:
§ Understanding what the customer is trying to accomplish with your software (product planning, value proposition/vision).
§ Thinking through the end-to-end experience, including pitfalls (experience design, scenarios).
§ Decomposing that experience into distinct engineerable pieces (engineering design, architecture).
Disciplined execution means:
§ Ordering work based on priority and dependency.
§ Creating a schedule based on data from past projects.
§ Establishing and upholding completion criteria for every piece.
In the simplest of terms, the nominals are "think before you act" and "define done."
Is it done?
I've written about experience and engineering design many times before (particularly, in "The other side of quality (chapter 6)"), so this time let's drill down on defining done. For a distinct engineerable piece of the architecture, be that a feature, component, API, or Web service, what should the completion criteria be?
You could set bug limits or test code coverage, but those evaluate intermediate results. What do you care about at the end? What should the final result of an engineerable piece of the architecture be? Well, it should fully satisfy its part of the end-to-end scenario, including performance targets and avoiding pitfalls, and it should be secure and reliable.
What causes software to fail scenarios and reliablity targets? Smart dedicated people, at Microsoft and around the globe, have been studying this problem for decades. The net result, which you can read in every study ever done, is that problems stem from failing to complete only five practices—five practices that should define done:
§ Documented design. Think before you act and capture it so that peers can do a …
§ Design review, which catches up to half the errors before you write a line of code which you then …
§ Code review, that, if you use inspections, can catch more than 70% of errors, though some errors will only be caught by …
§ Code analysis, like lint or PREfast that catch patterns humans can miss, but doesn't entirely replace …
§ Unit and component testing, which catches the remaining issues, including stress, performance, and edge cases, as well as end-to-end scenario checks.
Without defining done, tasks never finish (sound familiar?). Even worse, cowboy coders can claim to be finished with no accountability for quality work. Ever wonder why great developers end up cleaning up after poor developers? It's because poor developers claim completion for crud. Without defining done, who else can keep their crappy code from customers?
It's not that complicated
Notice how none of the five practices that define done are terribly exotic or revolutionary? They are basic. Everyone knows you should do them. Yet engineers will fall over themselves to become certified Scrum Masters, create cool new tools, or learn the latest technology before they'll actually run design reviews or write comprehensive unit and component tests.
Nail the nominals before tinkering the tolerances. It's not magic. It's not complicated. It's not hard. It takes discipline. It takes putting the customer and the project before yourself. It takes being a professional instead of a chump. Think you can handle it?
-
When I'm discussing challenges with fellow engineers, the first topic that comes up isn't estimation—it's career and people challenges. That's why those issues are so rampant in these rants. However, "How do you generate task estimates?" is always among the top non-moaning-about-your-manager-or-mates topics. After all, estimation is predicting the future. There are so many unknowns and unforeseen issues that it's impossible to provide the accurate estimates demented despots demand. Isn't it? It must be. Right?
Wrong. Estimation is among the most trivial tasks an engineer has to perform on a regular basis. Get over yourself, it is. It's so easy that there are dozens of seemingly different methods that all give you remarkably accurate predictions of completion time. All those methods come down to one simple concept—how long it took last time is how long it will take this time. Nothing could be easier.
Yeah, you've got to understand the work well enough to compare it to previous work, but that isn't too tough either. No, the real challenge isn't task estimation; the real challenge is accepting the estimate. Estimation is easy; acceptance is hard.
No one would accept the program
Let's pretend for a moment that you actually keep track of how long it takes you in calendar days to perform various tasks. (You do—the information is right there in your e-mail dates.) Let's further imagine that you provided those previous times as estimates for doing similar tasks today (you'd be quite accurate). What would the reaction be from your project leads and managers? My guess: "Oh come on, you've got to be kidding me!"
This is fun, so let's take it a step further. Let's say you told your project leads and managers that your estimates were based on hard dates collected from your previous project. What reasons would they give for not believing this hard data? Here's the big three:
§ Last time was different.
§ You get faster the second time.
§ Weird stuff happened last time.
Let's break down these feeble fallacies one at time.
It's a different kind of flying altogether
The first excuse your manager or project lead will have to reject your hard schedule data from the previous project is that the previous project was different. Things have changed. Perhaps the build system and tools have changed, the design change request process changed, the requirements changed, management changed, or perhaps the moon is in a different position relative to Saturn this time.
Out of all those excuses, only two have a small chance of affecting your estimates—the tool and process changes. Every other factor is superfluous with little or no impact to cycle time.
Even the tool and process changes would have to be extreme to noticeably affect the accuracy of your estimates. Tool changes would have to cut end-to-end build times by a factor of 5. Process changes would have to reduce the time of weekly activities by days. Otherwise, the impact is just noise in the estimate.
Look, the more the world changes, the more it stays the same. Deal with it.
Eric Aside
Let's say a task takes you two weeks, give or take a day or two. The tool or process change would need to save you at least one full day every two weeks to matter.
I'm getting better
The second excuse to reject your hard schedule data from the previous project is that you get better the second time around. The funny thing is that you do get better the second time around. The problem is that you're not doing the same project (hopefully). The only things that are the same are the tools, process, project scope, and the general task of software engineering.
You should already be well versed in the general task of software engineering, so getting better at the details of the previous project has no impact on the estimates for the next project. Of course, if you're fresh out of school then the second project will take less time than the first.
If you did change tools and processes, your performance should actually be worse because it will be your first time using them. That's okay if the changes are small or the benefits are big. Just don't kid yourself about the impact.
You do want to compare the current project to a prior project with similar scope. The better the match, the more accurate the estimate. The big differences between estimation techniques are how they produce matches.
Oh no, not again
The final excuses your manager or project lead will have to reject your hard schedule data from the previous project are all the "weird" things that happened last time. There was that unexpected security patch, the feature that was far more complex than anticipated, the reorganization and associated project reset, not to mention the snowstorm, and that earthquake, yeah, the earthquake. There's no way you should count the earthquake!
You count the frigging earthquake. There's always a surprise patch, feature, reorganization, and natural disaster waiting for you over the course of a project. Always. Random events happen, but their impact on the schedule isn't as unpredictable. Thanks to Lyapunov's central limit theorem, their overall impact averages out. However long it took last time is likely to be nearly the same this time. That is, as long as you don't pretend this time will be different.
Same old wine
Okay, we've proven your project leads and managers are in denial. As a result, they force you to make ridiculous estimates you don't believe, only to blame you later for missing them. We've shown that accurate estimates are almost trivial to make. The big question remains, "How can you turn your trivial and accurate estimates into ones your project leads and managers will believe?"
That's where task hours are so handy. Instead of making your estimates in calendar days, you make them in task hours—the number of hours it would take if there were no earthquakes, e-mail, or bathroom breaks. Without those distractions, your estimates look far smaller and more reasonable, even though they're no different.
Task hour estimates are slightly harder to make because you don't have the data lying around in your inbox. However, you can estimate task hours quickly, easily, and accurately with a simple technique like planning poker (or it's more accurate and sophisticated sibling, Wideband Delphi).
In planning poker, three or more engineers each estimate the same task privately around a table. They all reveal their estimates simultaneously so no one exerts undue influence. If the estimates match, you're done. If they differ, the high and low estimators explain themselves, the group discusses their thinking, and then the process repeats until the estimates agree. The process also surfaces assumptions before they become a problem.
Once you have believable estimates in task hours, the argument isn't about how long the tasks will take. It's about how many hours you spend on-task in a week. Even after subtracting vacation, training, and big group meetings, most teams spend less than half of working hours on task. The rest of their time is in meetings, answering e-mail, lunch breaks, and so on. If your project leads and managers don't believe it, simply have the team spend two weeks tracking their hours. The numbers don't lie.
Eric Aside
Even after subtracting vacation days, training, off-site meetings, and other planned non-task time, most engineering teams only spend about 42% of their time on task. You can increase time on task by having days or afternoons set aside with no e-mail or meetings; having feature teams co-located and self-directed, which reduces formal meetings, design mistakes, and overall communication time; and by using methods like Scrum Sprints, which increase team focus.
Your results may vary
As I mentioned earlier, there's a certain amount of randomness or "variance" that asserts itself over the course of a project. It averages out, but any one estimate has a chance of being off by some standard deviation. That deviation is a percentage; it scales with the size of the estimate. Thus, a two-day estimate will be accurate give or take a few hours, a two-week estimate might be off by a couple of days (in either direction), and a three-month estimate could be off by a couple of weeks.
As long as you avoid being overly optimistic, the randomness will even out. By the end of the project, your project deviation will be about the same as the deviation from any individual task. If you are overly optimistic ("It can't take this long next time!") your deviations will keep adding up, not averaging out.
The point is that there's little point in estimating a two-day task to the minute or a three-month task to the day. You just need order-of-magnitude estimates, like I talked about in my very first column seven year ago, "Dev schedules, flying pigs, and other fantasies" (chapter 1).
I want to believe
How would you estimate? Focus with your peers on understanding the tasks at hand and their order of magnitude using a technique like planning poker or Wideband Delphi (poker for well-understood tasks, Delphi for others). That's the easy part.
What truly matters in the end is believing and accepting your estimates, then scheduling accordingly. As I've written about before, over-committing is foolish. What's worse is that over-commitment can lead to "Marching to death" (chapter 1). As I said then, death marches are a strong indicator of weak, cowardly, deceitful, and irresponsible management.
Scheduling trouble is diabolical, but completely avoidable. When you prioritize your work properly, putting what's first first, you reduce pressure on your schedule. When you use your estimates to drive realistic commitments, you can deliver reliably to your customers and partners, build trust, and enhance your group's and our company's reputation. Deriving good estimates is easy. Trusting them and yourself is the challenge.
-
It's summertime. Time to sit out in the sun and daydream, perhaps on a vacation or a weekend afternoon. When your mind is relaxed at times like these, you often think of beautiful new ideas. You further develop those ideas and then, when the time is right, perhaps early in the next release cycle, you begin prototyping those beautiful notions. Before you know it, your beautiful ideas have blossomed into hideous, miserable nightmares that either die of exposure, or worse, live on to cause future generations of engineers to curse the day you were born.
Oh but if it were a fairly tale. Instead, more often than not, prototypes of beautiful ideas become horrific, hairy hodgepodges of hacks that cannot be easily maintained, refactored, or understood. Why? What happened?
It's not that you should write prototypes more carefully, with unit tests and all the rest—you shouldn't. It's not that you should throw the prototypes away—though you should. No, the problem is that your entire philosophy about prototyping is dead wrong.
Explore the space
Usually, when engineers think of a new idea to try, they write a prototype. That's a major mistake and the wrong thing to do. It leads you down a path to destruction of all that was good about your idea. You see, you shouldn't write a prototype—you should write dozens of prototypes parameterized to try hundreds of cases, all designed to solve the same problem but from different angles.
That's how all other fields of study work. You don't do one experiment, try one approach, or use just your first guess. You do hundreds of experiments. Artists and producers call it, "exploring the space." Could you imagine if medical researchers only tried one idea at a time to cure diseases? Wouldn't you think that was idiotic? Hello?
That's so rad!
Naturally, you don't have all the time in the world to write dozens of sophisticated prototypes. Good, you're not supposed to. Prototyping isn't supposed to be like production engineering. It's supposed to be like experimenting. It's supposed to involve the software equivalents of duct tape, silly putty, and wire hangers, like VBScript, Word Art, and Perl. You're supposed to throw together prototypes in hours not days.
If it's taking you longer to write one prototype than it did to understand the problem, you are already off the end of the gangplank and are headed into the shark infested waters of failure and frustration. Spending serious time writing one prototype not only distracts you from exploring other possibilities, but it also puts so much investment in that one prototype that you can't help but use it as the basis for production code. Welcome to despair and desolation.
On the contrary, prototypes should be churned out so fast and furiously that no one would consider taking them seriously as the real thing. Prototypes are there to try out ideas, make mistakes, iterate, and gain insight. If you only write one of them then you've learned nothing, aside from the fact that you are a close-minded, ignorant, misguided fool.
Harness in the good energy
Hold on, the "I can't change" people are calling. They say, "You can't prototype quickly without frameworks, harnesses, and libraries. It takes time to build those tools. Creating dozens of prototypes with our shipping pressures just isn't realistic." Look, if you are consigned to being a calcified carcass of a coder just admit it and accept it.
Otherwise, you need to wake up and realize that prototypes don't have to follow the same rules as production code. They can be written in different languages and be built on different platforms. They can use tricks and shortcuts; scripts and canned animation; you name the kluge, short of stealing licensed code, and it's fair game for prototypes.
There are loads of libraries, heaps of harnesses, and enough frameworks to fill a football field to help with rapid prototype development. All you need to do is step out of the box and into the garden.
You still have a choice
Great design, like great research, depends on trying dozens of ideas. It also relies on collaboration, working with others to generate new ideas and directions. If there's a user interface (UI) involved, you want user experience (UX) folks to help inspire you. If it's a library or API, you want architects or clients as your co-conspirators.
Eric Aside
UX engineers are designers and usability experts. These folks are trained in creative design thinking, thus the perfect people for prototyping projects. Many even specialize in the design and usability of interfaces, making them ideal to help with API and library design.
Once you've generated dozens of ideas, prototyped them all, iterated, and gained the desired insight, you've got a number of options. You can:
§ Pick your subjective favorite.
§ Use a simple tool like a Pugh Concept Selection Matrix to thoughtfully choose between alternatives.
§ Combine aspects of each approach and experiment some more.
§ Postpone making a decision till constraints force a choice.
Postponing design decisions till constraints force your hand is called "set-based design." You keep all design options available, only culling ideas when they break a requirement. Because schedule requirements often hit before all alternatives have been considered, you eventually need to fall back on something like Pugh Concept Selection. Until you do, set-based design keeps your mind open and helps you find the optimal solution.
Eric Aside
Pugh Concept Selection uses a simple table to rate alternatives against requirements. Ratings are positive, negative, or zero, based on fit. Requirements are scaled according to significance. The alternative with the highest total wins. Download a handy spreadsheet.
Throwing it all away
By the time you've made your design choice, your duct tape and bailing wire prototypes should be so fragile and convoluted that no one would consider keeping them for anything but nostalgia. That's the right result.
Using cobbled-together code as the basis of production work only leads to software that is difficult to maintain, susceptible to failure, and must be retrofitted to meet quality requirements, such as automated tests, code reviews, globalization, accessibility, security, privacy, manageability, and performance just to name a few. The result of this retrofitting is the hideous hodgepodge of hacks I mentioned earlier. Not the beautiful ideas that started you down this path.
Throw away your prototypes, using them only for idea and algorithm reference, not code. This shouldn't be painful because you spent so little time on each.
Temptations always come along
I know there is a great temptation to write solid prototype code using quality methods, just as there is a great temptation to write rapid production code using quick-and-dirty methods. You must avoid these temptations by any means necessary.
How you act and code during experimentation should not match how you act and code during production. People who don't understand this difference have a name. They are called "children."
If you are treating prototypes like production code, or vice-versa, you should get reamed on your review. If your boss thinks you should write prototypes like production code, or write production code like prototypes, he should be fired. Not that I feel strongly about it.
The result of production should be the on-time delivery of delightful customer experiences that meet a high quality bar. That's what managers should reward you for during production.
The result of prototyping should be insight and innovation that changes how you think about your product, service, and customer. That's what managers should reward you for during prototyping.
Do yourself a favor
The bottom line is that if you only write one prototype for your idea, you've done your idea, your team, your company, and yourself a disservice. You won't discover all the implications of your idea, good and bad. You won't discover new ways to implement your idea or expand upon it. You won't discover all the uses for your idea or how it connects with other ideas. All you'll have is your first guess. You'll have cheated yourself and your idea.
Remember, it doesn't take longer to implement dozens of prototypes—it takes a different mindset. Sure, you don't want that mindset when you're producing shipping products and services. But during the planning and experimentation phase, timely torrents of tossed-together trials are just what you need to "explore the space." Who knows? Maybe you'll make a few gold records of your own.
Eric Aside
One last point about prototypes that a friend of mine in UX mentioned. Some prototypes are "concept" prototypes. Concept prototypes are special. They are blue-sky, over-the-rainbow prototypes meant to see what's out there on a conceptual level. They aren't going to be a real product or service anytime soon. Their purpose is to see where truly broad thinking can take you, which might inform the more practical path you have for the next version or two.
-
It's annual review time at Microsoft. We differentiate pay between high, average, and low performers in the same roles. Thus, it's time to calibrate those who've made the most of their opportunities in the past year with those in the mainstream of solid engineers and those who haven't quite kept pace with peers.
Eric Aside
There are many people inside and outside of Microsoft who critique differentiated pay, saying it sabotages teams and teamwork. While I do agree team results should be a component of compensation, I don’t think differentiated pay is the problem (see “Beyond comparison” in chapter 9).
As a manager, this is also time for the whiners and the clueless to lament to me about their lack of opportunities to grow and demonstrate their true worth. As if managers hoard those opportunities, giving them out only in moments of weakness or pity. As if those opportunities are rare—hidden treasures available only to the select few with guile and charm. No, you fools, opportunities aren't rare and they aren't hidden. Opportunities are big, loud, and aromatic. They stand right in front of you in gorilla suits beating their chests all day long.
Yet many smart engineers don't notice. Huge, noisy, smelly gorillas in their face day after day, and they don't notice. Sometimes their manager hands them the opportunity, invites them to a meeting, or puts them on a project, and still the engineers, capable engineers, ignore it. They hand the opportunity to someone else. They give it only passing attention. They leave it sitting in a corner till it finally devolves from inattention.
Why?!? Why don't engineers notice these opportunities? Why do they toss them aside, only to complain in July about the lack of opportunity? Towering, raucous, pungent opportunities in gorilla suits, every day, ignored. Why?
I'm blind, man
I used to think people ignored these opportunities because they were lazy or apathetic. I still believe apathy is a real problem, but I no longer consider laziness a key cause. Instead, I believe people miss opportunities due to a concept called, "inattentional blindness." Basically, engineers don't notice flagrant opportunities because their minds are focused elsewhere and aren't paying attention.
There's a telling video you might have seen that asks viewers to count the number of times a basketball is passed amongst a group of people. During the video, a person in a gorilla suit walks into the middle of the group, faces the camera, beats its chest with gusto, and then walks out of frame. At the end of the video, viewers are asked if they saw the gorilla. Not only do people miss it (including me the first time), some people insist the gorilla must have been camouflaged. Then they watch the video again and notice the big gorilla in the middle of the frame, making a mockery of their perception.
It is all around us
Engineers don't notice the opportunities in gorilla suits because they are focused on their day-to-day duties of counting basketball passes. They are too distracted to notice. However, perhaps you are one of those who think the opportunities are camouflaged. Allow me to remove your blinders and list the opportunities that pass you by every day:
§ Killer features—sure, you know these exist. You might even know what some of them are. But how do you get the opportunity work on them? I'll bet they've got designs and code that need reviewing, usability, unit, and automated tests that need writing or reviewing, and bugs that need fixing. I'll bet the developer working on them is out from time to time and needs a backup. But, of course, you're too busy.
§ Customer advocacy and business intelligence—you think you know the customer and business, but you don't. That's the opportunity. The more direct engagement you have with customers (product support, usability, feedback data), and the better you understand the business (VP talks, business plan, business model and metrics), the better you'll know what the killer features are and what the critical features are. But that's someone else's job, right?
§ Critical features—you think these dull features like setup, patching, privacy, compatibility, accessibility, and manageability are for losers. Yeah, they can be when a loser implements them. If you are knowledgeable about the customer and the business, you'll know how to go the extra mile to turn the mundane into the marvelous. But why bother when there are cooler things to implement?
§ Task forces, committees, virtual team projects—you could safely argue that these activities boil up straight from Hades. However, they only arise when there is a problem that someone above your pay grade wants solved. Because everyone hates these dysfunctional efforts, you've got an opportunity to actually lead the effort toward a real solution. Or you could let someone else do it.
§ Process and tool improvements—you probably love these, but no one will listen to your ideas. Stop making improvements your idea. Stop making it about you. Start looking at other people's processes and tools. Start thinking about how you can embrace and extend them. Start thinking about being better together as a team and a company (see "Controlling your boss for fun and profit"). Or you could just do your own thing.
§ Problems in general—you can hardly make it through five minutes without spotting one problem or another. Every problem is an opportunity. That's not just a cute phrase, it's true. Yeah, you'll never fix them all, but surely there are some worth pursuing. Or you could live with the status quo.
Poor pitiful me
With all these opportunities being there day after day, I'm stunned when engineers complain about the lack of chances to grow and prove their merit. Yes, you are too busy. Yes, you've got enough for your own job. Yes, there are cool ideas to work on that aren't what customers or the business thinks they need. Yes, someone else can run the meeting, write the white paper, or drive the change. Yes, doing your own thing is straightforward because no one else gets in the way. Yes, accepting the status quo avoids hassle. And yes, ignoring opportunities and being mediocre is always the easier option.
"But I don't have time to take advantage of these opportunities," whine the witless. Are you kidding me??!?!? As I described in my "Time enough" column (chapter 8), you never have enough time to do anything. Each day, each minute, you choose tasks from the unlimited list you have. The key is to prioritize, cutting out the interruptions and time-sucking activities that aren't "must do" in order to focus on the activities that make a difference for our business, our customers, and in turn, your career.
To cowering clueless who say, "But there's nothing to cut; everything I have is ‘must do'!" I say, "So you always accomplish everything you need to do? Really? You must be full of something to say that." Look, if you aren't accomplishing everything you must be making choices. If you are making choices you simply need to adjust those choices. That's what life is. Successful people make adjustments to focus a portion of their time on new opportunities.
This concept of creating time by choosing your obligations goes by a familiar phrase, "under commit and over deliver." Yet most neophytes over commit trying impress management and their peers. At the end of the day, these neophytes under deliver, lose out on opportunities, and get ranked below those who met their obligations and went beyond.
I took the one less traveled by
I'm not saying it's easy. I'm saying it's necessary. No one will serve you success on a platter. Not your parents. Not your boss. Nobody. You've got to decide to take advantage of the myriad of opportunities that come your way every day.
You don't need to tackle all of them, just a few over the course of a year. If you are put on a panel you care about, actively participate. Become one of those people who actually contributes. I don't care if you are busy. Make time. That doesn't mean working longer or harder. It means dropping less important activities that mean more in the moment, but mean less in the long run.
Stop and consider the people you admire. It's not their number of accomplishments; it's the thoughtfulness and impact of their accomplishments. Give consideration to your work choices, be aware of opportunities, create space to take advantage of them, and you can become the person others admire.
-
Eric Aside
I'm taking June off to prepare for the annual event my organization runs internally for Microsoft engineers. (Not a Microsoft engineer? We can fix that.) This year the event is five days focused on various themes for improving engineers and engineering at the company. We've got one day focused on product quality, another day on software plus services, two days on design, and a day on security and privacy. Mixed in are sub-themes on innovation, environmental sustainability, project management, build and lab management, and talent and teams. It should be a great week!
We're taking some risks this year by making the event activity-based rather than lecture-based. Hands-on activities are more engaging and memorable, resulting in gaining real skills rather than "awareness." Mr. Wright could write a whole column on the abysmal absurdity of "awareness."
I. M. Wright will return next month with a column on whatever inane ignorant idiocy he encounters this month.
-
I heard a remark the other day that seemed stupid on the surface, but when I really thought about it I realized it was completely idiotic and irresponsible. The remark was that it's better to crash and let Watson report the error than it is to catch the exception and try to correct it.
Eric Aside
A lot of people have been flipping out (see comments below) about the statement that you should catch the exception. The more thoughtful readers point out security concerns with handling exceptions and the dangers of continuing an application with corrupted state. I couldn't agree more. If the failure or exception leaves the program compromised you can't simply continue. My point is just failing and giving up is wrong for users. One solution I talk about below is to fail, report, and reboot (restart) the application, like Office now does.
Watson is the internal name for the functionality behind the Send Error Report dialog box you see when an application running on Microsoft Windows crashes. (Always send it; we truly pay attention.)
From a technical perspective, there is some sense to the strategy of allowing the crash to complete and get reported. It's like the logic behind asserts—the moment you realize you are in a bad state, capture that state and abort. That way, when you are debugging later you'll be as close as possible to the cause of the problem. If you don't abort immediately, it's often impossible to reconstruct the state and identify what went wrong. That's why asserts are good, right? So, crashing is sensible, right?
Eric Aside
An assert is a programming construct that checks if a relationship the programmer believes should be true is actually true. If it isn't true, assert implementations typically abort the program when debugging, and log an error when running in production. Asserts are commonly used to check that parameters to a function are properly formed and to check that object states are consistent.
Oh please. Asserts and crashing are so 1990s. If you’re still thinking that way, you need shut off your walkman and join the twenty-first century, unless you write software just for yourself and your old school buddies. These days, software isn't expected to run only until its programmer got tired. It's expected to run and keep running. Period.
Struggle against reality
Hold on, an old school developer, I'll call him "Axl Rose," wants to inject "reality" into the discussion. "Look," says Axl, "you can't just wish bad machine states away, and you can't fix every bug no matter how late you party." You're right, Axl. While we need to design, test, and code our products and services as error-free as possible, there will always be bugs. What we in the new century have realized is that for many issues it's not the bugs that are the problem—it's how we respond to those bugs that matters.
Axl Rose responds to bugs by capturing data about them in hopes of identifying the cause. Enlightened engineers respond to bugs by expecting them, logging them, and making their software resilient to failure. Sure, we still want to fix the bugs we log because failures are costly to performance and impact the customer experience. However, cars, TVs, and networking fail all the time. They are just designed to be resilient to those failures so that crashes are rare.
Perhaps be less assertive
"But asserts are still good, right? Everyone says so," says Axl. No. Asserts as they are implemented today are evil. They are evil. I mean it, evil. They cause programs to be fragile instead of resilient. They perpetuate the mindset that you respond to failure by giving up instead of rolling back and starting over.
We need to change how asserts act. Instead of aborting, asserts should log problems and then trigger a recovery. I repeat—keep the asserts, but change how they act. You still want asserts to detect failures early. What's even more important is how you respond to those failures, including the ones that slip through.
Eric Aside
Just once more for emphasis—using asserts to detect problems early is good. Using asserts to avoid having to code against failures is bad.
If at first you don't succeed
So, how do you respond appropriately to failure? Well, how do you? I mean, in real life, how do you respond to failure? Do you give up and walk away? I doubt you made it through the Microsoft interview process if that was your attitude.
When you experience failure, you start over and try again. Ideally, you take notes about what went wrong and analyze them to improve, but usually that comes later. In the moment, you simply dust yourself off and give it another go.
For Web services, the approach is called the five Rs—retry, restart, reboot, reimage, and replace. Let’s break them down:
§ Retry. First off, you try the failed action again. Often something just goofed the first time and it will work the second time.
§ Restart. If retrying doesn't work, often restarting does. For services, this often means rolling back and restarting a transaction; or unloading a DLL, reloading it, and performing the action again the way Internet Information Server (IIS) does.
§ Reboot. If restarting doesn't work, do what a user would do, and reboot the machine.
§ Reimage. If rebooting doesn't work, do what support would do, and reimage the application or entire box.
§ Replace. If reimaging doesn't do the trick, it's time to get a new device.
Welcome to the jungle
Much of our software doesn't run as a service in a datacenter, and contrary to what Google might have you believe, customers don't want all software to depend on a service. For client software, the five Rs might seem irrelevant to you. Ah, to be so naïve and dismissive.
The five Rs apply just as well to client and application software on a PC and a phone. The key most engineers miss is defining the action, the scope of what gets retried or restarted.
On the Web it's easier to identify—the action is usually a transaction to a database or a GET or POST to a page. For client and application software, you need to think more about what action the user or subsystem is attempting.
Well-designed software will have custom error handling at the end of each action, just like I talked about in my column "A tragedy of error handling" (which appears in Chapter 6 of my book). Having custom error handling after actions makes applying the five Rs much simpler.
Unfortunately, lots of throwback engineers, like Axl Rose, use a Routine for Error Central Handling (RECH) instead, as I described in the same column. If your code looks like Axl's, you've got some work to do to separate out the actions, but it's worth it if a few actions harbor most crashes and you aren't able to fix the root cause.
Just like starting over
Let's check out some examples of applying the five Rs to client and application software:
§ Retry. PCs and devices are a bit more predictable than Web services, so failed operations will likely fail again. However, retrying works for issues that fail sporadically, like network connectivity or data contention. So, when saving a file, rather than blocking for what seems like an eternity and then failing, try blocking for a short timeout and then try again—a better result for the same time or less. Doing so asynchronously unblocks the user entirely and is even better, but it might be tricky.
Eric Aside
Care should be taken when retrying an action. Some APIs and components already have retries built into them. Be sure to understand the behavior of components you use in advance or suffer from compounding repetition caused by leaky abstraction.
§
Restart. What can you restart at the client level? How about device drivers, database connections, OLE objects, DLL loads, network connections, worker threads, dialogs, services, and resource handles. Of course, blindly restarting the components you depend upon is silly. You have to consider the kind of failure, and you need to restart the full action to ensure that you don't confuse state. Yes, it's not trivial. What kills me is that as a sophisticated user, restarting components is exactly what I do to fix half the problems I encounter. Why can't the code do the same? Why is the code so inept? Wait for it, the answer will come to you.
§ Reboot. If restarting components doesn't work or isn't possible due to a serious failure, you need to restart the client or application itself—a reboot. Most of the Office applications do this automatically now. They even recover most of their state as a bonus. There are some phone and game applications that purposefully freeze the screen and reboot the application or device in order to recover (works only for fast reboots).
§ Reimage. If rebooting the application doesn't work, what does product support tell you to do? Reinstall the software. Yes, this is an extreme measure, but these days installs and repairs are entirely programmable for most applications, often at a component level. You'll likely need to involve the user and might even have to check online for a fix. But if you're expecting the user to do it, then you should do it.
§ Replace. This is where we lose. If our software fails to correct the problem, the customer has few choices left. These days, with competitors aching to steal away our business, let's hope we've tried all the other options first.
Let's not be hasty
Mr. Rose has another question, "Wait, we can't just unilaterally take these actions. Customers must be alerted and give permission, right?" Well Axl, that depends.
Certainly, there are cases where the customer must provide increased privileges to restart certain subsystems or repair installs. There are also cases when an action could be time consuming or have undesirable side effects. However, most actions are clear, quick, and solve the problem without requiring user intervention. Regardless, the key word here is "action."
There's no point in alerting the user about anything unless it's actionable. That goes for all messages. What's the point of telling me an operation failed if there's no action I can take to fix it or prevent it from happening again? Why not just tell me to put an axe through the screen? If there is a constructive action I can take, why doesn't the code just take it? And we have the audacity at times to think the customer is dumb? Unbelievable.
It's always the same
"Fine, this is extra work though," complains Axl, "and who says the software won't just be retrying, restarting, rebooting, and reimaging all the time? After all, if the bug happened once, it will happen again." Actually Axl, bugs come in two flavors—repeatable and random. Some people call these Bohrbugs and Heisenbugs, respectively.
Eric Aside
The terms Bohrbug and Heisenbug date back before the 1990s. Jim Gray talked about them in a 1985 paper, "Why Do Computers Stop and What Can Be Done About It?"
Using the five Rs will resolve random bugs, rendering them almost harmless. However, repeatable bugs will repeat, which is why logging these issues is so important. Even if the program or service doesn't crash, we still want the failure reported so we can recognize and repair the repeatable bugs, and perhaps even pin down the random bugs. The good news is that the nastiest bugs in this model, the repeatable ones, are by far the easiest to fix.
By putting in some extra work, we can make our software resilient to failure even if it isn't bug-free. It will just appear to be bug-free and continue operating that way indefinitely. All it takes is a change in how we think about errors—expecting, logging, and handling them instead of catching them. Isn't that worth the kudos (and free time) you'll get from family and friends when our software just works? Welcome to the new world.
Eric Aside
I don't expect this new approach to happen tomorrow. It's a big change, particularly in the client and application areas. It used to be that only geeks had computers, so users knew how to restart and repair drivers. Now, everything just has to work with little or no user intervention. Part of the solution is higher engineering quality, but that only goes so far. There will always be failures even if the code is bug free. Resilience to failure is the clear next step.