After decades as a professional software engineer, working for six different firms (large and small), I can honestly say that Microsoft is by far the best. I can also honestly say that Microsoft is far from perfect.

My monthly rants typically focus on problems that individual engineers or managers can change by being better individual engineers and managers, by using different approaches or tools, or by altering the way they think about issues. However, Microsoft also has system-wide issues. I know how to solve them, but Microsoft executives and engineers may not like my solutions.

Too bad. I’m throwing my ideas out there this month. Don’t like them? Come up with something better. No company is perfect, but we owe it to ourselves to never be satisfied with the status quo.

Eric Aside

All opinions expressed in this column (and every Hard Code column) are my own and do not represent Microsoft in any official or unofficial capacity.

There is something terribly wrong

What’s wrong with Microsoft? I’ve narrowed it down to five fundamental flaws.

  1. We’re top-heavy. We’ve up-leveled lead and group manager roles. Nearly 80 percent of development leads have been at Microsoft for six years or more, with roughly 50 percent here for 10 years or more. This up-leveling clogs career advancement, reduces the influx of new thinking, and drives the use of outdated practices. Up-leveling was supposed to deliver better and fewer managers, but opportunities for growth and innovation need to balance that goal.
  2. We’re overstaffed and overfunded. Having lots of engineers and money enables Microsoft to accomplish big, bold breakthroughs. It also enables Microsoft to isolate divisions, duplicate infrastructure and services, fail slowly, and generally throw money and people at problems instead of thinking and simplifying.
  3. Our reward system fails. The time we spend debating and marginally improving the review system is only exceeded by the time we spend in calibration, assessment, and the rest of the review process. Even so, we insult and drive away people we value, and we don’t promote people promptly when they are demonstrably ready.
  4. We disregard previous experience. Microsoft culture places little value on what you’ve done before—the value is in what you’ll do next. That’s really nice in theory, but in practice we ignore skills developed previously, don’t build upon gained knowledge, and repeat past mistakes. This destroys industry hires, slows innovation, stifles reuse, lowers productivity due to constant restarts, and demoralizes the engineering staff.
  5. We replicate infrastructure. Divisions, and often individual teams, reinvent their own build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics. We end up with wasted effort, poor systems, and difficulty working across teams that use different tools and methods.

A great Microsoft cultural edict is, “No whining. Accept it and move on, or come up with a better solution.” Let’s talk about why these five issues exist and what can be done about them.

Why? Why?

Being top-heavy is partially a result of being a successful, mature company. People move up, plateau eventually, and stay put. The company isn’t doubling its staff anymore, so the percentage of new blood declines. To fix this, we must grow much faster or make room somehow (org design or attrition).

Being overstaffed and overfunded is a direct result of success. We could trim, but that’s painful. We could change, but that’s hard and scary. The solution is to trim and change anyway.

Our reward system pays for individual performance based on a curve, and not everyone will be happy. The steady performers have trouble being promoted because they don’t stand out. Those who stumble get a disproportionate punishment. Even those who do a great job keeping up with their peers feel jilted by an average review. Only those who receive a well-deserved promotion (or expulsion) feel a sense of fairness. The solution is staring at us—it’s so obvious that it’s easy to dismiss (more below).

We disregard previous experience because, at first, it wasn’t important. All that mattered in Microsoft’s early days was “Can you code all day and night, loving every minute of it?” Now history matters. We’re in an established industry, with legacy codebases, huge projects, and billions on the line. We should still focus on the future, but we should also place people in roles where their past experience is most beneficial.

We replicate infrastructure when we don’t have a supported solution to turn to that operates well at the large scale of Microsoft projects. We do share Source Depot, Active Directory, Product Studio, SharePoint, Office, Exchange, and Windows Server. These are commercial products we build and use ourselves or internal products (not tools—products) that were designed and intended to be shared (though internal products don’t get as much love). Build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics will need to be well-supported and scalable products if we hope to share them.

Eric Aside

Source Depot is our internal source control system. Product Studio is our internal bug tracking system.

You got a better idea?

What should we do? Let’s start with people issues. Please keep in mind, I speak for no one but myself.

At the start of every major product cycle, like a new major revision of Office, high-level planning already focuses on direction, key scenarios, and tenets. Part of that planning should also specify the number and kind of engineering triads needed based on what the high-level plan specifically requires—not based on current staffing.

There aren’t that many types of triads in engineering: UI (heavy on designers, light on architects), kernel (heavy on architects, light on designers), and midlayer (average number of architects, some API design). These triads come in small, medium, and large sizes. Thus, staff planning comes down to selecting the number and size of each type of triad necessary to build the product planned.

To break down each triad type, we can use statistical values from our people data to determine the small, medium, and large makeup (how many in each discipline and level band). Microsoft is big enough to give us a valid sample size. These breakdowns should be reviewed as a sanity check and also to balance growth and new hires.

Eric Aside

I did this triad breakdown exercise six years ago. It only took a day or two, and the data wasn’t that surprising or diverse (most triads of the same kind were roughly the same size). The biggest variances were in sustained engineering teams, due to basic differences in approach.

I’m focused on engineering product group triads in this column, but the same staff planning approach can be used in sales, finance, HR, consulting, operations, and anywhere else in a large, mature company like ours.

I've got a job to do

The next step is to place people into the key leadership roles, both individual contributors and managers. New triads need leaders that are good at building teams. People and architectural turnaround cases require leaders who’ve revamped teams before. There may be special technology requirements too.

To find the right leaders for these roles, midyear career discussion can include identifying skillsets (validated by managers) that are later used to place people. These key leaders are told why they were chosen to take their specific roles—making it clear that their special skillsets are valued and needed.

Once the key leaders have been placed, the teams can be built from existing and new staff. While many folks will continue to fit well in their current areas, no job is a given—everyone has a chance to change positions or look elsewhere. Internal candidates can’t fill entry-level-hire spots. Higher-level people can’t fill lower-level roles. However, people can get promoted into positions if they are ready. Those who can’t find a role that fits are given a few months to find a position elsewhere at the company or accept a severance package.

A variation of the solution I describe above has been used in large part by Office and Windows for years. I’ve added recording skillsets during midyear career discussion, emphasizing people’s past experience, preventing higher-level people from filling lower-level roles (avoids job inflation), and enforcing layoffs of extra people. My plan is harsh, but it ensures that we get balanced, smaller, and more efficient based on the people we need in order to build our software.

Eric Aside

You might be worried about losing high-potential employees during this process. Chances are good that they’d find positions during org shuffles, but we could use HR’s existing tracking of high-potential employees to catch any unfortunate exceptions. Those exceptions could be put on special projects until new positions open elsewhere at the company.

You also might be worried about staffing shifts to handle unforeseen issues or opportunities. Those plan changes happen all the time today. The purpose of high-level planning isn’t to create a perfect, immutable plan; it’s to think ahead about what you really want to achieve and how you’d like to achieve it, including how to best staff the project.

Just rewards

As for the review curve, we kill it. Period. Microsoft has slathered lipstick on our pig of a review model three different times (adding lip gloss tweaks annually). It doesn’t work—the review curve is still a pig.

Instead, we should replace the annual review with a second career discussion. Outrageous and unacceptable, right? Wrong. The three states of employee performance—doing well, moving up, and moving out—can all be addressed in a career discussion:

  • We already focus on moving out poor performers. Career discussions and one-on-ones with managers let people know in advance when they are falling behind. While people can be (and are) dismissed at any time, retaining only the employees we need to build each major revision of our software keeps everyone honest about who is meeting or exceeding expectations.
  • Today it takes too long to move people up, and we lose many as a result. Instead, we can focus semiannual career discussions and calibrations on promotions (and problems). We can design our triads to have an appropriate number of positions at each level, and we can reject candidates above the target level to prevent job inflation. We can make room for promotions at the desired rate and promote proven people promptly. That’s what they want—it’s the ultimate pay for performance.
  • Everyone else is performing well, working in roles we determined are essential for shipping our products and services. They all get paid the same generous salaries, taking into account their level and current market compensation. We can even give bonuses based on division results, encouraging collaboration and focusing on collective success.

In other words, we make the review system about growth, effectively filling essential roles, and promotions. A forced, abstract, insulting curve has no place in the system.

Eat it. EAT IT. Eat it.

Fixing the final fundamental flaw of Microsoft, replicating infrastructure, is just as obvious as the review system fix. We need our internal build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics to be supported and scalable products (preferably commercial products that get proper attention).

There are a few different ways to make our internal systems commercial products.

  • We could take our existing internal systems and productize them. This strategy isn’t likely to work because the systems weren’t designed for commercial use. They are effective, but often fragile, difficult to maintain, poorly documented, and tightly coupled to individual division business practices. Plus, every division has its own.
  • We could use existing commercial products (including open source) to replace our internal systems. This strategy isn’t likely to work either. Microsoft produces many of the largest and most complex software products in the world. Handling that scale and complexity is beyond the capacity of commercially available engineering systems that cover the full software lifecycle.
  • We could exclusively use our existing commercial products—Azure, Visual Studio, and Team Foundation Server—for our own infrastructure. At Microsoft, we call using our own products “eating our own dog food,” or “dogfood” for short. We dogfood all our products except build systems, deployment systems, test systems, localization procedures, monitoring, and engineering analytics. Clearly, it’s time we start dogfooding these as well.

Many Microsoft teams do use Azure, Visual Studio, and Team Foundation Server for their infrastructure, but not exclusively. The build, deployment, test, localization, monitoring, and engineering analytics in these products are still relatively young. It would be painful to switch to them before they fully mature, and most of our customers don’t operate at Microsoft’s scale.

But it was painful to switch to Exchange. It was taxing to switch to Windows Server, SharePoint, Outlook, and Active Directory. It’s always unpleasant—that’s why we call it dogfood. However, going through that pain has led to successful products that scale to meet the demands of any enterprise. Scaling Visual Studio and Team Foundation Server to build Windows, or scaling Azure to deploy, monitor, and operate Bing, won’t happen overnight. However, right now that’s not even our stated goal. It’s time Microsoft made the commitment to dogfood Azure, Visual Studio, and Team Foundation Server.

Eric Aside

Wonder what it would take potentially to re-architect Azure, Visual Studio, and Team Foundation Server so that they scale to the largest workloads? Look no further than the dozens of full-scale teams that design, develop, operate, and maintain each division’s solutions today. Don’t think it’s worth it? Look again.

Tough love

I love Microsoft, but we’re getting top-heavy, overstaffed, and inefficient. Our review system is broken, past experience is disregarded, and money and effort are wasted on incongruent infrastructure.

We can fix these flaws. It’s not that complicated, and it’s not that radical. We can switch from a seemingly arbitrary and insulting curve to a thoughtful people plan. We can switch from division-specific “not invented here” to our own proudly invented and used superstructure.

Don’t like my solutions? Write to me and suggest something better. Wish my ideas were implemented? Use your network within Microsoft to spread them around. Transformation starts with people like us.

It will take time. It will be painful. It will be difficult. But when has Microsoft shied away from a challenge? We have a chance to be the first technology company to remain a market leader from the birth of an industry through to its full maturity. Ford couldn’t do it. Phillips and RCA couldn’t do it. IBM couldn’t do it. Apple is getting there. Microsoft can do it. Let’s beat the odds and make changes that will keep us on top for generations to come.