The internal response

During the spring of 2006, the engineers were working on a transition to a new motherboard, code-named Zephyr. But they postponed that transition to work on fixing the “bone pile” issues and “maximize the yield,” according to an email circulated by engineering manager Harjit Singh to the hardware team on March 10, 2006. The yield at that point was an abysmal 50 percent on the first pass. When the bad machines were reworked within the factory, the yield went up to 75 percent –- hardly acceptable. Singh had said the process in the factories was “not repeatable” at a time when they were scheduled to triple production in the coming months. That meant that something could go slightly wrong and Microsoft would have no idea how to fix the problem.

Microsoft finally started throwing more engineers at the problem. It took engineers off projects such as reducing the costs of its wireless controllers because, Singh wrote, “if we don’t have the consoles, we don’t need the peripherals.” Marc Whitten was temporarily assigned to work on evaluations for adding new heat sinks to address thermal issues on the big chips. It was essentially “all hands on deck” for engineers, who were expected to devote 75 percent of their time to the yield/bone pile issues. But, again, those engineers didn’t achieve magical fixes. And it was late in the game.

The company pushed to stir up sales. Because Sony appeared to be falling off schedule a second time, the Microsoft executive team challenged the engineers to improve shipment targets by 25 percent through June, 2006. The team decided to postpone the launch of Zephyr (a board with HDMI connector for better video quality) and concentrate on shipping Xenon boards. The target was to hit 80 percent first-pass yield, but that wasn’t reaslistic.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

During the fall, Microsoft offered bundles, or packages of games and consoles together, which effectively amounted to a price cut. Some analysts accused Microsoft of “stuffing the channel,” or shipping more consoles to retailers than they wanted or needed for the holidays. Microsoft denied those claims.

“We’re a very responsible company,” Moore said to me in the May, 2007 interview. “How do you determine who stuffed the channel? What is the optimal number of units? We’re not in a position to force the product upon the retailers. It wouldn’t be the console wars without conspiracy theories.”

But problems with manufacturing continued. The yield on the consoles was only 85 percent at the beginning of 2008, not at the 90-percent plus level that was expected at this point in the console cycle. That meant there was an extra billion dollars in cost that Microsoft hadn’t anticipated. But, again, Moore at the time said that the manufacturing problems Microsoft had were not enough to ruin the business model.

Dead Rising

Robert Delaware, a former game and hardware tester at one of Microsoft’s contractors, Redmond, Wash.-based VMC, saw the problems popping up in the spring of 2006 as he was testing consoles and new games. Like many testers, he was in the dark about the overall problems and only saw a small slice of the efforts to debug and fix consoles. He found a bug that could cause a reproducible crash on every game he tested.

“If you coordinated the music player with the dashboard, you could get almost every 360 to lock up,” he said in an interview. “I did it first on a combo DVD/audio disk. With NBA 2K6, you would select the music. The screen went black.”

The NBA 2K6 flaw resulted in the Red Rings of Death. Delaware, who as of this writing works at Microsoft as a game tester, agreed to go on the record for this story because he said he believes passionately in his work, which involved painstakingly playing games over and over again in order to uncover bugs. But he also believed passionately in the Xbox 360 gamer community. He is one of those thousands of employees who was good at his job, loved the games, and sought to push Microsoft to excellence. But he said he found it difficult to stay silent on problems that caused a number of his friends a great deal of consternation. He said he realized that speaking out could cost him his job and he plans on asking Microsoft to forgive him for it. He was courageous in coming forward, and he was not the source of emails cited in this story.

When the Capcom game “Dead Rising” (pictured left) shipped in the summer of 2006, another flaw emerged. Console owners reported putting the game disks into their machines and then seeing a breakdown. One of the reasons was the process by which Microsoft updated the Xbox Live online service. Many of the system owners were not connected online so they didn’t get live fixes downloaded to their machines. Thus, Microsoft had to ship Xbox console fixes with some of the consoles. In the case of Dead Rising, the game came with a bug fix that actually crippled some consoles. Microsoft often did a couple of firmware updates a year, and sometimes they led to broken machines.

Delaware was convinced that at least some of the problems reported about Dead Rising were related to the “2858 dashboard update” for Xbox Live, which he believed was embedded in the game. Another explanation was that games such as Dead Rising demanded more from the consoles, which gave out under stress. Delaware believed that the practice of embedding console and dashboard updates in game disks is responsible for some hardware failures.

In October, 2006 Delaware was concerned about how much information Microsoft was sharing with its testing partner and offered a warning about the Xbox Live updates. “With the upcoming Wii and PS 3 launches, MS cannot afford the bad PR that might result from Update related issues,” he said. “Asking these questions from the start might help to prevent possible criticism of our testing process, in the event of update-related problems. The last thing VMC needs is to take the blame for problems that Microsoft has (known) about from the start.”

Shooting too high, moving too fast, stretched too thin?

So what exactly was wrong with the machines? As time would reveal, there was no single reason for the failures, though many of the problems could be blamed on the ATI graphics chip, which could overheat so much it warped the motherboard. This put stress on bad solder joints, causing them to fail early in the machine’s life. Sometimes the heat sinks on top of the GPU were put on the wrong way, resulting in heat problems. Finally, games would sometimes crash because of sub-par memory. Infineon had been brought aboard as the second supplier behind Samsung for the GDDR3 memory used in the Xbox 360. This new kind of memory chip was specified for 700 megahertz, but the Infineon parts were falling short of that target. Microsoft had to set up a line for sorting through the good parts and the bad parts, contributing to a shortage of consoles.

Problems with the DVD drive also lasted longer than expected. And the console was also one of the first products that had to meet new environmental standards in Europe, which prohibited the use of lead in solder (which, when melted, fuses electronic components together). Paul Wang, a Microsoft test engineer on the Xbox 360, said in a speech in 2007 before a Silicon Valley engineering group that the lead-free solder created a lot of problems.

Perhaps Microsoft had too many balls in the air. As it launched the Xbox 360, it was already planning other big related projects. Cobalt was a next-generation HD-DVD movie player (pictured above) that Microsoft would launch in the holidays of 2006 as a $199 add-on. A project code-named Zephyr (later renamed the Xbox 360 Elite) would add a more expensive hard disk drive with 120 gigabytes of storage and a new HDMI connector. And the company’s engineers were hard at work with suppliers on a project code-named Falcon, which was an effort to create a motherboard, or main circuit board inside the box, to reduce the costs of the console itself during 2007.

“That was really annoying, that they had us working on so many things,” an engineering source said.

As the engineers wound down one task, they had to move to the next. There was no team set aside to deal with the low yields, in part because the poor yields hadn’t been anticipated.

Fessing up

In September, 2006 that the company made a rare admission. It said in a statement that the quality of the consoles it made during 2005 wasn’t as high as it expected and therefore it would extend the policy of free replacement for consoles made during 2005, even though the warranties had expired.

While some industry observers expected Microsoft to cut the costs on the console and offer consumers a price cut during the second holiday season, that was never in the plans. Instead, the company was ramping up to offer a new accessory, an HD-DVD drive that could play next-generation DVD movies for the low price of $199. That was meant to parry Sony’s PlayStation 3, which had a Blu-ray next-generation DVD drive built into the system.

Fortunately, Sony was stumbling. Besides being a year late, the costs on the Blu-ray drive were so high that Sony had to price its consoles at $499 and $599, far above Microsoft’s prices of $299 and $399. Market analysts estimated the cost of Sony’s box was around $800 or more. Microsoft didn’t need to offer a price cut after all, particularly since Sony was struggling to build enough consoles for its fall launch.

It was looking like, even with the manufacturing problems, Microsoft was going to do just fine. In mid-November 2006, about 645,079 units, or 8.8 percent of the total units shipped, had been returned for repair. Sony had been more conscientious about delaying its launch when the product wasn’t ready. JackTretton, the president of Sony’s U.S. games division, said he was proud that Sony waited. But it was going to pay for that cautiousness.

The three Microsoft contract manufacturers had now ramped up to full production. They produced more than 12 million consoles. In early 2007, Peter Moore announced that Microsoft had hit its revised target and had shipped more than 10.4 million consoles into the market before the end of 2006. It only sold through 9 million consoles. There were a lot of leftover consoles sitting on shelves.

At about the same time, in December, 2006, Microsoft took another step to appease consumers who had failed machines. It said that it was extending the warranty of free repair for all consoles made for customers in the U.S. and Canada. Now the machines would carry a one-year warranty, not just 90 days.

A drastic measure

Microsoft decided to shut down manufacturing of the Xbox 360 in January, 2007. Between January and June, it didn’t build any new machines. The reason was partly because it made too many machines earlier, but the other reason was to track down the source of its quality problems. It did that even though it was launching a new version of the console, dubbed the Xbox 360 Elite (pictured left), with a bigger hard drive and a new HDMI connector for $480. It had a new board, dubbed Zephyr, inside. That new black version of the console was made in limited quantities for hardcore fans who might otherwise be attracted to buying the most expensive PS 3.

Since it had an oversupply, Microsoft had a cushion. It could afford to suspend manufacturing as it sold off its inventory of consoles. The company stopped promoting the console heavily and shaved a million units off the goal for the fiscal year ended June 30, 2007. The problem was that Nintendo’s Wii was starting to catch fire. Nintendo had come up with a cheap and innovative console, which was proving extremely popular at $250, about $50 to $150 cheaper than Microsoft’s Xbox 360 models. During the six-month period ending June 30, 2007, Microsoft would sell only 1.2 million consoles. It was a disastrous slowdown, compared to the Wii’s sell-out success.

In January, 2007 at the Consumer Electronics Show, Bill Gates announced that the Xbox 360 would be capable of serving as an IPTV set-top box, meaning that telephone companies could use it to deliver video programming and other services that could compete with cable TV providers. Gates also said that because the Xbox 360 had beaten the PS 3 to market and didn’t include a hard drive on every machine, Microsoft would be able to ride a silicon cost-reduction curve faster than Sony, which saddled every unit with Blu-ray and hard drive costs. At any given point, the Xbox 360’s costs will be lower and more easily reduced, giving Microsoft a fundamental advantage over Sony in pricing, Gates said.

The Cassinghams stir discontent

The topic of defects flared up again in February, 2007, when Rob and Mindy Cassingham (pictured left) of Moab, Utah, revealed that they had the frustrating experience of returning seven defective Xbox 360s. They were die-hard fans who had driven across the country to be present at the launch of the Xbox 360.

They used several in their retail gaming center, where kids came in to play against each other. That prompted critics to blame the Cassinghams for over-using the machines. But several three of the machines that broke down were actually personal machines that the Cassinghams used in their own homes. Consumer outrage flared, and Moore again offered apologies to customers. One game magazine, 360 Gamer, conducted a poll that found many console owners reported more than one console failure.

Fixing bugs in the trenches

Microsoft quietly acknowledged that it was no longer using Wistron as a manufacturing partner. In April, 2007, Larry Yang, the general manager of Xbox division hardware, said in a group email that he was resigning that job and would take another position within Microsoft. He would still stay and help Microsoft get out its most important redesign, dubbed Falcon.

Falcon was the code name for the cost-reduced Xbox 360 that included a major redesign of its CPU. Scheduled for once every couple of years, the transition to new chips was a big deal. Microsoft’s partners had been making their chips with 90-nanometer technology since 2005. Now they would make the CPUs with 65-nanometer technology, meaning the widths between the circuits were smaller. With smaller circuits, the same chip design can fit on a smaller amount of space. With less space, the chip can be manufactured with less material, resulting in lower costs. Quality also gets better since it’s easier to make smaller chips than big chips. Shifting to a new generation of chips was thus the best way to reduce the cost of a game console and improve its quality.

An email sent out by S. Srini, director of Xbox manufacturing, on May 27, 2007, said that the overall yield of units in production was 85 percent. But the email said that the IBM microprocessors were still exhibiting “excessive failures” of 4 percent to 8 percent because of heat problems. There was still no “fresh start” manufacturing going on.

The hardware engineers worked mainly to address the overheating of the graphics chip from ATI. The chip was overheating and causing both the chip and the circuit board to warp, cracking the joints that were held together by the lead-free solder. Given enough time, many units will suffer this problem. To fix the problem, Microsoft put epoxy on both the IBM microprocessor and the ATI graphics chip. They also took out a 50 cent extruded aluminum heat sink (which sits on top of a chip and dissipates heat away from it) from the graphics chip and replaced it with a $5 heat sink that could do the job more efficiently. They also used a pipe to move heat in front of a fan.

Beyond the temporary solution of the heat sink change, Microsoft made more changes. Microsoft tried to get Falcon out as fast as it could. While the system box would remain the same size, there would be fewer components inside. That, in turn, would leave more room for air flow and to reduce the use of fans in the system, which meant it wouldn’t be as noisy as before. When rivals such as Sony made this kind of switch before, they had big shortages because it was always difficult to make the transition.

Engineers were under pressure from the top executives to describe the nature of the problem. When the engineers finally had the situation under control, Microsoft finally announced its free replacement program on July 5, 2007.

“Hopefully consumers will recognize we’re trying to do the right thing,” Moore said in an interview that day. “It’s a courageous step because it is not an inconsequential step. We are not burying our heads in the sand.”

The announcement put a dent in the cottage industry for repairing Xbox 360s (such as the book pictured at left). Even so, in early 2008, companies such as game retailer GameStop were making a fortune repairing thousands of Xbox 360s a week and then returning them as refurbished units to store shelves at a discounted price. Most of GameStop’s repairs were successful and involved re-soldering the graphics chip to the main board; unsuccessful repairs were passed on to Microsoft.

Peer Schneider, vice president of content publishing at gamer web site IGN, said, “The extended warranty and the reimbursement policies are important steps to win back consumer trust. The only question from Xbox fans is: what took Microsoft so long?”