Facebook rearchitects its data center networks, increases capacity 10x to make room for even more growth

Jordan Novet @jordannovet November 14, 2014 11:00 AM

A Cisco top-of-rack switch in a Facebook data center in Prineville, Ore.

Image Credit: Jordan Novet/VentureBeat

Just outside Des Moines, Iowa, Facebook has developed a whole new way of moving data around its data centers.

The social networking company clearly wants to support an increasing number of users. To that end, it’s redesigned the orientation of the network for its newest functioning data center, in Altoona, Iowa, according to a Facebook blog post today. The new fabric architecture, which increases capacity by a factor of 10, is aimed at supporting traffic between servers.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":1605884,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,cloud,enterprise,social,","session":"A"}']

“We are constantly optimizing internal application efficiency, but nonetheless the rate of our machine-to-machine traffic growth remains exponential, and the volume has been doubling at an interval of less than a year,” Facebook network engineer Alexey Andreyev wrote in the blog post.

Having racked up more than 1.3 billion monthly active users, Facebook has reached enormous scale, and the social-networking company a few years ago opted to optimize its operations by running its own data centers. Along the way, Facebook has gradually tuned up many aspects of the facilities. Now the company has made sweeping changes to the way in which bits flow through the network inside Facebook’s data centers.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Improvements have come to the way Facebook computes, stores, and moves data. Facebook has built hardware for each of these processes. Clearly, the do-it-yourself mindset continues at Facebook given today’s news.

Some of the infrastructure innovations at Facebook scale have trickled down to other companies in the past few years, as startups in data analytics, networking, and storage have created similar technology. Don’t be surprised to see networking companies start to emulate the new fabric.

Above: A switch sits near the top of a rack inside a Facebook data center in Prineville, Ore.

Image Credit: Jordan Novet/VentureBeat

Facebook originally set out to employ a cluster architecture for its networks, with each cluster spanning hundreds of cabinets full of servers, Andreyev wrote. But Facebook ran into problems with that configuration, namely that it required hefty, expensive, proprietary networking hardware to aggregate all the connections. And on top of that, Andreyev explained, “the need for so many ports in a box is orthogonal to the desire to provide the highest bandwidth infrastructure possible.” Plus, depending on just a few of these powerful boxes could be risky.

What’s more, the cluster architecture doesn’t always make sense if Facebook wants to distribute applications beyond a single cluster. And Facebook is rapidly updating its applications while also taking on more users, making the cluster model imperfect. Hence the redesign.

“For our next-generation data center network design we challenged ourselves to make the entire data center building one high-performance network, instead of a hierarchically oversubscribed system of clusters,” Andreyev wrote.

By the way, thinking about networks on a building scale, instead of addressing individual parts of a building, is classic Google, and now Facebook is once more taking cues from the search company in continuing to grow and provide web services for the world.

[aditude-amp id="medium1" targeting='{"env":"staging","page_type":"article","post_id":1605884,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,cloud,enterprise,social,","session":"A"}']

Here’s how Facebook went about putting the model into practice in Altoona.

Above: The new Facebook networking fabric.

Image Credit: Facebook

Facebook disaggregated its network into pods, each of which consists of four “fabric switches” that are together responsible for as many as 48 top-of-rack switches.

“The smaller port density of the fabric switches makes their internal architecture very simple, modular, and robust, and there are several easy-to-find options available from multiple sources,” Andreyev wrote.

To wire together the pods across buildings, Facebook builds one level up and connects the fabric switches to heavy-duty spine switches. And there can be as many as 48 spine switches in a “spine plane.” Each individual data center can contain four spine planes.

[aditude-amp id="medium2" targeting='{"env":"staging","page_type":"article","post_id":1605884,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,cloud,enterprise,social,","session":"A"}']

As for connecting with the outer world, Facebook constructed “edge pods” that feature “edge switches.”

Altogether, the new style sounds like a major win for the company.

“This highly modular design allows us to quickly scale capacity in any dimension, within a simple and uniform framework. When we need more compute capacity, we add server pods,” Andreyev wrote. “When we need more intra-fabric network capacity, we add spine switches on all planes. When we need more extra-fabric connectivity, we add edge pods or scale uplinks on the existing edge switches.”

In step with the new fabric, Facebook has carved out sections in the middle of its new Altoona data center to store the spine and edge gear. And yet, Facebook saved time, even though it was working with a new site plan.

[aditude-amp id="medium3" targeting='{"env":"staging","page_type":"article","post_id":1605884,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"business,cloud,enterprise,social,","session":"A"}']

“In the end, the amount of time for site network turn-up in Altoona — from concrete floor to bits flowing through switches — was greatly reduced,” Andreyev wrote.

Read the full blog post for more detail on the new fabric.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More

Explore

None Business Cloud Enterprise Social