High Availability Design

Author Ermias Teffera
Dec 25, 2015
4 min read

It is ugly when networks go down. People live online nowadays. Essentially, the life blood of an organization rests in highly available networks. As you, the network Designer looking to implement High availability design, there are components that every layer of the three Tier Hierarchical Model that cisco wants you to know before carrying out any highly available designs. In this article, I will describe these components to help you design a highly available network.

High Availability (HA) design means your network stays online even in the face of devastating failure. In order to make your network HA there are two principles that need to be taken into consideration. One is redundant physical devices; you have to have at least two physical devices. Second thing is the logical protocols, depending on the requirement of your network environment you have to use the right protocols to makes those devices work redundantly. You can also throw some other things like having redundant uplink carriers or having redundant IT people. You can essentially say "two is one, one is none" as the old saying goes. This is fun to say but at the same time realize when you are in the designing world you are facing the uphill battle of cost. You can essentially make any network redundant given enough money.

There were many occasions where I would sell my left leg (maybe even an arm) to bring the network back online again because everybody was screaming and pulling their hair out. But there isn’t really much I can do because the company didn’t have redundancy and their only gateway caught on fire. Thus, I doubt I even need to make an argument how important it is to make a network redundant. Once you convince the management to spend how-ever much money for redundancy the question will remain what would be the standard design guideline for HA. Well Cisco already has made it clear that there are components that the three tier hierarchical model (Access, Distribution and core) layers has to use in order to make each layer redundant. Let’s start with The Access layer.

The Access Layer devices have to have the redundant supervisors and redundant power supply. The supervisor module is essentially the intelligence of the switch the chassis itself of the switch doesn’t have intelligent in other word it is dumb. Thus by adding redundant supervisor module you are giving the switch two brains. The design of the three layer hierarchical model has uplink connections to the distribution layer switches (as seen on the image), one uplink goes to the primary distribution layer switch and a second uplink goes to the second distribution layer switch. But how fast those links can fail over really depends on how you set up your network, if you use Spanning-Tree protocol (STP) Cisco says your network should recover within 30 - 50 seconds or if you use Rapid Spanning-Tree Protocol (RSTP) the network should recover in about 2 seconds. If you striped off spanning-tree all together and use layer-3 or no switch port design (which is very common approach in most enterprise networks) your network will recover in about 200 milliseconds in ideal design, which means nobody even notices any down time. The other thing I should mention while on Access layer is that the StackWise design, which is the ability to take stackable switches and essentially build your own modular switches with the stackable cable. Some of the recent edition of StackWise also provide the redundant power supply over the StackWise cable.

As you move one layer up to the Distribution Layer, essentially you will have the redundant path to the core layer or down to the access layer and a layer 3 load sharing (you can actually use a Layer 2 technology like STP and RSTP) but layer 3 becomes standard feature set between distribution and core layer switches. By doing that you will get not only the “ninja-fast” failover speed but also (what I like to call) failure domain scoping, how big the failure domain is. If you use layer 2 and somehow you bust out a spanning-tree loop on your network the whole enterprise collapses whereas if you put layer 3 boundaries minimally to different sections of the network you can ideally minimize the failure boundaries. At this layer you also have dynamic routing failover capabilities, this basically means you can tweak the hello timer as required in case of a device failure the others know about it so quick nobody even notices it because you tuned down those hello timers so low. The last point to mention at this layer is the First Hop Redundant Protocol (FHRP), which is a family of three gateway protocols. Hot Standby Router Protocol (HSRP), Virtual Redundancy Router Protocol (VRRP), and Gateway Load Balancing Protocol (GLBP) that allows two or more devices to act as one by creating virtual IP that they both use in case one catches fire the other one will take over, that is FHRP.

Core layer, the goal of core layer is speed and speed only. This layer is considered the backbone of the network and includes the high-end switches and high-speed cables such as fiber cables. This layer of the network does not route traffic at the LAN. In addition, no packet manipulation is done by devices in this layer. Rather, this layer is concerned with speed and ensures reliable delivery of packets. Essentially you have the same thing as the last two layers in terms of placing the redundancy in the architecture.

High Availability Design

VPN Scalability: Building Secure and Expandable Remote and Site-to-Site Networks

Site-to-Site VPN: Connecting Branches Securely Across the WAN

VPN Remote Access: Secure Connectivity for Modern Enterprises