Creatica Wiki: Redundancy and Spanning Tree Protocol STP

Multiple inter-switch links are good for redundancy but would create bridge loops (without STP).

Bridge loops bring the Ethernet network down because the packets will keep looping indefinitely (no TTL), saturating the links and exhausting CPUs of switches. The broadcast packets are the worth.

IEEE 802.1d and PVST+

STP prevents bridge loops, which bring the Ethernet network down because the packets will keep looping indefinitely (no TTL) saturating the links and exhausting CPUs of switches.

STP works by blocking one of the ports on the redundant links. It unblocks them when needed.

STP elects the root bridge by exchanging Bridge Protocol Data Unit packets (BPDUs). BPDU has Burned-In-Address (BIA), a switch’s unique MAC address, and switch priority. Here is the format of BPDU:

Protocol Id

Ver

Msg Type

Flags

Root Id

Root Path Cost

Bridge Id

Port Id

Msg Age

Max Age

Hello Time

Forward Delay

The switch with the lowest priority or lowest BIA (if priorities are equal, default are 32,768) becomes the root bridge. Its all inter-switch ports are put into forwarding state and are called the designated ports. Other switches select the fastest links (the best root path cost) to reach the root bridge. If the cost is the same, then they use the lower BIA or next, the lower designated port id. The ports on these links facing the root bridge (upstream ports) are called root ports. Downstream ports with the best path cost to the root bridge are called the designed ports. If the port is neither a root nor designated one, it becomes a blocking port and prevents the loop. In blocking state it still receives the PBDUs, propagated from the root switch down, to track the topology changes. It initiates a transition to non-blocking state, if it does not receive any BPDU within the max age time (20 sec for Cisco switches, 10 times the Hello interval for BPDU packets) and it does not have Cisco !LoopGuard enabled, which will keep the port in blocking (Loop Inconsistent) state even if no BPDUs are received. The transition to the forwarding state will takes another 30 seconds (15 sec in the listening state and 15 sec in the learning state).

Topology Change Notifications (TCNs) are transmitted (until acknowledged) on a root port to the root switch if a non-!PortFast port changes its state from learning to blocking or becomes disabled or a port changes its state from learning to forwarding and there is at least one designate port. Here is the format of TCN packet:

802.3 / 802.2 Headers

Protocol Id

Ver

Msg Type (0x80)

Padding

FCS

The root switch sets the TC-Flag bit in its BPDUs and transmits them every 2 sec for 35 sec (Max Age + Forwarding Interval). It also reduces the aging time of MAC entries in its address table for affecting VLAN from 300 sec to 15 sec (Forwarding Interval). The other switches do the same upon receiving the BPDU with the TC-Flag set. All switches restore the aging time in 35 sec upon receiving a normal BPDU (TC-Flag is not set).

STP is enabled by default in PVST mode on Cisco switches. Cisco IOS command verification and debug:

show spanning-tree
show spanning-tree summary
show spanning-tree vlan 1 detail
debug spanning-tree event

A unidirectional failure of the link between the designated port and the blocking port, often associated with fibre optical links when one fibre is unplugged or damaged but could also happen with copper wires, may lead to the blocking port not receive BPDUs and transition to the forwarding state, thus creating a bridge loop. Cisco has unidirectional link detection (UDLD) feature that prevents this by putting the port into error disabled state (UDLD must be configured in agressive mode to do that) after 45 sec of unidirectional link failure (3 times UDLD message interval of 15 sec). Therefore, it beats STP by 5 sec. STP (without !LoopGuard) would put the blocking port into the forwarding state in 50 sec. UDLD can be used by other protocols as well. !LoopGuard works similar but it only used by STP.

Configuration and verification commands:

 config# udld {enable | aggressive | disable | message time seconds}
 # show udld neighbors
 # show udld <interface #>

STP has long convergence because of its timers (50 sec). Initial problem detection is 20 sec (max age) plus the transition of a port from listening state to learning and then to forwarding state, 30 sec (2 forwarding intervals of 15 sec each).

Cisco !PortFast feature eliminates 30 sec waiting time for ports that are connected to end points such as computers that don’t send BPDUs. Usually Cisco BPDUGuard is also enabled on these ports to prevent connection of a switch. When BPDUs are received on such port, it would be put into error disabled mode.

Configuration and verification commands:

 config-if# spanning-tree portfast
 config-if# spanning-tree bpduguard enable
 # show running-config | i interface <interface #>

Cisco !UplinkFast feature accelerates transition of a blocked port to a root port upon direct failure of the root port link. The same does the !BackboneFast feature but for indirect failures. The !UplinkFast is meant to be implemented on access layer switches only because it sends dummy multicast to other switches to update their MAC address tables. Distribution or Core layer switches would have hard time sending thousands of these dummy multicasts for their huge CAM tables. Therefore, when !UplinkFast is enabled, the switch priority is increased to 49152 and the cost of the designated port is increased by 3000.

Configuration and verification commands:

 config# spanning-tree uplinkfast
 config# spanning-tree backbonefast
 # show spanning-tree

Cisco !RootGuard prevents a new switch with higher priority or lower BIA from becoming a root switch.

 config-if# spanning-tree guard root
 # show running-config | i interface <interface #>

Cisco BPDUFilter prevents ports from sending BPDUs and also ignores BPDUs received on such ports, if configured on a port basis. If configured globally, it only applies to !PortFast ports, which will lose their !PortFast status upon receiving a BPDU.

 config-if# spanning-tree bpdufilter enable
 # show running-config | i interface <interface #>

STP in a standard IEEE 802.1d form does not account for VLANs. Cisco has its proprietary modification called PVST+ that runs one instance of STP per each VLAN. VLAN id is added to a switch priority. VLANs are added mostly to load balance traffic over all inter-swtich links to avoid wasting the bandwidth for idle redundant links.

IEEE 802.1w and RPVST+

IEEE 802.1w standard (or Rapid STP) was developed to accelerate the convergence time. It includes features such as !PortFast called Edge Port and modified !UplinkFast (dummy multicasts are disabled and switch priority and cost are not changed). Its protocol version is 2. Normal STP version is 0. Full duplex link is considered point-to-point. On these links the election of the root bridge happens without listening and learning stages by exchanging just a couple of BPDUs with Proposal and Agreement Flags. Half duplex link is considered shared which significantly increases its cost but can be configured as p2p. BPDUs are now send in both ways (even by the blocking port) and are treated like keep-alive messages between the switches. Downstream switches will continue to send BPDUs even if they do not receive those from the root switch. The Max Age is decreased to 6 sec (3 times Hello interval, not 10). Three port states (Disabled, Listening and Blocking) are combined into one called Discard. Non-designated (blocking) port became known under two new names: Alternate port that receives BPDUs from a different switch and Backup port that receives BPDUs from itself (probably because two ports are connected to the same hub). Topology change is triggered only when a non-edge port going forwarding. TCN is no longer sent. Instead, the switch that has experienced the topology change, sends its own BPDU with TC-Flag set via its designated and root ports for 4 sec (2 Hello intervals). Other switches forward these TC BPDUs the same way. All switches flush its CAM tables for ports on which this TC BPDU was flooded.

Configuration and verification commands:

 config# spanning-tree mode rapid-pvst
 # show spanning-tree summary

802.1w is backward compatible with 802.1d. Another words, the switch with 802.1w enabled can still talk to 802.1d switch by exchanging regular 802.1d BPDUs. The difference between PVST and PVST+ is that the later will multicast its non-native VLAN BPDUs on the Cisco proprietary musticast address, so that Cisco switches located downstream the non-Cisco switch (not capable of PVST+) can still receive those BPDUs for non-native VLANs.

IEEE 802.1s and MST

Multiple Spanning Tree (MST) is IEEE alternative to Cisco PVST (per VLAN spanning tree) that allows load balancing traffic over all inter-switch links (trunks) and not waste the bandwidth over blocking redundant links. This can be achieved by creating PVST and manipulating costs and priorities per VLAN spanning tree, so that one PVST forwards traffic over one topology and blocks it over another topology (a redundant link); and another PVST forwards traffic over second topology (the redundant link) and blocks it over the first topology. PVST runs one STP instance per VLAN. MST instance has one or more VLANs mapped to it. Default MST instance is 0. Maximum of 16 instances are allowed. Another words, maximum 16 topologies can be created. Each instance of MST runs RSTP. MST must have the same Region name and Revision number for all switches. VLAN-to-MSTI mapping should also be the same. MST has its new cost values for each link bandwidth that are higher than in STP or RSTP. These values can be used with RSTP, if desired. MST changed the BPDU structure. STP Info for all instances are packed into one BPDU. Region Name, Revision Number, VLAN-to-MSTI mapping MD5 digest and Instance-0 STP Info are always present in any BPDU. If more than one instance is defined, then all remaining instances STP Infos are appended to the BPDU as Mrecords.

Configuration and verification commands:

 config# spanning-tree mode mst
 config# spanning-tree mst configuration
 config-mst# name <Region_Name>
 config-mst# revision <number>
 config-mst# instance <instance #> vlan <vlans>
 config-mst# exit
 # show spanning-tree summary

MST is backward compatible with RSTP and STP, collectively called Common Spanning Trees (CSTs). Whatever STP protocol is running on a switch connected to a Boundary port, will be matched by MST switch by sending the appropriate respected BPDU. Once the port role is determined for the instance-0 (for example, forwarding designated port) on the boundary port, then all other instances have to perform the same role; another words, no load balancing on boundary ports. Instance-0 is called Internal Spanning Tree (IST), the other instances are all called MSTs. IST elects two root bridges: one is a Main Root Bridge, the bridge for CST and IST and another root bridge for the region, called IST Master or CIST Regional Root. Within the MST Region all MST switches select their root ports based on the location of IST Master. If CST bridge is located inside the MST Region, then it is also the IST Master. If CST Root Bridge is located outside of MST Region (on non-MST bridge), then the MST switch with the lowest cost to that CST Root Bridge will become the IST Master.