Corrupted frames are the devils spawn. A few noisy links causing frame corruption can quickly degrade network performance, and troubleshooting them is getting harder. These integrity errors generally occur when signal noise causes a binary ‘1’ to be mistaken for a binary ‘0’ or vice-versa. This post takes a look at integrity errors and the impacts of corrupted frames in a cut-through switched network. Throughout this post I’ll use the term ‘CRC errors’ term to refer to frame integrity errors which were detected by CRC comparison. Show
FCS and the CRCThe Frame Check Sequence (FCS) is a 4-byte (32-bit) trailer added to the end of every ethernet frame. The originator of the frame calculates a cyclic redundancy check (CRC) code against the layer-2 frame it is sending, and sends this CRC code as the frame FCS trailer. The receiver verifies the received CRC code against received frame. If there’s a mismatch, the receiver will drop the received frame and increment the ‘CRC Errors’ counter on the receive interface. DisambiguationThere are other error checking mechanisms in the TCP/IP stack which often get confused with each other. The IP header (just the header, not the payload!) and full TCP segment both use a 16-bit one-complement checksum to detect errors. Although checksums and CRCs fulfill a similar function they are implemented differently and the terms are not interchangable. How CRC works in store and forward modeIn a traditional network which uses store and forward switching, the entire frame will be read into the switch’s buffer before a switching decision is made. In store-and-forward mode the switch can wait for the CRC code in the FCS trailer, and check it against the CRC calculated from the received frame. The switch will either discard the frame (incrementing it’s Rx CRC-error counter for that interface) or forward the frame out an egress interface. Troubleshooting store-and-forward CRC errorsThus, troubleshooting CRC errors in a store-and-forward world allows you to make one very important assumption. “Received frames with detected corruption do not not get propagated”. When a corrupt frame is detected then you can deduce that the corruption was introduced either within the sender switch or on the link between the sender and receiver interfaces. So if you observe CRC-error counters incrementing on a particular link, you know where the problem lies immediately. You don’t yet know ‘what’ caused the problem but you know approximately ‘where’ the problem is; i.e. at either end of a single physical link. Troubleshooting steps would look loosely like:
Cut-through and the corruption propagation pandemicThe switch may not be able to prevent the forwarding of corrupt frames when it operates in cut-through switching mode. Remember that cut-through switches will begin forwarding the frame out the egress interface before the full frame is received. This reduces switching latency but introduces a thorny problem if the received frame was corrupted. By the time the CRC is processed the frame is already outbound on the wire. The frame transmission has to be completed but it needs an FCS trailer. What to FCS value should the switch use?Remember that the switch still has to calculate the checksum and append the FCS for this transmitted frame. If it calculates a new CRC value for the known-corrupt frame, the frame corruption would be masked and go undetected until it arrives at its destination. The compromise here is to ‘stomp’ the outbound CRC. Stomping the CRC ensures that the next-hop receiver will correctly identify this frame as having a CRC error. Being honest, I’m not sure how this stomping is actually implemented. You could set all 1s or you could re-use the received CRC as long as you knew the L2 frame hadn’t changed. Have a look at Cisco Nexus 5000 stomp procedure. If the sending device increments the Stomp and Tx CRC error counters you know that the device knowingly propagated CRC-errored frames. That’s nice, but if you’ve got a large network with a single bad top-of-rack cable then you’d see Tx and Rx CRC’s all across your cut-through switching domain. The behavior on the next-hop cut-through switch would be the same; mark the Rx/Tx/Stomp counters, but still propagate the frame. CRC errors, spreading like the plague! Troubleshooting cut-through CRC errorsLet’s get this straight. The dodgy links causing the CRC’s will still cause the same level of pain and upset to your customers, and application owners will still observe the impact as IP packet loss. Corrupt frames will be propagated farther than normal within your network but hopefully there aren’t enough errored frames for that to be a bandwidth concern. No, the problem here is identifying the source of the CRCs. Your monitoring system will now detect CRCs at multiple points in the network for a single noisy-link event. The more ‘truly-errored’ links in your network the harder it will be to trace them back through the network. Actions and summary
Here’s an interesting cut-through war-story where the the ether-type was being mangled and thus the dot1q header was not interpreted. This lead to unicast flooding of corrupt frames on the default vlan. As we learned in the previous lesson, the first step in switches' operational logic is to receive an Ethernet frame from the transmitting node. Depending on the type of switching methodology in use, the switch needs to receive and examine a different number of bytes before going to the next operational step and ultimately switch the frame to the outgoing port or ports. There are two main switching modes supported on Cisco switches:
Both switching modes base their forwarding decisions on the destination MAC address of the Ethernet frames. They also learn MAC addresses and build their MAC tables as they examine the source MAC address (SMAC) fields in the Ethernet header as frames are being forwarded. These switching modes differ in how much of the frame must be received and examined by the switch before the frame start being forwarded out the egress port. Figure 1 compares each of the three modes and shows how much information must be received in each mode. Let's look at each one in detail. Store-and-Forward ModeHistorically, the first widely used forwarding method at the Ethernet layer was referred to as "store-and-forward" switching. In this switching method, the frame has to be received entirely before a forwarding decision is made based on destination MAC address lookup. Once received and buffered, the switch will compare the FCS field of the frame against its frame-check-sequence (FCS) calculations to ensure the integrity and correctness of the data. If the CRC values don't match, the frame is marked as invalid and dropped. If the values match, the destination and the source MAC addresses are examined before the frame is forwarded. This method creates higher latency than the other three and discards frames smaller than 64 bytes(runts) and larger than 1518 bytes (giants) by default. Figure 2. Example of Store-and-Forward Switching ModeFigure 2 shows an example of a switch receiving a frame and validating its integrity. Note that it is first received in its entirety before the next actions are performed. Cut-Through SwitchingEthernet switch that uses cut-through switching can make a forwarding decision as soon as it gets the first couple of bytes of the incoming frame. The switch does not have to wait for the rest of the frame to start switching the frame to the outgoing port. Fragment-Free ModeSwitches operating in this mode must receive and examine the first 64 bytes of the frame and then make a forwarding decision. Why they need exactly 64 bytes? In an Ethernet LAN, collision fragments are detected in the first 64 bytes. This switching mode is no longer widely used these days, so we only mention it for reference. Fast-forward switching (referred to just as cut-through)A cut-through switch can make a forwarding decision as soon as it gets the destination MAC address of the frame, which means it needs only the first 6 bytes. It does not have to wait for the rest of the Ethernet frame to make its forwarding decision. An example of this behavior is shown in Figure 3. Figure 3. Example of Cut-Through Switching ModeHowever, more sophisticated cut-through switches today do not necessarily take this approach. They may parse an incoming frame until they have enough information from the frame content to perform all additional features. For example, if there is an Access Control List (ACL) configured on the interface, the switch must receive the frame up to the IP and transport-layer headers (20 bytes for IPv4 header and 20bytes for TCP header) to match the information there against the interface access list. This means a total of 54 bytes up to that point. Another example would be if there is a quality of service (QoS) configured or any other advanced feature. Unlike store-and-forward switching, cut-through switching does not drop invalid Ethernet frames. They get forwarded to the next nodes until some device along the path invalidates the FCS of the frame and drops it. A primary advantage of this switching approach is that the amount of time the switch takes to start forwarding the packet (referred to as the switch's latency) is way lower than store-and-forward switching. Configuring and Verifying switching modesMost modern switch platforms come with cut-through switching mode enabled by default. You can check that using the show switching-mode command. SW1# show switching-mode Configured switching mode: Cut through Module Number Operational Mode 1 Cut-ThroughIf you want to enable the store-and-forward mode, you can use the following simple procedure. SW1# configure terminal Enter configuration commands, one per line. End with CNTL/Z. SW1(config)# switching-mode store-forward SW1(config)# end SW1# show switching-mode Configured switching mode: Store and Forward Module Number Operational Mode 1 Store and ForwardIn SummarySo in summary, the most important points about the different switching modes are:
|