What does a device do with a frame when the CRC value on arrival doesnt match the CRC value placed in the FCS by the source device?

Corrupted frames are the devils spawn.  A few noisy links causing frame corruption can quickly degrade network performance, and troubleshooting them is getting harder.  These integrity errors generally occur when signal noise causes a binary ‘1’ to be mistaken for a binary ‘0’ or vice-versa.  This post takes a look at integrity errors and the impacts of corrupted frames in a cut-through switched network.  Throughout this post I’ll use the term ‘CRC errors’ term to refer to frame integrity errors which were detected by CRC comparison.


FCS and the CRC

The Frame Check Sequence (FCS) is a 4-byte (32-bit) trailer added to the end of every ethernet frame.  The originator of the frame calculates a cyclic redundancy check (CRC) code against the layer-2 frame it is sending, and sends this CRC code as the frame FCS trailer. The receiver verifies the received CRC code against received frame.  If there’s a mismatch, the receiver will drop the received frame and increment the ‘CRC Errors’ counter on the receive interface.

Disambiguation

There are other error checking mechanisms in the TCP/IP stack which often get confused with each other.  The IP header (just the header, not the payload!) and full TCP segment both use a 16-bit one-complement checksum to detect errors.  Although checksums and CRCs fulfill a similar function they are implemented differently and the terms are not interchangable.
The 32-bit CRC used in the layer-2 FCS is a strong error detection mechanism.  However It is worth calling out that no mechanism is foolproof, and that errors can also creep in within the switch itself.  Check out router freak’s excellent post on error detection.
For the rest of this post I’ll refer exclusively to the layer-2 frame check sequence and the cyclic redundancy check used to implement it.

How CRC works in store and forward mode

In a traditional network which uses store and forward switching, the entire frame will be read into the switch’s buffer before a switching decision is made.  In store-and-forward mode the switch can wait for the CRC code in the FCS trailer, and check it against the CRC calculated from the received frame.   The switch will either discard the frame (incrementing it’s Rx CRC-error counter for that interface) or forward the frame out an egress interface.

Troubleshooting store-and-forward CRC errors

Thus, troubleshooting CRC errors in a store-and-forward world allows you to make one very important assumption.  “Received frames with detected corruption do not not get propagated”.

When a corrupt frame is detected then you can deduce that the corruption was introduced either within the sender switch or on the link between the sender and receiver interfaces.  So if you observe CRC-error counters incrementing on a particular link, you know where the problem lies immediately.

You don’t yet know ‘what’ caused the problem but you know approximately ‘where’ the problem is; i.e. at either end of a single physical link.  Troubleshooting steps would look loosely like:

  • Clear counters and monitor
  • Look for other interfaces with CRC’s (multiple ports might indicate a board/fabric problem).
  • Checking transceiver light levels on transmit and receiver
  • Traffic shift, check fiber connectors, clean fibers, replace transceivers, swap slots etc. etc.
  • Verify the  ‘Operational’ switching mode.  Note that some platforms won’t cut-through unless the frame size exceeds 576 bytes for example. Check your platform-specific behavior.

Cut-through and the corruption propagation pandemic

The switch may not be able to prevent the forwarding of corrupt frames when it operates in cut-through switching mode.  Remember that cut-through switches will begin forwarding the frame out the egress interface before the full frame is received.

This reduces switching latency but introduces a thorny problem if the received frame was corrupted.  By the time the CRC is processed the frame is already outbound on the wire.  The frame transmission has to be completed but it needs an FCS trailer.

What to FCS value should the switch use?

Remember that the switch still has to calculate the checksum and append the FCS for this transmitted frame. If it calculates a new CRC value for the known-corrupt frame, the frame corruption would be masked and go undetected until it arrives at its destination.

The compromise here is to ‘stomp’ the outbound CRC.  Stomping the CRC ensures that the next-hop receiver will correctly identify this frame as having a CRC error.  Being honest, I’m not sure how this stomping is actually implemented.  You could set all 1s or you could re-use the received CRC as long as you knew the L2 frame hadn’t changed.

Have a look at Cisco Nexus 5000 stomp procedure.  If the sending device increments the Stomp and Tx CRC error counters you know that the device knowingly propagated CRC-errored frames.   That’s nice, but if you’ve got a large network with a single bad top-of-rack cable then you’d see Tx and Rx CRC’s all across your cut-through switching domain.  The behavior on the next-hop cut-through switch would be the same; mark the Rx/Tx/Stomp counters, but still propagate the frame.   CRC errors, spreading like the plague!

Troubleshooting cut-through CRC errors

Let’s get this straight.  The dodgy links causing the CRC’s will still cause the same level of pain and upset to your customers, and application owners will still observe the impact as IP packet loss.  Corrupt frames will be propagated farther than normal within your network but hopefully there aren’t enough errored frames for that to be a bandwidth concern.

No, the problem here is identifying the source of the CRCs.  Your monitoring system will now detect CRCs at multiple points in the network for a single noisy-link event.  The more ‘truly-errored’ links in your network the harder it will be to trace them back through the network.

Actions and summary

  • Be aware that your CRC troubleshooting approach needs to change if you enable cut-through switching.
  • Monitor every port. It’s very likely that the problems will originate at the edge of your network.  If you see Rx CRC at the edge of your network you’re back to single-link troubleshooting.  If you have trouble monitoring all your links, check out StatSeeker which is very well suited to this job.  There are some good engineer reviews by the Lone SysAdmin, LameJournal and the NetworkingNerd.
  • Treat CRC errors seriously and act fast.  CRC’s really hurt your customers so you should already be reacting fast.  However you need to know that CRC fault-location becomes much harder in a cut-through environment when there are multiple errored links.  So detect and correct noisy links early, or triage and troubleshooting will get harder still.

Here’s an interesting cut-through war-story where the the ether-type was being mangled and thus the dot1q header was not interpreted. This lead to unicast flooding of corrupt frames on the default vlan.
Do you have any war stories? I’d love to hear from you in the comments.

As we learned in the previous lesson, the first step in switches' operational logic is to receive an Ethernet frame from the transmitting node. Depending on the type of switching methodology in use, the switch needs to receive and examine a different number of bytes before going to the next operational step and ultimately switch the frame to the outgoing port or ports. There are two main switching modes supported on Cisco switches:

  • Cut-Through mode, which has two forms:
    • Fragment-free switching
    • Fast-forward switching
  • Store-and-Forward mode

Both switching modes base their forwarding decisions on the destination MAC address of the Ethernet frames. They also learn MAC addresses and build their MAC tables as they examine the source MAC address (SMAC) fields in the Ethernet header as frames are being forwarded. These switching modes differ in how much of the frame must be received and examined by the switch before the frame start being forwarded out the egress port. 

Figure 1. Switching Modes based on Frame Bytes Received

Figure 1 compares each of the three modes and shows how much information must be received in each mode. Let's look at each one in detail.

Store-and-Forward Mode

Historically, the first widely used forwarding method at the Ethernet layer was referred to as "store-and-forward" switching. In this switching method, the frame has to be received entirely before a forwarding decision is made based on destination MAC address lookup. Once received and buffered, the switch will compare the FCS field of the frame against its frame-check-sequence (FCS) calculations to ensure the integrity and correctness of the data. If the CRC values don't match, the frame is marked as invalid and dropped. If the values match, the destination and the source MAC addresses are examined before the frame is forwarded.

This method creates higher latency than the other three and discards frames smaller than 64 bytes(runts) and larger than 1518 bytes (giants) by default.

Figure 2. Example of Store-and-Forward Switching Mode

Figure 2 shows an example of a switch receiving a frame and validating its integrity. Note that it is first received in its entirety before the next actions are performed.

Cut-Through Switching

Ethernet switch that uses cut-through switching can make a forwarding decision as soon as it gets the first couple of bytes of the incoming frame. The switch does not have to wait for the rest of the frame to start switching the frame to the outgoing port.

Fragment-Free Mode

Switches operating in this mode must receive and examine the first 64 bytes of the frame and then make a forwarding decision. Why they need exactly 64 bytes? In an Ethernet LAN, collision fragments are detected in the first 64 bytes. This switching mode is no longer widely used these days, so we only mention it for reference.

Fast-forward switching (referred to just as cut-through)

A cut-through switch can make a forwarding decision as soon as it gets the destination MAC address of the frame, which means it needs only the first 6 bytes. It does not have to wait for the rest of the Ethernet frame to make its forwarding decision. An example of this behavior is shown in Figure 3.

Figure 3. Example of Cut-Through Switching Mode

However, more sophisticated cut-through switches today do not necessarily take this approach. They may parse an incoming frame until they have enough information from the frame content to perform all additional features. For example, if there is an Access Control List (ACL) configured on the interface, the switch must receive the frame up to the IP and transport-layer headers (20 bytes for IPv4 header and 20bytes for TCP header) to match the information there against the interface access list. This means a total of 54 bytes up to that point. Another example would be if there is a quality of service (QoS) configured or any other advanced feature.

Unlike store-and-forward switching, cut-through switching does not drop invalid Ethernet frames. They get forwarded to the next nodes until some device along the path invalidates the FCS of the frame and drops it. 

A primary advantage of this switching approach is that the amount of time the switch takes to start forwarding the packet (referred to as the switch's latency) is way lower than store-and-forward switching.

Configuring and Verifying switching modes

Most modern switch platforms come with cut-through switching mode enabled by default. You can check that using the show switching-mode command.

SW1# show switching-mode Configured switching mode: Cut through Module Number Operational Mode 1 Cut-Through

If you want to enable the store-and-forward mode, you can use the following simple procedure. 

SW1# configure terminal Enter configuration commands, one per line. End with CNTL/Z. SW1(config)# switching-mode store-forward SW1(config)# end SW1# show switching-mode Configured switching mode: Store and Forward Module Number Operational Mode 1 Store and Forward

In Summary

So in summary, the most important points about the different switching modes are:

  • In store-and-forward mode, switches receive and store the entire frame before making any operational decision. This approach is good for keeping the integrity and validity of the frames but creates additional network latency.
  • In cut-through switching mode, switches receive only a fraction of the frame and immediately start making a forwarding decision. In this approach, switches do not drop invalid frames but forward them to the next node. However, the network latency is lower than with the store-and-forward approach.

Neuester Beitrag

Stichworte