What is a Class 3 Discard?
A Class 3 Discard occurs when a switch drops a Fibre Channel frame. The switch notes that it has dropped the frame but no notification is provided to either the sender or the receiver. Higher-level functions on the host or storage array are typically designed to overcome the problem of the missing frame.
What is a Class of Service?
Fibre Channel is capable of providing different levels of service, which define the characteristics of frame delivery. For most enterprise SANs, Class 3 is the most common configuration and does not guarantee routing of frames or have the overhead of hand-shaking that would be required in order to ensure that each frame gets routed.
Why are Class 3 Discards a Problem?
If a Fibre Channel frame is dropped by a switch, no notification is provided to either the sender or the receiver. Higher-level functions at the host or storage array are required to recover from the loss, and the processing can be disruptive to the environment. These dropped frames have several side effects. The multiple full exchange requests attempted in order to complete a single problematic data request will cause increased utilization as well as a significant delay in completing the data request. A high volume of Class 3 Discards may also cause a switch to perform a Link Reset, in order to re-negotiate the number of Buffer-to-Buffer Credits it has available. Leaving the root cause of a Class 3 Discard unresolved also leaves the SAN in a state that is more likely to see a significant diminishment of service or an outage caused by compounding root causes.
Required to identify:
Network Switch Probe (software only), though SAN Performance Probe hardware may be required to resolve.
What are Common Causes of Class 3 Discards?
A Class 3 Discard occurs when the switch receives a frame and is unable to pass it along to the next point on the path to its destination. Class 3 Discards can happen for several reasons:
- The outbound link is being reset and therefore cannot accept any frames for transmission (including during server reboots)
- The source does not have authority to talk to the destination, typically due to zoning restrictions
- The destination no longer exists
- The switch does not have sufficient Buffer-to-Buffer Credits for the outbound link (as with an overloaded target or congested ISL). If Class 3 Discards are being observed due to credit issues, there will be serious performance issues in the SAN fabric.
A cyclical pattern of Class 3 Discards usually indicates either a host or a target that is trying to communicate with something that is no longer a part of the fabric, or that it is no longer allowed to communicate with (un-zoned). For example, a server has a device mounted that it can no longer get to, or a storage array is attempting to communicate with an old control host. In both instances, Class 3 Discards are expected to occur on either the host or target port.
How to Spot a Class 3 Discard
The Network Switch Probe keeps track of the number of Class 3 Discard events which have occurred. This information can be viewed in the VI - Health - Physical Layer report.
For example, a Top X Trend Chart configured to show Storage Port Discards:
Correlating Class 3 Discards with Other Events
Class 3 Discards may occur simultaneously with other events. For example, during a server reboot, Loss of Sync, Loss of Signal and Link Failure / Link Reset events may occur. Each time the server loses its link, any outstanding exchanges in flight are dropped (usually while traversing an ISL), often indicated as Class 3 Discards.
How to Resolve Class 3 Discards
Compare to Link Resets
The first step is to determine whether a Link Reset occurred immediately prior to the Class 3 Discard. If so, it is very likely that the Link Reset caused the Class 3 Discard (possibly as part of a server re-boot) and the investigation should be re-focused to find the cause of the Link Reset. Once the Link Reset is resolved, the Class 3 Discard will likely be resolved as well. If the Class 3 Discard is not immediately preceded by a Link Reset, it may be caused by a zoning or Buffer-to-Buffer Credits issue. For example, a Top X Trend Chart configured to show HBA Port Link Resets in the VI - Health - Physical Layer report:
Check for Zoning Problems
A zoning problem may occur with a request from either a host or a target. A constant stream of Class 3 Discards over time is often the result of a zoning issue on the host side or on an ISL. The first step is to determine whether the requestor is a host or a target. Host Zoning If the requestor is a host, compare the host requests to the zoning table. If the host is requesting a device it is not zoned to, the next step is to fix the invalid zoning. This is likely to resolve the Class 3 Discard problem. If the host requests are valid however, move on to examining the target.
Target Zoning Note: Use the SAN Performance Probe to determine which devices the target is trying to reach. Compare the requests to the zoning table. If the target is requesting a device it is not zoned to, the next step is to fix the invalid zoning. This is likely to resolve the Class 3 Discard problem. If the target requests are valid however, the problem is likely being caused by a lack of buffer-to-buffer credits.
Buffer-to-Buffer Credit Issues
A switch may exceed its Buffer-to-Buffer Credits for a destination either by sending a higher volume of data than the target can process, or by sending a very high number of small frames (management frames for example). In fact, if a very large number of Class 3 Discards occur, a switch may initiate a Link Reset in order to re-negotiate the number of buffer-to-buffer credits it has available. High Utilization The first step is to check the utilization on the outbound link. If the utilization on the outbound link is very high, the likely problem is an insufficient number of buffer-to-buffer credits.
High Volume of Small Frames
If the outbound link utilization is not very high, the next step is to determine whether a large number of very small frames are being sent. To do this, compare the amount of data being sent (in MB per second) with the volume of frames being sent (in frames per second) on the outbound link. If this comparison shows that a large number of very small frames are being sent, the likely problem is an insufficient number of buffer-to-buffer credits.
Other Causes
There are other potential causes but these are generally too complex to diagnose in this document. An example would be Class 3 Discards for a link that spans an FCIP link, where the WAN link drops but the Fibre Channel switch is not aware of the FCIP outage. At this point we recommend engaging Virtual Instruments Professional Services to assist with further troubleshooting support.