VirtualWisdom Health - Buffer-to-Buffer Credit Depletion

badge_man · September 8, 2020, 5:59pm

What are Buffer-to-Buffer Credits?

A Buffer-to-Buffer Credit is part of the default link-based flow control mechanism for Class 3 service. These “credits” are used to start and stop a transmitting device in order to keep the receiving device from overflowing its buffer, while still maintaining maximum throughput. The number of credits available at any time is the number of receive buffer “slots” available (initially set during FC Login). As each frame is transmitted, the transmitting node decrements the number of credits remaining. When the transmitting node receives an R_RDY character from the receiver, indicating that a frame has been processed, the transmitting node increments the number of credits remaining. If the remaining credits count reaches zero, the transmitting node stops sending and waits for an R_RDY character.

Perf-b2b-gif1

Why is a Lack of Buffer-to-Buffer Credits a Problem?

If a port’s credit counter (the number of Buffer-to-Buffer Credits it has) becomes zero, it cannot transmit data until it receives an R_RDY character. If it remains in this state for long, it will impact throughput and performance.

Required to identify:

Network Switch Probe (software only) or SAN Performance Probe hardware.

What are Common Causes of a Lack of Buffer-to-Buffer Credits?

Mismatch in sending and receiving speed (2Gb HBA requesting data from 8Gb Storage, 4Gb HBA requesting data from 4 4Gb Storage Ports, for example)
CPU Utilization (due to Backup Servers/Jobs, for example)
PCI Bandwidth (4Gb HBA in a 33MHz PCI slot, for example)
Failing Hardware
Firmware Issues (“HP 4Gb Fibre Channel Pass-Thru Module for HP BladeSystem c-Class May Cause HBA Ports to Exhibit Low SAN Bandwidth or Become Unresponsive Under Certain Conditions,” for example)
Large numbers of extremely small frames being sent
A long-distance link
Problems with the optics (faulty, dirty, mismatched or disconnected cables, patch panels or SFP transceivers) which corrupt or otherwise prevent R_RDY characters from being received accurately by the transmitting node

How to Spot a Lack of Buffer-to-Buffer Credits

Buffer credit information is represented in different ways, depending upon the Fibre Channel switch implementations of the unique manufacturers. For example:

Cisco switches log the number of transitions out of a zero-credit buffer-to-buffer state
Brocade switches log the number of 2.5uS intervals that are spent in a zero-credit buffer-to-buffer state
McData switches do not provide this information via SNMP Fibre Alliance MIB

The buffer information from switches is correlated by VirtualWisdom and its probes over time. There are several distinct differences between what can be monitored with the Network Switch Probe vs. the SAN Performance Probe.

Using the Network Switch Probe:

Metrics are only available on the switch transmit port (Host Read, Storage Write)
Metrics are gathered and reported on a 5-minute basis (assuming default polling interval), displayed as a percentage on an average rate per second over the 5-minute period
Metrics within the 5-minute period could be evenly distributed across the period with minimal impact, or could be in bursts that significantly impact performance
Buffer-to-buffer credit information can be viewed in the Live Reports under Analysis.

For example:

Using the SAN Performance Probe:

Metrics are available on both the transmit and receive channels (limited to the storage link being measured)
Three separate metrics are monitored: Min B2B Credit, Max B2B Credit (max observed), and % time at buffer zero (based on max observed)
Finer granularity is available than with the Network Switch Probe
Buffer-to-buffer credit information for all three metrics can be viewed in the Live Reports under Analysis.

For example:

Correlating Buffer-to-Buffer Credits with Other Events; How to Resolve

When performance or throughput problems are observed, close attention should be paid to the buffer-to-buffer credit metrics. For example, under “normal” performance conditions where metrics such as Read ECT, Read Array Latency and Read MB/sec indicate acceptable throughput, the buffer-to-buffer credit metrics are likely to indicate that very little time is being spent in the zero buffer-to-buffer credit state.

However, if increasing demand and response times are being seen, (as shown below via Read ECT, Read Array Latency and Read MB/sec), then the buffer-to-buffer credit state should be checked. The Live Report below shows a clear correlation between the periods of high demand, high response time and significant time spent with zero buffer-to-buffer credits:

In the Live Report below, the Network Switch Probe metric shows another set of Servers with a worst case % of time at Zero Buffers in a Switch at 0.1%. In this case, it is at an acceptable level.

In the SAN Performance Probe example above, the ultimate cause of the situation is a slow-draining HBA port that is blocking the storage port, including a speed mismatch (HBA at 4G while the storage is at 8G) and a storage port that is struggling to supply the demand. In this case, reducing the HBA Queue Depth setting is likely to help. See the HBA Queue Depth chapter in this guide for further information.