当前位置:网站首页>Ethtool principle introduction and troubleshooting ideas for network card packet loss (with ethtool source code download)
Ethtool principle introduction and troubleshooting ideas for network card packet loss (with ethtool source code download)
2022-07-05 07:08:00 【Enlaihe】
Table of Contents
1. Understand the process of receiving packets
Transfer the packet received by the network card to the host memory (NIC Interact with the driver )
Notify the system kernel to process ( Driving and Linux Kernel interaction )
Network card interrupt processing function
4. Packet loss troubleshooting ideas
appendix A:ethtool Description of common parameters of the command
appendix B:ethtool - utility for controlling network drivers and hardware
It was recorded before because LVS Network card traffic load is too high, which leads to packet loss of soft interrupt ,RPS and RFS Network card multi queue performance tuning practice [1]- Search the article on the Internet , For ordinary people, the probability of meeting is not high when the pressure is not high . The topic I want to share this time is the troubleshooting idea of the relatively common phenomenon of packet loss in server network cards , If you want to understand the idea of point-to-point packet loss solution, it may cover a wide range , You might as well refer to the previous article first How to use MTR Diagnose network problems [2]- Search the article on the Internet , about Linux The commonly used network card packet loss analysis tool is naturally ethtool.
ethtool Used to view and modify network devices ( Especially wired Ethernet devices ) Drive parameters and hardware settings of . You can change the parameters of the Ethernet card as needed , Including automatic negotiation 、 Speed 、 Parameters such as duplex and local area network wake-up . Through the configuration of Ethernet card , Your computer can communicate effectively through the network . The tool provides a lot about connecting to your Linux Information about the Ethernet device of the system .
1. Understand the process of receiving packets
Here is the analysis of meituan technical team , Thank you
Receiving packets is a complex process , Involves a lot of underlying technical details , But the following steps are roughly required :
- The network card receives a packet .
- Transfer the packet from the network card hardware cache to the server memory .
- Notify the kernel to process .
- after TCP/IP Protocol layer by layer .
- Application through
read()
fromsocket buffer
Reading data .
Transfer the packets received by the network card to the host Memory (NIC Interact with the driver )
NIC After receiving the packet , First, you need to synchronize the data to the kernel , The bridge in the middle is rx ring buffer
. It is from NIC An area shared with the driver , in fact ,rx ring buffer
What is stored is not actual packet data , It's a descriptor , This descriptor points to its real storage address , The specific process is as follows :
- The driver allocates a buffer in memory to receive packets , be called
sk_buffer
; - The address and size of the above buffer ( Receive descriptor ), Add to
rx ring buffer
. The buffer address in the descriptor is DMA The physical address used ; - The driver notifies the network card of a new descriptor ;
- Network card from
rx ring buffer
Take out the descriptor , So as to know the address and size of the buffer ; - The network card receives a new packet ;
- The network card passes the new packet through DMA Write directly to
sk_buffer
in .
When the processing speed of the driver cannot keep up with the packet receiving speed of the network card , The driver has no time to allocate the buffer ,NIC The received packet cannot be written in time sk_buffer
, There will be accumulation , When NIC When the internal buffer is full , Some data will be discarded , Cause packet loss . This part of packet loss is rx_fifo_errors
, stay /proc/net/dev
It is embodied in fifo Field growth , stay ifconfig It is embodied in overruns Index growth .
Notify the system kernel to process ( Driving and Linux Kernel interaction )
This is the time , The packet has been transferred to sk_buffer
in . As mentioned above , This is a buffer allocated by the driver in memory , And through DMA Written in , This way does not depend on CPU Write the data directly to memory , Means for the kernel , In fact, I don't know that there are new data in memory . So how to let the kernel know that new data has come in ? The answer is to interrupt , Interrupt to tell the kernel that new data has come in , And subsequent treatment is required .
Mention interruption , It involves hard interrupt and soft interrupt , First, we need to briefly understand the differences between them :
- Hard interrupt : Generated by the hardware itself , It's random , The hard interrupt is CPU After receiving , Trigger execution of interrupt handler . Interrupt handlers only handle critical 、 Work that can be handled in a short time , The remaining time-consuming work , It will be put after the interrupt , Done by soft interrupts . Hard interrupts are also known as the top half .
- Soft interrupt : Generated by the interrupt handler corresponding to the hard interrupt , It is often implemented in the code in advance , No randomness .( besides , There are also application triggered soft interrupts , It has nothing to do with the network card packet receiving discussed in this article .) Also known as the lower half .
《Linux Hard interrupt and soft interrupt 》
When NIC Pass the packet through DMA Copy to the kernel buffer sk_buffer
after ,NIC Immediately initiate a hardware interrupt .CPU After receiving , First go to the top half , The interrupt handler corresponding to the network card interrupt is a part of the network card driver , Then it initiates a soft interrupt , Enter the lower half , Start spending sk_buffer
Data in , Give it to the kernel protocol stack for processing .
《DPDK Network card packet receiving process 》
By interrupting , It can quickly and timely respond to network card data requests , But if there's a lot of data , Then a large number of interrupt requests will be generated ,CPU Most of the time is busy dealing with interrupts , Efficiency is very low . To solve this problem , Now the kernel and driver adopt a method called NAPI(new API) Data processing in a way , Its principle can be simply understood as interrupt + polling , When there's a lot of data , After an interrupt, a certain number of packets are received through polling and then returned , Avoid multiple interruptions .
2. ifconfig explain
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.135 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::20c:29ff:fe9b:52d3 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:9b:52:d3 txqueuelen 1000 (Ethernet)
RX packets 833 bytes 61846 (60.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 122 bytes 9028 (8.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
(1) RX errors
Indicates the total number of errors received , This includes too-long-frames error ,Ring Buffer Overflow error ,crc Check error , Frame synchronization error ,fifo overruns as well as missed pkg wait .
(2) RX dropped
Indicates that the packet has entered Ring Buffer, But because of the lack of memory and other system reasons , Cause to be discarded in the process of copying to memory .
(3) RX overruns
According to the fifo Of overruns, This is because Ring Buffer(aka Driver Queue) Transmission of IO Greater than kernel Can handle IO As a result of , and Ring Buffer It means to initiate IRQ The piece before the request buffer. Obviously ,overruns The increase means that the packet does not arrive Ring Buffer It was discarded by the physical layer of network card , and CPU The processing interruption that cannot be ignored is caused by Ring Buffer One of the reasons for being full , The problem with the machine above is because interruprs Uneven distribution ( It's all under pressure core0), Didn't do affinity And the packet loss caused by .
(4) RX frame
Express misaligned Of frames.
3. How the NIC works
If the above process of receiving packets is not detailed enough, you can see the pure text explanation
Network card packet receiving
On the network cable packet First, it is obtained by the network card , The network card will check packet Of CRC check , Guarantee integrity , And then packet Head removal , obtain frame. The network card will check MAC Purpose in package MAC Address , If it is the same as this network card MAC If the address is different, discard ( Except for hybrid mode ).
Network card will frame Copy to the inside of the network card FIFO buffer , Trigger hardware interrupt .( if there be ring buffer Network card of , As if frame Can exist first ring buffer Trigger software interrupt again in ( The next article will explain in detail Linux in frame The direction of ),ring buffer It is shared by network card and driver , It's the memory in the device , But it is visible to the operating system , Because I saw linux kernel Source code In the network card driver is to use kcalloc To allocate space , therefore ring buffer There is usually an upper limit , And this one ring buffer size, It should be able to store frame The number of , Not byte size . Other systems ethtool command It doesn't change ring parameters To set up ring buffer Size , I don't know why , Maybe the driver doesn't support .)
Network card driver through hard interrupt processing function , structure sk_buff, hold frame From the network card FIFO Copy to memory skb in , Next, let the kernel handle .( Support napi Your network card should be placed directly on ring buffer, Do not trigger hard interrupts , Use soft interrupts directly , Copy ring buffer The data in , Directly deliver to the upper layer for treatment , Each network card can handle in a soft interrupt processing process weight individual frame)
In the process , Network card chip pair frame the MAC Filter , To reduce the system load .( Except for hybrid patterns )
Network card contract
The network card driver will IP Package addition 14 Bytes of MAC head , constitute frame( no CRC).Frame( no CRC) Contains the sender and the receiver MAC Address , Because it is created by the driver MAC head , So you can enter the address at will , You can also camouflage the host .
The driver will frame( no CRC) Copy to the buffer inside the network card chip , Handled by the network card .
The network card chip will not be completely completed frame( Lack of CRC) Once again, it is encapsulated as something that can be sent packet, That is to add header synchronization information and CRC check , And throw it on the cable , Just one IP The message was sent , All network cards connected to the network cable can see this packet.
Network card interrupt processing function
Each device that generates an interrupt has a corresponding interrupt handler , Is part of the device driver . Each network card has an interrupt handler , Used to notify the network card that the interrupt has been received , And copy the packets in the buffer of the network card to the memory .
When the network card receives packets from the network , You need to notify the kernel that the packet has arrived . The network card immediately sends an interrupt . The kernel responds by executing the interrupt handling function registered by the network card . The interrupt handler starts execution , Notify hardware , Copy the latest network packets to memory , Then read more packets from the network card .
These are important 、 Urgent and hardware related work . The kernel usually needs to quickly copy network packets to the system memory , Because the cache size of network packets received on the network card is fixed , And compared with the system memory is much smaller . So once the above copying action is delayed , It will inevitably cause network card FIFO Buffer overflow - The incoming packets occupy the cache of the network card , Subsequent packets can only be discarded , This should also be ifconfig Inside overrun The source of the .
When the network packet is copied to the system memory , The interrupted task is completed , At this time, it returns control to the program running before the system interruption .
Buffer access
The kernel buffer of the network card , Is in PC In the memory , Controlled by the kernel , And the network card will have FIFO buffer , perhaps ring buffer, This should distinguish the two .FIFO The relatively small , If there is data in it, the data will be stored in the kernel buffer as much as possible .
- Buffer in network card Neither belongs to kernel space , Nor does it belong to user space . It belongs to hardware buffering , Allow a buffer between the network card and the operating system ;
- Kernel buffer In kernel space , In memory , For kernel programs , As a data buffer for reading from or writing to hardware ;
- User buffer In user space , In memory , For user programs , As a data buffer for reading from or writing to hardware ;
- in addition , In order to speed up data interaction , You can map kernel buffers to user space , such , Kernel programs and user programs can access this section at the same time .
For having ring buffer Network card of ,ring buffer It is shared by the driver and the network card , So the kernel can directly access ring buffer, General copy frames Copy of to your own kernel space for processing (deliver To the upper layer agreement , Then one by one skb Is in accordance with the skb Pointer passing method of , Until the user gets the data , therefore , about ring buffer network card , A large number of copies occur in frame from ring buffer Pass it to the computer memory controlled by the kernel ).
4. Packet loss troubleshooting ideas
The network card works in the data link layer , Data link layer , Will do some verification , Package into frames . We can check whether the verification is wrong , Determine if there is a problem with the transmission . Then from the software level , Whether the packet is lost because the buffer is too small .
Check the hardware first
A machine often receives an alarm of packet loss , First, let's see if there is any problem with the bottom layer :
(1) Check whether the working mode is normal
[[email protected] ~]# ethtool eth0 | egrep 'Speed|Duplex'
Speed: 1000Mb/s
Duplex: Full
(2) Check whether the inspection is normal
[[email protected] ~]# ethtool -S eth0 | grep crc
rx_crc_errors: 0
Speed,Duplex,CRC No problem with anything like that , It can basically eliminate physical interference .
overruns and buffer size
for i in `seq 1 100`; do ifconfig eth2 | grep RX | grep overruns; sleep 1; done
RX packets:346547657 errors:0 dropped:0 overruns:35345 frame:0
-g –show-ringQueries the specified ethernet device for rx/tx ring parameter information.
-G –set-ringChanges the rx/tx ring parameters of the specified ethernet device.
ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256
ethtool -G eth0 rx 2048
ethtool -G eth0 tx 2048
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 2048
RX Mini: 0
RX Jumbo: 0
TX: 2048
Red Hat Official solution
Issue
Why rx_crc_errors
incrementing in the receive counter of ethtool -S
output?
$ ethtool -S <Interface_name> | grep -i error
rx_error_bytes: 0
tx_error_bytes: 0
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 9244
rx_align_errors: 0
Resolution Resolution
- Change the cable.
- Check switch configuration.
- Change the network interface card.
- Replace the cable .
- Check the switch configuration .
- Replace the network interface card .
Root Cause The root cause
- Most of the time incrementing the value of
rx_crc_errors
means the problem is inLayer-1
of the networking model.- When a packet is received at the interface, it goes through a data integrity check which is called
cyclic redundancy check
. If the packet fails in that check, it is marked asrx_crc_errors
.- The switch was forcing the
NIC
to operate inhalf-duplex
mode. Fixing the switch to tell theNIC
to operate infull-duplex
mode have resolved the issue.
- in the majority of cases , increase rx_crc_errors The value of means that the problem lies in the... Of the network model 1 layer .
- When a packet is received on the interface , It will undergo data integrity checks , This is called cyclic redundancy check . If the packet fails in this check , Mark it as rx_crc_errors.
- Switch force NIC Run in half duplex mode . Fix the switch to inform NIC Running in full duplex mode has solved this problem .
Diagnostic Steps Diagnostic steps
Check ethtool -S
output and find where are the drops and errors.
$ ethtool -S <Interface_name> | grep -i error
rx_error_bytes: 0
tx_error_bytes: 0
tx_mac_errors: 0
tx_carrier_errors: 0
rx_crc_errors: 9244 >>>>>>
rx_align_errors: 0
Check the numbers corresponding to rx_crc_errors
.
ethtool p1p1
Settings for p1p1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: 10000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
Shows p1p1 The interface type of , Connection mode , Speed and other information , And whether the network cable is currently connected ( If it's a cable Supported ports Namely TP, If it is optical fiber, it shows Fiber), Here are some examples 3 Key words
Supported ports: [FIBRE]
Speed: 10000Mb/s
Link detected: yes
ethtool -S p1p1 | grep -i error
rx_errors: 0
tx_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
rx_length_errors: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_csum_offload_errors: 0
ethtool -p <Interface_name>
ethtool -p eth0
ethtool -i p1p1
driver: ixgbe
version: 5.1.0-k-rh7.6
firmware-version: 0x80000960, 18.3.6
expansion-rom-version:
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
ethtool -s eth0 speed 100
Reference article
《ethtool Principle introduction and troubleshooting ideas for network card packet loss 》 Simple books
《ethtool Principle introduction and troubleshooting ideas for network card packet loss 》 WeChat
《DPDK examples ethtool-app Completely annotate 》
《Linux Check the network port bandwidth status (ifconfig,netstat,ethtool)》
appendix A:ethtool Description of common parameters of the command
Parameters | explain |
-a | Check the receiving module in the network card RX、 Sending module TX and Autonegotiate Module status : start-up on or Discontinue use off. |
-A | Modify the network card Receiving module RX、 Sending module TX and Autonegotiate Module status : start-up on or Discontinue use off. |
-c | display the Coalesce( polymerization 、 union ) information of the specified ethernet card. Aggregate network port information , Make it look more regular . |
-C | Change the Coalesce setting of the specified ethernet card. Modify network card aggregation information . |
-g | Display the rx/tx ring parameter information of the specified ethernet card. Display the receiving of network card / Send ring parameters . |
-G | Change the rx/tx ring setting of the specified ethernet card. Modify the receiving of network card / Send ring parameters . |
-i | Display network card driver information , Such as the name of the driver 、 Version, etc . |
-d | Show register dump Information , Some network card drivers do not support this option . |
-e | Show EEPROM dump Information , Some network card drivers do not support this option . |
-E | Modify NIC EEPROM byte. |
-k | Display network card Offload The state of the parameter :on or off, Include rx-checksumming、tx-checksumming etc. . |
-K | Modify NIC Offload The state of the parameter |
-p | Used to differentiate between ethX The physical location of the corresponding network card , The common method is to make the network card port Upper led Keep flashing ;N Indicates the duration of the network card flash , In seconds . |
-r | If auto-negotiation The status of the module is on, be restarts auto-negotiation. |
-s | Modify some configuration of network card , Including network card speed 、 Simplex / Full duplex mode 、mac Address, etc . add -s Option changes will take effect |
-S | Show NIC- and driver-specific The statistical parameters of , Such as network card receiving / Number of bytes sent 、 receive / Number of broadcast packets sent, etc . |
-t | Let the network card perform self-test , There are two patterns :offline or online. |
appendix B:ethtool - utility for controlling network drivers and hardware
ethtool - utility for controlling network drivers and hardware
Introduction
ethtool is the standard Linux utility for controlling network drivers and hardware, particularly for wired Ethernet devices. It can be used to:
- Get identification and diagnostic information
- Get extended device statistics
- Control speed, duplex, autonegotiation and flow control for Ethernet devices
- Control checksum offload and other hardware offload features
- Control DMA ring sizes and interrupt moderation
- Control receive queue selection for multiqueue devices
- Upgrade firmware in flash memory
Most features are dependent on support in the specific driver. See the manual page for full information.
Bugs
Bug reports should be sent to the maintainer, Michal Kubecek <[email protected]>
, and to the netdev mailing list <[email protected]>
.
Development
See the development page.
Download
Older versions are available in the Sourceforge 'gkernel' project.
边栏推荐
猜你喜欢
随机推荐
Volcano 资源预留特性
1290_FreeRTOS中prvTaskIsTaskSuspended()接口实现分析
PHY drive commissioning --- mdio/mdc interface Clause 22 and 45 (I)
Empire help
mysql设置触发器问题
基于FPGA的一维卷积神经网络CNN的实现(八)激活层实现
PHY驱动调试之 --- PHY控制器驱动(二)
Inftnews | drink tea and send virtual stocks? Analysis of Naixue's tea "coin issuance"
并发编程 — 死锁排查及处理
[software testing] 02 -- software defect management
Special training of C language array
Energy conservation and creating energy gap
ROS2——node节点(七)
2022.06.27_每日一题
new和malloc的区别
Mipi interface, DVP interface and CSI interface of camera
The difference between NPM install -g/-save/-save-dev
【软件测试】04 -- 软件测试与软件开发
[software testing] 06 -- basic process of software testing
Orin installs CUDA environment