writing / The system operational SIG
stay
《AK47 invincible , All memory leaks are wiped out 》
In the article , We shared slab Methods and tools for troubleshooting memory leaks , This time we share a more secretive and difficult to check " Memory leak " Case study .
One 、 Problem phenomenon
The customer receives the system alarm ,K8S Cluster some nodes used Memory keeps rising ,top The memory used by the viewing process is not much , The user who has insufficient remaining memory but cannot find memory , Mysterious disappearance of memory , You need to check where the memory goes .
perform top Instructions and sort the output by memory , The processes that use the most memory are 800M about , It doesn't add up to used 9G Usage of .
Two 、 Problem analysis
2.1 Where is the memory ?
Before analyzing specific problems , Let's first classify the system memory , It is easy to find places where memory usage is abnormal , From the nature of memory usage , Memory can be simply divided into application memory and kernel memory , Two kinds of memory usage plus free memory , It should be close to memory total, This distinction can quickly locate the boundary of the problem .
among allocpage Finger pass __get_free_pages/alloc_pages etc. API The amount of memory requested by the interface directly from the partner system ( It doesn't contain slab and vmalloc).
2.1.1 Memory analysis
Calculate the application memory and kernel memory respectively according to the memory map , You can know which part has exceptions , But the calculation of these indicators is cumbersome , Many memory values still overlap . For this pain point ,SysOM The memory disk function of the operation and maintenance platform shows the memory usage in a visual way , And directly give whether there is a memory leak , In this case , Use SysOM testing , Direct display allocpage There is a leak , The usage is close to 6G.
2.1.2 allocpage Memory
Since it is alloc page Type takes up too much memory , Can I directly from sysfs、procfs Check the memory usage of the file node ? unfortunately , This part of memory is the kernel / The driver directly calls __get_free_page/alloc_pages Function to apply for single or multiple consecutive pages from partner system , There is no interface at the system level to query the memory usage details . If there is a leak in this kind of memory , Will appear " Memory disappears out of thin air " The phenomenon of , It's hard to find out , The cause of the problem is also difficult to investigate . For this difficulty , our SysOM System operation and maintenance can cover such memory statistics and cause diagnosis .
So it needs to be further passed SysOM Diagnostic tool SysAK Dynamically grab the usage of this kind of memory .
2.2 allocPage Type memory troubleshooting
2.2.1 Dynamic diagnosis
For kernel memory leaks , We can use SysAK Tools to dynamically track , Start the command and wait 10 minute .
sysak memleak -t page -i 600
The diagnosis showed 10 Within minutes receive_mergeable The memory allocated by the function is 4919 Time did not release , The memory size is 300M about , So that's the analysis , We need to combine the code to confirm receive_mergeable Whether the memory allocation and release logic of the function is correct .
2.2.2 Distribution and release summary
1)page_to_skb Each time, a linear data area will be allocated as 128 Byte Of skb.
2) Data area call alloc_pages_node function , Apply from the partner system at one time 32k Memory (order=3).
3) Every skb Would be right 32k Of head page Generate a reference count , That is, only when all skb When both are released , this 32k Memory is released back to the partner system .
4)receive_mergeable The function is responsible for applying for memory , But I am not responsible for releasing this part of memory , Only when the application is from socket recvQ Read the data away in head page The reference count is subtracted by one , When page refs by 0 when , Release back to the partner system .
When applying consumption data is slow , May lead to receive_mergeable The memory requested by the function is not released in time , And the worst case is one skb Will occupy 32k Memory , Use sysak skcheck Check socket Residual condition of receiving queue and sending queue .
You can tell from the output that , There's only... In the system nginx The receiving queue of the process has residual data ,socket fd=11 Of Recv-Q Be close to 3M The data of is not received , By direct kill 146935, The system memory is back to normal , So the root cause of the problem is nginx The data was not collected in time .
3、 ... and 、 The conclusion of the question
After communicating with the business party , The final confirmation is the business configuration , Lead to nginx There is a thread that does not process data , As a result, the memory applied by the network card driver is not released in time , and allocpage Memory cannot be counted , Thus, the memory disappears out of thin air .
Conclusion verification
Is there really data left in the receiving queue , This combination crash The tool files Command passed fd Find the corresponding sock:
socket = file->private_data
sock = socket->sk
Through many observations , Find out sk_receive_queue Upper skb It hasn't changed for a long time , And that proves it nginx Failed to handle the... On the receiving queue in time skb, As a result, the memory allocated in the network card driver is not released .
Four 、 Memory leak suspect
In the process of troubleshooting, I also encountered a very confused place ,sockstat and slabtop Check tcp mem and skbuff_head_cache It is normal to use , This further masks the memory occupied by the network .
tcp mem = 32204*4K=125M
skb Quantity in 1.5 ten thousand ~3 Between ten thousand .
According to the previous analysis , One skb In the worst case, it takes 32k Memory , that 2 m skb The largest is 600M about , How can it take up a few G 了 , Is there a problem with the analysis ? As shown in the figure below ,skb There may be several nonlinear regions of frag page, And each frag page It may also be caused by compund page form .
use crash Actually read skb Memory discovery , There are some skb There is 17 individual frag page, And the data size is only 10 Byte.
analysis frag page Of order by 3, It means a frag page Occupy 32k Memory .
In extreme cases , One skb May occupy (1+17)
8=144 page , Upper figure slabinfo in skbuff_head_cache active object The number of 15033 individual , So the theoretical maximum total memory =144
15033*4K = 8.2G, And now we encounter scenario consumption 6G It is entirely possible to have a memory of .
—— End ——
Join the dragon lizard community
Join wechat group : Add a community assistant - Dragon lizard community Little Dragon ( WeChat :openanolis_assis), remarks 【 japalura 】 Be with you ; Join the nail group : Scan the QR code of the nail group below . Welcome to developers / Users join the dragon lizard community (OpenAnolis) communication , Jointly promote the development of dragon lizard community , Create an active 、 Healthy open source operating system ecosystem !
About the dragon lizard community
Dragon lizard community (OpenAnolis) By enterprises and institutions 、 Institutions of higher learning 、 Scientific research institutions 、 nonprofit organization 、 Individuals are waiting voluntarily 、 equality 、 Open source 、 A non-profit open source community based on collaboration . Dragon lizard community was founded in 2020 year 9 month , Designed to build an open source 、 neutral 、 Open Linux Upstream distribution community and innovation platform .
The short-term goal of the dragon lizard community is to develop the dragon lizard operating system (Anolis OS) As CentOS Countermeasures after stopping service , Build a compatible international network Linux Community distribution of mainstream manufacturers . The medium and long-term goal is to explore and build a future oriented operating system , Establish a unified open source operating system ecosystem , Incubate innovative open source projects , Prosper the open source ecosystem .
at present ,
The published , More dragon lizard self-developed characteristics , Support X86_64 、RISC-V、Arm64、LoongArch framework , Perfect fit Intel、 Megacell 、 Kun Peng 、 Godson and other chips , And provide national secret support of the whole stack .
Welcome to download :https://openanolis.cn/download
Join us , Work together to build an open source operating system for the future !https://openanolis.cn
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071347276447.html