当前位置:网站首页>Five simple ways to troubleshoot with Stace
Five simple ways to troubleshoot with Stace
2022-07-07 17:58:00 【-Flying crane-】
I've always been surprised that few people realize that they can use Strace Everything you do . It has always been one of my first debugging tools , Because it is usually used by me Lunux Available on the system , And it can be used to solve such a wide range of problems .
strace What is it? ?
Strace It is a simple tool for tracking system calls of executable programs . In its simplest form , It can track a binary executable from beginning to end , And output a line of text , It contains system calls , Parameters and the return value of each system call in each process life cycle . And it can do more :
- It can filter based on a specified system call or a set of system calls .
- It can count the number of times a particular system call is used 、 The time spent and the number of successes and errors to analyze the use of system calls .
- It tracks the signals sent to the process .
- It can go through pid Attach to any running process .
If you have used other Unix System , This is similar to “truss". the other one ( More comprehensive ) yes Sun Of Dtrace.
How to use it ?
Only superficial research is done here , Importance comes in no order .
1. Find out which configuration file was read when the program started
Have you ever tried to figure out why some programs don't read the configuration files you think should be read ? Have to solve custom compiled or specific release versions from what you think “ error ” The location of the read configuration table ? The simple way :
$ strace php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
lstat64("/usr/local/lib/php.ini", {
st_mode=S_IFLNK|0777, st_size=27, ...}) = 0
readlink("/usr/local/lib/php.ini", "/usr/local/Zend/etc/php.ini", 4096) = 27
lstat64("/usr/local/Zend/etc/php.ini", {
st_mode=S_IFREG|0664, st_size=40971, ...}) = 0
So this version of PHP from /usr/local/lib/php.ini Read php.ini( But it first tries /usr/local/bin). If I only care about specific system calls , Then the more complicated method :
$ strace -e open php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
This same method can be applied to many other situations . Install multiple versions in different paths , And want to know exactly which version is loaded ? wait .
2. Why does this program not open my file ?
I once encountered a program , It silently refuses to read files that it does not have read permission , But you only find out after swearing for a long time , Because you once thought it didn't actually find this file . ok , You already know what to do :
$ strace -e open,access 2>&1 | grep your-filename
Find a open() or access() Failure of system call .
3. What is the process doing now ?
Once a process suddenly monopolized a large number CPU Do you ? Or a process seems to hang ? Then you find pid, And then do it :
[email protected]:~# strace -p 15427
Process 15427 attached - interrupt to quit
futex(0x402f4900, FUTEX_WAIT, 2, NULL
Process 15427 detached
ah . So in this case , It hangs on a call futex() in . By the way , under these circumstances , It doesn't tell us more —— Hang on futex It can be caused by many things ( Pay special effect is a kind of in linux The locking mechanism in the kernel ). The above is from normal work but idle Apache The subprocess is waiting to be processed by a request . however "trace -p" Very useful , Because it eliminates many guesses , And eliminate the need to restart the application to get more print logs ( Or recompile it ).
4. Where is the time spent ?
You can always recompile your application with analysis enabled , And get accurate information , Especially about what parts of your own code will spend time on what you should do . But usually it can quickly strace Attach to the process to see where it currently spends time , In particular, diagnosing problems is very useful . yes 90% Of CPU The usage rate is because it is actually doing real work , Or because something got out of control . This is what you have to do :
[email protected]:~# strace -c -p 11084
Process 11084 attached - interrupt to quit
Process 11084 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
94.59 0.001014 48 21 select
2.89 0.000031 1 21 getppid
2.52 0.000027 1 21 time
------ ----------- ----------- --------- --------- ----------------
100.00 0.001072 63 total
[email protected]:~#
Use -c -p start-up strace after , You just have to wait , And then use ctrl-c sign out .Strace Performance analysis data will be sent . under these circumstances , It is a free Postgres“postmaster” process , Most of its time is spent select() Waiting quietly . under these circumstances , It's in each select() Call between calls getppid() and time(), This is a fairly standard event cycle . You can also use “ls” function “ From a to Z ”:
[email protected]:~# strace -c >/dev/null ls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
23.62 0.000205 103 2 getdents64
18.78 0.000163 15 11 1 open
15.09 0.000131 19 7 read
12.79 0.000111 7 16 old_mmap
7.03 0.000061 6 11 close
4.84 0.000042 11 4 munmap
4.84 0.000042 11 4 mmap2
4.03 0.000035 6 6 6 access
3.80 0.000033 3 11 fstat64
1.38 0.000012 3 4 brk
0.92 0.000008 3 3 3 ioctl
0.69 0.000006 6 1 uname
0.58 0.000005 5 1 set_thread_area
0.35 0.000003 3 1 write
0.35 0.000003 3 1 rt_sigaction
0.35 0.000003 3 1 fcntl64
0.23 0.000002 2 1 getrlimit
0.23 0.000002 2 1 set_tid_address
0.12 0.000001 1 1 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 0.000868 87 10 total
As you expected , It spends most of its time in two calls to read the directory ( Only twice , Because it runs on a small directory ).
5. Why my *** Unable to connect to the server ?
It is frustrating to debug why some processes fail to connect to the remote server .DNS May fail , The connection may hang , The server may send back something that doesn't meet expectations, and so on . You can use tcpdump To analyze so many such situations , This is also a very good tool , But a lot of times strace It will make you more convenient , Just because it will only return the relevant data of the system call initiated by your process . for example , If you try to figure out one of the hundreds of running processes connected to the same database server ( Choose the right connection among them tcpdump It's a nightmare ),strace Make life easier . This is a port 80 Connect to www.news.com Of "nc" Tracking example , No problem :
$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, "\\24\\0\\0\\0\\26\\0\\1\\3\\255\\373NH\\0\\0\\0\\0\\0\\0\\0\\0", 20, 0, {
sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {
sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\213\\321\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\213\\321\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 153
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "k\\374\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "k\\374\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\\\\\2\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\\\\\2\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("216.239.122.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL) = 1 (out [3])
So what will happen here ? Notice and /var/run/nscd/socket Have you tried the connection of ? They mean nc First try connecting to NSCD - Name service cache daemon - Usually used to rely on NIS、YP、LDAP Or similar directory protocol for name lookup . under these circumstances , The connection fails . Then it moves to DNS(DNS Is the port 53, Therefore, in the following connection is “sin_port=htons(53)”. You can see it and execute “sendto()” call , Send a message containing www.news Of DNS Data packets .com. Then it reads back a packet . For whatever reason , It tried three times , The last request was slightly different . My best guess is why in this case www.news.com It's a CNAME( One “ Alias ”), And multiple requests may just nc How to deal with its artifacts . Last , It finally found it IP issue connect(). Notice that it returns EINPROGRESS. This means that the connection is non blocking - nc Want to continue processing . And then it calls select(), When the connection is successful . Try to “read” and “write” Add to strace In the system call list of , And input a string when connecting , You'll get something like this :
read(0, "test\\n", 1024) = 5
write(3, "test\\n", 5) = 5
poll([{
fd=3, events=POLLIN, revents=POLLIN}, {
fd=0, events=POLLIN}], 2, -1) = 1
read(3, "\"-//IETF//"..., 1024) = 216
write(1, "\"-//IETF//"..., 216) = 216
This indicates that it reads from standard input “ test ”+ Line break , And write it back to the network connection , And then call poll() Wait for a response , Read the reply from the network connection and write it to the standard output . Everything seems normal
边栏推荐
- Machine vision (1) - Overview
- Youth experience and career development
- VSCode关于C语言的3个配置文件
- 原生js验证码
- Easy to understand [linear regression of machine learning]
- 【分布式理论】(二)分布式存储
- Define menus using XML resource files
- 请将磁盘插入“U盘(H)“的情况&无法访问 磁盘结构损坏且无法读取
- List selection JS effect with animation
- Alertdialog create dialog
猜你喜欢
科学家首次观察到“电子漩涡” 有助于设计出更高效的电子产品
Vscode three configuration files about C language
Dragging the custom style of Baidu map to the right makes the global map longitude 0 unable to be displayed normally
2021-06-28
【深度学习】3分钟入门
Self made dataset in pytoch for dataset rewriting
Numberpick的功能和用法
面试官:页面很卡的原因分析及解决方案?【测试面试题分享】
serachview的功能和用法
Target detection 1 -- actual operation of Yolo data annotation and script for converting XML to TXT file
随机推荐
Mui side navigation anchor positioning JS special effect
[OKR target management] value analysis
Functions and usage of viewflipper
textSwitch文本切换器的功能和用法
Self made dataset in pytoch for dataset rewriting
[deep learning] 3 minutes introduction
面试官:页面很卡的原因分析及解决方案?【测试面试题分享】
What skills can you master to be a "master tester" when doing software testing?
【分布式理论】(二)分布式存储
【可信计算】第十二次课:TPM授权与会话
mui侧边导航锚点定位js特效
Simple loading animation
使用Stace排除故障的5种简单方法
Youth experience and career development
线上比赛相关规则补充说明
使用popupwindow創建对话框风格的窗口
深入浅出图解CNN-卷积神经网络
Functions and usage of imageswitch
Understanding of 12 methods of enterprise management
[distributed theory] (II) distributed storage