当前位置:网站首页>Five simple ways to troubleshoot with Stace
Five simple ways to troubleshoot with Stace
2022-07-07 17:58:00 【-Flying crane-】
I've always been surprised that few people realize that they can use Strace Everything you do . It has always been one of my first debugging tools , Because it is usually used by me Lunux Available on the system , And it can be used to solve such a wide range of problems .
strace What is it? ?
Strace It is a simple tool for tracking system calls of executable programs . In its simplest form , It can track a binary executable from beginning to end , And output a line of text , It contains system calls , Parameters and the return value of each system call in each process life cycle . And it can do more :
- It can filter based on a specified system call or a set of system calls .
- It can count the number of times a particular system call is used 、 The time spent and the number of successes and errors to analyze the use of system calls .
- It tracks the signals sent to the process .
- It can go through pid Attach to any running process .
If you have used other Unix System , This is similar to “truss". the other one ( More comprehensive ) yes Sun Of Dtrace.
How to use it ?
Only superficial research is done here , Importance comes in no order .
1. Find out which configuration file was read when the program started
Have you ever tried to figure out why some programs don't read the configuration files you think should be read ? Have to solve custom compiled or specific release versions from what you think “ error ” The location of the read configuration table ? The simple way :
$ strace php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
lstat64("/usr/local/lib/php.ini", {
st_mode=S_IFLNK|0777, st_size=27, ...}) = 0
readlink("/usr/local/lib/php.ini", "/usr/local/Zend/etc/php.ini", 4096) = 27
lstat64("/usr/local/Zend/etc/php.ini", {
st_mode=S_IFREG|0664, st_size=40971, ...}) = 0
So this version of PHP from /usr/local/lib/php.ini Read php.ini( But it first tries /usr/local/bin). If I only care about specific system calls , Then the more complicated method :
$ strace -e open php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
This same method can be applied to many other situations . Install multiple versions in different paths , And want to know exactly which version is loaded ? wait .
2. Why does this program not open my file ?
I once encountered a program , It silently refuses to read files that it does not have read permission , But you only find out after swearing for a long time , Because you once thought it didn't actually find this file . ok , You already know what to do :
$ strace -e open,access 2>&1 | grep your-filename
Find a open() or access() Failure of system call .
3. What is the process doing now ?
Once a process suddenly monopolized a large number CPU Do you ? Or a process seems to hang ? Then you find pid, And then do it :
[email protected]:~# strace -p 15427
Process 15427 attached - interrupt to quit
futex(0x402f4900, FUTEX_WAIT, 2, NULL
Process 15427 detached
ah . So in this case , It hangs on a call futex() in . By the way , under these circumstances , It doesn't tell us more —— Hang on futex It can be caused by many things ( Pay special effect is a kind of in linux The locking mechanism in the kernel ). The above is from normal work but idle Apache The subprocess is waiting to be processed by a request . however "trace -p" Very useful , Because it eliminates many guesses , And eliminate the need to restart the application to get more print logs ( Or recompile it ).
4. Where is the time spent ?
You can always recompile your application with analysis enabled , And get accurate information , Especially about what parts of your own code will spend time on what you should do . But usually it can quickly strace Attach to the process to see where it currently spends time , In particular, diagnosing problems is very useful . yes 90% Of CPU The usage rate is because it is actually doing real work , Or because something got out of control . This is what you have to do :
[email protected]:~# strace -c -p 11084
Process 11084 attached - interrupt to quit
Process 11084 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
94.59 0.001014 48 21 select
2.89 0.000031 1 21 getppid
2.52 0.000027 1 21 time
------ ----------- ----------- --------- --------- ----------------
100.00 0.001072 63 total
[email protected]:~#
Use -c -p start-up strace after , You just have to wait , And then use ctrl-c sign out .Strace Performance analysis data will be sent . under these circumstances , It is a free Postgres“postmaster” process , Most of its time is spent select() Waiting quietly . under these circumstances , It's in each select() Call between calls getppid() and time(), This is a fairly standard event cycle . You can also use “ls” function “ From a to Z ”:
[email protected]:~# strace -c >/dev/null ls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
23.62 0.000205 103 2 getdents64
18.78 0.000163 15 11 1 open
15.09 0.000131 19 7 read
12.79 0.000111 7 16 old_mmap
7.03 0.000061 6 11 close
4.84 0.000042 11 4 munmap
4.84 0.000042 11 4 mmap2
4.03 0.000035 6 6 6 access
3.80 0.000033 3 11 fstat64
1.38 0.000012 3 4 brk
0.92 0.000008 3 3 3 ioctl
0.69 0.000006 6 1 uname
0.58 0.000005 5 1 set_thread_area
0.35 0.000003 3 1 write
0.35 0.000003 3 1 rt_sigaction
0.35 0.000003 3 1 fcntl64
0.23 0.000002 2 1 getrlimit
0.23 0.000002 2 1 set_tid_address
0.12 0.000001 1 1 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 0.000868 87 10 total
As you expected , It spends most of its time in two calls to read the directory ( Only twice , Because it runs on a small directory ).
5. Why my *** Unable to connect to the server ?
It is frustrating to debug why some processes fail to connect to the remote server .DNS May fail , The connection may hang , The server may send back something that doesn't meet expectations, and so on . You can use tcpdump To analyze so many such situations , This is also a very good tool , But a lot of times strace It will make you more convenient , Just because it will only return the relevant data of the system call initiated by your process . for example , If you try to figure out one of the hundreds of running processes connected to the same database server ( Choose the right connection among them tcpdump It's a nightmare ),strace Make life easier . This is a port 80 Connect to www.news.com Of "nc" Tracking example , No problem :
$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, "\\24\\0\\0\\0\\26\\0\\1\\3\\255\\373NH\\0\\0\\0\\0\\0\\0\\0\\0", 20, 0, {
sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {
sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\213\\321\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\213\\321\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 153
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "k\\374\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "k\\374\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\\\\\2\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\\\\\2\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("216.239.122.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL) = 1 (out [3])
So what will happen here ? Notice and /var/run/nscd/socket Have you tried the connection of ? They mean nc First try connecting to NSCD - Name service cache daemon - Usually used to rely on NIS、YP、LDAP Or similar directory protocol for name lookup . under these circumstances , The connection fails . Then it moves to DNS(DNS Is the port 53, Therefore, in the following connection is “sin_port=htons(53)”. You can see it and execute “sendto()” call , Send a message containing www.news Of DNS Data packets .com. Then it reads back a packet . For whatever reason , It tried three times , The last request was slightly different . My best guess is why in this case www.news.com It's a CNAME( One “ Alias ”), And multiple requests may just nc How to deal with its artifacts . Last , It finally found it IP issue connect(). Notice that it returns EINPROGRESS. This means that the connection is non blocking - nc Want to continue processing . And then it calls select(), When the connection is successful . Try to “read” and “write” Add to strace In the system call list of , And input a string when connecting , You'll get something like this :
read(0, "test\\n", 1024) = 5
write(3, "test\\n", 5) = 5
poll([{
fd=3, events=POLLIN, revents=POLLIN}, {
fd=0, events=POLLIN}], 2, -1) = 1
read(3, "\"-//IETF//"..., 1024) = 216
write(1, "\"-//IETF//"..., 216) = 216
This indicates that it reads from standard input “ test ”+ Line break , And write it back to the network connection , And then call poll() Wait for a response , Read the reply from the network connection and write it to the standard output . Everything seems normal
边栏推荐
- 【4500字归纳总结】一名软件测试工程师需要掌握的技能大全
- Functions and usage of imageswitch
- Based on pytorch, we use CNN to classify our own data sets
- Main work of digital transformation
- js拉下帷幕js特效显示层
- serachview的功能和用法
- [OKR target management] case analysis
- [re understand the communication model] the application of reactor mode in redis and Kafka
- 线上比赛相关规则补充说明
- TaffyDB开源的JS数据库
猜你喜欢

Automated testing: a practical skill that everyone wants to know about robot framework
![[tpm2.0 principle and Application guide] Chapter 1-3](/img/28/7f6e848d5c12d175214d6cc5de7c8b.png)
[tpm2.0 principle and Application guide] Chapter 1-3

alertDialog創建对话框

面试官:页面很卡的原因分析及解决方案?【测试面试题分享】

Test for 3 months, successful entry "byte", my interview experience summary

What skills can you master to be a "master tester" when doing software testing?
![[answer] if the app is in the foreground, the activity will not be recycled?](/img/b7/a749d7220c22f92080b71fd3859b8d.png)
[answer] if the app is in the foreground, the activity will not be recycled?

imageswitcher的功能和用法

科学家首次观察到“电子漩涡” 有助于设计出更高效的电子产品

Native JS verification code
随机推荐
Introduction to OTA technology of Internet of things
【分布式理论】(二)分布式存储
Management by objectives [14 of management]
【OKR目标管理】案例分析
Show progress bar above window
Vscode three configuration files about C language
2021年全国平均工资出炉,你达标了吗?
第2章搭建CRM项目开发环境(数据库设计)
Mysql 索引命中级别分析
textSwitch文本切换器的功能和用法
基于百度飞浆平台(EasyDL)设计的人脸识别考勤系统
Robot engineering lifelong learning and work plan-2022-
SD_DATA_SEND_SHIFT_REGISTER
swiper左右切换滑块插件
Simple loading animation
DatePickerDialog and trimepickerdialog
Pro2:修改div块的颜色
Taffydb open source JS database
【可信计算】第十一次课:TPM密码资源管理(三) NV索引与PCR
深入浅出【机器学习之线性回归】