当前位置:网站首页>Five simple ways to troubleshoot with Stace
Five simple ways to troubleshoot with Stace
2022-07-07 17:58:00 【-Flying crane-】
I've always been surprised that few people realize that they can use Strace Everything you do . It has always been one of my first debugging tools , Because it is usually used by me Lunux Available on the system , And it can be used to solve such a wide range of problems .
strace What is it? ?
Strace It is a simple tool for tracking system calls of executable programs . In its simplest form , It can track a binary executable from beginning to end , And output a line of text , It contains system calls , Parameters and the return value of each system call in each process life cycle . And it can do more :
- It can filter based on a specified system call or a set of system calls .
- It can count the number of times a particular system call is used 、 The time spent and the number of successes and errors to analyze the use of system calls .
- It tracks the signals sent to the process .
- It can go through pid Attach to any running process .
If you have used other Unix System , This is similar to “truss". the other one ( More comprehensive ) yes Sun Of Dtrace.
How to use it ?
Only superficial research is done here , Importance comes in no order .
1. Find out which configuration file was read when the program started
Have you ever tried to figure out why some programs don't read the configuration files you think should be read ? Have to solve custom compiled or specific release versions from what you think “ error ” The location of the read configuration table ? The simple way :
$ strace php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
lstat64("/usr/local/lib/php.ini", {
st_mode=S_IFLNK|0777, st_size=27, ...}) = 0
readlink("/usr/local/lib/php.ini", "/usr/local/Zend/etc/php.ini", 4096) = 27
lstat64("/usr/local/Zend/etc/php.ini", {
st_mode=S_IFREG|0664, st_size=40971, ...}) = 0
So this version of PHP from /usr/local/lib/php.ini Read php.ini( But it first tries /usr/local/bin). If I only care about specific system calls , Then the more complicated method :
$ strace -e open php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
This same method can be applied to many other situations . Install multiple versions in different paths , And want to know exactly which version is loaded ? wait .
2. Why does this program not open my file ?
I once encountered a program , It silently refuses to read files that it does not have read permission , But you only find out after swearing for a long time , Because you once thought it didn't actually find this file . ok , You already know what to do :
$ strace -e open,access 2>&1 | grep your-filename
Find a open() or access() Failure of system call .
3. What is the process doing now ?
Once a process suddenly monopolized a large number CPU Do you ? Or a process seems to hang ? Then you find pid, And then do it :
[email protected]:~# strace -p 15427
Process 15427 attached - interrupt to quit
futex(0x402f4900, FUTEX_WAIT, 2, NULL
Process 15427 detached
ah . So in this case , It hangs on a call futex() in . By the way , under these circumstances , It doesn't tell us more —— Hang on futex It can be caused by many things ( Pay special effect is a kind of in linux The locking mechanism in the kernel ). The above is from normal work but idle Apache The subprocess is waiting to be processed by a request . however "trace -p" Very useful , Because it eliminates many guesses , And eliminate the need to restart the application to get more print logs ( Or recompile it ).
4. Where is the time spent ?
You can always recompile your application with analysis enabled , And get accurate information , Especially about what parts of your own code will spend time on what you should do . But usually it can quickly strace Attach to the process to see where it currently spends time , In particular, diagnosing problems is very useful . yes 90% Of CPU The usage rate is because it is actually doing real work , Or because something got out of control . This is what you have to do :
[email protected]:~# strace -c -p 11084
Process 11084 attached - interrupt to quit
Process 11084 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
94.59 0.001014 48 21 select
2.89 0.000031 1 21 getppid
2.52 0.000027 1 21 time
------ ----------- ----------- --------- --------- ----------------
100.00 0.001072 63 total
[email protected]:~#
Use -c -p start-up strace after , You just have to wait , And then use ctrl-c sign out .Strace Performance analysis data will be sent . under these circumstances , It is a free Postgres“postmaster” process , Most of its time is spent select() Waiting quietly . under these circumstances , It's in each select() Call between calls getppid() and time(), This is a fairly standard event cycle . You can also use “ls” function “ From a to Z ”:
[email protected]:~# strace -c >/dev/null ls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
23.62 0.000205 103 2 getdents64
18.78 0.000163 15 11 1 open
15.09 0.000131 19 7 read
12.79 0.000111 7 16 old_mmap
7.03 0.000061 6 11 close
4.84 0.000042 11 4 munmap
4.84 0.000042 11 4 mmap2
4.03 0.000035 6 6 6 access
3.80 0.000033 3 11 fstat64
1.38 0.000012 3 4 brk
0.92 0.000008 3 3 3 ioctl
0.69 0.000006 6 1 uname
0.58 0.000005 5 1 set_thread_area
0.35 0.000003 3 1 write
0.35 0.000003 3 1 rt_sigaction
0.35 0.000003 3 1 fcntl64
0.23 0.000002 2 1 getrlimit
0.23 0.000002 2 1 set_tid_address
0.12 0.000001 1 1 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 0.000868 87 10 total
As you expected , It spends most of its time in two calls to read the directory ( Only twice , Because it runs on a small directory ).
5. Why my *** Unable to connect to the server ?
It is frustrating to debug why some processes fail to connect to the remote server .DNS May fail , The connection may hang , The server may send back something that doesn't meet expectations, and so on . You can use tcpdump To analyze so many such situations , This is also a very good tool , But a lot of times strace It will make you more convenient , Just because it will only return the relevant data of the system call initiated by your process . for example , If you try to figure out one of the hundreds of running processes connected to the same database server ( Choose the right connection among them tcpdump It's a nightmare ),strace Make life easier . This is a port 80 Connect to www.news.com Of "nc" Tracking example , No problem :
$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, "\\24\\0\\0\\0\\26\\0\\1\\3\\255\\373NH\\0\\0\\0\\0\\0\\0\\0\\0", 20, 0, {
sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {
sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\213\\321\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\213\\321\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 153
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "k\\374\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "k\\374\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\\\\\2\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\\\\\2\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("216.239.122.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL) = 1 (out [3])
So what will happen here ? Notice and /var/run/nscd/socket Have you tried the connection of ? They mean nc First try connecting to NSCD - Name service cache daemon - Usually used to rely on NIS、YP、LDAP Or similar directory protocol for name lookup . under these circumstances , The connection fails . Then it moves to DNS(DNS Is the port 53, Therefore, in the following connection is “sin_port=htons(53)”. You can see it and execute “sendto()” call , Send a message containing www.news Of DNS Data packets .com. Then it reads back a packet . For whatever reason , It tried three times , The last request was slightly different . My best guess is why in this case www.news.com It's a CNAME( One “ Alias ”), And multiple requests may just nc How to deal with its artifacts . Last , It finally found it IP issue connect(). Notice that it returns EINPROGRESS. This means that the connection is non blocking - nc Want to continue processing . And then it calls select(), When the connection is successful . Try to “read” and “write” Add to strace In the system call list of , And input a string when connecting , You'll get something like this :
read(0, "test\\n", 1024) = 5
write(3, "test\\n", 5) = 5
poll([{
fd=3, events=POLLIN, revents=POLLIN}, {
fd=0, events=POLLIN}], 2, -1) = 1
read(3, "\"-//IETF//"..., 1024) = 216
write(1, "\"-//IETF//"..., 216) = 216
This indicates that it reads from standard input “ test ”+ Line break , And write it back to the network connection , And then call poll() Wait for a response , Read the reply from the network connection and write it to the standard output . Everything seems normal
边栏推荐
- SD_DATA_RECEIVE_SHIFT_REGISTER
- 深入浅出图解CNN-卷积神经网络
- MySQL index hit level analysis
- 手机app外卖订餐个人中心页面
- Create dialog style windows with popupwindow
- What skills can you master to be a "master tester" when doing software testing?
- js拉下帷幕js特效显示层
- 自动化测试:Robot FrameWork框架大家都想知道的实用技巧
- How to implement safety practice in software development stage
- 第2章搭建CRM项目开发环境(搭建开发环境)
猜你喜欢
第3章业务功能开发(安全退出)
Please insert the disk into "U disk (H)" & unable to access the disk structure is damaged and cannot be read
漫画 | 宇宙第一 IDE 到底是谁?
线上比赛相关规则补充说明
基于百度飞浆平台(EasyDL)设计的人脸识别考勤系统
Automated testing: a practical skill that everyone wants to know about robot framework
做软件测试 掌握哪些技术才能算作 “ 测试高手 ”?
Functions and usage of serachview
Create dialog style windows with popupwindow
YARN Capacity Scheduler容量调度器(超详细解读)
随机推荐
【重新理解通信模型】Reactor 模式在 Redis 和 Kafka 中的应用
[OKR target management] value analysis
Taffydb open source JS database
Simple loading animation
VSCode关于C语言的3个配置文件
【4500字归纳总结】一名软件测试工程师需要掌握的技能大全
【网络攻防原理与技术】第3章:网络侦察技术
使用popupwindow創建对话框风格的窗口
DatePickerDialog和trimepickerDialog
深度学习机器学习各种数据集汇总地址
yolo训练过程中批量导入requirments.txt中所需要的包
使用 xml资源文件定义菜单
手机版像素小鸟游js戏代码
Numberpick的功能和用法
Mobile pixel bird game JS play code
Based on pytorch, we use CNN to classify our own data sets
第3章业务功能开发(用户访问项目)
Dateticket and timeticket, functions and usage of date and time selectors
Function and usage of numberpick
viewflipper的功能和用法