当前位置:网站首页>Five simple ways to troubleshoot with Stace

Five simple ways to troubleshoot with Stace

2022-07-07 17:58:00 -Flying crane-

I've always been surprised that few people realize that they can use Strace Everything you do . It has always been one of my first debugging tools , Because it is usually used by me Lunux Available on the system , And it can be used to solve such a wide range of problems .

strace What is it? ?

Strace It is a simple tool for tracking system calls of executable programs . In its simplest form , It can track a binary executable from beginning to end , And output a line of text , It contains system calls , Parameters and the return value of each system call in each process life cycle . And it can do more :

  • It can filter based on a specified system call or a set of system calls .
  • It can count the number of times a particular system call is used 、 The time spent and the number of successes and errors to analyze the use of system calls .
  • It tracks the signals sent to the process .
  • It can go through pid Attach to any running process .

If you have used other Unix System , This is similar to “truss". the other one ( More comprehensive ) yes Sun Of Dtrace.

How to use it ?

Only superficial research is done here , Importance comes in no order .

1. Find out which configuration file was read when the program started

Have you ever tried to figure out why some programs don't read the configuration files you think should be read ? Have to solve custom compiled or specific release versions from what you think “ error ” The location of the read configuration table ? The simple way :

$ strace php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4
lstat64("/usr/local/lib/php.ini", {
    st_mode=S_IFLNK|0777, st_size=27, ...}) = 0
readlink("/usr/local/lib/php.ini", "/usr/local/Zend/etc/php.ini", 4096) = 27
lstat64("/usr/local/Zend/etc/php.ini", {
    st_mode=S_IFREG|0664, st_size=40971, ...}) = 0

So this version of PHP from /usr/local/lib/php.ini Read php.ini( But it first tries /usr/local/bin). If I only care about specific system calls , Then the more complicated method :

$ strace -e open php 2>&1 | grep php.ini
open("/usr/local/bin/php.ini", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/php.ini", O_RDONLY) = 4

This same method can be applied to many other situations . Install multiple versions in different paths , And want to know exactly which version is loaded ? wait .

2. Why does this program not open my file ?

I once encountered a program , It silently refuses to read files that it does not have read permission , But you only find out after swearing for a long time , Because you once thought it didn't actually find this file . ok , You already know what to do :

$ strace -e open,access 2>&1 | grep your-filename

Find a open() or access() Failure of system call .

3. What is the process doing now ?

Once a process suddenly monopolized a large number CPU Do you ? Or a process seems to hang ? Then you find pid, And then do it :

[email protected]:~# strace -p 15427
Process 15427 attached - interrupt to quit
futex(0x402f4900, FUTEX_WAIT, 2, NULL 
Process 15427 detached

ah . So in this case , It hangs on a call futex() in . By the way , under these circumstances , It doesn't tell us more —— Hang on futex It can be caused by many things ( Pay special effect is a kind of in linux The locking mechanism in the kernel ). The above is from normal work but idle Apache The subprocess is waiting to be processed by a request . however "trace -p" Very useful , Because it eliminates many guesses , And eliminate the need to restart the application to get more print logs ( Or recompile it ).

4. Where is the time spent ?

You can always recompile your application with analysis enabled , And get accurate information , Especially about what parts of your own code will spend time on what you should do . But usually it can quickly strace Attach to the process to see where it currently spends time , In particular, diagnosing problems is very useful . yes 90% Of CPU The usage rate is because it is actually doing real work , Or because something got out of control . This is what you have to do :

[email protected]:~# strace -c -p 11084
Process 11084 attached - interrupt to quit
Process 11084 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.59    0.001014          48        21           select
  2.89    0.000031           1        21           getppid
  2.52    0.000027           1        21           time
------ ----------- ----------- --------- --------- ----------------
100.00    0.001072                    63           total
[email protected]:~# 

Use -c -p start-up strace after , You just have to wait , And then use ctrl-c sign out .Strace Performance analysis data will be sent . under these circumstances , It is a free Postgres“postmaster” process , Most of its time is spent select() Waiting quietly . under these circumstances , It's in each select() Call between calls getppid() and time(), This is a fairly standard event cycle . You can also use “ls” function “ From a to Z ”:

[email protected]:~# strace -c >/dev/null ls
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 23.62    0.000205         103         2           getdents64
 18.78    0.000163          15        11         1 open
 15.09    0.000131          19         7           read
 12.79    0.000111           7        16           old_mmap
  7.03    0.000061           6        11           close
  4.84    0.000042          11         4           munmap
  4.84    0.000042          11         4           mmap2
  4.03    0.000035           6         6         6 access
  3.80    0.000033           3        11           fstat64
  1.38    0.000012           3         4           brk
  0.92    0.000008           3         3         3 ioctl
  0.69    0.000006           6         1           uname
  0.58    0.000005           5         1           set_thread_area
  0.35    0.000003           3         1           write
  0.35    0.000003           3         1           rt_sigaction
  0.35    0.000003           3         1           fcntl64
  0.23    0.000002           2         1           getrlimit
  0.23    0.000002           2         1           set_tid_address
  0.12    0.000001           1         1           rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00    0.000868                    87        10 total

As you expected , It spends most of its time in two calls to read the directory ( Only twice , Because it runs on a small directory ).

5. Why my *** Unable to connect to the server ?

It is frustrating to debug why some processes fail to connect to the remote server .DNS May fail , The connection may hang , The server may send back something that doesn't meet expectations, and so on . You can use tcpdump To analyze so many such situations , This is also a very good tool , But a lot of times strace It will make you more convenient , Just because it will only return the relevant data of the system call initiated by your process . for example , If you try to figure out one of the hundreds of running processes connected to the same database server ( Choose the right connection among them tcpdump It's a nightmare ),strace Make life easier . This is a port 80 Connect to www.news.com Of "nc" Tracking example , No problem :

$ strace -e poll,select,connect,recvfrom,sendto nc www.news.com 80
sendto(3, "\\24\\0\\0\\0\\26\\0\\1\\3\\255\\373NH\\0\\0\\0\\0\\0\\0\\0\\0", 20, 0, {
    sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
connect(3, {
    sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
    sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(3, {
    sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
    fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\213\\321\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
    fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\213\\321\\201\\200\\0\\1\\0\\1\\0\\1\\0\\0\\3www\\4news\\3com\\0\\0\\34\\0\\1\\300\\f"..., 1024, 0, {
    sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 153
connect(3, {
    sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
    fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "k\\374\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
    fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "k\\374\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
    sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
    sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, 28) = 0
poll([{
    fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
sendto(3, "\\\\\\2\\1\\0\\0\\1\\0\\0\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1", 30, MSG_NOSIGNAL, NULL, 0) = 30
poll([{
    fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
recvfrom(3, "\\\\\\2\\201\\200\\0\\1\\0\\2\\0\\0\\0\\0\\3www\\4news\\3com\\0\\0\\1\\0\\1\\300\\f"..., 1024, 0, {
    sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("62.30.112.39")}, [16]) = 106
connect(3, {
    sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("216.239.122.102")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, NULL, [3], NULL, NULL)        = 1 (out [3])

So what will happen here ? Notice and /var/run/nscd/socket Have you tried the connection of ? They mean nc First try connecting to NSCD - Name service cache daemon - Usually used to rely on NIS、YP、LDAP Or similar directory protocol for name lookup . under these circumstances , The connection fails . Then it moves to DNS(DNS Is the port 53, Therefore, in the following connection is “sin_port=htons(53)”. You can see it and execute “sendto()” call , Send a message containing www.news Of DNS Data packets .com. Then it reads back a packet . For whatever reason , It tried three times , The last request was slightly different . My best guess is why in this case www.news.com It's a CNAME( One “ Alias ”), And multiple requests may just nc How to deal with its artifacts . Last , It finally found it IP issue connect(). Notice that it returns EINPROGRESS. This means that the connection is non blocking - nc Want to continue processing . And then it calls select(), When the connection is successful . Try to “read” and “write” Add to strace In the system call list of , And input a string when connecting , You'll get something like this :

read(0, "test\\n", 1024)                 = 5
write(3, "test\\n", 5)                   = 5
poll([{
    fd=3, events=POLLIN, revents=POLLIN}, {
    fd=0, events=POLLIN}], 2, -1) = 1
read(3, "\"-//IETF//"..., 1024) = 216
write(1, "\"-//IETF//"..., 216) = 216

This indicates that it reads from standard input “ test ”+ Line break , And write it back to the network connection , And then call poll() Wait for a response , Read the reply from the network connection and write it to the standard output . Everything seems normal

原网站

版权声明
本文为[-Flying crane-]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071522592054.html