strace - trace system calls and signals
strace [ -dffhiqrtttTvxx ] [ -acolumn ] [ -eexpr ] ... [ -ofile ] [ -ppid ] ... [ -sstrsize ] [ -uusername ] [ -Evar=val ] ... [
-Evar ] ... [ command [ arg ... ] ]
strace -c [ -eexpr ] ... [ -Ooverhead ] [ -Ssortby ] [ command [ arg ... ] ]
In the simplest case strace runs the specified command until it exits. It intercepts and records the system calls which are called by a process and the signals which are received by a process. The name of each system call, its arguments and its return value are printed on standard error or to the file specified with the -o option.
So as the above man page excerpt suggests, this article is going to be about strace, how you can utilize it, and when it can used. Before proceeding with any additional information, the best thing to do at this point is to simply use strace, get a feel for the output, and start analyzing its output so that you understand the information that is being printed to the screen. Here is an easy example to run a strace on the parent apache process on a CentOS installation:
strace -p´cat /var/run/httpd.pid´
Assuming this is the first strace command you have run, lets take a moment and analyze it. The first and most obvious would be the command itself, followed by the "-p" switch. The "-p" switch tell's strace that you want to trace a processs id. In this case we are getting the process id from a lock file, however this can also be manually typed in such as:
Now you may or may not immediately see output. If the process is in a "sleeping" or "waiting" status, waiting to be utilized there may not be any data printed to the screen in which case you may need to wait a moment, or verify the process id you are trying to attach to.
To detach press CTRL+C
So by now you should see that strace can provide a wealth of data about what a process is currently doing. Unfortunately there is no "easy" way to jump into using strace due to the massive amount of data it can provide, so this section will provide you with some examples which you can re-create on your own server to demonstrate how it can be used.
For this example we are simply using apache. We will know why its broken, and a normal apache restart would also tell you why it is broken, however this example is to demonstrate how you could utilize strace to identify a similar problem, but on software not specificly pin pointing the problem, or providing a convoluted error code. With that said follow these steps
(DO NOT FOLLOW THESE UNLESS YOU ARE ON YOUR OWN SERVER, OR A TESTING ENVIRONMENT!!!)
[root@dev ~]# cd /tmp
[root@dev tmp]# mv /etc/httpd/conf/httpd.conf /etc/httpd/conf/httpd.conf.bak
[root@dev tmp]# strace /etc/init.d/httpd restart
Now within the information that was printed on your screen, you should find a line that looks like this:
waitpid(-1, httpd: Could not open configuration file /etc/httpd/conf/httpd.conf: No such file or directory
The error shown above is pretty obvious in terms of what the underlying problem would be, however information like this, as minor as it may be can help save you hours of work. Additional messages you may see here are ones reflecting "permission denied", "Inappropriate ioctl for device", and "No such file or directory". These will be very common in various software distributions such as apache, MySQL, vsftpd and so forth which can make some of the information a bit confusing. Keep in mind that on a linux system, if software fails it generally quits where its at. With that said, the information you are normally looking for is printed right before the strace stops.
The easiest way to look for errors is by reading through the data, and looking for "-1" which is your error indicator. "0" would indicate a success. Keep in mind that strace will not always provide you useful information for every problem, however it can help you determine what the software is doing in the background which may assist you in troubleshooting / debugging a problem.
Options and Examples
The following "switches" and examples are ones that I would personally suggest, and use. Along with them are excerpts from the strace man page.
-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call
-ff If the -o filename option is in effect, each processes trace is written to filename.pid where pid is the numeric process id of each process. This is incompatible with -c, since no per-process counts are kept.
-v Print unabbreviated versions of environment, stat, termios, etc. calls. These structures are very common in calls and so the default behavior displays a reasonable subset of structure members. Use this option to get all of the details
-o Write the trace output to the file filename rather than to stderr. Use filename.pid if -ff is used. If the argument begins with ‘|’ or with ‘!’ then the rest of the argument is treated as a command and all output is piped to it. This is convenient for piping the debugging output to a program without affecting the redirections of executed programs.
strace process id from pid file:
strace -p´cat /var/run/file.pid´
strace a process id and output to a file
strace -p12345 -o /tmp/filename.txt
strace a process and follow all forks
strace -ff -p12345
combining all of the above
strace -ff -o /tmp/outfile.txt -p´cat /var/run/httpd.pid´
Hopefully by now you have a pretty solid basic understanding on how to use strace, and how it can be beneficial in saving you time, and effort when it comes to troubleshooting an issue that is consuming your time. Again strace will not always provide you with the information you need, however when you are running out of idea's or options, it is a great tool to turn to. To become more versed in utilizing strace, get familiar with its options, and how to correctly use them, and understand the information they provide you, and of course.. use it. The best way to become good at something is to practice.