Tagged: debugging

Brief GDB Basics

In this post I would like to go through some of the very basic cases in which gdb can come in handy. I’ve seen people avoid using gdb, saying it is a CLI tool and therefore it would be hard to use. Instead, they opted for this:

std::cout << "qwewtrer" << std::endl;
DEBUG("stupid segfault already?");

That’s just stupid. In fact, printing a back trace in gdb is as easy as writing two letters. I don’t appreciate lengthy debugging sessions that much either, but it’s something you simply cannot avoid in software development. What you can do to speed things up is to know the right tools and to be able to use them efficiently. One of them is GNU debugger.

Example program

All the examples in the text will be referring to the following short piece of code. I have it stored as segfault.c and it’s basically a program that calls a function which results in segmentation fault. The code looks like this:

/* Just a segfault within a function. */

#include <stdio.h>
#include <unistd.h>

void segfault(void)
{
	int *null = NULL;
	*null = 0;
}

int main(void)
{
	printf("PID: %d\n", getpid());
	fflush(stdout);

	segfault();

	return 0;
}

Debugging symbols

One more thing, before we proceed to gdb itself. Well, two actually. In order to get anything more than a bunch of hex addresses you need to compile your binary without stripping symbols and with debug info included. Let me explain.

Symbols (in this case) can be thought of simply variable and function names. You can strip them from your binary either during compilation/linking (by passing -s argument to gcc) or later with strip(1) utility from binutils. People do this, because it can significantly reduce size of the resulting object file. Let’s see how it works exactly. First, compile the code with striping the symbols:

[astro@desktop ~/MyBook/code]$ gcc -s segfault.c

Now let’s fire up gdb:

[astro@desktop ~/MyBook/code]$ gdb ./a.out
GNU gdb (GDB) Fedora (7.3.1-48.fc15)
Reading symbols from /mnt/MyBook/code/a.out...(no debugging symbols found)...done.

Notice the last line of the output. gdb is complaining that it didn’t find any debuging symbols. Now, let’s try to run the program and display stack trace after it crashes:

(gdb) run
Starting program: /mnt/MyBook/code/a.out 
PID: 21568

Program received signal SIGSEGV, Segmentation fault.
0x08048454 in ?? ()
(gdb) bt
#0  0x08048454 in ?? ()
#1  0x0804848d in ?? ()
#2  0x4ee4a3f3 in __libc_start_main (main=0x804845c, argc=1, ubp_av=0xbffff1a4, init=0x80484a0, fini=0x8048510, 
    rtld_fini=0x4ee1dfc0 , stack_end=0xbffff19c) at libc-start.c:226
#3  0x080483b1 in ?? ()

You can imagine, that this won’t help you very much with the debugging. Now let’s see what happens when the code is compiled with symbols, but without the debuginfo.

[astro@desktop ~/MyBook/code]$ gcc segfault.c 
[astro@desktop ~/MyBook/code]$ gdb ./a.out 
GNU gdb (GDB) Fedora (7.3.1-48.fc15)
Reading symbols from /mnt/MyBook/code/a.out...(no debugging symbols found)...done.
(gdb) run
Starting program: /mnt/MyBook/code/a.out 
PID: 21765

Program received signal SIGSEGV, Segmentation fault.
0x08048454 in segfault ()
(gdb) bt
#0  0x08048454 in segfault ()
#1  0x0804848d in main ()

As you can see, gdb still complains about the symbols in the beginning, but the results are much better. The program crashed when it was executing segfault() function, so we can start looking for any problems from there. Now let’s see what we get when debuginfo get’s compiled in.

[astro@desktop ~/MyBook/code]$ gcc -g segfault.c 
[astro@desktop ~/MyBook/code]$ gdb ./a.out 
GNU gdb (GDB) Fedora (7.3.1-48.fc15)
Reading symbols from /mnt/MyBook/code/a.out...done.
(gdb) run
Starting program: /mnt/MyBook/code/a.out 
PID: 21934

Program received signal SIGSEGV, Segmentation fault.
0x08048454 in segfault () at segfault.c:9
9		*null = 0;

That’s more like it! gdb printed the exact line from the code that caused the program to crash! That means, every time you try to use gdb to get some useful directions for debugging, make sure, that you don’t strip symbols and have debuginfo available!

Start, Stop, Interrupt, Continue

These are the basic commands to control your application’s runtime. You can start a program by writing

(gdb) run

When a program is running, you can interrupt it with the usual Ctrl-C, which will send SIGINTR to the debugged process. When the process is interrupted, you can examine it (this is described later in the post) and then either stop it completely or let it continue. To stop the execution, write

(gdb) kill

If you’d like to let your program carry on executing, use

(gdb) continue

I should point out, that in gdb, you can abbreviate most of the commands to as little as a single character. For instance r can be used for run, k for kill, c for continue and so on :).

Stack traces

Stack traces are very powerful when you need to localize the point of failure. Seeing a stack trace will point you directly to the function, that caused you program to crash. If your project is small or you keep your functions short and straight-forward, this could be all you’ll ever need from a debugger. You can display stack trace in case of a segmentation fault or generally anytime when the program is interrupted. The stack trace can be displayed by a backtrace or bt command

(gdb) bt
#0  0x08048454 in segfault () at segfault.c:9
#1  0x0804848d in main () at segfault.c:17

You see, that the program stopped (more precisely was killed by the kernel with a SIGSEGV signal) at line 9 of segfault.c file while it was executing a function segfault(). The segfault function was called directly from the main() function.

Listing source code

When the program is interrupted (and compiled it with debuginfo), you can list the code directly by using the list command. It will show the precise line of code (with some context) where the program was interrupted. This can be more convenient, because you don’t have to go back into your editor and search for the place of the crash by line numbers.

(gdb) list
4 #include 
5
6 void segfault(void)
7 {
8   int *null = NULL;
9 *null = 0;
10 }
11
12 int main(void)
13 {

We know (from the stack trace), that the program has stopped at line 9. This command will show you exactly what is going on around there.

Breakpoints

Up to this point, we only interrupted the program by sending a SIGTERM to it manually. This is not very useful in practice though. In most cases, you will want the program stop at some exact place during the execution, to be able to inspect what is going on, what values do the variables have and possibly to manually step further through the program. To achieve this, you can use breakpoints. By attaching a breakpoint to a line of code, you say that you want the debugger to interrupt every time the program wants to execute the particular line and wait for your instructions.

A breakpoint can be set by a break command (before the program is executed) like this

(gdb) break 8
Breakpoint 2 at 0x4005c0: file segfault.c, line 8.

I’m using line number to specify, where to put the break, but you can use also function name and file name. There are multiple variants of arguments to break command.

You can list the breakpoints you have set up by writing info breakpoints:

(gdb) info breakpoints 
Num     Type           Disp Enb Address            What
1       breakpoint     keep n   0x00000000004005d8 in main at segfault.c:14
2       breakpoint     keep y   0x00000000004005c0 in segfault at segfault.c:8

To disable a break point, use disable <Num> command with the number you find in the info.

Stepping through the code

When gdb stops your application, you can resume the execution manually step-by-step through the instructions. There are several commands to help you with that. You can use the step and next commands to advance to the following line of code. However, these two commands are not entirely the same. Next will ‘jump’ over function calls and run them at once. Step, on the other hand, will allow you to descend into the function and execute it line-by-line as well. When you decide you’ve had enough of stepping, use the continue command to resume the execution uninterrupted to the next break point.

Breakpoint 1, segfault () at segfault.c:8
8		int *null = NULL;
(gdb) step
9		*null = 0;

There are multiple things you can do during the process of stepping through a running program. You can dump values of variables using the print command, even set values to variables (using set command). And this is definitely not all. Gdb is great! It really can save a lot of time and lets you focus on the important parts of software development. Think of it the next time you try to bisect errors in the program by inappropriate debug messages :-).

Sources

Advertisements

Core dumps in Fedora

This post will demonstrate a way of obtaining and examining a core dump on Fedora Linux. Core file is a snapshot of working memory of some process. Normally there’s not much use for such a thing, but when it comes to debugging software it’s more than useful. Especially for those┬áhard-to-reproduce random bugs. When your program crashes in such a way, it might be your only source of information, since the problem doesn’t need to come up in the next million executions of your application.

The thing is, creation of core dumps is disabled by default in Fedora, which is fine since the user doesn’t want to have some magic file spawned in his home folder every time an app goes down. But we’re here to fix stuff, so how do you turn it on? Well, there’s couple of thing that might prevent the cores to appear.

1. Permissions

First, make sure that the program has writing permission for the directory it resides in. The core files are created in the directory of the executable. From my experience, core dump creation doesn’t work on programs executed from NTFS drives mounted through ntfs3g.

2. ulimit

This is the place where the core dump creation is disabled. You can see for yourself by using the ulimit command in bash:

astro@desktop:~$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 15976
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

To enable core dumps set some reasonable size for core files. I usually opt for unlimited, since disk space is not an issue for me:

ulimit -c unlimited

This setting is local only for the current shell though. To keep this settings, you need to put the above line into your ~/.bashrc or (which is cleaner) adjust the limits in /etc/security/limits.conf.

3. Ruling out ABRT

In Fedora, cores are sent to the Automatic Bug Reporting Tool — ABRT. So they can be posted to the RedHat bugzilla to the developers to analyse. The kernel is configured so that all core dumps are pipelined right to abrt. This is set in /proc/sys/kernel/core_pattern. My settings look like this

|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t %h %e 636f726500

That means, that all core files are passed to the standard input of abrt-hook-ccpp. Change this settings simply to “core” i.e.:

core

Then the core files will be stored in the same directory as the executable and will be called core.PID.

4. Send Right Signals

Not every process termination leads to dumping core. Keep in mind, that core file will be created only if the process receives this signals:

  • SIGSEGV
  • SIGFPE
  • SIGABRT
  • SIGILL
  • SIGQUIT

Example program

Here’s a series of steps to test whether your configuration is valid and the cores will appear where they should. You can use this simple program to test it:

/* Print PID and loop. */

#include <stdio.h>
#include <unistd.h>

void infinite_loop(void)
{
    while(1);
}

int main(void)
{
    printf("PID: %d\n", getpid());
    fflush(stdout);

    infinite_loop();

    return 0;
}

Compile the source, run the program and send a signal like following to get a memory dump:

gcc infinite.c
astro@desktop:~$ ./a.out &
[1] 19233
PID: 19233
astro@desktop:~$ kill -SEGV 19233
[1]+  Segmentation fault      (core dumped) ./a.out
astro@desktop:~$ ls core*
core.19233

Analysing Core Files

If you already have a core, you can open it using GNU Debugger (gdb). For instance, to open the core file, that was created earlier in this post and displaying a backtrace, do the following:

astro@desktop:~$ gdb a.out core.19233
GNU gdb (GDB) Fedora (7.3.1-47.fc15)
Copyright (C) 2011 Free Software Foundation, Inc.
Reading symbols from /home/astro/a.out...(no debugging symbols found)...done.
[New LWP 19233]
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0  0x08048447 in infinite_loop ()
Missing separate debuginfos, use: debuginfo-install glibc-2.14.1-5.i686
(gdb) bt
#0  0x08048447 in infinite_loop ()
#1  0x0804847a in main ()
(gdb)

Sources