Monday, June 04, 2012

Debugging mutex locks in threaded programs using GDB

A common problem I have while debugging a threaded program is trying to determine which thread currently has a mutex lock when another thread is stuck waiting at the lock.  This could be because the two threads are deadlocked, or perhaps the first thread simply forgot to unlock the mutex.

Piecing together information from StackOverflow and my own experience, here's a possible way to go about it:

1. Run your program using GDB

$ gdb program
(gdb) run

2. When the program freezes, stop it using Ctrl-C

Program received signal SIGINT, Interrupt.
0x0012d422 in __kernel_vsyscall ()

3. Do a backtrace to see where we're deadlocked.

(gdb) backtrace

#0  0x0012d422 in __kernel_vsyscall ()
#1  0x0013aaf9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2  0x0013613b in _L_lock_748 () from /lib/tls/i686/cmov/
#3  0x00135f61 in __pthread_mutex_lock (mutex=0xb2601660) at pthread_mutex_lock.c:61
#4  0x08167cc9 in ObjectManager::lockObject (this=0x8280998, ObjectId=2992640368) at program/src/ObjectManager.cpp:1212
#5  0x08164929 in ObjectManager::executeActions (this=0x8280998, actions=...) at program/src/ObjectManager.cpp:424
#6  0x08174d09 in main (argc=1, argv=0xbffff6b4) at program/src/main.cpp:110

From this, we can see that we're stuck at line 1212 in ObjectManager.cpp (frame 4).  (Look for the frame right before the frame containing __pthread_mutex_lock)

4. Switch to the frame.

(gdb) frame 4
#4  0x08167cc9 in ObjectManager::lockObject (this=0x8280998, ObjectId=2992640368) at program/src/ObjectManager.cpp:1212
1212            pthread_mutex_lock(&mutex);

5. Determine which thread currently holds the lock we're attempting to obtain.

Look for the mutex.__data.__owner value.  This is a thread ID, or in GDB parlance, a light-weight process (LWP) ID.

(gdb) print mutex
$2 = {__data = {__lock = 2, __count = 0, __owner = 19617, __kind = 0, __nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, __size = "\002\000\000\000\000\000\000\000\241L\000\000\000\000\000\000\001\000\000\000\000\000\000",
  __align = 2}

(gdb) info threads
  9 Thread 0xb47d0b70 (LWP 19619)  0x0012d422 in __kernel_vsyscall ()
  8 Thread 0xb4fd1b70 (LWP 19618)  0x0012d422 in __kernel_vsyscall ()
  7 Thread 0xb57d2b70 (LWP 19617)  0x0012d422 in __kernel_vsyscall ()
  6 Thread 0xb5fd3b70 (LWP 19616)  0x0012d422 in __kernel_vsyscall ()
  5 Thread 0xb67d4b70 (LWP 19615)  0x0012d422 in __kernel_vsyscall ()
  4 Thread 0xb6fd5b70 (LWP 19614)  0x0012d422 in __kernel_vsyscall ()
  3 Thread 0xb77d6b70 (LWP 19613)  0x0012d422 in __kernel_vsyscall ()
  2 Thread 0xb7fd7b70 (LWP 19610)  0x0012d422 in __kernel_vsyscall ()
* 1 Thread 0xb7fd8800 (LWP 19607)  0x0012d422 in __kernel_vsyscall ()

6. Switch to the owning thread.

(gdb) thread 7
[Switching to thread 7 (Thread 0xb57d2b70 (LWP 19617))]#0  0x0012d422 in __kernel_vsyscall ()

At this point, you could run a backtrace on the owning thread to see what it's doing and perhaps get clues as to why it never unlocked the mutex our original thread is still gamely waiting for.


colorfulFOOL said...

Thank you, this really saved me with university task.

Евгений Бачинин said...

Thank's. Very helpful!!!