HomeAbout MeProjectsContactMeow

Debugging a Strange Linux Issue

22 May 2023

A few days ago, after not touching my Arch install for months, I decided to do an update. This installed a new kernel, so of course I did a reboot.

After grumbling about my up-time, I went about my day. Things were fine until I tried to start thunar, and it just... didn't.

...

...

...

Oh, there it is!

Yes, my computer has a new issue. Thunar takes almost exactly 25 seconds to actually start up. This issue also applies to pavucontrol, my screenshot utility, my podcast manager, and a few other programs.

Luckily, Linux has a lot of useful tools built-in for debugging crazy issues like this! The first one I like to reach for is strace. This is a lovely utility for logging system calls to the terminal. I find this is an absolutely perfect level of abstraction in a lot of situations. It's high-level enough that I'm not stepping through assembler instructions, but still low-level enough that I have a precise view of what's going on.

I fire up strace with strace pavucontrol. It spits out a ton of logs, before eventually hanging:

  write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
  futex(0x55cfbbf8d300, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x55cfbbf88b78, FUTEX_WAKE_PRIVATE, 1) = 1
  eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK)   = 11
  futex(0x7fe8833f1ba0, FUTEX_WAKE_PRIVATE, 1) = 1
  futex(0x55cfbbf8dff0, FUTEX_WAKE_PRIVATE, 1) = 0
  write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
  futex(0x55cfbbf88b78, FUTEX_WAKE_PRIVATE, 1) = 1
  poll([{fd=11, events=POLLIN}], 1, 25000

Well, there's our 25 second timeout - 25000 milliseconds. At least, I assume this is that, but I'm really just making a guess. Anyway, what's fd=11?

lsof is the next tool to reach for. First, I start pavucontrol a second time.

I grab its PID with ps -aux | grep pavucontrol:

  stephen  2319431  0.6  0.0  10480  3800 pts/0    S+   22:39   0:00 strace pavucontrol
  stephen  2319434  1.0  0.1 316492 36768 pts/0    Sl+  22:39   0:00 pavucontrol
  stephen  2319680  0.0  0.0   9112  2400 pts/1    S+   22:39   0:00 grep --color=auto pavucontrol

Now we have its PID - 2319434. We can feed this into lsof to get a bit more information about the file descriptor.

lsof -p 2319434

  ...
  pavucontr 2319434 stephen    0u      CHR              136,0      0t0        3 /dev/pts/0
  pavucontr 2319434 stephen    1u      CHR              136,0      0t0        3 /dev/pts/0
  pavucontr 2319434 stephen    2u      CHR              136,0      0t0        3 /dev/pts/0
  pavucontr 2319434 stephen    3u     unix 0x00000000b0fa1eb9      0t0  5191565 type=STREAM (CONNECTED)
  pavucontr 2319434 stephen    4u  a_inode               0,14        0     1076 [eventfd:245]
  pavucontr 2319434 stephen    5u  a_inode               0,14        0     1076 [eventfd:246]
  pavucontr 2319434 stephen    6u      CHR              136,0      0t0        3 /dev/pts/0
  pavucontr 2319434 stephen    7u     unix 0x000000005f742058      0t0  5205220 type=STREAM (CONNECTED)
  pavucontr 2319434 stephen    8u  a_inode               0,14        0     1076 [eventfd:247]
  pavucontr 2319434 stephen    9u     unix 0x0000000000946e4a      0t0  5205222 type=STREAM (CONNECTED)
  pavucontr 2319434 stephen   10u  a_inode               0,14        0     1076 [eventfd:248]
  pavucontr 2319434 stephen   11u  a_inode               0,14        0     1076 [eventfd:254]

From the last line (the one with 11u in it) we can see it's an anon_inode eventfd. I have no idea what this means but we can work with it.

Years of experience with Linux has given me trust issues, so let's do a quick confirm. ls -l /proc/2319434/fd

  lrwx------ 1 stephen stephen 64 May 18 22:39 11 -> 'anon_inode:[eventfd]'

Yup. Still an anon_inode eventfd. And I still have no clue what this means. I did try to research it but didn't get too far. I guess I need some more prerequisite knowledge here!

From my years of experience with C (there was a time when it was my favourite language, for some reason), gdb is what I want to reach for next. We can use it to step through pavucontrol line by line. It gives us a bit more granularity than just using strace.

I start pavucontrol once again, this time using gdb pavucontrol.

    $ gdb pavucontrol
  GNU gdb (GDB) 13.1
  Copyright (C) 2023 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  Type "show copying" and "show warranty" for details.
  This GDB was configured as "x86_64-pc-linux-gnu".
  Type "show configuration" for configuration details.
  For bug reporting instructions, please see:
  <https://www.gnu.org/software/gdb/bugs/>.
  Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.

  For help, type "help".
  Type "apropos word" to search for commands related to "word"...
  Reading symbols from pavucontrol...

  This GDB supports auto-downloading debuginfo from the following URLs:
    <https://debuginfod.archlinux.org>
  Enable debuginfod for this session? (y or [n]) y
  Debuginfod has been enabled.
  To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
  Downloading separate debug info for /usr/bin/pavucontrol
  Reading symbols from /home/stephen/.cache/debuginfod_client/74e85fe333bf717a0434dec5c4444aa66cd73d41/debuginfo...
  (gdb) r
  Starting program: /usr/bin/pavucontrol
  Downloading separate debug info for /lib64/ld-linux-x86-64.so.2
  Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000
  Downloading separate debug info for /usr/lib/libgtkmm-3.0.so.1
  Downloading separate debug info for /usr/lib/libatkmm-1.6.so.1
  ... [lots more of this]
  Downloading separate debug info for /usr/lib/libgpg-error.so.0
  Downloading separate debug info for /usr/lib/gtk-3.0/modules/libcanberra-gtk-module.so
  Downloading separate debug info for /usr/lib/gio/modules/libgvfsdbus.so
  Downloading separate debug info for /usr/lib/gvfs/libgvfscommon.so
  [New Thread 0x7ffff25ff6c0 (LWP 3875580)]
  [New Thread 0x7fffebfff6c0 (LWP 3875581)]
  [New Thread 0x7ffff1dfe6c0 (LWP 3875582)]

  (pavucontrol:3873842): Gtk-WARNING **: 19:39:10.626: Theme parsing error: gtk.css:3611:22: 'none' is not a valid color name
  ^C
  Thread 1 "pavucontrol" received signal SIGINT, Interrupt.
  0x00007ffff6913c0f in __GI___poll (fds=0x55555574b570, nfds=1, timeout=25000) at ../sysdeps/unix/sysv/linux/poll.c:29
  Downloading source file /usr/src/debug/glibc/glibc/io/../sysdeps/unix/sysv/linux/poll.c
  29        return SYSCALL_CANCEL (poll, fds, nfds, timeout);

Alright, so there's a few things there I want to point out. First of all, gdb helpfully asks if we want to download debuginfo automatically. To be honest, I'm not entirely sure on the mechanism here, but it's a massive quality-of-life improvement, so of course we say yes. Then gdb downloads all the debuginfo, obviously. Once this is done, and the program is allowed to run for a few seconds, we Ctrl+C it to pause execution.

We're apparently hanging at that SYSCALL_CANCEL line. A simple p timeout shows that yes, our timeout is indeed 25000. Exactly what we were seeing before!

Armed with this new information, I started doing more research. Eventually, I stumbled my way onto this post. To be honest, I'm not entirely sure how I got there, but the fix, sudo pacman -Rndd xdg-desktop-portal xdg-desktop-portal-gtk, immediately had things starting instantly again!

So, why did I make this post? I think documenting my debugging steps is useful - even if only for myself. This is basically my very generic process for digging into issues when I have so little to work on that I can't even search it up.