This is a little compilation of some useful configuration options and basic methods to debug a Linux kernel. There is good documentation in kernel.org and elsewhere but back when I was learning these things I felt that there are few places where you can get a high level overview. We will go through more advanced methods of profiling and tracing in another post.
It is assumed that we already know how to compile a kernel, use menuconfig and what kernel options are.
Basic kernel configuration
First, there are a few options that we want to activate in order to have them available in our running kernel. Some of them will have a performance toll and won’t be present in a production system. Others will just increase kernel size. You can get an overview of what is available in Kconfig.debug, and also you might find this kernel symbol reference handy.
This option exists specifically to unhide debugging options in menuconfig.
BUG() panics will output the file name and line number of the BUG call as well as the EIP and oops trace. This aids debugging but costs about 70-100K of memory.
This option inserts code to into the compiled executable which saves the frame information in registers or on the stack at different points which allows a debugger such as gdb to more accurately construct stack back traces while debugging the kernel.
This will give us more information in stack traces from kernel oopses. The kernel will print out symbolic crash information and symbolic stack backtraces. This increases the size of the kernel somewhat, as all symbols have to be loaded into the kernel image.
If a kernel crashes, this option will allow us to get an output that includes function names and not only addresses so it will be much easier to discover what is going on. This is an example taken from the Ubuntu wiki
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c022a7b5 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: thinkpad_acpi ppdev speedstep_lib cpufreq_conservative cpufreq_userspace cpufreq_ondemand cpufreq_stats cpufreq_powersave freq_table video bay dock ac sbs button container battery lp irtty_sir sir_dev pcmcia parport_pc parport snd_cs46xx gameport snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss nsc_ircc snd_pcm snd_seq_dummy irda crc_ccitt snd_seq_oss psmouse i2c_piix4 snd_seq_midi snd_rawmidi snd_seq_midi_event serio_raw pcspkr snd_seq i2c_core snd_timer snd_seq_device snd soundcore snd_page_alloc shpchp pci_hotplug intel_agp yenta_socket rsrc_nonstatic pcmcia_core agpgart evdev ext3 jbd mbcache sg sr_mod cdrom sd_mod uhci_hcd usbcore ata_piix ata_generic libata scsi_mod e100 mii thermal processor fan fuse apparmor commoncap CPU: 0 EIP: 0060:[<c022a7b5>] Not tainted VLI EFLAGS: 00010202 (2.6.22-12-generic #1) EIP is at acpi_ns_internalize_name+0xd/0x83 eax: 00000008 ebx: 00000000 ecx: 00000000 edx: c7879e54 esi: d0b980c0 edi: c7879e54 ebp: c7879e70 esp: c7879de8 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process modprobe (pid: 4467, ti=c7878000 task=ce5c94c0 task.ti=c7878000) Stack: 00000000 00000000 d0b97e60 00008080 c01c4390 d0b97e60 00000000 00000000 d0b980c0 00000000 c7879e70 c022a85c d0b97e60 c795d030 c7c604e0 c01c44ef 00000004 d0b97e60 c7acea18 c01c3884 00008080 00000004 00000004 00000080 Call Trace: [<c01c4390>] __sysfs_new_dirent+0x20/0x50 [<c022a85c>] acpi_ns_get_node+0x31/0x93 [<c01c44ef>] sysfs_make_dirent+0x2f/0x50 [<c01c3884>] sysfs_add_file+0x74/0x90 [<d0b910b7>] drv_acpi_handle_init+0x37/0x90 [thinkpad_acpi] [<c0231aef>] acpi_ut_release_mutex+0x5b/0x63 [<c0233ac0>] acpi_method_notify_enable+0x15/0x34 [<d0b5ba32>] cmos_init+0x52/0x70 [thinkpad_acpi] [<d0b5c21f>] thinkpad_acpi_module_init+0x27f/0x69a [thinkpad_acpi] [<c014a811>] sys_init_module+0x151/0x1a00 [<c01fb8cf>] prio_tree_insert+0x1f/0x250 [<c01041d2>] sysenter_past_esp+0x6b/0xa9 ======================= Code: c7 44 24 14 01 00 00 00 8b 54 24 14 8d 04 96 e9 f2 fe ff ff 83 c4 18 89 d0 5b 5e 5f 5d c3 55 57 89 d7 56 53 83 ec 1c 85 c0 74 67 <80> 38 00 74 62 85 d2 74 5e 89 04 24 89 e0 e8 b5 fb ff ff 8b 4c EIP: [<c022a7b5>] acpi_ns_internalize_name+0xd/0x83 SS:ESP 0068:c7879de8
We can manually check the symbol to address map in /proc/kallsyms. This output is in nm format.
# head -20 /proc/kallsyms 0000000000000000 A irq_stack_union 0000000000000000 A __per_cpu_start 0000000000004000 A cpu_debug_store 0000000000005000 A cpu_tss_rw 0000000000008000 A gdt_page 0000000000009000 A exception_stacks 000000000000e000 A entry_stack_storage 000000000000f000 A cpu_llc_id 000000000000f020 A cpu_llc_shared_map 000000000000f060 A cpu_core_map 000000000000f0a0 A cpu_sibling_map 000000000000f0e0 A cpu_info 000000000000f1d0 A cpu_number 000000000000f1d8 A this_cpu_off 000000000000f1e0 A x86_cpu_to_acpiid 000000000000f1e4 A x86_cpu_to_apicid 000000000000f1e6 A x86_bios_cpu_apicid 000000000000f1e8 A sched_core_priority 000000000000f200 A cpu_loops_per_jiffy 000000000000f220 A pmc_prev_left
Also, it is easy to access this information from kernel code. We can just use kallsyms_lookup_name() from kallsyms.c.
Normally kallsyms only contains the symbols of functions for nicer OOPS messages and backtraces (i.e., symbols from the text and inittext sections). This is sufficient for most cases. And only in very rare cases (e.g., when a debugger is used) all symbols are required (e.g., names of variables from the data sections, etc).
This option makes sure that all symbols are loaded into the kernel image (i.e., symbols from all sections) in cost of increased kernel size (depending on the kernel configuration, it may be 300KiB or something like this).
This option enables the complete Linux kernel .config file contents to be saved in the kernel. It provides documentation of which kernel options are used in a running kernel or in an on-disk kernel. This information can be extracted from the kernel image file with the script scripts/extract-ikconfig and used as input to rebuild the current kernel or to build another kernel.
Enable access to .config through /proc/config.gz in a running kernel.
$ zcat /proc/config.gz | head -20 # # Automatically generated file; DO NOT EDIT. # Linux/x86 4.16.4-1 Kernel Configuration # CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=28 CONFIG_ARCH_MMAP_RND_BITS_MAX=32 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y
We can check what parameters were used to invoke our running kernel in /proc/cmdline
$ cat /proc/cmdline 8250.nr_uarts=0 bcm2708_fb.fbwidth=656 bcm2708_fb.fbheight=416 bcm2708_fb.fbswap=1 vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 dwc_otg.lpm_enable=0 console=ttyS0,115200 console=tty1 root=PARTUUID=9f51643f-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait
These are normally provided by the bootloader (GRUB or uboot), but sometimes those are locked so we can use CONFIG_CMDLINE and CONFIG_CMDLINE_OVERRIDE to work around this when we are building a kernel.
- CONFIG_CMDLINE. Enter arguments here that should be compiled into the kernel image and used at boot time. If the boot loader provides a command line at boot time, it is appended to this string to form the full kernel command line, when the system boots.
- CONFIG_CMDLINE_OVERRIDE. have the kernel ignore the boot loader command line, and use ONLY the built-in command line. This is used to work around broken boot loaders.
Analyzing running modules
Probably the most basic means of debugging is reading printk output. There are several logging levels, and not all of them are active by default. Values go from emergency which is level 0 to debug which is level 8, with the default being warning which is level 4.
We can activate maximum verbose logging with
echo 8 > /proc/sys/kernel/printk
We can check the current logging level with
$ cat /proc/sys/kernel/printk 7 4 1 7 current default minimum boot-time-default
We can only see debug messages ( level 8 ), if we compile our kernel with CONFIG_DEBUG. Note that if we are generating too much logging we can severely affect system performance, specially if we are sending kernel logs through slow UART. For this situations, it is a good idea to use printk_ratelimited() or printk_once() if we are trying to introduce printk statements in places where they can get called too often, like the network packet stack.
printk_ratelimited(KERN_WARNING "this line will be limited. Iteration %d\n", count);
will result in many iterations being suppresed with a message like
[6886.360547] __ratelimit: 190 callbacks suppressed
an example of printk_once() could be
printk_once(KERN_INFO "this will only be printed once\n");
Using pr_debug() globally will cause a tremendous amount of logging so it is not very practical. To make the debug level more manageable, dynamic debugging was introduced.
We can activate it with CONFIG_DYNAMIC_DEBUG. We can see in printk.h that dynamic_pr_debug() will take over in this case.
/* If you are writing a driver, please use dev_dbg instead */ #if defined(CONFIG_DYNAMIC_DEBUG) #include <linux/dynamic_debug.h> /* dynamic_pr_debug() uses pr_fmt() internally so we don't need it here */ #define pr_debug(fmt, ...) \ dynamic_pr_debug(fmt, ##__VA_ARGS__) #elif defined(DEBUG) #define pr_debug(fmt, ...) \ printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) #else #define pr_debug(fmt, ...) \ no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) #endif
In order to control what we want to print, a file will appear in dynamic_debug/control. We now have absolute control about what printk lines we want to be logged. We can hide (-p ) or show (+p ) log messages at the file level, function level or even line level.
echo file tpm_nsc.c line 346-373 +p > /sys/kernel/dynamic_debug/control echo file tpm_nsc.c function init_nsc +p > /sys/kernel/dynamic_debug/control
We can also configure this at module load time with the dyndbg option, for example
# modprobe snd_hda_intel dyndbg=+p
If we have this option selected, then we can mount debugfs like so
mount -t debugfs none /sys/kernel/debug
For instance, if we are compiling Zswap with DEBUG_FS we can check different status parameters such as
# cat /sys/kernel/debug/zswap/written_back_pages
This is really powerful. We can use good old gdb to analyze our kernel. We can set breakpoints, see memory contents and all the rest. We will need gdb or some cross-gdb if we are working with a different architecture, for example an embedded ARM board.
In our kernel we will want to activate
- CONFIG_KGDB. Include in-kernel hooks for kgdb, the Linux kernel source level debugger
- CONFIG_KGDB_SERIAL_CONSOLE. Share a serial console with kgdb. Sysrq-g must be used to break in initially.
- CONFIG_MAGIC_SYSRQ . This will activate the magic SysRq Key.
First we configure kgdboc ( kgdb over console ) with the tty and baud rate of our UART.
echo "ttyS0,115200" > /sys/module/kgdboc/parameters/kgdboc
Next, we tell our kernel to enter in debug mode. This will freeze waiting for a gdb connection.
echo g > /proc/sysrq-trigger
We will probably also want to activate kgdbcon in order to send printk statements to the gdb session.
echo 1 > /sys/module/kgdb/parameters/kgdb_use_con
Finally, we connect our (cross)gdb instance to the QEMU instance or real board. First, we run the debugger passing it the kernel that we are running, compiled with debug symbols (CONFIG_DEBUG_INFO). Then, we configure the serial port and attach with the target remote command, just like when we use gdbserver .
./toolchain-mips_34kc_gcc-5.2.0_musl-1.1.11/bin/mips-openwrt-linux-gdb ./build_dir/target-mips_34kc_musl-1.1.11/linux-ar71xx_generic/linux-4.1.13/vmlinux set serial baud 115200 target remote /dev/ttyUSB0
We can also have the kernel wait for the gdb connection at early boot, but then we have to configure kgdboc in the kernel command line, and use the kgdbwait option. In this example we also are deactivating Kernel Address Space Layout Randomization so the memory locations are not messed up.
console=ttyS0,115200 kgdboc=ttyS0,11520 nokaslr kgdbwait
The kernel will be frozen until we connect with gdb. This will allow us to debug from early in the boot process if we need to.
kdb is a shell interface that lets us connect through a keyboard or serial console and inspect memory, registers and logs. It isn’t as powerful as kgdb in terms of inspecting execution following the source code. More details can be found in the kernel docs, but I recommend just sticking to kgdb.
Kcore provides a virtual ELF core file of the live kernel accessible at /proc/kcore. This can be read with gdb and other ELF tools. No modifications can be made of the kernel memory using this mechanism.
This is very cool, because we can analyze a picture of the memory at the moment of invoking gdb. We need a copy of the running kernel compiled with debug symbols (CONFIG_DEBUG_INFO).
Unlike with kgdb, this is a limited simulation of the memory contents, and we cannot do fancy stuff such as control execution. Still we can access memory contents. For instance, if your kernel with symbols is vmlinux we would invoke gdb just like we would run it to analyze a core dump from any executable, but our core dump will be /proc/kcore.
# gdb vmlinux /proc/kcore (gdb) print jiffies $1 = 16467921357 (gdb)
We can refresh memory contents with
(gdb) core-file /proc/kcore
You can read a more detailed example in this article.