Tricky Problems? MariaDB debug container

MariaDB does have bugs. Users see them sometimes. Sometimes developers look for a long time at bug reports and code and still cannot see how the situation occurred. Developers during their analysis ask questions like:

  • I wonder if this was already fixed in {not released version}? But how can I ask a user to test that?
  • Can I get the user to get good stack trace that would help understand this better? But users sometimes find this hard.
  • What exact hardware and kernel configuration is this bug occurring on? And how would I reproduce this?
  • I’d like to have a copy of this data to attempt some potentially damaging operation to understand this better? But user data isn’t always sharable, or easily copy-able.
  • A rr record of this way that MariaDB got into this state would help me understand this better?
  • I know what debugger information I’d get if I was to attack this problem locally, can I ask the user to do it?

Users also ask tricky questions of MariaDB like:

  • Where is MariaDB spending all its CPU time?
  • Can I record a flame graph on what MariaDB is doing?

MariaDB Foundation has developed the container image quay.io/mariadb-foundation/mariadb-debug to help bring greater information on problems to developers to solve MariaDB issues, and for developers to ask users for more information in ways that won’t consume lots of time.

What is quay.io/mariadb-foundation/mariadb-debug?

quay.io/mariadb-foundation/mariadb-debug is a container image. It is based on the quay.io/mariadb-foundation/mariadb-devel image, which as per the previous blog image, is the current latest development version of MariaDB for every major supported stable branch (and the unstable 10.8 branch). This image behaves identical to the Docker Library docker.io/library/mariadb that you may be using.

Differences are:

  • Debug info packages are installed so its easier to get stack resolution.
  • Extra debugging tools are installed in the container.
  • Curl is installed for easy upload to the MariaDB Private FTP server (very recently added – may not be in all images).

Notably this isn’t the server debug build (i.e. compiled with -DCMAKE_BUILD_TYPE=Debug).

Is my MariaDB runtime error fixed already?

The quay.io/mariadb-foundation/mariadb-devel images are sufficient to test this. If the problem is a crash or needs further debugging this can be replaced with the quay.io/mariadb-foundation/mariadb-debug image in the below steps.

The steps are:

  1. podman pull quay.io/mariadb-foundation/mariadb-devel:{same major version} ; we’ll use 10.5 in the examples below.
  2. stop your existing container
  3. Run the quay.io/mariadb-foundation/mariadb-devel:10.5 image like you ran your previous 10.5 image. e.g. podman run -v {volume}:/var/lib/mysql -p quay.io/mariadb-foundation/mariadb-devel:10.5
  4. Run your previous SQL/tests to see if the same behavior is observed.

I need a backtrace for a bug report

Like the above steps that used quay.io/mariadb-foundation/mariadb-devel:10.5 image to determine if the bug still exists, this follows similar step using the debug information to give detailed information for use in a bug report. The steps are:

  1. Create the data directory, by stop current container taking notes of the data volume on /var/lib/mysql.
  2. Start the debug container. Note the capability CAP_SYS_PTRACE is needed for all debugging.
  3. trigger the crash
  4. record the backtrace
  5. copy the backtrace out of the container for a bug report.

Below is a step by step guide on reproducing the existing bug MDEV-26412.

Create data directory

This creates an empty data directory on the new volume db105. Below uses the Docker Library mariadb image to create this however the quay.io/mariadb-foundation/mariadb-devel and quay.io/mariadb-foundatin/mariadb-debug images achieve the same result.

$ podman volume create db105
db105

$ podman run -v db105:/var/lib/mysql -e MARIADB_DATABASE=test -e MARIADB_USER=bob -e MARIADB_PASSWORD=pluto -e MARIADB_RANDOM_ROOT_PASSWORD=1 -d --name mdb105 --rm mariadb:10.5
91b59609aa83f474422aa3f7b6b693243782a61698153f4dac1e689744edf937

$ podman kill mdb105
91b59609aa83f474422aa3f7b6b693243782a61698153f4dac1e689744edf937

Start the debug container

Here we start the debug container of the previous volume by using volume db105 on /var/lib/mysql, and we use --user mysql (so the permissions are correct, CAP_SYS_PTRACE is needed so we can debug and trace the process, and gdb --args mysqld to run the server instance under a debugger.

$ podman run -ti -v db105:/var/lib/mysql --user mysql --cap-add CAP_SYS_PTRACE --name mdb105 quay.io/mariadb-foundation/mariadb-debug:10.5 gdb --args mysqld
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"…
Reading symbols from mysqld…
Reading symbols from /usr/lib/debug/.build-id/c3/b5a5cf349a1d49643b9eaab6e9d667fcc2d005.debug…
(gdb) r
Starting program: /usr/sbin/mysqld

The r command above was entered by us to start debugging.

Trigger the crash

The bug we are triggering is a basic SQL bug. We use the MariaDB monitor installed in the container to run the SQL.

$ podman exec -ti mdb105 mysql -u bob -ppluto test
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.5.14-MariaDB-88b339805d7a9ddebc3fd61e9dee83270dbf474d mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [test]> CREATE TABLE v0 ( v1 BIGINT ( 67 ) NOT NULL ) ;
Query OK, 0 rows affected (0.033 sec)

MariaDB [test]>  CREATE TABLE v2 ( v4 INT , v3 INT NOT NULL UNIQUE KEY CHECK ( -128 | ( str_to_date ( CHAR ( 33 ) , 'x' ) ) ) ) ;
Query OK, 0 rows affected (0.017 sec)

MariaDB [test]>  DROP FUNCTION IF EXISTS v0 ;
Query OK, 0 rows affected, 1 warning (0.000 sec)

MariaDB [test]>  INSERT INTO v0 SELECT DISTINCT * FROM v0 FULL JOIN v2 ON ( SELECT v0 . v1 ) ;

If we look back at the terminal where we started the container we can see this is stopped at the location of the crash:

Version: '10.5.14-MariaDB-1:10.5.14+maria~focal' as '10.5.14-MariaDB-88b339805d7a9ddebc3fd61e9dee83270dbf474d' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
[New Thread 0x7fc5f05eb700 (LWP 33)]
--Type for more, q to quit, c to continue without paging--c

Thread 26 "mysqld" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc5f05eb700 (LWP 33)]
Item_field::fix_outer_field (this=0x7fc568014a90, thd=0x7fc568000c58, from_field=0x7fc5f05e9750, reference=0x7fc568014bd8) at ./sql/item.cc:5636
5636 ./sql/item.cc: No such file or directory.

Following the pattern from our knowledge base article on creating backtraces. We use /var/lib/mysql to store the created trace as its writable by the mysql user that the debugger is running under.

(gdb) set logging file /var/lib/mysql/gdb_output.txt
(gdb) set pagination off
(gdb) set logging on
Copying output to /var/lib/mysql/gdb_output.txt.
Copying debug output to /var/lib/mysql/gdb_output.txt.
(gdb) thread apply all bt -frame-arguments all full

With a large amount of backtrace created, we finish with set logging off.

.....
#5  0x00007fc5f3d850b3 in __libc_start_main (main=0x55ac89d1bab0 <main(int, char**)>, argc=1, argv=0x7fff5f7f1dc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff5f7f1db8) at ../csu/libc-start.c:308
        self = <optimized out>
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {94199544867104, 284094712732557841, 94199535158800, 140734795554240, 0, 0, -283743969626709487, -253704532302519791}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x1, 0x7fff5f7f1dc8}, data = {prev = 0x0, cleanup = 0x0, canceltype = 1}}}
        not_first_call = <optimized out>
#6  0x000055ac89d4c63e in _start () at ./sql/mysqld.cc:4321
No symbol table info available.

(gdb) set logging off
Done logging to /var/lib/mysql/gdb_output.txt.

We can copy the file outside the container for bug reporting along with the SQL and processes used to trigger it, along with SELECT VERSION() information.

$ podman cp mdb105:/var/lib/mysql/gdb_output.txt /tmp/

Variant: Attach the debugger to a running instance

In the above instance we ran the gdb debugger as the main process of the container. An alternate is to run the container and attach the gdb debugger as needed. This has the advantage of not needing an initialization step:

$ podman run --cap-add CAP_SYS_PTRACE -d --rm --name mdb105 -e MARIADB_DATABASE=test -e MARIADB_USER=bob -e MARIADB_PASSWORD=pluto -e MARIADB_RANDOM_ROOT_PASSWORD=1 quay.io/mariadb-foundation/mariadb-debug:10.5
e4aafa995837524264bd0db024ef808f1efb394528264feb0d059bb9dbe980c0

Then exec in the container the debugger running as user mysql and attach to process 1 (the main process – mariadbd)

$ podman exec -ti --user mysql mdb105 gdb -p 1
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
...
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
0x00007f2d3c1d7aff in __GI___poll (fds=fds@entry=0x7ffe82371350, nfds=nfds@entry=2, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29	../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) c

As this is already running we use c to continue execution, and the previous steps on triggering/recording the information is the same.

Conclusion

This is only the beginning of describing how the debug containers can be used. Developers in bug reports may request other information be printed in the debugger. You can use a debug container with a non-container data directory by passing that in as volume to the container. The debugger can also be use at the initialization to find hardware/kernel incompatibilities as was performed in the Docker Library issue #338.

We welcome feedback on what can be done to make this container image more useful, or your other innovative uses of this.