Our most recent MariaDB Server release introduced some regressions starting with the 10.6 series, affecting 10.7 – 10.9 as well. This blog post is here to explain the problems in hopes that the impact is minimized. We are likely going to release a new version of MariaDB correcting these problems soon.
There was a bug in the InnoDB Storage Engine where the full text index could go out of sync with the actual table data. This would happen when only one new row was inserted between the last InnoDB sync (which happens asynchronously) and a server shutdown. The only way to fix the index is to rebuild it. Note that this problem was present in older releases too, however it was silently happening. The difference in 10.6.9 is that an assertion actually catches the problem and thus stops the server to prevent further corruption. More details can be found in MDEV-29342.
A fix has already been pushed to GitHub. Tables with fulltext indexes were likely affected by this bug and need to be rebuilt with OPTIMIZE TABLE once upgrading to 10.6.10. As a potential workaround, rebuilding the fulltext index in 10.6.9 could prevent the server from crashing (at least until the data corruption happens again). Hopefully this solution works for you until an upgrade is possible.
Normally, InnoDB tables should be resilient to crashes and that is largely the case. InnoDB also has a checksum mechanism to ensure that there are no data pages containing corrupted data. Since the very first version of InnoDB the go-to approach was to always crash the server if a checksum error is encountered. This is effectively the least risky solution to be coded in: we detected something is wrong, we don’t know what caused it, stop everything and let the user decide how best to proceed.
Unfortunately this approach has its drawbacks in that it guaranteed system downtime. If even a single row in a table is corrupted, no requests can be performed. This is where MDEV-13542 comes into play. We have made the choice to introduce this new functionality into 10.6 onward even if it is already GA. We normally introduce code into GA releases only as bug fixes. We thought of this code as a bug fix rather than a feature because of the numerous benefits of not always crashing the server should a corrupt page be detected.
In hindsight, one could argue that this would have been better reserved for a newer version. But since development started on 10.6 and during development, numerous other bugs were uncovered, 10.6 is the version where the code landed.
Nevertheless this work has introduced the following crash recovery bugs MDEV-29374 and MDEV-29383 which cause data corruption. There is no action users can take to prevent hitting these bugs, except for preventing MariaDB Server from crashing in the first place (as can happen by having an Out of Memory killer script terminate the mariadbd process). For MDEV-29383, avoid performing backups with maria-backup after a crash recovery. Both of these bugs have been fixed already in the source tree.
For a complete list of changes you can view the release notes of MariaDB 10.6.9.
And as noted in the opening paragraph of this blog entry, there will soon be a new point release of MariaDB Server.