MariaDB upgrades to PCRE-8.34

Today we upgraded the PCRE library bundled with MariaDB-10.0 to PCRE-8.34. This PCRE release includes some improvements, fixes for better stability and performance, and gives more compatibility with the Perl regular expressions.

I’d like to give details on the PCRE changes that especially affected MariaDB.

PCRE now includes support for [[:<:]] and [[:>:]]  as used in the BSD POSIX library (written by Henry Spencer) to mean “start of word” and “end of word“, respectively. This is a good news for those project (like MariaDB) migrating from the Henry Spencer’s library to PCRE, as this non-standard syntax seemed to be used quite widely. Many thanks to Philip Hazel and the PCRE team who kindly added this extension into PCRE and who gave us a patch before the final 8.34 release, so we were able to fix an incompatibility with MySQL RLIKE earlier (see “MDEV-5357 REGEXP word boundaries don’t work“, fixed in Maria-10.0.7).

PCRE-8.33 has also fixed a crash caused by stack overrun in pcre_compile() in cases when the pattern contains a very deep level of nested parenthesis. PCRE now has a compile-time limit (250 by default) on the depth of nesting of parentheses. This works perfectly fine with programs using the OS default stack size settings, and instead of crashing, pcre_compile() now returns an error safely. However, unfortunately, this new limit did not help us, because MariaDB uses a smaller individual thread stack size, needed to handle dozen thousands concurrent connections and controlled by the @@thread_stack MariaDB system variable with the default value of 288Kb (versus the default Posix thread stack size of 8Mb). With the default @@thread_stack=288Kb, MariaDB still would crash with the verbatim copy of PCRE-8.34 on a query like this:

SELECT 'a' RLIKE REPEAT('(', 1000);

The exact number that would hit the crash might vary on different operating systems. It was about 210 during my tests on a Fedora box, which a little bit smaller than the new PCRE default limit

To prevent the crash we had to keep our patch that adds a callback function into PCRE. See pcre/mariadb-patches/pcre_stack_guard.diff for details. pcre_compile() calls this callback function every time when a parenthesis is met in the regular expression pattern, before going into the next recursion level. If the thread stack size gets dangerously small, mysqld indicates this by the callback function result, pcre_compile() returns with an error, and the entire SQL query display the error message:

mysql> SELECT 'a' RLIKE REPEAT('(',1000);
ERROR 1436 (HY000): Thread stack overrun:
263672 bytes used of a 294912 byte stack, and 32000 bytes needed.
Use 'mysqld --thread_stack=#' to specify a bigger stack

If you give more stack size by starting mysqld with say --thread-stack=589824, threads don’t run out of stack too early, so the compiled PCRE limit is hit instead, which is indicated by a different error message:

mysql> SELECT 'a' RLIKE REPEAT('(',1000);
ERROR 1139 (42000): Got error 'parentheses are too deeply nested at offset 251' from regexp

We’ll be watching if the future versions of PCRE add some built-in means to control the used stack size. In the meanwhile, we’ll have to keep compiling the bundled patched version even though if PCRE is installed in the system.

Please see http://www.pcre.org/changelog.txt for the full list of changes in PCRE-8.34. We’d like to thank the PCRE team for a good release.