InnoDB holepunch compression vs the filesystem in MariaDB 10.1

InnoDB holepunch experiments

After excellent blogs by Mark Callaghan (see links below), I decided to use some of my time to experiment how different filesystems behave if the holepunch feature is used in MariaDB 10.1. First of all, MariaDB 10.1 does not use holepunch by default even if a table is page compressed (a term used in MariaDB). The holepunch feature in MariaDB is enabled with the innodb-use-trim=1 configuration variable and naturally requires support of the fallocate system call with the FALLOC_FL_PUNCH_HOLE and FALLOC_FL_KEEP_SIZE parameters. Support for these is checked during the cmake build phase.

I used CentOS Linux release 7.1.1503 (Core) using the 3.10.0-229.el7.x86_64 Linux kernel and few SSD drives in a RAID-0 setup (Intel X25-E Extreme SSDSA2SH032 G1GN 2.5-inch 32GB SATA II SLC Internal Solid State Drive (SSD)). In the system I used ext4, btrfs (v3.16.2), and xfs as file systems. Note that NVMFS, where this feature was designed, can’t be currently used on normal SSDs as a file system.

I used LinkBench with 2.5x database size resulting is 26G of uncompressed tables and 16G of holepunch tables using zip compression.

Filesystem Time to drop uncompressed db Time to drop holepunch compressed db
ext4 6.43sec 6.53sec
btrfs 1.74sec 13.82sec
xfs 6.66sec 1 min 1.73sec

Clearly, dropping holepunch compressed tables in xfs takes magnitudes longer than uncompressed meaning that the holepunch feature is not useful on xfs if the workload requires dropping big tables. In ext4 there is basically no difference and in btrfs the difference is 10x. However, the problem could be that the database is too small, thus I also tested LinkBench with a 5x database size.

Filesystem Time to drop uncompressed db Time to drop holepunch compressed db
ext4 7.75sec 26.55sec
btrfs 11.69sec 33.18sec

Similar test was done using a normal HD and 20x LinkBench using only ext4. Firstly, loading holepunch compressed tables took significantly longer than an uncompressed database. Similarly, dropping the database was significantly different:

Filesystem Time to drop uncompressed db Time to drop holepunch compressed db
ext4 40.44sec 5 minutes 55.65sec

Finally, similar test was done using ioMemory SX300-1600 with VSL driver 4.2.1 build 1137 and NVMFS 1.1.1 using 20X LinkBench database. In this setting no significant difference was found.

Filesystem Time to drop uncompressed db Time to drop holepunch compressed db
nvmfs 3.30sec 3.65sec

Conclusions

Based on my and other experiments its clear that many of these filesystem have a large overhead dealing with large sparse files. This conclusion leads to need for alternative design for holepunch i.e. where one could use this the feature without punch hole operation resulting in a slightly denser file than otherwise would be possible based on the true Page Compression architecture.

Links

https://bugs.mysql.com/bug.php?id=78277
http://smalldatum.blogspot.com/2015/08/first-day-with-innodb-transparent-page.html
http://smalldatum.blogspot.com/2015/09/second-day-with-innodb-transparent-page.html
http://smalldatum.blogspot.com/2015/09/third-day-with-innodb-transparent-page.html
http://smalldatum.blogspot.com/2015/10/wanted-file-system-on-which-innodb.html
https://mariadb.com/kb/en/mariadb/compression/