InnoDB holepunch compression vs the filesystem in MariaDB 10.1
InnoDB holepunch experiments
After excellent blogs by Mark Callaghan (see links below), I decided to use some of my time to experiment how different filesystems behave if the holepunch feature is used in MariaDB 10.1. First of all, MariaDB 10.1 does not use holepunch by default even if a table is page compressed (a term used in MariaDB). The holepunch feature in MariaDB is enabled with the innodb-use-trim=1
configuration variable and naturally requires support of the fallocate system call with the FALLOC_FL_PUNCH_HOLE
and FALLOC_FL_KEEP_SIZE
parameters. Support for these is checked during the cmake build phase.
I used CentOS Linux release 7.1.1503 (Core) using the 3.10.0-229.el7.x86_64 Linux kernel and few SSD drives in a RAID-0 setup (Intel X25-E Extreme SSDSA2SH032 G1GN 2.5-inch 32GB SATA II SLC Internal Solid State Drive (SSD)). In the system I used ext4, btrfs (v3.16.2), and xfs as file systems. Note that NVMFS, where this feature was designed, can’t be currently used on normal SSDs as a file system.
I used LinkBench with 2.5x database size resulting is 26G of uncompressed tables and 16G of holepunch tables using zip compression.
Filesystem | Time to drop uncompressed db | Time to drop holepunch compressed db |
---|---|---|
ext4 | 6.43sec | 6.53sec |
btrfs | 1.74sec | 13.82sec |
xfs | 6.66sec | 1 min 1.73sec |
Clearly, dropping holepunch compressed tables in xfs takes magnitudes longer than uncompressed meaning that the holepunch feature is not useful on xfs if the workload requires dropping big tables. In ext4 there is basically no difference and in btrfs the difference is 10x. However, the problem could be that the database is too small, thus I also tested LinkBench with a 5x database size.
Filesystem | Time to drop uncompressed db | Time to drop holepunch compressed db |
---|---|---|
ext4 | 7.75sec | 26.55sec |
btrfs | 11.69sec | 33.18sec |
Similar test was done using a normal HD and 20x LinkBench using only ext4. Firstly, loading holepunch compressed tables took significantly longer than an uncompressed database. Similarly, dropping the database was significantly different:
Filesystem | Time to drop uncompressed db | Time to drop holepunch compressed db |
---|---|---|
ext4 | 40.44sec | 5 minutes 55.65sec |
Finally, similar test was done using ioMemory SX300-1600 with VSL driver 4.2.1 build 1137 and NVMFS 1.1.1 using 20X LinkBench database. In this setting no significant difference was found.
Filesystem | Time to drop uncompressed db | Time to drop holepunch compressed db |
---|---|---|
nvmfs | 3.30sec | 3.65sec |
Conclusions
Based on my and other experiments its clear that many of these filesystem have a large overhead dealing with large sparse files. This conclusion leads to need for alternative design for holepunch i.e. where one could use this the feature without punch hole operation resulting in a slightly denser file than otherwise would be possible based on the true Page Compression architecture.
Links
https://bugs.mysql.com/bug.php?id=78277
http://smalldatum.blogspot.com/2015/08/first-day-with-innodb-transparent-page.html
http://smalldatum.blogspot.com/2015/09/second-day-with-innodb-transparent-page.html
http://smalldatum.blogspot.com/2015/09/third-day-with-innodb-transparent-page.html
http://smalldatum.blogspot.com/2015/10/wanted-file-system-on-which-innodb.html
https://mariadb.com/kb/en/mariadb/compression/
Thank you for showing when transparent page compression is and is not a problem.