10.7 preview feature: InnoDB Bulk Insert

Everyone wants performance. Speeding up the insert into empty InnoDB tables and partitions is a common enough use case to optimize. Thirunarayanan Balathandayuthapani from the MariaDB Corporation wrote this improvement to the bulk inserting into InnoDB tables that is available as a preview for testing.

To achieve this, if the table is empty then InnoDB will pre-sort the records for each index, and build the indexes one page at a time. If the transaction does bulk insert operation then InnoDB should create buffer of size innodb_sort_buffer_size for each index.

If the buffer ran out of memory then InnoDB will create a temporary file for storing the data. Once transaction statement gets committed then InnoDB should sort the buffer/file blocks and do bulk insert operation for each index.

In case of externally stored field, InnoDB bulk insert operation will store the data in temporary created blob file. Criteria to store the field data in blob file is more than 2000 bytes.

A current limitation is that the bulk insert is from a single large INSERT … VALUES SQL, or a LOAD DATA SQL.
For improving load performance disabling foreign_key_checks=0 and unique_checks=0 (in 10.6+ due to MDEV-515) are existing mechanisms.

Example Performance

In the example below, HammerDB has been used to generate a scale 1, TPROCH dataset.

As the tbl files are one per virtual user when I generated these, they where concatenated so they could be used with LOAD DATA INFILE.

The table schema for these where created with mariadb-dump –no-data tproch from another instance.

These tests where on a basic 8 cpu laptop with 16G RAM, with CPU governor set to performance. To eliminate the performance of the encrypted storage I’ve included innodb_flush_log_at_trx_commit=0 as a mitigation.

Here I’ve compared using this preview, against the 10.7 baseline development version.

$ podman run --rm --detach \
 --env MARIADB_RANDOM_ROOT_PASSWORD=1 \
 --env MARIADB_DATABASE=tproch \
 --env MARIADB_USER=tproch \
 --env MARIADB_PASSWORD=tproch \
 --name mdb107_base \
 -P \
 -v ./:/docker-entrypoint-initdb.d:z \
 quay.io/mariadb-foundation/mariadb-devel:10.7 \
--innodb-buffer-pool-size=6G \
--innodb-log-file-size=4G \
--innodb_flush_log_at_trx_commit=0

The docker-entrypoint-initdb.d contains the tables.sql to create the table on startup.

$ podman port mdb107_base 3306
0.0.0.0:35545

Using the port number above, the following was used to LOAD DATA of the tables in parallel:

$  for t in customer lineitem nation orders part partsupp region supplier; do 
  tbl=${t^^*};
  echo table=$tbl
  time mariadb -h 127.0.0.1 --port 35545  -u tproch -ptproch \
    --local-infile=1 \
    -e "set @@foreign_key_checks=0;
    load data local infile '$t.tbl' into table $tbl FIELDS terminated by '|';" tproch
 done

The following timing results where obtained for each table:

Tag of container: quay.io/mariadb-foundation/mariadb-devel	customer	lineitem	nation	orders	part	partsupp	region	supplier	total time
10.7	0m1.076s	1m55.075s	0m0.006s	0m16.041s	0m1.271s	0m6.647s	0m0.010s	0m0.238s	140.364s
10.7-mdev-24621-innodb-bulk-insert	0m4.175s	1m47.870s	0m0.006s	0m13.670s	0m1.191s	0m7.718s	0m0.006s	0m0.261s	134.897s

Real time measurements for loading each table

How to try the 10.7 InnoDB Bulk Insert data type preview yourself?

There are two ways to preview this feature:

Tarball

Go to tarball downloads

Container

You can run the container called quay.io/mariadb-foundation/mariadb-devel:10.7-mdev-24621-innodb-bulk-insert as I have performed above.

Feedback Welcome

If you come across any problems in this feature preview, with the design, or especially cases where there is a performance drop, please let us know with a JIRA bug/feature request on the MDEV project. You are welcome to chat about it on Zulip.