10.7 preview feature: Natural sort

Natural sort order is the ordering of strings in alphabetical order, while numbers are treated as numbers. This understanding of sorting is closer to human comprehension than to a machine. You can find an example of this feature in the Windows file manager. There the files are sorted in natural order. Try to create four folders “b1”, “a11”, “a2”, “a1”.

There are several programming languages which have natural sort. In PHP it is built-in function natsort, while as an third-party module in Python it is natsort, in Perl it is Sort::Naturally and in Matlab it is sort_nat. It is possible to get the same behavior using some workarounds such as by adding leading zeros.

In MariaDB 10.7.0, based on MDEV-4742, natural sort is available via the natural_sort_key() function.

Let’s take a look at a couple of examples. First we’ll look at the Windows file manager case from before. Here is our test data:

create table t1 (c text);
insert into t1 values ('b1'),('a2'),('a11'),('a1');
select c from t1;
+------+
| c    |
+------+
| b1   |
| a2   |
| a11  |
| a1   |
+------+

And now let’s examine the results with normal (unnatural) ordering with the order by clause

select c from t1 order by c;
+------+
| c    |
+------+
| a1   |
| a11  |
| a2   |
| b1   |
+------+

By wrapping the column with natural_sort_key() the results will look like so:

select c from t1 order by natural_sort_key(c);
+------+
| c    |
+------+
| a1   |
| a2   |
| a11  |
| b1   |
+------+

Let’s take a look at another example of sorting. This time we’ll be sorting MariaDB versions:

# Test data
create table t2 (c text);
insert into t2 values ('5.5.31'),('10.7.0'),('10.2.1'),('10.1.22'),('10.3.32'),('10.2.12');
select c from t2;
+---------+
| c       |
+---------+
| 5.5.31  |
| 10.7.0  |
| 10.2.1  |
| 10.1.22 |
| 10.3.32 |
| 10.2.12 |
+---------+

# Unnatural sort

select c from t2 order by c;
+---------+
| c       |
+---------+
| 10.1.22 |
| 10.2.1  |
| 10.2.12 |
| 10.3.32 |
| 10.7.0  |
| 5.5.31  |
+---------+

# Natural sort

select c from t2 order by natural_sort_key(c);
+---------+
| c       |
+---------+
| 5.5.31  |
| 10.1.22 |
| 10.2.1  |
| 10.2.12 |
| 10.3.32 |
| 10.7.0  |
+---------+

And finally, let’s sort some IPs:

create table t3 (c text);
insert into t3 values ('192.167.3.1'),('192.167.1.12'),('100.200.300.400'),('100.50.60.70'),('100.8.9.9'),('127.0.0.1'),('0.0.0.0');
select * from t3;
+-----------------+
| c               |
+-----------------+
| 192.167.3.1     |
| 192.167.1.12    |
| 100.200.300.400 |
| 100.50.60.70    |
| 100.8.9.9       |
| 127.0.0.1       |
| 0.0.0.0         |
+-----------------+

# Unnatural sort

select * from t3 order by c;
+-----------------+
| c               |
+-----------------+
| 0.0.0.0         |
| 100.200.300.400 |
| 100.50.60.70    |
| 100.8.9.9       |
| 127.0.0.1       |
| 192.167.1.12    |
| 192.167.3.1     |
+-----------------+

# Natural sort

select * from t3 order by natural_sort_key(c);
+-----------------+
| c               |
+-----------------+
| 0.0.0.0         |
| 100.8.9.9       |
| 100.50.60.70    |
| 100.200.300.400 |
| 127.0.0.1       |
| 192.167.1.12    |
| 192.167.3.1     |
+-----------------+

Another thing to note is that you can use natural_sort_key() with generated (virtual) columns. There is a limitation: the value can not be stored permanently in the table.

create table t(c varchar(3), k varchar(3) as (natural_sort_key(c)) invisible);
show create table t;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                            |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `c` varchar(3) DEFAULT NULL,
  `k` varchar(3) GENERATED ALWAYS AS (natural_sort_key(`c`)) VIRTUAL INVISIBLE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

# Insert data

insert into t(c) values ('b1'),('a2'),('a11'),('a1');

# No sorting

select * from t;
+------+
| c    |
+------+
| b1   |
| a2   |
| a11  |
| a1   |
+------+

# Unnatural sort

select * from t order by c;
+------+
| c    |
+------+
| a1   |
| a11  |
| a2   |
| b1   |
+------+

# Natural sort

select * from t order by k;
+------+
| c    |
+------+
| a1   |
| a2   |
| a11  |
| b1   |
+------+

If you try to force storing in generated columns via STORED, you will get to an error.

create table t2(c varchar(1), k varchar(2)  as (natural_sort_key(c)) stored);
ERROR 1901 (HY000): Function or expression 'natural_sort_key()' cannot be used in the GENERATED ALWAYS AS clause of `k`

This is a purely artificial limitation to prevent natural_sort_key() values from being persistently stored while we might expect changes in the function implementation based on your feedback. As soon as the sorting order stabilizes, the function will be enabled for use in stored and indexed generated columns.

How to try the 10.7 natural sort preview yourself?

Tarball

Download the binary: mariadb-10.7.0-mdev-4742-natural-sort-linux-x86_64.tar.gz.

Container

Use the container quay.io/mariadb-foundation/mariadb-devel:10.7-mdev-4742-natural-sort.

# Start the server container
docker run --rm -d -e MARIADB_USER=anel \
  -e MARIADB_PASSWORD=anel -e MARIADB_DATABASE=natsort \
  -e MARIADB_RANDOM_ROOT_PASSWORD=1 \
  --name mariadb-natsort \
   quay.io/mariadb-foundation/mariadb-devel:10.7-mdev-4742-natural-sort
10.7-mdev-4742-natural-sort: Pulling from mariadb-foundation/mariadb-devel
5297cdb712ed: Pull complete 
4a3bb3ae52c1: Pull complete 
110d0c5ac1d6: Pull complete 
Digest: sha256:8e2cbb009eeee5c5e76bc107e91cd4423a323a80e65d42d7aa745bb2cb697483
Status: Downloaded newer image for quay.io/mariadb-foundation/mariadb-devel:10.7-mdev-4742-natural-sort
21ee676ae4e33f7db43a90a902991d25ff825aefe3119f98903266bfd96f32e9

# Start the client container
docker exec -it mariadb-natsort mariadb -uanel -panel natsort
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 5
Server version: 10.7.0-MariaDB-MDEV-4742-natural-sort mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [natsort]> create table t1 (c text);
Query OK, 0 rows affected (0.047 sec)

MariaDB [natsort]> insert into t1 values ('b1'),('a2'),('a11'),('a1');
Query OK, 4 rows affected (0.001 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB [natsort]> select c from t1;
+------+
| c    |
+------+
| b1   |
| a2   |
| a11  |
| a1   |
+------+
MariaDB [natsort]> select * from t1 order by natural_sort_key(c);
+------+
| c    |
+------+
| a1   |
| a2   |
| a11  |
| b1   |
+------+
4 rows in set (0.000 sec)

Binder online MariaDB Jupyter Notebook

Or try the feature online in Binder.

Feedback Welcome

If you come across any problems in this feature preview, with the design, or edge cases that don’t work as expected, please let us know with a JIRA bug/feature request in the MDEV project. You are welcome to chat about it on Zulip.