Big Vector Search Benchmark: 10 databases comparison

I have benchmarked MariaDB Vector before, but it was a while ago. Users kept asking about Milvus. New pgvector alternatives were gaining popularity. And I simply wanted to see if MariaDB got any better. This benchmark round includes more databases, larger dataset, and no irrelevant datasets that only add noise but don’t really help today in 2026.

Dataset

Now is the AI time. Vector search is used for embeddings generated by LLMs. Most ann-benchmarks datasets are pre-AI and use, for example, image transformations and filters to construct vectors. While useful for certain purposes, they are not the main use case for the MariaDB Vector and providing these results would be misleading and distracting from what matters to users. This is why this benchmark uses only one dataset — dbpedia-openai-1000k, one million texts from DBpedia converted to 1536-dimensional embeddings by OpenAI. It is a big dataset, more than double the size of anything I used before and I was anxious to see how MariaDB will cope.

Databases

For this benchmark I have selected

  • MariaDB 12.3
  • MariaDB 11.8
  • pgvector master branch as it was on 2026-02-08
  • pgvecto.rs as of tag pg16-v0.3.0-alpha.1
  • pgvectorscale master branch as it was on 2026-02-08
  • RediSearch version 6.2.13
  • Weaviate version 1.19.0-beta.1, weaviate-client 3.16.0
  • OpenSearch version 2.6.0
  • Qdrant version 1.5.1
  • Milvus version 2.4.1 index_type=”HNSW”

Except for MariaDB, these are the versions that ann-benchmarks uses, some are very new, others unfortunately are not.

Setup

This benchmark was run by my fork of ann-benchmarks, that added MariaDB support. It turned out that the dataset was too big for ann-benchmarks in its default settings, I had to increase numerous timeouts as well as JVM options for OpenSearch (from -Xmx3G to -Xmx32G). Additionally I had to patch Milvus support in ann-benchmarks to prevent cheating, this bug was reported to ann-benchmarks.

Hardware-wise it was run on Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz. This CPU only supports AVX2, not AVX512, so on a newer CPU most, if not all, participating databases would have been faster.

Results

Without further ado, here are the results, zoomed to the practically importance recall range of 90%-98%:

Observations

In the Recall-QPS chart there are three clearly separated groups of results: MariaDB and RediSearch delivering 850-1000 QPS at 94% recall; Weaviate/pgvectorscale/pgvecto.rs delivering 470-570 QPS; and OpenSearch/Qdrant/Milvus with 90-150 QPS at 94% recall. pgvector with 250 QPS lies between the second and the third group.

MariaDB 12.3 is clearly faster than 11.8 and has the highest QPS for recall > 95%, while RediSearch is slightly faster at lower recalls.

Recall-Build chart also shows three clearly distinct groups: both MariaDB versions and Qdrant build the index in under 15 minutes, closely followed by pgvectorscale; Milvus and Weaviate manage to do it in a bit longer than an hour; and RediSearch, pgvector, pgvecto.rs, and OpenSearch needed 2.5-3 hours for the same dataset and the same recall level. All databases deliver very high recall at these build times and explicitly targeting a lower recall does not result in any notable speedup.

Conclusions

Some databases are faster at creating the index, some — at searching it. RediSearch has the fastest search but the slowest index build times. Qdrant has the slowest search but the fastest index build times. Weaviate is average at both, a reasonable trade-off. It’s understandable, nobody can excel at everything. Unless it’s MariaDB that breaks this pattern by actually excelling at everything, delivering both the fastest index search and the fastest index build times!

For PostgreSQL users pgvectorscale is now the preferred option, at least, according to this benchmark. It has quite fast index build times, much faster than that of pgvector, and reasonably fast searches. And while pgvecto.rs searches as fast as pgvectorscale, it needs 7x more time to build the index (for the dbpedia-openai-1000k dataset).