Tag Archives: vector search
This is the last post in the “MariaDB Vector: How it works” series. The first three were about storage, in-memory representation, HNSW modifications. Everything that was done in MariaDB 11.8. This post talks about new feature in MariaDB 12.3: optimized distance calculation.
As I mentioned earlier, distance calculation is the most time consuming part of the vector search, taking 80–90% of the total search time. Also it is linear on the number of dimension — computing the distance between vectors of 1536 dimensions takes twice as long compared to vectors of 768 dimensions.
…
In the previous parts of this series we’ve seen how MariaDB stores vector indexes in a table and how to implement HNSW for a good performance. But MariaDB is not implementing HNSW, it calls its vector search algorithm mHNWS, a modified HNSW. Let’s see how exactly it was modified.
Not so greedy!
HWNS, like many, if not most, graph based vector search algorithms is greedy. Think of it this way, when it needs to find just one nearest vector (ef=1), it will walk the graph always choosing the node that will take it the closest to the target at this particular step.
…
In the first post of this series, I’ve described how the vector index is stored in a table and how it achieves full transactional behavior and ACID properties compatible with the storage engine of the table the user created. But while the table provides persistent storage of the index, it’s in-memory part that gives it the performance. This is how it works.
Distance calculations
This is the most performance sensitive part of the HNSW. According to various estimates, distance calculations account for 80–90% of search time. And this operation time grows linearly with the vector length.
…
You might have seen that MariaDB Vector is fast. And is getting faster. But why? How does it achieve that? And why it is said to use mHNSW (modified HNSW) algorithm? What did it modify in the conventional HNSW that all other databases are using? Let’s take it apart and analyze piece by piece.
Introduction into HNSW
This post is not a full description of HNSW, there are many HNSW descriptions online and they are good, better than what I could’ve written. I will only show the basic concepts beyond HNSW, concepts that are crucial for the rest of the post.
…
I have benchmarked MariaDB Vector before, but it was a while ago. Users kept asking about Milvus. New pgvector alternatives were gaining popularity. And I simply wanted to see if MariaDB got any better. This benchmark round includes more databases, larger dataset, and no irrelevant datasets that only add noise but don’t really help today in 2026.
Dataset
Now is the AI time. Vector search is used for embeddings generated by LLMs. Most ann-benchmarks datasets are pre-AI and use, for example, image transformations and filters to construct vectors. While useful for certain purposes, they are not the main use case for the MariaDB Vector and providing these results would be misleading and distracting from what matters to users.
…
Continue reading “Big Vector Search Benchmark: 10 databases comparison”
Mirror, mirror on the wall — what do you measure when you measure us all?
Is it skill, or is it voice?
Is it code, or is it conversation?
DB-Engines is not a looking glass of perfection, but a mirror of perception — reflecting the chorus of those who search, speak, teach, compare, and build. And like every enchanted mirror, it shows not only what is, but what the world believes it sees.
MariaDB today stands firmly among the world’s top relational databases.
Not by inheritance, and not by illusion, but by the millions who use it, trust it, and shape it.
…
Continue reading “Mirror, Mirror on DB-Engines: The MariaDB Story”