MariaDB Vector — native vector search in MariaDB

MariaDB Vector is native vector similarity search built into MariaDB Server. It is generally available in MariaDB 11.8 LTS (2025), needs no extension and no separate vector database, and is available on Amazon RDS for MariaDB. You store embeddings in a VECTOR column next to your relational data and query both — vector similarity and ordinary SQL filters — in a single transactional statement.
- Native
VECTOR(N)data type andVECTOR INDEXindex type (modified HNSW); up to 16,383 dimensions; euclidean and cosine distance. - Full ACID transactions, concurrent reads and writes.
- Benchmarked ahead of pgvector, pgvectorscale, Qdrant, Milvus and Weaviate on both search speed and index build time.
Quick start
Create a table with a vector column and an index, insert embeddings, and search. Build the index for the distance function you will search with — here, cosine.
-- 1. A vector column and index (built into the server; no extension needed)
CREATE TABLE documents (
id BIGINT UNSIGNED PRIMARY KEY,
owner_id BIGINT,
content TEXT,
embedding VECTOR(1536) NOT NULL, -- match your model's output dimensions
VECTOR INDEX (embedding) M=8 DISTANCE=cosine
) ENGINE=InnoDB;
-- 2. Insert an embedding (VEC_FromText takes a JSON array of floats)
INSERT INTO documents (id, owner_id, content, embedding)
VALUES (1, 42, 'a document', VEC_FromText('[...]')); -- 1536 floats from your embedding model
-- 3. Find the 10 nearest matches, optionally filtered by ordinary SQL
SELECT id, content
FROM documents
WHERE owner_id = 42
ORDER BY VEC_DISTANCE(embedding, VEC_FromText('[...]')) -- the query's 1536-float embedding
LIMIT 10;
The optimizer uses the vector index when the query has a bare ORDER BY VEC_DISTANCE (or VEC_DISTANCE_COSINE or VEC_DISTANCE_EUCLIDEAN and the distance function matches the one the index was built with) plus a LIMIT. The WHERE clause shows the key advantage of vector search inside a relational database: similarity search and SQL filtering in a single query, with no second system to maintain.
Prefer to watch or run code? Everything you Need to Know to Start Building Apps with AI and RAG is a full walkthrough of the concepts and code; the Java RAG demo (MariaDB + OpenAI, no frameworks) and the official MariaDB Knowledge Base RAG demo (Python) are runnable starting points.
What you get in MariaDB 11.8 LTS
- A native
VECTOR(N)data type, storing 32-bit floats; up to 16,383 dimensions. - A specialized
VECTOR INDEXusing a modified HNSW algorithm. Options:DISTANCE(cosine or euclidean, the default) andM(3–100; higher is more accurate but slower and larger). One vector index per table; the indexed column must beNOT NULL. - Distance functions
VEC_DISTANCE_EUCLIDEANandVEC_DISTANCE_COSINE, plusVEC_DISTANCEthat automatically uses the correct distance. - Conversion functions
VEC_FromText(JSON array to vector) andVEC_ToText(vector to JSON array). - Full transactional support and all isolation levels; concurrent reads and writes.
- SIMD hardware acceleration on Intel (AVX2, AVX512), ARM Neon, and IBM Power10 VSX.
- Built into the server — no extension — and available on Amazon RDS for MariaDB.
Full SQL reference: MariaDB Vector documentation.
How fast is it?
In a 10-database benchmark on a realistic one-million-vector dataset (dbpedia-openai-1000k, 1536-dimensional OpenAI embeddings), MariaDB led the field on both search throughput and index build time:
- 850 – 1000 queries per second at 94% recall — ahead of pgvectorscale, pgvector, and the dedicated vector databases Qdrant, Milvus and Weaviate.
- Index built in under 15 minutes, where pgvector and several others needed 2.5–3 hours.
- Most engines trade search speed against build time. MariaDB led on both.
Full results and methodology: https://mariadb.org/big-vector-search-benchmark-10-databases-comparison/
MariaDB Vector vs pgvector, pgvectorscale, and dedicated vector databases
MariaDB Vector stores embeddings inside the same relational database as the rest of your data, so it offers things a bolt-on vector store cannot:
- One system. No separate vector database to deploy, secure, scale, and keep in sync with your primary data.
- No extension. Vector search is part of MariaDB Server, unlike pgvector, which is a PostgreSQL extension you must install and enable.
- Transactional integrity. Inserts, updates and searches are fully ACID, with all isolation levels.
- Performance. In the benchmark above, MariaDB was faster than pgvector and pgvectorscale — the option now commonly recommended for PostgreSQL users — and faster than the dedicated vector databases tested.
How MariaDB Vector works
A four-part engineering series by MariaDB Server architect Sergei Golubchik describes the implementation in depth — the modified HNSW index, its transactional InnoDB backing, and how search and inserts behave:
Using MariaDB Vector with AI frameworks and languages
MariaDB Vector works with the major AI application frameworks:
- LangChain, MariaDB Vector Store – Python
- LangChain.js, MariaDB Vector Store – Node.js
- LangChain4j, MariaDB Embedding Store – Java
- LlamaIndex, MariaDB Vector Store – Python
- Spring AI, MariaDB Vector Store – Java
- MCP (Model Context Protocol), MariaDB MCP server – Python
- VectorDBBench – vector-database benchmarking
The MariaDB MCP Server connects AI agents and assistants to MariaDB for both SQL operations and vector-based semantic search. Read Build Smarter with MariaDB MCP Server or visit github.com/mariadb/mcp.
Embeddings themselves are generated in your application or model tier (OpenAI, Llama, Claude, Gemini, or an open model such as a sentence-transformers model) and stored in MariaDB.Something missing? Suggest it to foundation@mariadb.org, or add to the list on the Vector Framework Integration documentation page.
Use cases
- Semantic search and RAG. Find documents by meaning, and ground LLM answers in your own data. Build a support assistant or internal knowledge base from your existing tables.
- Recommendations. Personalized product and content recommendations from user behavior and item embeddings, expressed in natural language rather than keyword queries.
- Similarity search. Find similar images, documents, or products without manual labeling.
- Machine learning. Store and retrieve vector representations for clustering and nearest-neighbor lookups.
Background: what is vector similarity search?
A vector embedding represents text, images, or other data as a list of numbers, produced by an AI model, such that similar items have nearby vectors. Vector search finds the items whose embeddings are closest to a query embedding, using a distance metric such as cosine or euclidean. This powers semantic search, recommendations, and retrieval-augmented generation. Embeddings from different models are not interchangeable, because each model places items in its own space; pick one embedding model and use it consistently. MariaDB Vector accelerates the nearest-neighbor lookup with an approximate-nearest-neighbor index, so it stays fast as data grows.
Documentation
- MariaDB Vector documentation — SQL reference (VECTOR data type, VECTOR INDEX, VEC_DISTANCE and conversion functions)
- Vector Framework Integration documentation
Content, talks and blogs
Getting started — tutorials and demos
- 2026 What exactly is a vector in AI and RAG? (video, 1 min)
- 2024-11 Try RAG with MariaDB on your own data — Robert Silén (blog)
- Demo: RAG with the MariaDB Knowledge Base — official Python example
- Demo: RAG with MariaDB, OpenAI and Java, no frameworks — runnable example
Performance
- 2026-03 Big Vector Search Benchmark: 10 databases comparison
- 2025-02 Vector indexes, large server, dbpedia-openai dataset: MariaDB, Qdrant and pgvector
- 2025-02 Vector indexes, MariaDB & pgvector, large server, dbpedia-openai dataset
- 2025-01 Evaluating vector indexes in MariaDB and pgvector: part 1 · part 2
- 2025-01 Vector indexes, large server, small dataset: part 1 · part 2
- 2025-01 Vector indexes, large server, large dataset: part 1
Talks
- MariaDB Vector, a new Open Source vector database you are already familiar with — Sergei Golubchik, 2025 (27 min)
- MariaDB Vector: A storage engine for LLMs — Kaj Arnö and Jonah Harris, 2025 (12 min)
- Why your AI data should be in an RDBMS — SFSCON 2024 (15 min)
- Vector Similarity Search in Relational Databases — FrOSCon 2024 (43 min)
- Making AI transparent with RAG on your own data — Robert Silén, foss-north 2025 (43 min)
- Overall introduction (12 min)
- Technical introduction (33 min)
Community projects and use cases
- 2025-09 Can you do RAG with Full Text Search in MariaDB? — the SemantiQ project (code)
- 2025-06 YouTube Semantic Search — MariaDB AI RAG Hackathon innovation-track winner (code)
- 2025-05 Model Context Protocol (MCP) — hackathon integration-track winner (since adopted by MariaDB plc)
- Metadata-Hub — multimodal semantic search, Helsinki Python Hackathon runner-up
Hackathons
- 2025-05-29 Helsinki Python Hackathon — a role model for future MariaDB hackathons
- 2025-05-28 Helsinki Python meetup with AI RAG and MariaDB Foundation
- 2025-03-10 Join our AI Hackathon with MariaDB Vector
Announcements
- 2025-06-09 MariaDB Vector is part of the 11.8 LTS release
- 2024-07-29 Finally here: MariaDB Vector Preview
- 2024-05-10 MariaDB is soon a vector database, too
Contributions
- 2024-08-30 Intel improving the performance of MariaDB Vector
- 2024-08-19 Amazon contributes to MariaDB Vector
- 2024-05-17 MariaDB Vector at Intel Vision – AI everywhere
Frequently asked questions
Does MariaDB support vector search natively?
Yes. Native vector search has been built into MariaDB Server since MariaDB 11.8 LTS (2025). No extension is required.
Is MariaDB Vector open source?
Yes. It is part of MariaDB Community Server. This is a difference from MySQL, whose vector indexing is available through the proprietary HeatWave service.
Do I need a separate vector database such as Pinecone, Qdrant, or Milvus?
No. Embeddings are stored in a VECTOR column in MariaDB and searched with SQL, alongside the rest of your data.
Is MariaDB Vector available on AWS?
Yes, on Amazon RDS for MariaDB 11.8.
Which distance metrics are supported?
Euclidean (L2) and cosine distance.
How many dimensions can a vector have?
Up to 16,383.
Can I combine vector search with normal SQL filters?
Yes. A WHERE clause and an ORDER BY VEC_DISTANCE_* … LIMIT run in the same query, in one transaction.
Does it work with LangChain, LlamaIndex, and Spring AI?
Yes. There are official integrations for LangChain (Python and JavaScript), LangChain4j, LlamaIndex, Spring AI, and a MariaDB MCP server.
How do I generate embeddings?
With an embedding model — OpenAI, Llama, Claude, Gemini, or an open model — in your application or model tier. MariaDB stores and searches the resulting vectors; it does not generate them.
Is MariaDB Vector production-ready?
Yes. It is generally available in the 11.8 long-term-support release.
Which MariaDB versions support vector search?
It was introduced as a preview in MariaDB 11.7 and is generally available in 11.8 LTS and later.
How does it compare to pgvector?
In a 10-database benchmark on one million OpenAI embeddings, MariaDB had higher query throughput at high recall and built its index far faster than pgvector, while producing an index about half the size. See the benchmark above.
What index type does MariaDB Vector use?
A modified HNSW algorithm. There is one vector index per table, built for either euclidean or cosine distance.
Why isn’t my query using the index?
The index is used only when the query has a bare ORDER BY VEC_DISTANCE_*(column, vector) with a LIMIT, and the distance function matches the one the index was built with. Wrapping the distance in an expression, or using a different distance function, causes a full table scan.