MariaDB Vector — native vector search in MariaDB

MariaDB Vector is native vector similarity search built into MariaDB Server. It is generally available in MariaDB 11.8 LTS (2025), needs no extension and no separate vector database, and is available on Amazon RDS for MariaDB. You store embeddings in a VECTOR column next to your relational data and query both — vector similarity and ordinary SQL filters — in a single transactional statement.

  • Native VECTOR(N) data type and VECTOR INDEX index type (modified HNSW); up to 16,383 dimensions; euclidean and cosine distance.
  • Full ACID transactions, concurrent reads and writes.
  • Benchmarked ahead of pgvector, pgvectorscale, Qdrant, Milvus and Weaviate on both search speed and index build time.

Quick start

Create a table with a vector column and an index, insert embeddings, and search. Build the index for the distance function you will search with — here, cosine.

-- 1. A vector column and index (built into the server; no extension needed)
CREATE TABLE documents (
    id          BIGINT UNSIGNED PRIMARY KEY,
    owner_id    BIGINT,
    content     TEXT,
    embedding   VECTOR(1536) NOT NULL,        -- match your model's output dimensions
    VECTOR INDEX (embedding) M=8 DISTANCE=cosine
) ENGINE=InnoDB;

-- 2. Insert an embedding (VEC_FromText takes a JSON array of floats)
INSERT INTO documents (id, owner_id, content, embedding)
VALUES (1, 42, 'a document', VEC_FromText('[...]'));   -- 1536 floats from your embedding model

-- 3. Find the 10 nearest matches, optionally filtered by ordinary SQL
SELECT id, content
FROM documents
WHERE owner_id = 42
ORDER BY VEC_DISTANCE(embedding, VEC_FromText('[...]'))   -- the query's 1536-float embedding
LIMIT 10;

The optimizer uses the vector index when the query has a bare ORDER BY VEC_DISTANCE (or VEC_DISTANCE_COSINE or VEC_DISTANCE_EUCLIDEAN and the distance function matches the one the index was built with) plus a LIMIT. The WHERE clause shows the key advantage of vector search inside a relational database: similarity search and SQL filtering in a single query, with no second system to maintain.

Prefer to watch or run code? Everything you Need to Know to Start Building Apps with AI and RAG is a full walkthrough of the concepts and code; the Java RAG demo (MariaDB + OpenAI, no frameworks) and the official MariaDB Knowledge Base RAG demo (Python) are runnable starting points.

What you get in MariaDB 11.8 LTS

  • A native VECTOR(N) data type, storing 32-bit floats; up to 16,383 dimensions.
  • A specialized VECTOR INDEX using a modified HNSW algorithm. Options: DISTANCE (cosine or euclidean, the default) and M (3–100; higher is more accurate but slower and larger). One vector index per table; the indexed column must be NOT NULL.
  • Distance functions VEC_DISTANCE_EUCLIDEAN and VEC_DISTANCE_COSINE, plus VEC_DISTANCE that automatically uses the correct distance.
  • Conversion functions VEC_FromText (JSON array to vector) and VEC_ToText (vector to JSON array).
  • Full transactional support and all isolation levels; concurrent reads and writes.
  • SIMD hardware acceleration on Intel (AVX2, AVX512), ARM Neon, and IBM Power10 VSX.
  • Built into the server — no extension — and available on Amazon RDS for MariaDB.

Full SQL reference: MariaDB Vector documentation.

How fast is it?

In a 10-database benchmark on a realistic one-million-vector dataset (dbpedia-openai-1000k, 1536-dimensional OpenAI embeddings), MariaDB led the field on both search throughput and index build time:

  • 850 – 1000 queries per second at 94% recall — ahead of pgvectorscale, pgvector, and the dedicated vector databases Qdrant, Milvus and Weaviate.
  • Index built in under 15 minutes, where pgvector and several others needed 2.5–3 hours.
  • Most engines trade search speed against build time. MariaDB led on both.
Recall vs. queries-per-second (up and to the right is better)
Recall vs. queries-per-second (up and to the right is better)
Recall vs. index build time
Recall vs. index build time

Full results and methodology: https://mariadb.org/big-vector-search-benchmark-10-databases-comparison/

MariaDB Vector vs pgvector, pgvectorscale, and dedicated vector databases

MariaDB Vector stores embeddings inside the same relational database as the rest of your data, so it offers things a bolt-on vector store cannot:

  • One system. No separate vector database to deploy, secure, scale, and keep in sync with your primary data.
  • No extension. Vector search is part of MariaDB Server, unlike pgvector, which is a PostgreSQL extension you must install and enable.
  • Transactional integrity. Inserts, updates and searches are fully ACID, with all isolation levels.
  • Performance. In the benchmark above, MariaDB was faster than pgvector and pgvectorscale — the option now commonly recommended for PostgreSQL users — and faster than the dedicated vector databases tested.

How MariaDB Vector works

A four-part engineering series by MariaDB Server architect Sergei Golubchik describes the implementation in depth — the modified HNSW index, its transactional InnoDB backing, and how search and inserts behave:

Using MariaDB Vector with AI frameworks and languages

MariaDB Vector works with the major AI application frameworks:

The MariaDB MCP Server connects AI agents and assistants to MariaDB for both SQL operations and vector-based semantic search. Read Build Smarter with MariaDB MCP Server or visit github.com/mariadb/mcp.

Embeddings themselves are generated in your application or model tier (OpenAI, Llama, Claude, Gemini, or an open model such as a sentence-transformers model) and stored in MariaDB.Something missing? Suggest it to foundation@mariadb.org, or add to the list on the Vector Framework Integration documentation page.

Use cases

  • Semantic search and RAG. Find documents by meaning, and ground LLM answers in your own data. Build a support assistant or internal knowledge base from your existing tables.
  • Recommendations. Personalized product and content recommendations from user behavior and item embeddings, expressed in natural language rather than keyword queries.
  • Similarity search. Find similar images, documents, or products without manual labeling.
  • Machine learning. Store and retrieve vector representations for clustering and nearest-neighbor lookups.

Background: what is vector similarity search?

A vector embedding represents text, images, or other data as a list of numbers, produced by an AI model, such that similar items have nearby vectors. Vector search finds the items whose embeddings are closest to a query embedding, using a distance metric such as cosine or euclidean. This powers semantic search, recommendations, and retrieval-augmented generation. Embeddings from different models are not interchangeable, because each model places items in its own space; pick one embedding model and use it consistently. MariaDB Vector accelerates the nearest-neighbor lookup with an approximate-nearest-neighbor index, so it stays fast as data grows.

Documentation

Content, talks and blogs

Getting started — tutorials and demos

Everything you Need to Know to Start Building Apps with AI and RAG — Alejandro Duarte, MariaDB plc (video, ~40 min)

Performance

Talks

Get to know MariaDB’s Rocket-Fast Native Vector Search — Sergei Golubchik, FOSDEM 2025 (37 min)
AI-first applications with MariaDB Vector — Vicențiu Ciorbaru, 2025 (22 min)

Community projects and use cases

Hackathons

Announcements

Contributions

Frequently asked questions

Does MariaDB support vector search natively?
Is MariaDB Vector open source?
Do I need a separate vector database such as Pinecone, Qdrant, or Milvus?
Is MariaDB Vector available on AWS?
Which distance metrics are supported?
How many dimensions can a vector have?
Can I combine vector search with normal SQL filters?
Does it work with LangChain, LlamaIndex, and Spring AI?
How do I generate embeddings?
Is MariaDB Vector production-ready?
Which MariaDB versions support vector search?
How does it compare to pgvector?
What index type does MariaDB Vector use?
Why isn’t my query using the index?