MariaDB.org-Planet-Feed

MariaDB Enterprise Server Q2 2026 Maintenance Releases

Fri, 26 Jun 2026 22:25:04 +0000

New maintenance releases for MariaDB Enterprise Server: 11.8.8-5, 11.4.12-9, and 10.6.27-23 are now available. Download Now MariaDB Enterprise Server is an enhanced, hardened and secured version of MariaDB Community Server that delivers enterprise reliability, stability and long-term support as well as greater operational efficiency when it comes…

Source

The post MariaDB Enterprise Server Q2 2026 Maintenance Releases appeared first on MariaDB.org.

MariaDB Privacy-First Stack: Nextcloud, Passbolt and MariaDB Server

Fri, 26 Jun 2026 12:37:50 +0000

I hear this sentence a lot: “We care about privacy.”
Good.
But then you look a bit closer.
Files are on some cloud platform. Nobody is completely sure which settings were changed two years ago. …

Continue reading “MariaDB Privacy-First Stack: Nextcloud, Passbolt and MariaDB Server”

The post MariaDB Privacy-First Stack: Nextcloud, Passbolt and MariaDB Server appeared first on MariaDB.org.

Why PostgreSQL needs an AI usage policy

Fri, 26 Jun 2026 10:00:00 +0000

We often hear that open source is about people.

People who contribute their time and, in a way, parts of their lives to work on software that is available for everyone without limitations and without licensing costs.

The more popular a project becomes, the more often we also hear about the need for sustainable open source. Nothing surprising here. Often projects start off as “scratching ones itch” and it’s very appreciated when others notice the work done. The more time passes and the more the work becomes appreciated, the higher the chances that there will be a need to spend more time on the project.

When projects graduate from a hobby project to software used by thousands of users, or even a foundational building block in production, things get interesting.

At that point, we may hope to see new contributors joining the project. This would normally be a good thing. But is it still the same in the AI hype era, where anyone can generate almost any content and claim it as their own?

AI was supposed to be a killer of open source. After all, a lot of publicly available code from open source communities was part of what AI systems trained on. The fear was that as it would become so easy to create our own software, there would not be as much need for the existing open source projects. While this was the hype speaking, we can notice another trend. It became much easier to propose patches, detect and report security threats, or submit code reviews. Even without any developer experience or coding capabilities.

How sustainable is that for the human maintainers?

It is easy to imagine that there is a very fine line between positively helpful and overwhelming. As with anything unwanted, AI-generated code or text can be harmful to many open source projects. Especially those with a single maintainer treating their project as a spare time hobby and suddenly experiencing a waterfall of AI-slop.

Playstation 3 emulator project recently RPCS3 posted a plea to the vibe coders to stop the AI-generated abuse already and they are not alone in this problem.

Daniel Stenberg from the curl project captured this well in his FOSDEM 2026 talk summarizing that: “AI gives us the worst and the best, simultaneously.”

In the same talk, he discussed how curl had to stop its bug bounty program. Curl has also posted on the rules of AI use. Even that was not enough, which led to the “curl summer of bliss”, where they will:

not accept or otherwise handle any vulnerability reports during the month of July 2026.

Security reports are an especially sensitive case. An AI-generated vulnerability report is not harmless. Someone has to read it, reproduce it, evaluate it and decide whether it is real. Even when the issue is not there, the work is still very real. Like it or not but when it’s unfounded work that proves a ai-generated false it is abusive.

Knowing that some projects adopt AI-focused policies, I searched for examples of such policies, using AI obviously 🙂, and stumbled upon a very useful (open source!) list that already gathers this kind of information.

Further analysis of the resources linked in the list, as of June 2026, shows that most policies allow assisted use, but not “AI as the contributor.”

Commonly allowed uses include:

drafting code,
generating tests,
improving docs,
debugging,
summarizing, or asking an LLM for help.

All of that is usually acceptable as long as the human reviews and owns the result.

Typically banned practices include:

fully AI-generated PRs with little human engagement
AI-generated “good first issue” work
AI as co-author
automated AI code reviews
unreviewed agentic output

It is completely understandable that experienced developers and communities say “no” to submissions of low quality. That would not be sustainable. Maintainers already carry a lot of invisible work, and AI can easily multiply that work if contributors treat it as a shortcut instead of a tool.

What is very positive for the future of AI-enhanced work is that the general direction seems to be acceptance, as long as there is a human-in-the-loop.

AI-enhanced work, as long as a human was involved, is possible across a range of open source products: Apache Airflow, Apache DataFusion, Arrow, CloudNativePG (CNPG), CPython, Django, Firefox, Flutter, Ghostty, Gitea, Homebrew, Kubernetes, Linux Kernel, LLVM, Matplotlib, NumPy, Pandas, PyTorch, SciPy, SymPy, Wagtail, Zulip, and others.

What is interesting is that, at this moment, PostgreSQL does not have any official policy of this sort available.

Slonik says “I haven’t noticed…”

While this may be a problem that does not directly touch PostgreSQL as a database server, it already has an impact on the PostgreSQL ecosystem, which consists of many other extensions and tools.

The reason may be quite trivial. Even with AI, the entry threshold for PostgreSQL core hacking is still higher than for many other tools. Hackers communicate through mailing lists, and even with the adoption of modern tools like Hackorum.dev, it is still not that easy to work with PostgreSQL compared with many other, more tempting projects.

The issue, as I often see it for PostgreSQL, is that there is not much leadership for the wider ecosystem from the core project. Availability of responsible AI usage policies for the ecosystem could make maintainers’ lives easier. And let’s be honest, for many smaller projects, creating such policies from scratch is a burden they could be spared.

Seems like any help would be appreciated.

What now? Is this over? Was this a rant?

I like to say, and repeat myself, that “AI usage in open source is all about respect.” To me, this is enough to say all that is needed. People need to communicate. This was meant as a start of the discussion.

PGConf.EU is coming in October, as well as many smaller meetups this year. There will be lots of space for hallway track discussions and hopefully some outcomes. Not to mention async communication channels. What I hope is that we can leverage all these channels to propose some solutions, experiment, and get better.

Let this be a call to action to help us all be more reasonable and more respectful of other people’s time.

With this in mind,

check out my original text before I refined it with AI if you want to see how it changed.

Passbolt renews its support for MariaDB Foundation

Thu, 25 Jun 2026 06:23:50 +0000

MariaDB Foundation is pleased to announce that Passbolt has renewed its Silver sponsorship for another year, continuing its long-term support for the MariaDB open-source ecosystem. …

Continue reading “Passbolt renews its support for MariaDB Foundation”

The post Passbolt renews its support for MariaDB Foundation appeared first on MariaDB.org.

Aqtra Joins MariaDB Foundation as a Gold Sponsor

Wed, 24 Jun 2026 08:31:00 +0000

MariaDB Foundation is pleased to welcome Aqtra Platform as a new Gold Sponsor.
Aqtra is a Development Infrastructure Layer (DIL) platform for building ERP solutions, business applications, internal and external portals, and workflows that connect multiple systems. …

Continue reading “Aqtra Joins MariaDB Foundation as a Gold Sponsor”

The post Aqtra Joins MariaDB Foundation as a Gold Sponsor appeared first on MariaDB.org.

MariaDB 13.1 Preview: This One Is Full of Community Goodies!

Tue, 23 Jun 2026 05:12:41 +0000

We just announced the availability of a preview of the MariaDB 13.1 series.
MariaDB 13.1 is a rolling release preview, and, as usual, this is the right moment to test what is coming, give feedback, and help us polish the next MariaDB Server release. …

Continue reading “MariaDB 13.1 Preview: This One Is Full of Community Goodies!”

The post MariaDB 13.1 Preview: This One Is Full of Community Goodies! appeared first on MariaDB.org.

MariaDB 13.1 preview available

Sat, 20 Jun 2026 20:24:51 +0000

We are pleased to announce the availability of a preview of the MariaDB 13.1 series. MariaDB 13.1 will be a rolling release. …

Continue reading “MariaDB 13.1 preview available”

The post MariaDB 13.1 preview available appeared first on MariaDB.org.

Simple tool to build MariaDB commits for performance-change analysis

Thu, 18 Jun 2026 21:10:43 +0000

Tracking down changes in database performance is one of the hardest parts of engineering, especially when the change is buried somewhere in a long commit history. …

Continue reading “Simple tool to build MariaDB commits for performance-change analysis”

The post Simple tool to build MariaDB commits for performance-change analysis appeared first on MariaDB.org.

MariaDB Vector in Laravel: insights on choosing an embedding model

Thu, 18 Jun 2026 06:18:15 +0000

laravel-mariadb-vector is an open-source project by Erik Ros, bringing MariaDB’s native vector search to Laravel’s Eloquent ORM. In his guest post, Erik shares how it works, and his insights about picking an embedding model. …

Continue reading “MariaDB Vector in Laravel: insights on choosing an embedding model”

The post MariaDB Vector in Laravel: insights on choosing an embedding model appeared first on MariaDB.org.

Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB

Wed, 17 Jun 2026 14:24:02 +0000

TL;DR: This advisory covers the two most important high-severity memory-safety vulnerabilities affecting MongoDB Community and our downstream Percona Server for MongoDB – CVE-2026-11933 and CVE-2026-9740. Both will be addressed in a single coordinated patch release, bundled with other recently revealed lower-scored CVE fixes: CVE-2026-9753, CVE-2026-9752, CVE-2026-9751, CVE-2026-9750, CVE-2026-9749, CVE-2026-9748, CVE-2026-9747, CVE-2026-9746, CVE-2026-9743, and CVE-2026-9741.

Fixes land in Percona Server for MongoDB patch window starting next week. The first high-vulnerability issue has nothing between it and your mongod process except your firewall. The second has a configuration off-switch you can flip during a maintenance window. Read on to understand why, how, and what.

CVE-2026-9740 — the one that does not need credentials

A stack overflow in the BSON validator, specifically in the BSONColumn interleaved-reference handling. The validator’s depth tracking resets on mutual recursion between validation functions, so a sufficiently nested input exhausts the thread’s stack before any explicit limit fires. The result: mongod crashes.

CVSS 8.7. High severity. The reason it lands in High instead of merely Medium is the prerequisite for exploitation – there is none.

The attacker needs network reachability to a mongod listener. No credentials, no prior session, and no application interaction. One crafted message over the wire and the process is down. Repeated crashes are trivially repeatable, so an attacker who can reach the port can keep the instance offline for as long as they keep that reachability. The urgency of this issue comes from the audience – everyone with a TCP route to your database.

Upstream tracking: SERVER-125063. Affected versions are Percona Server for MongoDB 8.0 ≤ 8.0.23-10 and PSMDB 7.0 ≤ 7.0.34-19. The vulnerable BSONColumn code path was introduced in 7.0, so 6.0 and earlier are not in scope for this one.

CVE-2026-11933 — the one that does need credentials and permissions to read

The vulnerable code path is inside MongoDB Server’s server-side JavaScript engine, specifically in the BSON-to-array conversion routine. When a BSON document is materialized as a JavaScript array for use inside a server-side script, the engine can reach a state where it accesses memory that has already been freed. An attacker who can submit input that flows into that conversion path can shape what happens at the point of access.

Server-side JavaScript is reachable from the following surfaces:

The $where query operator (deprecated in 8.0).
The $function aggregation expression (deprecated in 8.0).
The $accumulator aggregation expression (deprecated in 8.0).
The mapReduce command (deprecated since 5.0).
JavaScript functions stored in system.js.

MongoDB logs a warning when you run deprecated functions.

Prerequisites for exploitation:

The attacker must be authenticated to MongoDB.
The attacker must hold any role that permits running queries or aggregations against a collection. The built-in read role on a single database is sufficient.
Server-side JavaScript must be enabled on the mongod instance. This is the default; many production deployments leave it enabled even when they do not use it.

CVSS 8.8. High severity. Two demonstrated outcomes:

Information disclosure (reading other content out of the mongod process memory) and
Denial of Service (crashing it).

Upstream tracking: SERVER-128125. Affected versions: every supported and End of Life Percona Server for MongoDB major from 4.4 through 8.0.

The good news and bad news

CVE-2026-11933 has a configuration off-switch. If your application does not use server-side JavaScript — $where, $function, $accumulator, mapReduce, or stored system.js functions — you can disable server-side JavaScript on the server, removing the attack surface entirely until you patch.

How to check whether your applications use server-side JavaScript before disabling:

Enable MongoDB profiling at level 2 (all operations) on a representative mongod server for a representative time window. See details in Manage the database profiler.
Search the system.profile collection for operations that include $where, $function, $accumulator, or mapReduce.
Inspect application code paths and stored aggregation pipelines for the same operators. Check system.js in each database for stored functions.
If any usage exists, treat disabling as not viable for those deployments and rely on patching plus the defense-in-depth controls below.

How to disable server-side JavaScript:

Add to your configuration file for mongodand mongos:

security: 
  javascriptEnabled: false

Or pass --noscripting on the command line. See the reference documentation for details about MongoDB Setting: security.javascriptEnabled.

After a restart, any operation that reaches for server-side JavaScript will return an error. That is the catch: if your application does use one of those operators, this is not a viable mitigation for you, and you have to wait for the patch. If you are not sure whether your application uses them, turn on the database profiler at level 2 on a representative replica for a window long enough to be representative, then grep the profile collection for the operator names. Several teams have done this exercise in the last forty-eight hours and learned the answer is “no, we don’t actually use any of that.” The cost of disabling is then the cost of a mongod or mongos restart.

That was good news. Now the bad news: CVE-2026-9740 has no equivalent off-switch. The BSON validator is core to every client message; it cannot be disabled. Patch and network controls are the only options.

What is shipping, and when

The fixes for both CVEs will land in a single coordinated patch release for each supported major:

Percona Server for MongoDB 7.0 series — fix targeted for June 23, 2026.
Percona Server for MongoDB 8.0 series — fix targeted for June 25, 2026.
Percona Server for MongoDB 6.0 series — fix targeted for June 25, 2026 (for CVE-2026-11933).

All dates are targets, not commitments. Plan one upgrade window covering all CVEs.

Percona is not building binary packages for the 5.x line. We’re being upfront about that — the calculus on extended support has a limit, and 5.x is past it for us. If you have a hard requirement on 5.x and the time pressure to meet it, the source is available for building. Percona customers on 5.x can open a ticket, and we’ll work on the case individually.

As usual, you can download patches from your package manager or Percona Software Downloads page.

On Kubernetes via the Percona Operator for MongoDB: same drill as usual. When the patched image is published, edit the image tag in your PerconaServerMongoDB custom resource and let the operator roll the cluster. Don’t wait for the June operator release to do it for you. See details in our documentation on how to Upgrade Percona Server for MongoDB. You do not need to wait for an operator release to apply a security fix.

What to do this week

In order of urgency, for most deployments:

Confirm your mongod or mongos listeners are not reachable from any source you would not trust with a shell on the host. If you find an exposure, fix that first. CVE-2026-9740 turns any such exposure into a DoS primitive.
For deployments that do not use server-side JavaScript, disable it. Full mitigation for CVE-2026-11933 within a single mongod restart.
Plan your upgrade window for the week the relevant fixed release lands. One window. Both CVEs. Plus, the others scored lower.
Audit which roles in your deployment can run ad-hoc queries or aggregations. The bar for CVE-2026-11933 is the standard read role, so the population of potential attackers is larger than for most memory-safety defects.

One closing point, because it has come up several times in customer conversations this week. For a deployment behind tight network controls, the post-authenticated bug is the more urgent one. For a deployment reachable from broader networks — public cloud, shared internal LANs, multi-tenant infrastructure — the pre-authenticated bug is. Triage by your exposure, not by their CVSS.

Questions, or a deployment you’re not sure how to triage? Find us on the Percona Forum, or, for customers, in the support portal.

Reviewed by Ivan Groenewold. Vetted for technical accuracy as of June 17, 2026.

The post Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB appeared first on Percona.

The post Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB appeared first on MariaDB.org.

MariaDB R2DBC Connector 1.4.1 now available

Tue, 16 Jun 2026 22:24:39 +0000

MariaDB is pleased to announce the immediate availability of the MariaDB Connector/R2DBC 1.4.1 GA release. Download Now MariaDB Connector/R2DBC 1.4.1 is a Stable (GA) release. Notable items in this release include: See the Connector/R2DBC 1.4.1 release notes page for details and visit mariadb.com/downloads/connectors/connectors-data-access/r2dbc-connector/

Source

The post MariaDB R2DBC Connector 1.4.1 now available appeared first on MariaDB.org.

High Performance Real-Time Analytics on MariaDB Cloud: MariaDB Exa Technical Preview

Tue, 16 Jun 2026 14:59:25 +0000

We are excited to announce the technical preview of MariaDB Exa on MariaDB Cloud. This release brings high-performance Hybrid Transactional and Analytical Processing (HTAP) directly into the MariaDB environment by integrating Exasol’s massively parallel processing (MPP) engine. By removing the requirement for complex ETL pipelines, MariaDB Exa enables analytics on live transactional data at up to…

Source

The post High Performance Real-Time Analytics on MariaDB Cloud: MariaDB Exa Technical Preview appeared first on MariaDB.org.

MariaDB Server 10.6 Reaches End of Life on July 6th

Tue, 16 Jun 2026 12:26:27 +0000

MariaDB Server 10.6 has been with us for a long time. It was the first MariaDB LTS release under the current release model, and it has served many users, distributions, applications, and production environments very well. …

Continue reading “MariaDB Server 10.6 Reaches End of Life on July 6th”

The post MariaDB Server 10.6 Reaches End of Life on July 6th appeared first on MariaDB.org.

Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement

Tue, 16 Jun 2026 11:31:51 +0000

Managing data retention policies is one of the most common operational tasks in MySQL.

Applications continuously generate transactional, audit, logging, telemetry, and event data. Over time, these tables can grow to billions of rows, causing:

Larger backups
Longer recovery times
Reduced buffer pool efficiency
Slower index maintenance
Increased storage costs
Degraded query performance

To address these problems, organizations typically implement retention policies based on dates or timestamps. Examples include deleting events older than 90 days or purging session data older than 30 days and so forth. The deleted data can then eventually be archived somewhere else, like in another DBMS or on external files.

One of the most widely used tools for implementing these policies in MySQL ecosystems is pt-archiver, part of the Percona Toolkit.

This article provides a review of what pt-archiver is and how to use it, but in particular it focuses on the fact this tool is not partitioning aware, and this can make the deletion phase more costly. The article shows how to extend pt-archiver with a Perl plugin to make it aware of partitioning.

What is pt-archiver?

pt-archiver is a command-line utility from Percona Toolkit designed to:

Archive rows from MySQL tables
Purge rows from MySQL tables
Move data between tables into the local database or a remote one
Export rows into files

In a few words: implementing retention policies safely.

The tool processes rows incrementally in chunks, avoiding massive transactions and reducing impact on production systems.

Example:

pt-archiver 
  --source h=localhost,D=mydb,t=events 
  --where "created_at < '2026-05-01'" 
  --purge 
  --limit 1000 
  --commit-each

This command:

Scans rows matching the WHERE condition
Processes them in chunks of 1000 rows
Commits every chunk
Deletes matching rows from the source table

pt-archiver provides several advantages compared to ad-hoc DELETE statements.

Instead of running:

DELETE FROM events
WHERE created_at < '2026-05-01';

which may:

Lock rows for a long time
Generate massive undo/redo logs
Create replication lag
Exhaust transaction logs

pt-archiver processes rows incrementally to make the process overhead less impactful for the database performance.

pt-archiver implementation permits flexible archival strategies

Rows can be copied to another table on a remote host, exported to files or removed completely

More details: ps://docs.percona.com/percona-toolkit/pt-archiver.html

Example: Copy rows to a remote archive table

The following example archives rows older than 90 days from a local table into an archive table hosted on a remote MySQL server:

pt-archiver 
  --source h=localhost,D=sales,t=orders,u=archiver,p=secret 
  --dest h=archive-server,D=archive,t=orders_archive,u=archiver,p=secret 
  --where "created_at < '2026-05-01'" 
  --limit 1000 
  --commit-each 
  --progress 10000 
  --statistics

In this example:

–source defines the source table
–dest defines the remote archive destination
–where selects rows eligible for archival
–limit controls batch size
–commit-each commits every batch independently to reduce transaction overhead

–-progress reports progress every 10,000 rows

If rows should be removed from the source table after being copied, add –purge

Example: Export rows to a file

The following example exports rows older than one year into a text file:

pt-archiver 
  --source h=localhost,D=sales,t=orders,u=archiver,p=secret 
  --where "created_at < NOW() - INTERVAL 1 YEAR" 
  --file '/tmp/orders_archive_%Y-%m-%d.txt' 
  --output-format csv 
  --limit 1000 
  --commit-each 
  --progress 10000 
  --statistics

In this example:

–file specifies the output file
–-output-format csv exports rows in CSV format
Date placeholders in the filename are expanded automatically

Rows can optionally be deleted from the source table by adding –purge

This allows pt-archiver to be used both for data retention and for offline archival workflows.

The Hidden Cost of DELETE Statements

Although pt-archiver is much safer than massive DELETE operations, it still fundamentally relies on DELETE statements.

This is a critical point.

Even when there are proper indexes, the rows are processed in chunks, and transactions are small; the large-scale DELETE operations remain expensive.

Deleting rows is expensive in InnoDB because it involves:

Locating rows via indexes
Modifying clustered indexes
Modifying secondary indexes
Generating undo logs
Generating redo logs
Purge thread processing
Replication event generation
Page fragmentation

When deleting billions of rows, the overhead becomes enormous.

Indexes help for sure, but only partially.

Consider:

DELETE FROM events
WHERE created_at < '2024-01-01';

If created_at is indexed, MySQL can efficiently locate rows.

However, locating rows efficiently is only part of the cost. The actual delete operations still require all those things we mentioned above.

At considerable scale, this becomes expensive.

Why RANGE Partitioning is Superior for Retention Policies

For time-based retention policies, partitioning is often dramatically more efficient. In particular, RANGE partitioning is very useful for these cases.

Example:

CREATE TABLE events (
    id BIGINT NOT NULL,
    created_at DATETIME NOT NULL,
    payload JSON,
    PRIMARY KEY(id, created_at)
)

PARTITION BY RANGE (TO_DAYS(created_at)) (
    PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
    PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
    PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01'))
);

With partitioning, dropping old data becomes:

ALTER TABLE events DROP PARTITION p202604;

This operation is dramatically faster than running a DELETE.

Dropping a partition:

Removes an entire physical partition
Avoids row-by-row DELETE
Avoids undo generation for each row
Avoids secondary index maintenance per row
Minimizes redo generation
Is nearly metadata-only

This can remove millions or billions of rows in a matter of seconds without the same large cost of DELETE.

The Problem: pt-archiver is Not Partition-Aware

Unfortunately, pt-archiver does not automatically understand partitioning strategies.

Even if the table is partitioned or the retention policy perfectly matches partition boundaries, pt-archiver still executes DELETE statements.

Example:

pt-archiver 
  --where "created_at < NOW() - INTERVAL 90 DAY" 
  --purge

Internally, this still produces DELETE … instead of ALTER TABLE … DROP PARTITION …

This means organizations may lose the major operational benefits of partitioning, or they need to implement custom scripts for managing the selection of rows to copy using pt-archiver and then use DROP PARTITION separately from the tool. That is doable, and to be honest, not too complicated, but why not make pt-archiver aware of partitioning for some specific use cases?

Extending pt-archiver with Pulg-ins

Fortunately, pt-archiver supports Perl plug-ins.

A plug-in can do plenty of things. Like: inspect runtime conditions, interact with MySQL, override behaviors, and execute custom logic

This gives us an opportunity to implement partition-aware retention handling.

The plug-in can:

Inspect partition definitions
Analyze the WHERE condition
Determine which partitions are fully expired
Execute ALTER TABLE DROP PARTITION
Prevent row-by-row DELETE processing

This approach combines the scheduling/orchestration power of pt-archiver with the efficiency of partition pruning.

Plug-in Design

Our plug-in will:

Connect using the pt-archiver DB handle
Inspect INFORMATION_SCHEMA.PARTITIONS
Identify partitions older than the retention cutoff
Issue DROP PARTITION statements
Log actions
Skip DELETE processing

Assumptions:

The table is RANGE partitioned
Partitions are DATETIME based using the TO_DAYS() function to define ranges
Partition naming convention contains dates
Retention policy aligns with partition boundaries; if the plugin cannot determine a specific boundary, pt-archiver does nothing

Full Perl Plug-in for pt-archiver

package pt_archiver_partition_drop;

use strict;
use warnings;

sub new {
    my ($class, %args) = @_;
    my $self = {
        dbh        => $args{dbh},
        db         => $args{db},
        tbl        => $args{tbl},
        statistics => {},
    };

    bless $self, $class;
    return $self;
}

sub statistics {
    my ($self) = @_;
    return $self->{statistics};
}


sub before_begin {
    my ($self) = @_;
    my $dbh = $self->{dbh} or die "Missing dbh from pt-archivern";
    my $db  = $self->{db}  or die "Missing db from pt-archiver plugin argsn";
    my $tbl = $self->{tbl} or die "Missing tbl from pt-archiver plugin argsn";
    my $where  = _get_cmdline_option('where');
    my $dryrun = $ENV{PT_PARTITION_DROP_DRY_RUN} ? 1 : 0;

    die "Missing --where from original command linen" unless $where;

    print "PLUGIN before_begin calledn";
    print "DB=$db TABLE=$tbln";
    print "WHERE=$wheren";
    print "PLUGIN_DRY_RUN=$dryrunn";

    my ($column, $cutoff_date) = _parse_where($where);

    my $partitions = _get_partitions($dbh, $db, $tbl);

    if (!@$partitions) {
        print "Table `$db`.`$tbl` is not partitioned. Refusing DELETE.n";
        exit(0);
    }

    my $partition_expr = $partitions->[0]->{expression};
    die "Missing PARTITION_EXPRESSIONn"
        unless defined $partition_expr && length $partition_expr;

    print "Partition expression: $partition_exprn";

    my $cutoff_value = _evaluate_cutoff(
        $dbh,
        $partition_expr,
        $column,
        $cutoff_date,
    );

    print "Cutoff date: $cutoff_daten";
    print "Cutoff boundary value: $cutoff_valuen";

    my $matched;

    for my $p (@$partitions) {
        next if !defined $p->{description};
        next if uc($p->{description}) eq 'MAXVALUE';

        if ($p->{description} == $cutoff_value) {
            $matched = $p;
            last;
        }
    }


    if (!$matched) {
        print "No exact partition boundary matches cutoff $cutoff_value. Refusing DELETE.n";
        exit(0);
    }

    print "Matched boundary partition: $matched->{name}, position $matched->{position}n";

    my @drop;

    for my $p (@$partitions) {
        next if !defined $p->{description};
        next if uc($p->{description}) eq 'MAXVALUE';

        if ($p->{position} <= $matched->{position}) {
            push @drop, $p->{name};
            print "Eligible for DROP: $p->{name}, boundary $p->{description}n";
        }
    }

    if (!@drop) {
        print "No partitions eligible for DROP. Refusing DELETE.n";
        exit(0);
    }

    my $sql = sprintf(
        "ALTER TABLE %s.%s DROP PARTITION %s",
        _quote_ident($db),
        _quote_ident($tbl),
        join(", ", map { _quote_ident($_) } @drop),
    );

    print "SQL: $sqln";

    if ($dryrun) {
        print "PT_PARTITION_DROP_DRY_RUN enabled. Not executing DROP PARTITION.n";
    }
    else {
        $dbh->do($sql);
        print "Dropped partitions: " . join(", ", @drop) . "n";
    }

    $self->{statistics}->{partitions_dropped} = scalar @drop;

    exit(0);
}


sub _parse_where {
    my ($where) = @_;

    $where =~ s/^s+|s+$//g;

    die "Only WHERE format supported: created_at < 'YYYY-MM-DD'n"
        unless $where =~ /^`?([A-Za-z0-9_]+)`?s*<s*'(d{4}-d{2}-d{2})'s*$/;

    return ($1, $2);
}

sub _evaluate_cutoff {
    my ($dbh, $partition_expr, $column, $cutoff_date) = @_;

    my $expr = $partition_expr;
    $expr =~ s/`//g;

    die "Partition expression does not reference column `$column`: $partition_exprn"
        unless $expr =~ /bQ$columnEb/i;

    $expr =~ s/bQ$columnEb/'$cutoff_date'/ig;

    die "Unsafe generated expression: $exprn"
        unless $expr =~ /^[A-Za-z0-9_s()+-*/,.'":]+$/;

    my $sql = "SELECT $expr";

    print "Boundary evaluation SQL: $sqln";

    my ($value) = $dbh->selectrow_array($sql);

    die "Cannot evaluate cutoff expression: $sqln"
        unless defined $value;

    return $value;
}

sub _get_partitions {
    my ($dbh, $db, $tbl) = @_;

    my $sql = q{
        SELECT
            PARTITION_NAME,
            PARTITION_DESCRIPTION,
            PARTITION_EXPRESSION,
            PARTITION_ORDINAL_POSITION
        FROM INFORMATION_SCHEMA.PARTITIONS
        WHERE TABLE_SCHEMA = ?
          AND TABLE_NAME = ?
          AND PARTITION_NAME IS NOT NULL
        ORDER BY PARTITION_ORDINAL_POSITION
    };

    my $sth = $dbh->prepare($sql);
    $sth->execute($db, $tbl);
    my @partitions;

    while (my $row = $sth->fetchrow_hashref()) {
        push @partitions, {
            name        => $row->{PARTITION_NAME},
            description => $row->{PARTITION_DESCRIPTION},
            expression  => $row->{PARTITION_EXPRESSION},
            position    => $row->{PARTITION_ORDINAL_POSITION},
        };
    }

    return @partitions;
}


sub _get_cmdline_option {

    my ($name) = @_;

    my $opt = "--$name";

    for (my $i = 0; $i < @ARGV; $i++) {
        if ($ARGV[$i] eq $opt && defined $ARGV[$i + 1]) {
            return $ARGV[$i + 1];
        }

        if ($ARGV[$i] =~ /^Q$optE=(.*)$/) {
            return $1;
        }
    }

    if (open my $fh, '<', "/proc/$$/cmdline") {
        local $/;
        my $raw = <$fh>;
        close $fh;

        my @cmd = split //, $raw;

        for (my $i = 0; $i < @cmd; $i++) {
            if ($cmd[$i] eq $opt && defined $cmd[$i + 1]) {
                return $cmd[$i + 1];
            }

            if ($cmd[$i] =~ /^Q$optE=(.*)$/) {
                return $1;
            }
        }
    }

    return undef;
}



sub _quote_ident {

    my ($ident) = @_;

    die "Invalid identifier: $identn"
        unless defined $ident && $ident =~ /^[A-Za-z0-9_]+$/;

    return "`$ident`";
}

1;

Create the file named pt_archiver_partition_drop.pm into the /usr/local/share/perl5 path.

Also set the environment variable PERL5LIB to let pt-archiver where to find the Perl package

export PERL5LIB=/usr/local/share/perl5

Example Usage

First, create the partitioned table events and insert some fake data.

DROP TABLE IF EXISTS events;


CREATE TABLE events (
  id BIGINT NOT NULL,
  created_at DATETIME NOT NULL,
  payload JSON DEFAULT NULL,
  PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
  PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
  PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
  PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01')),
  PARTITION pmax VALUES LESS THAN MAXVALUE
);

INSERT INTO events (id, created_at, payload) VALUES

-- p202604
(1,  '2026-04-01 08:00:00', JSON_OBJECT('event', 'login',    'user', 'alice')),
(2,  '2026-04-03 09:15:00', JSON_OBJECT('event', 'view',     'page', 'home')),
(3,  '2026-04-05 10:30:00', JSON_OBJECT('event', 'click',    'button', 'signup')),
(4,  '2026-04-08 11:45:00', JSON_OBJECT('event', 'search',   'term', 'mysql')),
(5,  '2026-04-10 12:00:00', JSON_OBJECT('event', 'purchase', 'amount', 100)),
(6,  '2026-04-14 13:20:00', JSON_OBJECT('event', 'logout',   'user', 'alice')),
(7,  '2026-04-18 14:35:00', JSON_OBJECT('event', 'download', 'file', 'report.pdf')),
(8,  '2026-04-22 15:50:00', JSON_OBJECT('event', 'upload',   'file', 'image.png')),
(9,  '2026-04-26 16:05:00', JSON_OBJECT('event', 'click',    'button', 'buy')),
(10, '2026-04-30 23:59:59', JSON_OBJECT('event', 'month_end')),

-- p202605

(11, '2026-05-01 00:00:00', JSON_OBJECT('event', 'login',    'user', 'bob')),
(12, '2026-05-03 08:10:00', JSON_OBJECT('event', 'view',     'page', 'pricing')),
(13, '2026-05-06 09:20:00', JSON_OBJECT('event', 'search',   'term', 'percona')),
(14, '2026-05-09 10:30:00', JSON_OBJECT('event', 'purchase', 'amount', 250)),
(15, '2026-05-12 11:40:00', JSON_OBJECT('event', 'logout',   'user', 'bob')),
(16, '2026-05-16 12:50:00', JSON_OBJECT('event', 'download', 'file', 'backup.sql')),
(17, '2026-05-20 13:00:00', JSON_OBJECT('event', 'upload',   'file', 'data.csv')),
(18, '2026-05-24 14:10:00', JSON_OBJECT('event', 'click',    'button', 'subscribe')),
(19, '2026-05-28 15:20:00', JSON_OBJECT('event', 'view',     'page', 'docs')),
(20, '2026-05-31 23:59:59', JSON_OBJECT('event', 'month_end')),

-- p202606

(21, '2026-06-01 00:00:00', JSON_OBJECT('event', 'login',    'user', 'carol')),
(22, '2026-06-03 08:05:00', JSON_OBJECT('event', 'search',   'term', 'partitioning')),
(23, '2026-06-06 09:15:00', JSON_OBJECT('event', 'view',     'page', 'dashboard')),
(24, '2026-06-09 10:25:00', JSON_OBJECT('event', 'purchase', 'amount', 500)),
(25, '2026-06-12 11:35:00', JSON_OBJECT('event', 'logout',   'user', 'carol')),
(26, '2026-06-16 12:45:00', JSON_OBJECT('event', 'login',    'user', 'dave')),
(27, '2026-06-20 13:55:00', JSON_OBJECT('event', 'download', 'file', 'archive.zip')),
(28, '2026-06-24 14:05:00', JSON_OBJECT('event', 'upload',   'file', 'video.mp4')),
(29, '2026-06-28 15:15:00', JSON_OBJECT('event', 'click',    'button', 'checkout')),
(30, '2026-06-30 23:59:59', JSON_OBJECT('event', 'month_end')),

-- pmax
(31, '2026-07-01 00:00:00', JSON_OBJECT('event', 'login',    'user', 'eve')),
(32, '2026-07-05 08:30:00', JSON_OBJECT('event', 'view',     'page', 'future')),
(33, '2026-07-10 09:45:00', JSON_OBJECT('event', 'search',   'term', 'maxvalue')),
(34, '2026-08-01 10:00:00', JSON_OBJECT('event', 'purchase', 'amount', 750)),
(35, '2026-09-01 11:15:00', JSON_OBJECT('event', 'retained_future'));

Now you can run the following command to delete all rows before the 1st of May, which, by the way, matches the entire first partition in the table.

pt-archiver 
  --source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop 
  --where "created_at < '2026-05-01'" 
  --purge

Notice the Perl plugin must be indicated with the m option in the DSN string.

In practice:

pt-archiver initializes
The plug-in runs
Partitions are dropped
No DELETE statements are executed

Here is what you get from the execution of the above command:

PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at < '2026-05-01'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-05-01')
Cutoff date: 2026-05-01
Cutoff boundary value: 740102
Matched boundary partition: p202604, position 1
Eligible for DROP: p202604, boundary 740102
SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`
Dropped partitions: p202604

You can simply verify the table has been managed correctly:

SELECT * FROM mydb.events;

SHOW CREATE TABLE mydb.events;

Now TRUNCATE the table and recreate the data and try now to specify the where conditions that match a RANGE that is not the first in the list of the boundaries.

pt-archiver 
  --source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop 
  --where "created_at < '2026-06-01'" 
  --purge

You should get:

PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at < '2026-06-01'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-06-01')
Cutoff date: 2026-06-01
Cutoff boundary value: 740133
Matched boundary partition: p202605, position 2
Eligible for DROP: p202604, boundary 740102
Eligible for DROP: p202605, boundary 740133
SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`, `p202605`
Dropped partitions: p202604, p202605

In this case, two partitions have been identified and dropped.

Truncate the table and recreate the data again. Try now to provide a WHERE condition that does not match any of the boundaries in the RANGE.

pt-archiver 
  --source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop 
  --where "created_at < '2026-04-25'" 
  --purge

You get the following:

PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at < '2026-04-25'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-04-25')
Cutoff date: 2026-04-25
Cutoff boundary value: 740096
No exact partition boundary matches cutoff 740096. Refusing DELETE.

As expected, the tool now refuses to execute anything if it doesn’t find an exact match.

Operational Benefits

This approach provides major advantages.

Dropping partitions is vastly faster than deleting rows, and minimal binary logging is needed, compared to billions of row deletes. There is no massive transactional overhead for managing undo logs and purging. You get then a better InnoDB Buffer Pool stability because of less page churn.

In the end, retention jobs are completed quickly and consistently in a predictable way and at the minimal cost.

Important Caveats

Partition Boundaries Must Match Retention Policy

If partitions contain mixed retention windows, DROP PARTITION may remove too much data. For this reason, ensure correct partition design.

Recommended:

daily partitions
weekly partitions
monthly partitions

aligned with business retention requirements.

Metadata Locks

ALTER TABLE DROP PARTITION still acquires metadata locks.

Test carefully in production.

Backup Awareness

Ensure dropped partitions are no longer needed before removal or use pt-archiver to also copy the data into a remote server or dump the data into a CSV file before running the DROP PARTITION.

Possible Enhancements

The plug-in can be extended further.

Potential improvements:

Support for daily partitions
Support for UNIX timestamp partitions
Dry-run reporting
Automatic partition creation
Push Slack notifications
Export Prometheus metrics
Safety checks for replicas
GTID-aware orchestration
Integration with pt-online-schema-change workflows

These are just some ideas I had meanwhile doing my tests. What you can do by implementing a Perl plugin is only limited by your imagination and your real needs.

Conclusion

pt-archiver remains an excellent tool for implementing retention policies and archival workflows.

However, DELETE-based purging becomes increasingly expensive at scale, even with proper indexing and chunked processing.

For large time-series or historical datasets, RANGE partitioning is often a dramatically superior strategy.

The challenge is that pt-archiver does not natively leverage partition-level operations.

Fortunately, its Perl plug-in architecture allows advanced users to extend its behavior and implement partition-aware cleanup logic.

By combining:

pt-archiver orchestration
MySQL RANGE partitioning
Custom Perl plug-ins

Organizations can achieve:

Faster retention enforcement
Lower operational overhead
Smaller replication impact
Dramatically improved scalability

For large MySQL deployments, this hybrid approach can turn multi-hour purge operations into near-instant metadata operations.

The use case presented in this article is limited to a specific scenario, but you can reuse it or customize it if you have a different kind of RANGE partitioning, for example, not using TO_DAYS().

Take this as just an example of how you can extend pt-archiver. What you can do for real is driven by your needs and/or only limited by your imagination.

More info about extending pt-archiver:
https://docs.percona.com/percona-toolkit/pt-archiver.html#extending

The post Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement appeared first on Percona.

The post Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement appeared first on MariaDB.org.

MariaDB Java Connector 3.5.9, 3.4.3, 3.3.5, and 2.7.14 now available

Mon, 15 Jun 2026 18:14:03 +0000

MariaDB is pleased to announce the immediate availability of the MariaDB Connector/J 3.5.9, 3.4.3, 3.3.5, and 2.7.14 releases. Download Now Notable items in this release include: Notable items in this release include: Notable items in this release include: Notable items in this release include: See…

Source

The post MariaDB Java Connector 3.5.9, 3.4.3, 3.3.5, and 2.7.14 now available appeared first on MariaDB.org.

Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency

Mon, 15 Jun 2026 06:58:17 +0000

Overview

When building high-availability MySQL environments, the choice between MySQL Group Replication (GR) and Percona XtraDB Cluster (PXC) often comes down to how they handle the eternal database dilemma: data consistency versus performance.

While both provide “synchronous-like” replication, they approach the problem of stale reads—reading data that has been committed on one node but not yet applied on another—in distinct ways. Understanding these differences, and the performance penalties associated with fixing them, is critical for any production environment.

Technology Overviews

MySQL Group Replication (GR)

Group Replication is the native, albeit more recent, high-availability solution built by Oracle for MySQL. It is based on a distributed state machine architecture and uses the Paxos consensus protocol.

Mechanism: When a transaction is committed, it is sent to all group members. The members must agree (consensus) on the order of transactions. Once a majority agrees, the transaction is “certified” and committed on the originator.
Replication Type: Virtually synchronous. The consensus ensures the data is received and ordered across nodes, but the actual applying of the data to the database happens asynchronously in the background.

Percona XtraDB Cluster (PXC)

PXC is an open-source enterprise solution based on Percona Server for MySQL and the Galera Replication library, which is the first and most mature virtually synchronous solution for MySQL.

Mechanism: When a node commits a transaction, it sends it to all other members of the Primary component (active group). All nodes must certify the transaction (check for conflicts), this is done on each node in the cluster, including the node that originates the write-set, before the originating node can finalize the commit.
Replication Type: Strictly synchronous (up to the certification level), asynchronous afterward. If the certification test fails, the node drops the write-set and the cluster rolls back the original transaction. If the test succeeds, however, the transaction commits and the write-set is applied to the rest of the cluster.

The Battle Against “Stale Reads”: Why It Matters

The most critical distinction for developers is whether a SELECT query on Node B will immediately see the INSERT just performed on Node A.

In a distributed system, there is a microsecond-to-millisecond gap between a transaction being globally ordered (everyone knows it happened) and being locally applied (the data is physically readable in the table). Reading executed on a secondary during this gap results in a stale read.

Why is avoiding stale reads so critical?

While a stale read might just mean a user temporarily sees their old profile picture after updating it, in many business cases, it breaks the application’s core logic:

Financial Transactions: A user deposits $100 on the Primary node and immediately refreshes their balance page, which reads from a Replica. If the read is stale, the balance hasn’t updated. The user panics, thinking their money is lost.
E-commerce & Inventory: A customer buys the last item in stock. The next user immediately loads the product page. A stale read tells the second user the item is still available, leading to a cancelled order and a frustrated customer.
Security & Access: A user changes their password or updates a critical permission. If the next authentication request hits a node lagging by just a fraction of a second, their valid login might be rejected, or a revoked session might still be active.

To prevent these scenarios, we must tell the database to enforce strict consistency. But how do GR and PXC handle this, and what does it cost?

Consistency Controls Comparison

Both Group Replication and Percona XtraDB Cluster provide built-in mechanisms to enforce consistency and eliminate stale reads when your application demands it. However, they approach this problem using entirely different variables and distinct levels of granularity. The table below breaks down the specific controls each technology offers, highlighting exactly what it takes to force a node to serve fresh data.

Feature	MySQL Group Replication	Percona XtraDB Cluster
Default Behavior	Reads on secondaries may be stale because the applier thread might be lagging after consensus.	Reads on secondaries may be stale due to asynchronous background applying.
Stale Read Fix	Uses the group_replication_consistency variable.	Uses the wsrep-sync-wait variable.
Consistency Levels	Offers EVENTUAL, BEFORE, AFTER, and BEFORE_AND_AFTER.	Offers granular levels from 0 (default, no checks) up to 7 (checks on all READ, UPDATE, DELETE, INSERT, and REPLACE statements).
The Fix	Setting to AFTER ensures the next read is fresh.	Setting to 7 ensures we have a comparable scenario with GR. However in PXC setting wsrep_sync_wait = 1 will be enough to avoid stale reads.

The True Cost of Being Consistent

If we know stale reads are bad, why don’t we just enforce strict consistency everywhere?

An image can help to understand:

Because in distributed databases, consistency is incredibly expensive. To test this, we used a 3-node internal lab environment to run a Sysbench-based TPC-C derivative test (50/50 read/write split, running for 600 seconds, scaling from 1 to 1024 threads).

You can find the detailed machine specifications here. The benchmarks were executed using a TPC-C derivative test based on sysbench. Finally—and crucially—you can review the configuration files used for the tests. I maintained the same baseline MySQL configuration across the board, only adjusting the parameters specific to each replication technology.

Scenario 1: Default (Relaxed) Consistency

(GR = EVENTUAL, PXC = wsrep-sync-wait 0)

I want to remind, that MySQL CE and Percona Server are running using Group Replication, while PXC is using galera.

With default settings, both systems allow stale reads.

Both technologies scales well up to 128 threads:

Group Replication performs exceptionally well, handling up to 15K operations/sec before dropping off after 128 threads.
PXC (Galera) is slightly less efficient at peak but scales very nicely and predictably.

At this level, the lag between the moment of commit and the moment the server returns the answer is minimal. But we are entirely exposed to stale reads.

Scenario 2: Enforced Consistency (The Cost)

(GR = AFTER, PXC = wsrep-sync-wait 7)

When we configure the servers to prevent stale reads, the systems must wait for transactions to be fully applied before returning a read. This is where the architectural differences become glaringly apparent:

PXC (Galera): Performance drops but not too much from a peak of ~9K ops/sec (in the previous test) to roughly ~8.5K ops/sec. This is a hit but not huge and the database remains highly functional and stable.
Group Replication: Performance catastrophically drops from ~15K ops/sec (in the previous test) to a staggering ~3.8K ops/sec.

This is the crucial takeaway

Enforcing strict consistency in Group Replication results in a massive ~75% performance penalty. The latency between the commit and the server response increases significantly compared to PXC.

The intermediate way

There is another approach which is to inject the higher consistency only when it is really needed.

The Solution: Session-Level Consistency You do not need, and should not use, full consistency at the global level for general cases. Instead, force consistency only when and where it is critical.

While for Group Replication there is no support for SQL injection hints like SELECT /*+ SET_VAR(…) */, you can enforce this at the session level right before a critical read:

SET SESSION group_replication_consistency = 'AFTER';
-- OR for PXC:
SET SESSION wsrep_sync_wait = 7;

To note that PXC offers more flexibility and you can use hints:

select /*+ SET_VAR(wsrep_sync_wait=7) */ @@session.wsrep_sync_wait ,@@global.wsrep_sync_wait;
+---------------------------+--------------------------+
| @@session.wsrep_sync_wait | @@global.wsrep_sync_wait |
+---------------------------+--------------------------+
|                         7 |                        0 |
+---------------------------+--------------------------+

By isolating these variables to specific sessions (like the immediate redirect after a password change or a checkout process), you ensure data integrity exactly where the business requires it, while allowing the rest of your application to enjoy the high-speed performance of relaxed consistency.

PXC: The performance drop is minimal and the solution is able to provide a consistent delivery with nice scalability up to 256 threads.

Group Replication: The solution suffers from a significant drop, not as if we set the AFTER condition at global level, but still we see a drop of ~52%.

Comparing the two solutions we can see that PXC is able to deal with the additional requested consistency better.

Additional differences

But these are not the only differences we can immediately see.
Performing a comparison about resources utilization, we can see that while both solutions move the same amount of data as IO operations:

Yes, for exactly the same load and traffic Group Replication consumes 8GB more than PXC, which in this environment represents 26% memory more, over total available.

Cost that is reflected also as CPU utilization.

Conclusion: How to Survive the Cost

How impactful is enforcing strict consistency at a global level in a production environment? Massively. If you blindly enforce strict consistency globally without understanding your architecture, you will decimate your database throughput. Here is the reality of how the two solutions handle that tax:

The Group Replication Reality: By default (using EVENTUAL consistency), MySQL Group Replication behaves essentially as semi-synchronous replication paired with an automated topology manager (see The Failover Brownout: Rethinking High Availability in MySQL Group Replication). The Primary is allowed to forge ahead and serve traffic even if the Secondaries are lagging significantly behind. The moment you demand strict consistency, the Primary is violently tethered back to the rest of the cluster, and its performance drops off a cliff as it waits for the slowest node.
The PXC Advantage: Percona XtraDB Cluster (PXC) absorbs the “consistency penalty” much more gracefully. While varying consistency levels exist in PXC, adjusting them does not cause the same dramatic throughput shock seen in MGR. This is because PXC enforces a virtually synchronous, high-consistency baseline from the start. It simply does not allow the node receiving writes to deviate too far from the rest of the cluster. You pay a baseline performance tax upfront, but in exchange, you get guaranteed, ironclad High Availability out of the box.

The Final Verdict Modifying consistency values at the global server level should only be done after rigorous load testing and a complete understanding of the performance tax you are about to pay.

Ultimately, it comes down to choosing the right tool for your specific SLA:

If your architecture demands a true, virtually synchronous solution with strict High Availability out of the box, PXC is the purpose-built engine for the job.
If you are looking for a highly automated, semi-synchronous solution, Group Replication delivers excellent default performance—but tuning it to mimic PXC’s strict consistency will cost you heavily in throughput.

References

https://www.google.com/url?q=https://mariadb.com/docs/galera-cluster/galera-architecture/certification-based-replication&sa=D&source=docs&ust=1777342808813139&usg=AOvVaw3SAf2g7NO9d681ZJ0VVEMB

https://docs.percona.com/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait

The post Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency appeared first on Percona.

The post Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency appeared first on MariaDB.org.

The Failover Brownout: Rethinking High Availability in MySQL Group Replication

Mon, 15 Jun 2026 06:57:24 +0000

It is time to talk again about Flow control and group replication. This time with a special eye on the use of Group Replication in the Kubernetes context. In this article we will dig a bit on how it works and what are the various side effects.

The problem

Recently I was refining the calculation I use in the MySQL calculator for Operator given I was constantly encountering a very serious problem with the Percona Server Operator.

The problem is that when the deployment was/is serving a high level of traffic, it will, no matter what, end up in getting OMMKill by the K8 system.

This because the pod was gradually consuming more and more memory, reaching the memory limit set in the CR specification.

Now let me clarify a few things, to get straight to the facts.

Kubernetes itself does not OOMKill a pod for hitting its memory limit, the mechanism works as described below with mention on how Working Set Size (WSS) is calculated, and how OOMKills are triggered, and in the resource sections, the links to the official documentation and source code.

1. The Reality of OOMKills vs. Kubelet Evictions

It is crucial to distinguish between what the Linux kernel does and what Kubernetes does:

OOMKilled (Exit Code 137): This is executed entirely by the Linux kernel’s OOM Killer, not Kubernetes. When we set a memory limit in our Pod spec, Kubernetes translates that into a Linux cgroup constraint (memory.limit_in_bytes for cgroups v1, or memory.max for cgroups v2). If our container attempts to allocate more memory than this hard limit, and the kernel cannot reclaim any page cache (like inactive files), the kernel directly intervenes and terminates the process.
Node-Pressure Evictions: This is where Kubernetes actively observes memory. The kubelet monitors the working_set_bytes metric to protect the node from running out of memory. If the node’s memory drops below an eviction threshold, Kubernetes will actively evict pods to prevent the kernel from initiating a system-wide OOM kill.

2. How Working Set Size (WSS) is Calculated for the container

Kubernetes monitors container memory via cAdvisor, which is integrated directly into the kubelet. cAdvisor calculates the Working Set Size by taking the total memory usage and subtracting the inactive file cache (memory that the kernel can easily reclaim if it faces memory pressure).

Because active file caches and anonymous memory (like our application’s heap) cannot be easily evicted, this working set metric is the most accurate representation of the memory your container is forcing the system to hold.

The Calculation & cgroups Evolution The core mathematical calculation is Memory Usage – Inactive File Cache, but how cAdvisor fetches this data from the Linux kernel depends entirely on your node’s cgroup version. Modern cAdvisor relies heavily on the opencontainers/runc/libcontainer library to read these raw cgroup files:

cgroups v1: cAdvisor starts with the raw usage from memory.usage_in_bytes and subtracts the reclaimable cache found under the total_inactive_file key.
cgroups v2 (Unified): cAdvisor starts with the raw usage from memory.current and subtracts the reclaimable cache found under the inactive_file key.

The Underlying Code Logic While older versions used a static setMemoryStats function, modern Kubernetes branches handle this dynamically. The logic executes the following flow before reporting back to the kubelet:

Detects Version: It identifies whether the node runs cgroups v1 or v2 to determine the correct inactive file key name.
Fetch Usage: It pulls the raw memory usage from the container.
Subtract Cache: It looks up the inactive file value and safely subtracts it from the usage (including a safeguard to ensure the working set never drops below zero).
Report Metric: It sets this final calculated value as container_memory_working_set_bytes, which the kubelet then uses to decide if the node is under memory pressure.

Back to us

At the end the point is that if our pod reaches the limit and we ARE NOT using the new swap feature existing in Kubernetes, our pod will be brutally killed, and in 99% of the cases our production will suffer a lot. !Ops spoiler!

To clearly understand what was causing the issue about this memory consumption and having my calculator fail, I started to collect the information about the memory usage in MySQL itself.

SELECT EVENT_NAME,CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS current_usage_mb FROM performance_schema.memory_summary_global_by_event_name WHERE EVENT_NAME like ‘memory/%’ and EVENT_NAME not like ‘memory/performance%’ order by current_usage_mb desc limit 25;

Which will give you and output like this:

+---------------------------------------+------------------+
| EVENT_NAME                            | current_usage_mb |
+---------------------------------------+------------------+
| memory/innodb/buf_buf_pool            |   46398.92578125 |
| memory/group_rpl/GCS_XCom::xcom_cache |    1066.66179943 |
| memory/group_rpl/certification_info   |      92.45250702 |
| memory/innodb/log_buffer_memory       |      64.00096130 |
| memory/sql/TABLE                      |      49.90627003 |
| memory/innodb/memory                  |      34.68734741 |
| memory/innodb/ut0link_buf             |      24.00006104 |
| memory/innodb/lock0lock               |      21.40064240 |
| memory/mysqld_openssl/openssl_malloc  |       9.51009655 |
| memory/innodb/read0read               |       8.19496155 |
| memory/mysys/KEY_CACHE                |       8.00215149 |
| memory/innodb/sync0arr                |       7.03147125 |
| memory/innodb/ha_innodb               |       6.87006950 |
| memory/innodb/lock_sys                |       5.25009155 |
| memory/sql/log_sink_pfs               |       5.00003052 |
| memory/innodb/ut0pool                 |       4.00017548 |
| memory/sql/dd::objects                |       2.83031464 |
| memory/innodb/std                     |       2.72618866 |
| memory/innodb/os0file                 |       2.63054657 |
| memory/innodb/os0event                |       2.34302521 |
| memory/sql/TABLE_SHARE::mem_root      |       2.31734467 |
| memory/innodb/trx0trx                 |       2.22647858 |
| memory/temptable/physical_ram         |       1.00003052 |
| memory/sql/dd::String_type            |       0.94942093 |
| memory/innodb/btr0pcur                |       0.89743423 |
+---------------------------------------+------------------+

Plus I used PMM to collect memory information

To simulate the load I used the sysbench-tpcc (tpc-c derivate test) variant and run the tests simulating a load of 1024 threads against a cluster based on machine with 16 Core and 64Gb volumes ~3k IOPS, so not gigantic but not small.

The finding was almost immediate:

+---------------------------------------+------------------+
| EVENT_NAME | current_usage_mb |
+---------------------------------------+------------------+
| memory/innodb/buf_buf_pool | 46398.92578125 |
| memory/group_rpl/certification_info | 1431.67934418 |

Ok then … What is the certification info???
What is group_rpl/certification_info?
In MySQL, memory/group_rpl/certification_info is a Performance Schema memory instrument. It tracks the exact amount of RAM allocated to store the Certification Database (or Certification Info).
In Group Replication, nodes do not lock rows across the network while a transaction is executing. Instead, transactions execute locally and optimistically. When it is time to commit, the transaction undergoes a Certification Process to ensure no other concurrent transaction in the cluster has modified the exact same rows. The certification_info buffer is the in-memory hash map that makes this conflict detection possible.
1. What is it used for?
The certification_info structure acts as a tracking ledger for recently modified rows.
Here is how it works under the hood:

The Key-Value Pair: It is fundamentally an in-memory dictionary. The key is the hash of a modified row (extracted from the transaction’s “write set”), and the value is the Global Transaction Identifier (GTID) of the transaction that successfully modified it.
Conflict Detection: When a new transaction attempts to commit, it broadcasts its write set and the “snapshot version” of the database it saw when it started. The certifier cross-references the incoming transaction’s write set against the certification_info map.
The Decision: If the certification_info shows that a row was modified by a newer GTID that the incoming transaction did not “see” when it started, a conflict is flagged, and the transaction is aborted. If no conflict exists, the transaction is certified, and the certification_info map is updated with the new write set and GTID.

The primary does not hold onto this memory out of stubbornness; it does so because purging that data too early would destroy the cluster’s consistency in the event of a failover.
In Group Replication, garbage collection for the certification_info buffer is not triggered just because a transaction commits on the primary. It is triggered by a concept called the Stable Set.
Every node in the cluster periodically broadcasts a message to the rest of the group saying, “Here are the GTIDs I have successfully applied to my disk.” The cluster then calculates a global low watermark. This watermark is the highest transaction GTID that every single member of the group has successfully applied. Garbage collection is only allowed to purge write-sets from the certification database that fall below this global watermark.

To note that this purge is a synchronous operation during which writes are forbidden.
2. How the Apply Queue Stalls the Watermark
When a secondary node starts lagging, its applier queue grows. This means the secondary is receiving transactions from the network quickly, but its SQL thread is too slow to actually execute them and commit them to disk.
Because the secondary hasn’t applied these transactions, it cannot report those GTIDs back to the group as “finished.”

The lagging secondary’s local watermark stalls.
Therefore, the global low watermark for the entire cluster stalls.
Because the global watermark hasn’t moved forward, the garbage_collect function on the primary (and all other nodes) says, “I am not allowed to delete any write-sets yet.”
As the primary continues to process new writes, the certification_info memory buffer grows continuously.

3. Why the Primary Cannot Purge Early
we might wonder: If the transaction is already committed on the primary, why does the primary care if the secondary has applied it? Why not just drop the write-set from its own memory?
The answer comes down to Failover Safety and Distributed Conflict Detection. GR is a shared-nothing, decentralized architecture. Even if you are running in Single-Primary mode (keep this in mind will be important later), the underlying engine uses the exact same logic as Multi-Primary mode.
Here is why the primary is forbidden from purging that data:

The Failover Scenario: Imagine our primary node crashes right now. The lagging secondary (which still has a massive apply queue) is immediately elected as the new primary.
The Conflict Risk: As the new primary, it starts accepting new writes from your application. However, it still has thousands of old transactions in its applier queue that it hasn’t written to disk yet!
The Necessity of the Buffer: When a new write comes in, the new primary must check if that write conflicts with any of the pending transactions in its apply queue. It does this by checking the certification_info map. If the old primary had purged the global certification data early, the new primary wouldn’t have the write-sets for those pending transactions. It would blindly accept the new write, causing a massive data conflict and breaking the replication group entirely.

Fine Marco, then what is the effect of this?
Well, drums roll …
… When a secondary node is elected as the new primary during a failover, it does not immediately open the floodgates to new writes. It keeps its super_read_only variable set to ON until it has completely drained its local apply queue of all transactions that were certified prior to the election.
This is an intentional design choice to guarantee that the new primary’s state is completely consistent with the old primary before it starts accepting new data.
4. Immediate Write Rejections (No Built-in Queuing)
The most critical impact to understand is that the new primary does not queue or pause new incoming writes while it catches up. It outright rejects them.
If our application or proxy routes a COMMIT, INSERT, UPDATE, or DELETE to the new primary while it is still processing the old queue, MySQL will immediately throw an error back to the client:
ERROR 1290 (HY000): The MySQL server is running with the –super-read-only option so it cannot execute this statement
5. The “Brownout” Window (Write Outage)
Because of this behavior, a failover in MySQL Group Replication does not instantly restore write availability. Our cluster experiences a “brownout”, a period where reads might succeed, but writes are entirely blocked.
The duration of this write outage is directly proportional to the size of the apply queue.

If the secondary was fully caught up, write availability is restored in milliseconds.
If the secondary was lagging by 50 minutes, your application will suffer a 50 minute write outage while the node applies the backlog.

6. Impact on Proxies (e.g., MySQL Router or ProxySQL)
If we are using a proxy layer to route your database traffic, the apply queue dictates how the proxy behaves during the transition:

MySQL Router: It continuously monitors the cluster topology and the super_read_only flag. Even though the node has technically been elected primary, Router will not open the read-write port to it until the apply queue drains and super_read_only flips to OFF. Depending on your application timeouts, client connections will either hang waiting for a writable connection or fail completely.
ProxySQL: Similar to Router, if it is configured to check for the read_only state, it will temporarily quarantine the new primary from the write hostgroup.
HAProxy (in Operator): Monitor both Primary state and read_only state, but it expose the Primary to writes causing the application to fail (bug we need to fix)

7. Read Traffic and Stale Data
During this catch-up phase, the node will accept incoming SELECT queries (since it is still a valid database). However, because it is actively churning through the old primary’s backlog, the data being read is temporarily stale.
If your application reads a row that is sitting in the apply queue but hasn’t been committed to disk yet, it will get the old version of that row.
Why Flow Control is Critical
Because a large apply queue turns a seamless failover into a severe, application-breaking write outage, Group Replication includes the Flow Control feature.
Flow Control monitors the size of the apply queues across all secondaries. If a secondary starts lagging too far behind, Flow Control should actively throttle the write throughput on the current primary to allow the lagging node to catch up. It is essentially a trade-off: we accept a slight performance hit during normal operations to guarantee that your database recovers almost instantly during a failover.
However, this is not what really happens.
1. It is Reactive, Not Proactive (The Polling Blind Spot)
Flow control does not intercept and evaluate every single transaction in real-time. Instead, it relies on a periodic polling interval governed by group_replication_flow_control_period (which defaults to 1 second).
Once a second, the cluster checks the size of the apply queues and the certifier queues.

The Vulnerability: If our application generates a massive spike of 50,000 writes in 500 milliseconds, the primary will happily accept and certify all of them. Flow control will not even notice the spike until the next 1 second polling interval hits. By the time it decides to apply a throttle, the damage is already done, and the secondary’s queue is already overflowing.

2. The PID Controller’s “Soft Brake” Math
When flow control does decide to throttle, it does not simply freeze the primary. It uses a PID (Proportional-Integral-Derivative) controller algorithm to calculate a “write quota” (the maximum number of transactions the primary is allowed to commit in the next second).
The PID controller is deliberately tuned to be gentle. It wants to gracefully degrade performance rather than cause immediate application timeouts.

When the secondary’s queue breaches the group_replication_flow_control_applier_threshold (default 25,000 transactions), the PID controller reduces the primary’s quota incrementally.
The Failure Point: If the primary’s incoming write rate is astronomically higher than the secondary’s disk IO capacity, this incremental “step down” in the quota is too slow. The primary is still allowed to write, say, 10,000 transactions per second, while the secondary is only applying 2,000. The queue continues to grow aggressively despite the throttle being “active.”

3. The Concurrency Mismatch (Parallel vs. Serial)
This is often the silent killer that defeats flow control. Flow control makes mathematical assumptions about how fast the secondary should be able to apply transactions based on recent history.
However, the primary node might be executing writes using hundreds of highly concurrent threads. The secondary relies on the parallel applier to keep up. If the incoming workload suddenly includes transactions that cannot be parallelized, such as writes hitting overlapping rows, cascading foreign key updates, or DDL statements, the secondary’s applier instantly drops from executing in parallel down to a single, serialized thread.
When this serialization happens, the secondary’s applier rate plummets instantly. Flow control, which only checks in once a second and adjusts gradually, cannot brake the primary fast enough to compensate for the secondary suddenly dropping to a crawl.
What can we do?
At the moment of writing there are only two things that can be done.

Make Flow control more aggressive
Increase the number of replication appliers

1. Making Flow Control More Aggressive
We can configure Flow Control to be a bit more aggressive. It will still remain a suggestion but a strong one.
How it works (The Configuration):

Lower the Threshold: By reducing group_replication_flow_control_applier_threshold (default is 25,000) to something like 1,000 or 500, we force the PID controller to kick in almost immediately when a spike occurs.
Remove the Safety Net: By keeping group_replication_flow_control_min_quota to 0 (default), we remove the minimum write guarantee. If the secondary falls behind, Flow Control is allowed to throttle the primary’s writes down to zero, also if this will never happen.
Increase the Sensitivity: We can tweak the PID controller’s math (using the derivative and proportional tuning variables) to react much more aggressively to queue growth.

group_replication_flow_control_hold_percent=100

group_replication_flow_control_release_percent=5

The reality check, does it work?:
If the expectation is to have a rigid control over the applier queue on the lagging secondary, then the answer is NO. No matter what, at the moment flow control is not designed to act as we are used to in PXC (Percona Xtradb Cluster), where we have a rigid control of the pending queue also at the cost of delaying the writes. In Group Replication the Flow Control will never bring the write to 0, the unfortunate aspect is that the mechanism is not enough to keep the queue under control.
2. Increasing Replication Appliers
To help the secondary chew through the queue faster, we can increase the number of parallel threads it uses to write to disk.
How it works: We can increase the replica_parallel_workers (formerly slave_parallel_workers) setting. GR is exceptionally smart about this. Because of the certification process we discussed earlier, GR already knows exactly which transactions modify which rows. It uses a writeset-based dependency tracker to safely hand off non-conflicting transactions to multiple worker threads simultaneously.

The formula that is normally used to calculate the number of replication workers is to set 2.5 workers for each available core. IE if we have 14000m CPUs in our CR (K8) then we can assign ~35 workers, this is definitely higher than the default value of 4.
The reality check, does it work?: Yes, but only if our workload allows it.

The Catch – The Serialization Wall: Parallel appliers only work if the transactions do not conflict. If our application has 50 concurrent threads all trying to update the same “inventory count” row, or updating a highly contentious table, those transactions cannot be parallelized. The secondary’s coordinator thread will see the row-level conflicts and force those transactions to wait in line and execute sequentially. We could allocate 128 parallel workers, but 127 of them will sit idle while one thread does all the work.
The Catch – Context Switching: More threads do not magically create more disk IOPS. If we set the workers too high (e.g., beyond the physical CPU core count or disk IO capacity), the secondary’s InnoDB engine will spend more time context-switching and fighting over internal mutex locks than actually committing data. In many cases, over-allocating parallel workers actually slows down the apply rate.

Do we have any conclusions?
1. If HA is the goal, enforce Strict Flow Control
If our absolute top priority is High Availability, specifically achieving a near-zero Recovery Time Objective (RTO), we must configure an aggressive flow control.

The Logic: Fast failovers require small apply queues. To guarantee a small apply queue, we must strictly throttle the primary the millisecond the secondary starts to lag.
The Trade-off: we are protecting the cluster’s failover readiness at the expense of application write latency. If there is a massive write spike, our application will face timeouts and connection errors, but if the primary server suddenly catches fire, our database will recover and elect a new primary almost instantly.

The problem is that Group Replication is not able to act like that today, this is something we eventually need to implement to have better HA.
2. If Performance is the goal, relax Flow Control
If our top priority is keeping the application fast and ensuring COMMIT latencies remain extremely low, we should relax flow control or rely on the generous defaults.

The Logic: By relaxing flow control, we allow the primary to run at the absolute maximum speed its local disks and CPU allow. It does not care if the secondaries fall behind. Our application users remain happy and experience zero throttling.
The Trade-off: We are accepting severe risks to your HA posture. If the primary crashes while the secondaries have a massive apply queue, we will suffer a long write outage (the brownout) while the new primary catches up. Additionally, we are accepting the risk that the certification_info memory buffer will grow significantly on the primary and eventually have the pod OOMKilled .

3. Is this not what Asynchronous replication with semy-sync offers?
1. The Similarities
If we look purely at how a single transaction flows and how a failover behaves, GR and Semi-Sync look like twins:

The Durability Guarantee: Semi-Sync: The primary waits to commit until at least one secondary confirms it has received the transaction and written it to its local Relay Log.

GR: The primary waits to commit until a majority quorum of nodes confirm they have received the transaction, certified it, and written it to their local relay logs.

The Failover Delay (The Queue): In both systems, the secondary receiving the data does not mean the secondary has applied the data to its InnoDB tables.

If a crash happens, both systems require the new primary to completely execute its pending queue (Relay Log for Semi-Sync, Apply Queue for GR) before it is safe to accept new writes.

2. The Crucial Differences
If they behave so similarly, why use GR at all?

The differences lie entirely in automation, consensus, and split-brain protection. Semi-Sync is just a data transport mechanism; GR is a full state-machine cluster.
Here is what GR gives you that Semi-Sync does not:

Automatic Election and Orchestration:

Semi-Sync: If the primary dies, Semi-Sync does nothing. The cluster sits there broken. You must rely on external tools (like Orchestrator or manual DBA intervention) to detect the crash, pick the most up-to-date secondary, wait for its relay log to apply, disable read_only, and re-point the application.
GR: The cluster detects the failure natively. The remaining nodes use Paxos consensus to elect a new primary automatically, manage the queue drain natively via the super_read_only flip we discussed, and self-heal.

Split-Brain Protection (Network Partitions):

Semi-Sync: If our network splits in half, an external failover tool might accidentally promote a secondary while the old primary is still alive and accepting writes. We now have a split-brain, and our data is permanently corrupted.
GR: GR enforces strict quorum. If a network split happens, the side of the network with the minority of nodes will automatically fence itself off and refuse all writes. Split-brain is mathematically prevented.

The Certification Database:

As we established, GR requires the certification map to ensure the new primary doesn’t accept writes that conflict with its unapplied queue. Semi-Sync does not have this; it relies entirely on the external failover tool to guarantee no writes touch the new primary until the relay log is 100% applied.

3. Final observation
If we are using Single-Primary GR with relaxed flow control, we have essentially built a highly-automated, consensus-driven version of Semi-Sync replication.
We have the exact same apply-queue bottleneck during failover, but we have traded the need for external orchestrator tools for built-in Paxos consensus and native split-brain protection.
Conclusions (for real)
When we run MySQL on a traditional, dedicated Virtual Machine, memory limits are “soft.” If the certification_info database explodes and consumes an extra 10GB of RAM because of the applier lag, the Linux OS might start aggressively swapping inactive pages to disk, but the MySQL process usually survives. Performance degrades, but the database stays online.
In Kubernetes, memory limits are “hard.” As we discussed earlier, Kubernetes enforces pod memory limits via cgroups v2 (memory.max). The Linux kernel’s OOM Killer has no understanding of database quorum, failover states, or apply queues. It only sees math: Working Set Size > memory.max = Terminate Process (Exit Code 137).
The Chain Reaction of Relaxed Flow Control in k8s
If we prioritize “performance” by relaxing Flow Control in a Kubernetes environment, we are essentially setting a ticking time bomb. Here is the chain of events:

The Spike: Our application experiences a massive write spike.
The Queue: The secondary pod’s disk cannot keep up, and its applier queue grows to 1,000,000 transactions.
The Memory Sprawl: Because the queue is large, the global low-watermark stalls. The Primary pod is forbidden from garbage collecting the certification_info map. The in-memory hash map balloons in size.
The Execution: The memory.current metric will reach the memory.max, kernel will trigger the OMMKill process. First action will be to try to free the page.cache related to the process. If the purge is successful and the memory.current is less than memory.max then the process will persist, otherwise the kernel will kill it.

We can use the WSS metric to predict a successful OMMKill.

The Primary pod’s Working Set Size (WSS) breaches its Kubernetes memory limit, this is a fair estimate not an absolute value.
The Catastrophe: The Linux OOM Killer instantly assassinates the Primary MySQL process.

Because we tried to avoid a few seconds of write latency by keeping relaxed Flow Control, we inadvertently caused a hard crash of the primary database pod, with long write downtime.
The Architectural Law
Therefore, here is my statement as architectural law for containerized environments: In Kubernetes, High Availability and Pod stability are so intrinsically linked that Flow Control must act as hard as it can to cap the apply queue.

We cannot allow unbounded memory growth in a container. The only way to bound certification_info memory is to bound the apply queue.
The only way to bound the apply queue is with strict, aggressive Flow Control.
Increasing the number of replication appliers helps but is not the conclusive answer.

In a Kubernetes environment, we must tune group_replication_flow_control_applier_threshold to a strict, low number, and accept that during massive traffic spikes, our application will experience write throttling. It is infinitely better for our application’s connection pool to wait 2 seconds for a COMMIT to succeed than for the primary database pod to be violently OOMKilled by the kernel, and have to wait for minutes or hours to recover write capabilities.
Note
Just as a mention this is exactly how Percona Operator with Percona Xtradb Cluster works. To be more specific, PXC and in general solutions based on Galera have a Flow Control mechanism that enforces the queue to be inside hard limits. While this more invasive control may be noticeable at application level, it guarantees that the other nodes are not lagging behind the primary and this is why it is a stronger HA solution in the Kubernetes environment.
Reference
https://github.com/Tusamarco/mysqloperatorcalculator
Managing Resources and OOMKills: Resource Management for Pods and Containers (This page details how memory limits are enforced reactively by the Linux kernel via OOM kills).
How WSS triggers Evictions: Node-pressure Eviction (This page explicitly details how the kubelet uses the memory.available signal, which is derived from node capacity minus the working set size).
Latest changes. Pointer to the code
Swap Memory Management (Core Concepts & Configuration): https://kubernetes.io/docs/concepts/cluster-administration/swap-memory-management/
The post The Failover Brownout: Rethinking High Availability in MySQL Group Replication appeared first on Percona.

The post The Failover Brownout: Rethinking High Availability in MySQL Group Replication appeared first on MariaDB.org.

MariaDB Foundation Sea Lion Champions Nominees: Sylvain Arbaudie

Mon, 15 Jun 2026 06:30:16 +0000

Interview with Sylvain Arbaudie, nominated in the Technical Excellence category.
The MariaDB Foundation Sea Lion Champions program celebrates the people and organizations who help make the MariaDB ecosystem stronger, more open, and more useful for everyone. …

Continue reading “MariaDB Foundation Sea Lion Champions Nominees: Sylvain Arbaudie”

The post MariaDB Foundation Sea Lion Champions Nominees: Sylvain Arbaudie appeared first on MariaDB.org.

MariaDB + DuckDB: A New Playground for Analytics – A First Look at the New Storage Engine

Fri, 12 Jun 2026 11:53:16 +0000

MariaDB just announced it has learned to quack: the new DuckDB storage engine has joined the large family of storage engines in MariaDB Server. …

Continue reading “MariaDB + DuckDB: A New Playground for Analytics – A First Look at the New Storage Engine”

The post MariaDB + DuckDB: A New Playground for Analytics – A First Look at the New Storage Engine appeared first on MariaDB.org.

Guide Multi-Cluster MongoDB on GKE with MCS, Percona Operator

Fri, 12 Jun 2026 11:00:00 +0000

Multi-Cluster MongoDB on GKE with MCS Guide

Deploying the Percona Operator for MongoDB across two GKE clusters using Multi-Cluster Services (MCS)

This guide walks through deploying a highly available MongoDB replica set that spans two GKE clusters using the Percona Operator for MongoDB and GKE Multi-Cluster Services (MCS).

Architecture Overview

Both clusters belong to the same GKE Fleet. MCS gives each cluster DNS names for the
other cluster’s services (*.psmdb.svc.clusterset.local). externalNodes in the
Percona CR tells MongoDB to use those names as replica-set members. MCS provides
cross-cluster DNS; externalNodes wires MongoDB to use it.

What runs on each cluster

Each site runs a sharded MongoDB cluster (not a single 6-node replset):

flowchart TB
subgraph Main["Main cluster, Operator MANAGED"]
direction TB
MO["mongos ×3"]
MC["cfg replset: cfg-0, cfg-1, cfg-2"]
MR["shard rs0: rs0-0, rs0-1, rs0-2"]
MO --> MC
MO --> MR
end
subgraph Replica["Replica cluster, Operator UNMANAGED"]
direction TB
RO["mongos ×3"]
RC["cfg replset: cfg-0, cfg-1, cfg-2"]
RR["shard rs0: rs0-0, rs0-1, rs0-2"]
RO --> RC
RO --> RR
end
MC |"6 members, config servers"| RC
MR |"6 members, shard data"| RR

Once interconnected, each replset has 6 members (3 on main + 3 on replica). One
PRIMARY per replset; the rest are SECONDARY.

MCS is bidirectional

Both clusters export their own services and import the other cluster’s services:

flowchart LR
subgraph Main["Main cluster"]
ExpM["ServiceExportn(main services)"]
ImpM["ServiceImportn(replica services)"]
end
subgraph Replica["Replica cluster"]
ExpR["ServiceExportn(replica services)"]
ImpR["ServiceImportn(main services)"]
end
ExpM -->|"MCS Fleet"| ImpR
ExpR -->|"MCS Fleet"| ImpM
ImpM --> DNS["*.psmdb.svc.clusterset.local"]
ImpR --> DNS

Each cluster sees 18 ServiceImports, 9 from main + 9 from replica.

Prerequisites

gcloud CLI installed and authenticated
kubectl installed
yq installed (brew install yq on macOS or apt install yq on Linux)
A GCP project with billing enabled
Owner or Editor role on the project

If you want to see all the command in a Readmefile, see the Github repository here.

File Layout

After completing this guide you will have:

Kubeconfigs (in ~/.kube/psmdb-demo/, outside this repo):

~/.kube/psmdb-demo/gcp-main_config # kubeconfig for main cluster
~/.kube/psmdb-demo/gcp-replica_config # kubeconfig for replica cluster

Manifests and exports (in this working directory):

cr-main.yaml # Main cluster initial config
cr-main-after.yaml # Main cluster config with externalNodes
cr-replica.yaml # Replica cluster config
cr-replica-after.yaml # Replica cluster config with externalNodes

The following files are local only, created during the guide, listed in .gitignore, do not commit (contain passwords, TLS keys, and encryption keys):

my-cluster-secrets.yml # exported from main (do not apply directly)
main-cluster-ssl.yml # exported from main (do not apply directly)
main-cluster-ssl-internal.yml # exported from main (do not apply directly)
my-cluster-name-mongodb-encryption-key.yml # exported from main (do not apply directly)
my-cluster-secrets-replica.yaml # modified for replica, apply this
replica-cluster-ssl.yml # modified for replica, apply this
replica-cluster-ssl-internal.yml # modified for replica, apply this
my-cluster-name-mongodb-encryption-key-replica.yml # modified for replica, apply this

Why two versions of cr-main.yaml?
The initial cr-main.yaml deploys the cluster without knowing the replica node addresses.
After the replica cluster is running and ServiceImports are confirmed, cr-main-after.yaml
adds externalNodes to interconnect the two clusters. This avoids DNS failures during
initial deployment.

Step 1: Set your project ID

bash

export PROJECT_ID=your_project_id

Verify:

bash

echo $PROJECT_ID

Step 2: Enable required GCP APIs

These APIs are required for MCS, Fleet, and Workload Identity to work.

bash

gcloud services enable 
 multiclusterservicediscovery.googleapis.com 
 gkehub.googleapis.com 
 cloudresourcemanager.googleapis.com 
 trafficdirector.googleapis.com 
 dns.googleapis.com 
 --project $PROJECT_ID

Expected output: each API shows Enabling API... then Operation finished successfully.

Step 3: Create two GKE clusters

Both clusters must be created with --workload-metadata=GKE_METADATA and --workload-pool
to enable Workload Identity Federation, which is required by the MCS importer.

bash

# Main cluster
gcloud container clusters create main-cluster 
 --zone us-central1-a 
 --machine-type n1-standard-4 
 --num-nodes=3 
 --workload-metadata=GKE_METADATA 
 --workload-pool=$PROJECT_ID.svc.id.goog

# Replica cluster
gcloud container clusters create replica-cluster 
 --zone us-central1-a 
 --machine-type n1-standard-4 
 --num-nodes=3 
 --workload-metadata=GKE_METADATA 
 --workload-pool=$PROJECT_ID.svc.id.goog

Both clusters use us-central1-a here for simplicity. In a production setup,
use different zones or regions (e.g. us-east1-b) for the replica to achieve
true regional isolation.

Step 4: Enable MCS and register clusters to the Fleet

GKE uses a Fleet to group clusters. There is exactly one Fleet per GCP project,
automatically named after the project ID. MCS works across all clusters in the same Fleet.

bash

# Enable MCS at the Fleet level
gcloud container fleet multi-cluster-services enable --project $PROJECT_ID

# Register main cluster to the Fleet
gcloud container fleet memberships register main-cluster 
 --gke-cluster us-central1-a/main-cluster 
 --enable-workload-identity

# Register replica cluster to the Fleet
gcloud container fleet memberships register replica-cluster 
 --gke-cluster us-central1-a/replica-cluster 
 --enable-workload-identity

Step 5: Grant IAM permissions to the MCS Importer

The MCS Importer is a GKE-managed pod in the gke-mcs namespace on each cluster.
Its job is to watch for ServiceExport resources and create ServiceImport objects
on other clusters. It needs read access to your VPC network configuration to do this.

bash

# Get the numeric project number (different from the project ID string)
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID 
 --format="value(projectNumber)")

# Grant compute.networkViewer to the MCS importer service account
gcloud projects add-iam-policy-binding $PROJECT_ID 
 --member "principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/$PROJECT_ID.svc.id.goog/subject/ns/gke-mcs/sa/gke-mcs-importer" 
 --role "roles/compute.networkViewer"

Step 6: Verify MCS is active on both clusters

gcloud container fleet multi-cluster-services describe --project $PROJECT_ID

Expected output, both clusters must show code: OK:

yaml

membershipStates:
 projects/XXXXXXX/locations/us-central1/memberships/main-cluster:
 state:
 code: OK
 description: Firewall successfully updated
 projects/XXXXXXX/locations/us-central1/memberships/replica-cluster:
 state:
 code: OK
 description: Firewall successfully updated
resourceState:
 state: ACTIVE

If you see code: PENDING wait 2–3 minutes and re-run. If you see errors,
check that both clusters were created with --workload-pool and the IAM
binding in Step 5 was applied successfully.

Step 7: Generate kubeconfig files

Security: Kubeconfig files contain credentials that grant access to your clusters.
Keep both files in ~/.kube/psmdb-demo only, do not copy them elsewhere, commit them
to version control, or share them with anyone.

Store kubeconfig files in a dedicated directory outside this project:

bash

mkdir -p ~/.kube/psmdb-demo
chmod 700 ~/.kube/psmdb-demo

bash

# Generate kubeconfig for main cluster
KUBECONFIG=~/.kube/psmdb-demo/gcp-main_config gcloud container clusters 
 get-credentials main-cluster --zone us-central1-a

# Generate kubeconfig for replica cluster
KUBECONFIG=~/.kube/psmdb-demo/gcp-replica_config gcloud container clusters 
 get-credentials replica-cluster --zone us-central1-a

chmod 600 ~/.kube/psmdb-demo/gcp-main_config ~/.kube/psmdb-demo/gcp-replica_config

Verify both files were created:

ls -la ~/.kube/psmdb-demo/gcp-main_config ~/.kube/psmdb-demo/gcp-replica_config

Verify each connects to the correct cluster:

kubectl --kubeconfig ~/.kube/psmdb-demo/gcp-main_config get nodes
kubectl --kubeconfig ~/.kube/psmdb-demo/gcp-replica_config get nodes

Two terminals, set up once: Open two terminal windows for the rest of this
guide. Run each export once when you open the terminal, you do not need to repeat
it in later steps unless you open a new window:

Terminal Cluster Run once when opening the terminal

Terminal 1 Main export KUBECONFIG=~/.kube/psmdb-demo/gcp-main_config

Terminal 2 Replica export KUBECONFIG=~/.kube/psmdb-demo/gcp-replica_config

Verify:
bash
kubectl get nodes
From Step 8 onward, every kubectl block is labeled Terminal 1 or Terminal 2
only. Run the command in the matching terminal.
Re-export only if you open a new terminal window.

Terminal	Cluster	Run once when opening the terminal
Terminal 1	Main	`export KUBECONFIG=~/.kube/psmdb-demo/gcp-main_config`
Terminal 2	Replica	`export KUBECONFIG=~/.kube/psmdb-demo/gcp-replica_config`

Example: This is how the cluster looks like:

Terminal 1 · main cluster

bash

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-main-cluster-default-pool-9c0082b4-19wj Ready  68m v1.35.3-gke.2190000
gke-main-cluster-default-pool-9c0082b4-q78p Ready  68m v1.35.3-gke.2190000
gke-main-cluster-default-pool-9c0082b4-rb6r Ready  68m v1.35.3-gke.2190000

Terminal 2 · replica cluster

bash

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-replica-cluster-default-pool-3f3e6f2b-1qkb Ready  56m v1.35.3-gke.2190000
gke-replica-cluster-default-pool-3f3e6f2b-gl5j Ready  56m v1.35.3-gke.2190000
gke-replica-cluster-default-pool-3f3e6f2b-h6hk Ready  56m v1.35.3-gke.2190000

Step 8: Grant cluster-admin permissions to your account

GCP project access and Kubernetes permissions inside each cluster are separate,
Step 7’s kubeconfig lets you authenticate, but from Step 9 onward you need
cluster-wide rights to install the operator and deploy MongoDB. Main and replica are
independent clusters with their own RBAC, so run the same command on each; a binding
on one does not apply to the other.

Terminal 1:

bash

kubectl create clusterrolebinding cluster-admin-binding 
 --clusterrole cluster-admin 
 --user $(gcloud config get-value core/account)

Terminal 2:

bash

kubectl create clusterrolebinding cluster-admin-binding 
 --clusterrole cluster-admin 
 --user $(gcloud config get-value core/account)

If you see AlreadyExists on either cluster, the binding was already created in a
previous session. This is not an error; continue to the next step.

Verify on both clusters, each should return yes:

bash

kubectl auth can-i '*' '*' --all-namespaces # Terminal 1
kubectl auth can-i '*' '*' --all-namespaces # Terminal 2

Step 9: Create namespace and install the Operator on both clusters

The namespace must be identical on both clusters. The MCS DNS name includes
the namespace (e.g. rs0.psmdb.svc.clusterset.local). If the namespaces differ,
nodes cannot find each other.

Terminal 1 (main cluster):

bash

kubectl create namespace psmdb
kubectl config set-context --current --namespace=psmdb
kubectl apply --server-side 
 -f https://raw.githubusercontent.com/percona/percona-server-mongodb-operator/v1.20.1/deploy/bundle.yaml 
 -n psmdb

Terminal 2 (replica cluster):

bash

kubectl create namespace psmdb
kubectl config set-context --current --namespace=psmdb
kubectl apply --server-side 
 -f https://raw.githubusercontent.com/percona/percona-server-mongodb-operator/v1.20.1/deploy/bundle.yaml 
 -n psmdb

Verify the Operator is running on each cluster:

bash

# Terminal 1
kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-6877fcf797-stv4s 1/1 Running 0 33s

# Terminal 2
kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-6877fcf797-gslpz 1/1 Running 0 9s

Step 10: Create the Main cluster

Run all commands in Terminal 1 (main cluster).

Create cr-main.yaml:

Important notes:

type: ClusterIP is required for MCS, LoadBalancer will not work

multiCluster.DNSSuffix: svc.clusterset.local enables cross-cluster DNS

crVersion: 1.20.1, use a released version only. The Operator derives the
init container image tag from crVersion.

bash

cat > cr-main.yaml << 'EOF'
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
 name: main-cluster
spec:
 crVersion: 1.20.1
 image: percona/percona-server-mongodb:7.0.14-8-multi
 updateStrategy: SmartUpdate
 multiCluster:
 enabled: true
 DNSSuffix: svc.clusterset.local
 upgradeOptions:
 apply: disabled
 schedule: "0 2 * * *"
 secrets:
 users: my-cluster-name-secrets
 encryptionKey: my-cluster-name-mongodb-encryption-key
 replsets:
 - name: rs0
 size: 3
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 sharding:
 enabled: true
 configsvrReplSet:
 size: 3
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 mongos:
 size: 3
 expose:
 type: ClusterIP
EOF

Apply it:

kubectl apply -f cr-main.yaml -n psmdb

Watch until status is ready (takes 3–5 minutes):

kubectl get psmdb -n psmdb -w

Expected output:

kubectl get psmdb -n psmdb
NAME ENDPOINT STATUS AGE
main-cluster main-cluster-mongos.psmdb.svc.cluster.local:27017 ready 13m

Verify ServiceExport resources were created (takes up to 5 minutes after ready):

kubectl get serviceexport -n psmdb

Expected output:

NAME AGE
main-cluster-cfg 27m
main-cluster-cfg-0 27m
main-cluster-cfg-1 27m
main-cluster-cfg-2 26m
main-cluster-mongos 27m
main-cluster-rs0 27m
main-cluster-rs0-0 27m
main-cluster-rs0-1 27m
main-cluster-rs0-2 26m

Step 11: Export secrets from the Main cluster

Run all commands in Terminal 1 (main cluster).

The Replica cluster runs in unmanaged: true mode and cannot generate its own
TLS certificates or credentials. It must receive exact copies of the Main cluster secrets:

Without TLS secrets → pods never start
Without user credentials → pods start but fail liveness checks and restart continuously

bash

kubectl get secret my-cluster-name-secrets -n psmdb -o yaml > my-cluster-secrets.yml

kubectl get secret main-cluster-ssl -n psmdb -o yaml > main-cluster-ssl.yml

kubectl get secret main-cluster-ssl-internal -n psmdb -o yaml > main-cluster-ssl-internal.yml

kubectl get secret my-cluster-name-mongodb-encryption-key -n psmdb -o yaml > 
my-cluster-name-mongodb-encryption-key.yml

Step 12: Modify secrets for the Replica cluster

The exported secrets contain cluster-specific metadata that must be removed before
applying to another cluster. The resourceVersion and uid fields are unique to the
Main cluster and cause a conflict error if reused unchanged.

The secret data (passwords, TLS certificates, encryption key) is copied as-is,
the replica must use the same credentials to join the same MongoDB deployment. The
Kubernetes secret names for user credentials and the encryption key stay the same
(my-cluster-name-secrets, my-cluster-name-mongodb-encryption-key) because
cr-replica.yaml references those exact names. Only the TLS secrets are renamed
(main-cluster-ssl → replica-cluster-ssl) via sed; the yq step strips stale
metadata, it does not rename those two secrets.

Linux vs macOS: sed -i '' is macOS-only syntax.
On Linux, use sed -i without the empty string argument.

Terminal 1 (main cluster), modify the exported files locally:

bash

# Secret 1, user credentials
yq eval 'del(.metadata.ownerReferences, .metadata.annotations,
 .metadata.creationTimestamp, .metadata.resourceVersion,
 .metadata.selfLink, .metadata.uid)' 
 my-cluster-secrets.yml > my-cluster-secrets-replica.yaml
sed -i 's/main-cluster/replica-cluster/g' my-cluster-secrets-replica.yaml

# Secret 2, SSL client certificates
yq eval 'del(.metadata.ownerReferences, .metadata.annotations,
 .metadata.creationTimestamp, .metadata.resourceVersion,
 .metadata.selfLink, .metadata.uid)' 
 main-cluster-ssl.yml > replica-cluster-ssl.yml
sed -i 's/main-cluster/replica-cluster/g' replica-cluster-ssl.yml

# Secret 3, SSL internal replication certificates
yq eval 'del(.metadata.ownerReferences, .metadata.annotations,
 .metadata.creationTimestamp, .metadata.resourceVersion,
 .metadata.selfLink, .metadata.uid)' 
 main-cluster-ssl-internal.yml > replica-cluster-ssl-internal.yml
sed -i 's/main-cluster/replica-cluster/g' replica-cluster-ssl-internal.yml

# Secret 4, encryption key
yq eval 'del(.metadata.ownerReferences, .metadata.annotations,
 .metadata.creationTimestamp, .metadata.resourceVersion,
 .metadata.selfLink, .metadata.uid)' 
 my-cluster-name-mongodb-encryption-key.yml > 
 my-cluster-name-mongodb-encryption-key-replica.yml
sed -i 's/main-cluster/replica-cluster/g' 
 my-cluster-name-mongodb-encryption-key-replica.yml

Important: If you delete and recreate the Main cluster, re-export all four
secrets before applying to the Replica. The resourceVersion and uid change
on every cluster recreation, stale values cause a conflict error.

Terminal 2 (replica cluster), apply and verify:

bash

kubectl apply -f my-cluster-secrets-replica.yaml -n psmdb
kubectl apply -f replica-cluster-ssl.yml -n psmdb
kubectl apply -f replica-cluster-ssl-internal.yml -n psmdb
kubectl apply -f my-cluster-name-mongodb-encryption-key-replica.yml -n psmdb

kubectl get secrets -n psmdb

Expected output should include:

NAME TYPE DATA AGE
my-cluster-name-mongodb-encryption-key Opaque 1 8s
my-cluster-name-secrets Opaque 10 33s
replica-cluster-ssl kubernetes.io/tls 3 24s
replica-cluster-ssl-internal kubernetes.io/tls 3 16s

Step 13: Create the Replica cluster

Run all commands in Terminal 2 (replica cluster).

Create cr-replica.yaml:

Key differences from cr-main.yaml:

unmanaged: true prevents the Operator from initializing a new replica set,
avoiding split-brain with the Main cluster’s Operator

updateStrategy: RollingUpdate, SmartUpdate is not supported on unmanaged clusters

SSL secrets are explicitly referenced because the Operator does not generate them here

bash

cat > cr-replica.yaml << 'EOF'
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
 name: replica-cluster
spec:
 unmanaged: true
 crVersion: 1.20.1
 image: percona/percona-server-mongodb:7.0.14-8-multi
 updateStrategy: RollingUpdate
 multiCluster:
 enabled: true
 DNSSuffix: svc.clusterset.local
 upgradeOptions:
 apply: disabled
 schedule: "0 2 * * *"
 secrets:
 users: my-cluster-name-secrets
 encryptionKey: my-cluster-name-mongodb-encryption-key
 ssl: replica-cluster-ssl
 sslInternal: replica-cluster-ssl-internal
 replsets:
 - name: rs0
 size: 3
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 sharding:
 enabled: true
 configsvrReplSet:
 size: 3
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 mongos:
 size: 3
 expose:
 type: ClusterIP
EOF

Apply it:

kubectl apply -f cr-replica.yaml -n psmdb

Watch until status is ready:

kubectl get psmdb -n psmdb -w

Expected output:

bash

kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-6877fcf797-gslpz 1/1 Running 0 119m
replica-cluster-cfg-0 1/1 Running 11 (25s ago) 43m
replica-cluster-cfg-1 1/1 Running 10 (7m49s ago) 43m
replica-cluster-cfg-2 1/1 Running 10 (7m25s ago) 42m
replica-cluster-mongos-0 0/1 Running 10 (6m46s ago) 42m
replica-cluster-rs0-0 1/1 Running 11 (22s ago) 43m
replica-cluster-rs0-1 1/1 Running 10 (7m17s ago) 43m
replica-cluster-rs0-2 1/1 Running 10 (7m21s ago) 42m

bash

kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-6877fcf797-gslpz 1/1 Running 0 113m
replica-cluster-cfg-0 0/1 CrashLoopBackOff 9 (108s ago) 37m
replica-cluster-cfg-1 0/1 CrashLoopBackOff 9 (72s ago) 36m
replica-cluster-cfg-2 0/1 CrashLoopBackOff 9 (48s ago) 36m
replica-cluster-mongos-0 0/1 CrashLoopBackOff 9 (9s ago) 36m
replica-cluster-rs0-0 0/1 CrashLoopBackOff 9 (104s ago) 37m
replica-cluster-rs0-1 0/1 CrashLoopBackOff 9 (40s ago) 36m
replica-cluster-rs0-2 0/1 CrashLoopBackOff 9 (44s ago) 36m

Expected behavior before interconnect (Step 15): The replica cluster runs with
unmanaged: true, so the Operator starts mongoc pods but does not initialize a
separate replica set, that happens on the main cluster after you add externalNodes
in Step 15. While waiting, replica pods may show CrashLoopBackOff with many
restarts. This is usually the liveness probe timing out, not mongoc crashing. It is
common for cfg and rs0 pods to settle to 1/1 Running before interconnect;
mongos often stays 0/1 the longest. kubectl get psmdb may not show ready
yet, that is expected. Continue to Steps 14 and 15.
If pods keep restarting after Step 15, re-check the secrets from Steps 11–12.

Verify ServiceExport resources were created (takes up to 5 minutes after ready):

kubectl get serviceexport -n psmdb

Expected output:

bash

NAME AGE
replica-cluster-cfg 59m
replica-cluster-cfg-0 59m
replica-cluster-cfg-1 58m
replica-cluster-cfg-2 58m
replica-cluster-mongos 59m
replica-cluster-rs0 59m
replica-cluster-rs0-0 59m
replica-cluster-rs0-1 58m
replica-cluster-rs0-2 57m

Step 14: Verify ServiceImports on both clusters

After both clusters are running, the MCS controller creates ServiceImport objects
automatically. This takes approximately 5 minutes after the ServiceExports appear.

Terminal 1 (main cluster):

kubectl get serviceimport -n psmdb

Terminal 2 (replica cluster):

kubectl get serviceimport -n psmdb

Each cluster should show 18 total ServiceImports, 9 for each cluster.
Example output on the replica cluster:

NAME TYPE IP AGE
main-cluster-cfg Headless 127m
main-cluster-cfg-0 ClusterSetIP ["34.118.239.158"] 127m
main-cluster-cfg-1 ClusterSetIP ["34.118.230.45"] 125m
main-cluster-cfg-2 ClusterSetIP ["34.118.237.3"] 123m
main-cluster-mongos ClusterSetIP ["34.118.230.127"] 127m
main-cluster-rs0 Headless 127m
main-cluster-rs0-0 ClusterSetIP ["34.118.237.28"] 127m
main-cluster-rs0-1 ClusterSetIP ["34.118.230.37"] 125m
main-cluster-rs0-2 ClusterSetIP ["34.118.226.30"] 123m
replica-cluster-cfg Headless 62m
replica-cluster-cfg-0 ClusterSetIP ["34.118.231.166"] 62m
replica-cluster-cfg-1 ClusterSetIP ["34.118.234.146"] 59m
replica-cluster-cfg-2 ClusterSetIP ["34.118.225.208"] 59m
replica-cluster-mongos ClusterSetIP ["34.118.239.237"] 62m
replica-cluster-rs0 Headless 62m
replica-cluster-rs0-0 ClusterSetIP ["34.118.228.53"] 62m
replica-cluster-rs0-1 ClusterSetIP ["34.118.238.50"] 59m
replica-cluster-rs0-2 ClusterSetIP ["34.118.232.241"] 59m

If any are missing, check the MCS importer logs on the affected cluster:

bash

kubectl logs -n gke-mcs -l k8s-app=gke-mcs-importer --tail=30 # run in Terminal 1 or 2

Step 15: Interconnect the clusters (add externalNodes)

ServiceImport objects give each cluster a way to resolve DNS names for services
in other clusters. externalNodes tells MongoDB to actually use those addresses
as replica set members. Both are needed, ServiceImport is the phone book,
externalNodes is the instruction to call.

Why two voting and one non-voting external node?
Adding two voting nodes (votes: 1) and one non-voting node (votes: 0) from the
other site prevents split-brain. If the network between sites is severed, neither
side can accidentally promote a new Primary using only its external nodes.

15a: Add Replica nodes to Main cluster

Run in Terminal 1 (main cluster).

Copy cr-main.yaml to cr-main-after.yaml and add an externalNodes block under
replsets.rs0 and under sharding.configsvrReplSet, everything else stays the same.

Create cr-main-after.yaml:

bash

cat > cr-main-after.yaml << 'EOF'
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
 name: main-cluster
spec:
 crVersion: 1.20.1
 image: percona/percona-server-mongodb:7.0.14-8-multi
 updateStrategy: SmartUpdate
 multiCluster:
 enabled: true
 DNSSuffix: svc.clusterset.local
 upgradeOptions:
 apply: disabled
 schedule: "0 2 * * *"
 secrets:
 users: my-cluster-name-secrets
 encryptionKey: my-cluster-name-mongodb-encryption-key
 replsets:
 - name: rs0
 size: 3
 externalNodes:
 - host: replica-cluster-rs0-0.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: replica-cluster-rs0-1.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: replica-cluster-rs0-2.psmdb.svc.clusterset.local
 votes: 0
 priority: 0
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 sharding:
 enabled: true
 configsvrReplSet:
 size: 3
 externalNodes:
 - host: replica-cluster-cfg-0.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: replica-cluster-cfg-1.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: replica-cluster-cfg-2.psmdb.svc.clusterset.local
 votes: 0
 priority: 0
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 mongos:
 size: 3
 expose:
 type: ClusterIP
EOF

Apply:

kubectl apply -f cr-main-after.yaml -n psmdb

15b: Add Main nodes to Replica cluster

Run in Terminal 2 (replica cluster).

Copy cr-replica.yaml to cr-replica-after.yaml and add an externalNodes block under
replsets.rs0 and under sharding.configsvrReplSet, everything else stays the same.

Create cr-replica-after.yaml:

bash

cat > cr-replica-after.yaml << 'EOF'
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
 name: replica-cluster
spec:
 unmanaged: true
 crVersion: 1.20.1
 image: percona/percona-server-mongodb:7.0.14-8-multi
 updateStrategy: RollingUpdate
 multiCluster:
 enabled: true
 DNSSuffix: svc.clusterset.local
 upgradeOptions:
 apply: disabled
 schedule: "0 2 * * *"
 secrets:
 users: my-cluster-name-secrets
 encryptionKey: my-cluster-name-mongodb-encryption-key
 ssl: replica-cluster-ssl
 sslInternal: replica-cluster-ssl-internal
 replsets:
 - name: rs0
 size: 3
 externalNodes:
 - host: main-cluster-rs0-0.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: main-cluster-rs0-1.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: main-cluster-rs0-2.psmdb.svc.clusterset.local
 votes: 0
 priority: 0
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 sharding:
 enabled: true
 configsvrReplSet:
 size: 3
 externalNodes:
 - host: main-cluster-cfg-0.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: main-cluster-cfg-1.psmdb.svc.clusterset.local
 votes: 1
 priority: 1
 - host: main-cluster-cfg-2.psmdb.svc.clusterset.local
 votes: 0
 priority: 0
 expose:
 enabled: true
 type: ClusterIP
 volumeSpec:
 persistentVolumeClaim:
 resources:
 requests:
 storage: 3Gi
 mongos:
 size: 3
 expose:
 type: ClusterIP
EOF

Apply:

kubectl apply -f cr-replica-after.yaml -n psmdb

After interconnect: Pods may restart on both clusters while MongoDB reconfigures
the replica sets, brief CrashLoopBackOff on replica is normal. Wait until all
pods are 1/1 Running before continuing to Step 16.

Step 16: Verify cross-cluster replication

Run in Terminal 1 (main cluster).

Get the clusterAdmin password:

bash

kubectl get secret my-cluster-name-secrets 
 -n psmdb 
 -o jsonpath="{.data.MONGODB_CLUSTER_ADMIN_PASSWORD}" | base64 --decode

Connect to the main cluster config server:

kubectl exec -it main-cluster-cfg-0 -n psmdb -- /bin/bash

Inside the pod:

mongosh admin -u clusterAdmin -p

Check replica set members:

javascript

rs.status().members

Expected output, 6 members total, all using svc.clusterset.local DNS names:

javascript

cfg [direct: primary] admin> rs.status().members
[
 {
 _id: 0,
 name: 'main-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 1,
 stateStr: 'PRIMARY',
 uptime: 17202,
 syncSourceHost: '',
 syncSourceId: -1,
 infoMessage: '',
 electionTime: Timestamp({ t: 1780921358, i: 2 }),
 electionDate: ISODate('2026-06-08T12:22:38.000Z'),
 configVersion: 14,
 configTerm: 1,
 self: true,
 lastHeartbeatMessage: ''
 },
 {
 _id: 1,
 name: 'main-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 17034,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'main-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 syncSourceId: 0,
 infoMessage: '',
 configVersion: 14,
 configTerm: 1
 },
 {
 _id: 2,
 name: 'main-cluster-cfg-2.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 16861,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'main-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 syncSourceId: 1,
 infoMessage: '',
 configVersion: 14,
 configTerm: 1
 },
 {
 _id: 3,
 name: 'replica-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 3214,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'main-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 syncSourceId: 0,
 infoMessage: '',
 configVersion: 14,
 configTerm: 1
 },
 {
 _id: 4,
 name: 'replica-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 3181,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'replica-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 syncSourceId: 3,
 infoMessage: '',
 configVersion: 14,
 configTerm: 1
 },
 {
 _id: 5,
 name: 'replica-cluster-cfg-2.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 3164,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'main-cluster-cfg-2.psmdb.svc.clusterset.local:27017',
 syncSourceId: 2,
 infoMessage: '',
 configVersion: 14,
 configTerm: 1
 }
]

If all 6 members appear with health: 1, cross-cluster replication is working.

Step 17: Test the switchover process

In a multi-cluster deployment, only one Operator should actively manage the replica
set at a time, otherwise both sites could try to reconfigure MongoDB and cause
split-brain.

Until now, the main Operator was in charge (unmanaged not set, so managed by
default). The replica Operator only kept pods running (unmanaged: true) and
did not drive failover or replica-set changes.

This step simulates a site failover in two moves:

Main → unmanaged: main Operator stops managing the replica set.
Replica → managed: replica Operator takes over and can elect a new PRIMARY.

Apply both changes below, then verify MongoDB elects a new PRIMARY on the replica side.

Terminal 1 (main cluster), release Operator control on main:

Edit cr-main-after.yaml under spec:, add unmanaged: true and change
updateStrategy from SmartUpdate to RollingUpdate (SmartUpdate requires a
managed cluster):

yaml

 unmanaged: true
 updateStrategy: RollingUpdate

Apply:

kubectl apply -f cr-main-after.yaml -n psmdb

Terminal 2 (replica cluster), give Operator control on replica:

Edit cr-replica-after.yaml under spec:, change unmanaged: true to
unmanaged: false so the replica Operator can manage failover and replica-set
reconfiguration:

yaml

 unmanaged: false

Apply:

kubectl apply -f cr-replica-after.yaml -n psmdb

Verify a new PRIMARY was elected on the replica side (Terminal 2):

kubectl exec -it replica-cluster-cfg-0 -n psmdb -- /bin/bash

Inside the pod:

mongosh admin -u clusterAdmin -p

javascript

rs.status().members

Expected: replica-cluster-cfg-0 is PRIMARY, main-side members are SECONDARY:

javascript

[
 {
 _id: 0,
 name: 'main-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 19106,
 syncSourceHost: 'replica-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 syncSourceId: 4,
 infoMessage: '',
 configVersion: 20,
 configTerm: 2,
 self: true,
 lastHeartbeatMessage: ''
 },
 {
 _id: 1,
 name: 'main-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 18938,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'main-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 syncSourceId: 0,
 infoMessage: '',
 configVersion: 20,
 configTerm: 2
 },
 {
 _id: 2,
 name: 'main-cluster-cfg-2.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 18765,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'main-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 syncSourceId: 1,
 infoMessage: '',
 configVersion: 20,
 configTerm: 2
 },
 {
 _id: 3,
 name: 'replica-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 1,
 stateStr: 'PRIMARY',
 uptime: 5118,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: '',
 syncSourceId: -1,
 infoMessage: '',
 electionTime: Timestamp({ t: 1780940264, i: 1 }),
 electionDate: ISODate('2026-06-08T17:37:44.000Z'),
 configVersion: 20,
 configTerm: 2
 },
 {
 _id: 4,
 name: 'replica-cluster-cfg-1.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 5085,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'replica-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 syncSourceId: 3,
 infoMessage: '',
 configVersion: 20,
 configTerm: 2
 },
 {
 _id: 5,
 name: 'replica-cluster-cfg-2.psmdb.svc.clusterset.local:27017',
 health: 1,
 state: 2,
 stateStr: 'SECONDARY',
 uptime: 5068,
 pingMs: Long('0'),
 lastHeartbeatMessage: '',
 syncSourceHost: 'replica-cluster-cfg-0.psmdb.svc.clusterset.local:27017',
 syncSourceId: 3,
 infoMessage: '',
 configVersion: 20,
 configTerm: 2
 }
]

Step 18: Cleanup

To remove the GKE clusters when you are done:

bash

gcloud container clusters delete main-cluster 
 --zone us-central1-a 
 --quiet

gcloud container clusters delete replica-cluster 
 --zone us-central1-a 
 --quiet

References

The post Guide Multi-Cluster MongoDB on GKE with MCS, Percona Operator appeared first on MariaDB.org.

MariaDB shortens maintenance period from 5 to 3 years

Thu, 11 Jun 2026 07:34:00 +0000

Somehow this news slipped past me: MariaDB has shortened the support period for the long-term releases of the MariaDB Community Server from 5 to 3 years. OK, I guess that’s not really surprising — I’ve been offline for a good month…

MariaDB Server LTS Release Support Periods

Release	GA date	EoL date	Duration
12.3	28 May 2026	Jun 2029	3 years
11.8	4 Jun 2025	4 Jun 2028	3 years
11.4	29 May 2024	29 May 2029	5 years
10.11	16 Feb 2023	16 Feb 2028	5 years
10.6	6 Jul 2021	6 Jul 2026	5 years
10.5	24 Jun 2020	24 Jun 2025	5 years
10.4	18 Jun 2019	18 Jun 2024	5 years
10.3	25 May 2018	25 May 2023	5 years
10.2	23 May 2017	23 May 2022	5 years
10.1	17 Oct 2015	17 Oct 2020	5 years
10.0	31 Mar 2014	31 Mar 2019	5 years

Source: MariaDB Server long-term release maintenance periods

I am curious to see how all the distributions will handle this. They have significantly longer support periods, after all.

And what about the competitors — the other databases?

Debian

Debian Long Term Support (LTS) is a project to extend the lifetime of all Debian stable releases to (at least) 5 years.

Source: Debian Long Term Support

Version	Name	Release	ext-LTS	EoL	Duration
Debian 13	trixie	2025-08-09	2030-07-01	2035-06-30	5 / 10 years
Debian 12	bookworm	2023-06-10	2028-07-01	2033-06-30	5 / 10 years
Debian 11	bullseye	2021-08-14	2026-09-01	2031-06-30	5 / 10 years
Debian 10	buster	2019-06-06	2024-07-01	2029-06-30	5 / 10 years
Debian 9	stretch	2017-06-17	2022-07-01	2027-06-30	5 / 10 years
Debian 8	jessie	2015-04-26	2020-07-01	2025-06-30	5 / 10 years
Debian 7	wheezy	2013-05-04	2018-06-01	2020-06-30	5 / 7 years

Source: Extended Long Term Support

Ubuntu

Version	Name	Release	End of Support	EoL	Duration
Ubuntu 26.04 LTS	Resolute Raccoon	23. April 2026	May 2031	April 2041	5 / 15 years
Ubuntu 24.04 LTS	Noble Numbat	25. April 2024	June 2029	April 2039	5 / 15 years
Ubuntu 22.04 LTS	Jammy Jellyfish	21. April 2022	June 2027	April 2037	5 / 15 years
Ubuntu 20.04 LTS	Focal Fossa	23. April 2020	May 2025	April 2035	5 / 15 years
Ubuntu 18.04 LTS	Bionic Beaver	26. April 2018	June 2023	April 2033	5 / 15 years
Ubuntu 16.04 LTS	Xenial Xerus	21. April 2016	April 2021	April 2031	5 / 15 years
Ubuntu 14.04 LTS	Trusty Tahr	17. April 2014	April 2019	April 2029	5 / 15 years

Source: List of releases

Rocky Linux

Release	Codename	Release Date	Active Support End	End of Life	Duration
Rocky Linux 10	Red Quartz	June 11, 2025	May 31, 2030	May 31, 2035	5 / 10 years
Rocky Linux 9	Blue Onyx	July 14, 2022	May 31, 2027	May 31, 2032	5 / 10 years
Rocky Linux 8	Green Obsidian	May 1, 2021	May 31, 2024	May 31, 2029	3 / 8 years

Source: Rocky Linux Release and Version Guide

Oracle / MySQL Releases

Release	GA Date	Premier Support End	Extended Support End	Duration
MySQL 9.7	Apr 2026	Apr 2031	Apr 2034	5 / 8 years
MySQL 8.4	Apr 2024	Apr 2029	Apr 2032	5 / 8 years
MySQL 8.0	Apr 2018	Apr 2025	Apr 2026	7 years / 8 years
MySQL 5.7	Oct 2015	Oct 2020	Oct 2023	5 years / 8 years
MySQL 5.6	Feb 2013	Feb 2018	Feb 2021	5 years / 8 years
MySQL 5.5	Dec 2010	Dec 2015	Dec 2018	5 years / 8 years
MySQL 5.1	Dec 2008	Dec 2013	Not Available	5 years
MySQL 5.0	Oct 2005	Dec 2011	Not Available	6 years

Source: Oracle Lifetime Support Policy

Percona

Percona Distribution for PostgreSQL (PDPG) und Percona Server for MySQL (PS): At least 5 years, if I am interpreting the support matrix correctly…

Source: Percona Release Lifecycle Overview

OurSQL / VillageSQL

No finished software is available yet, and thus no support policies, as far as I know. Is that even planned at all?

Source: OurSQL und VillageSQL

PostgreSQL

Version	First Release	Final Release	Duration
18	September 25, 2025	November 14, 2030	5 years
17	September 26, 2024	November 8, 2029	5 years
16	September 14, 2023	November 9, 2028	5 years
15	October 13, 2022	November 11, 2027	5 years
14	September 30, 2021	November 12, 2026	5 years
13	September 24, 2020	November 13, 2025	5 years
12	October 3, 2019	November 21, 2024	5 years
11	October 18, 2018	November 9, 2023	5 years
10	October 5, 2017	November 10, 2022	5 years

Source: Versioning Policy

Further sources

MariaDB 10.6 Changes & Improvements
MariaDB 10.6 is a long-term maintenance stable version. The first stable release was in July 2021, and it will be maintained until July 2026.
MariaDB 10.11 Changes & Improvements
MariaDB 10.11 is a long-term maintenance release series, maintained until February 2028.
MariaDB 11.4 Changes & Improvements
MariaDB 11.4 is a current long-term series, maintained until May 2029.
MariaDB 11.8 Changes & Improvements
MariaDB 11.8 is a long-term release, maintained until June 2028.
MariaDB 12.3 Changes & Improvements
MariaDB 12.3 is a long term release, maintained until June 2029.

The post MariaDB shortens maintenance period from 5 to 3 years appeared first on MariaDB.org.

MariaDB Connector/C 3.4.9, and 3.3.19 now available

Wed, 10 Jun 2026 17:42:44 +0000

MariaDB is pleased to announce the immediate availability of MariaDB Connector/C 3.4.9, and 3.3.19. Download Now Notable items: Notable items: See the release notes and changelogs for more details and visit mariadb.com/downloads/connectors to download.

Source

The post MariaDB Connector/C 3.4.9, and 3.3.19 now available appeared first on MariaDB.org.

MariaDB Vector Support Upstreamed to Open-WebUI: Single-Database RAG Just Got Faster and Simpler

Wed, 10 Jun 2026 12:21:46 +0000

At Shattered Silicon, we live at the intersection of high-performance databases and production-grade AI. As an open-source contributor in the MariaDB ecosystem and a serious player bridging relational databases with modern AI workloads, we are excited to share our upstream contribution to one of the most popular self-hosted AI platforms: Open-WebUI. Why MariaDB Vector Changes […]

The post MariaDB Vector Support Upstreamed to Open-WebUI: Single-Database RAG Just Got Faster and Simpler appeared first on Shattered Silicon.

The post MariaDB Vector Support Upstreamed to Open-WebUI: Single-Database RAG Just Got Faster and Simpler appeared first on MariaDB.org.

MariaDB Server 12.3, 11.8, 11.4, 10.11, 10.6 – May 2026’s releases: thank you for your contributions

Wed, 10 Jun 2026 11:38:37 +0000

On May… we have released an update of our 5 current LTS releases:
These new releases contain a large amount of external contributions. The number of contributors is constantly growing, which is great! …

Continue reading “MariaDB Server 12.3, 11.8, 11.4, 10.11, 10.6 – May 2026’s releases: thank you for your contributions”

The post MariaDB Server 12.3, 11.8, 11.4, 10.11, 10.6 – May 2026’s releases: thank you for your contributions appeared first on MariaDB.org.

MariaDB Node.js Connector 3.5.3 and 3.4.6 now available

Tue, 09 Jun 2026 19:46:19 +0000

MariaDB is pleased to announce the immediate availability of the MariaDB Connector/Node.js 3.5.3 and 3.4.6 GA releases. Download Now MariaDB Connector/Node.js 3.5.3 is a Stable (GA) release. Notable changes in this release include: MariaDB Connector/Node.js 3.4.6 is a Stable (GA) release. Notable changes in this release include: See…

Source

The post MariaDB Node.js Connector 3.5.3 and 3.4.6 now available appeared first on MariaDB.org.

Comprehensive Self-Service Backups for Continuous Data Protection in MariaDB Cloud

Tue, 09 Jun 2026 18:57:49 +0000

The MariaDB Cloud Backup Service provides organizations with a fully managed service for continuous data protection, mitigating risks from hardware failure, zonal disruptions, data corruption, and cyberattacks. By offering a comprehensive API and intuitive interface, this service allows companies to automate recovery strategies tailored to specific compliance and business continuity requirements.

Source

The post Comprehensive Self-Service Backups for Continuous Data Protection in MariaDB Cloud appeared first on MariaDB.org.

DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.

Tue, 09 Jun 2026 16:30:22 +0000

An early look at the DuckDB storage engine for MariaDB — columnar, vectorized analytics that live right next to your transactional tables.
The problem
MariaDB’s InnoDB is excellent at what it was built for: transactions. …

Continue reading “DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack.”

The post DuckDB Storage Engine for MariaDB. When the Sea Lion Learns to Quack. appeared first on MariaDB.org.

Managing ClickHouse Resources in Multi-Tenant Environments

Tue, 09 Jun 2026 11:20:22 +0000

When people first deploy ClickHouse, their initial reaction is often surprise. Queries that used to take minutes now finish in seconds. Dashboards feel instant even when reading billions of rows.

To see this in action, here is a simple aggregation query running against a 200 million row events table:

SELECT customer_id, count()
FROM my_db.events GROUP BY customer_id ORDER BY count() DESC;

ClickHouse delivers exceptional speed, scanning 200 million rows in under a second on a 3-node cluster. This efficiency powers real-time analytics and observability platforms.

However, production environments are often multi-tenant, where dashboards, ETL pipelines, and background processes share CPU, memory, and disk resources. Without proper resource management, greedy workloads can saturate the system, causing performance degradation across all tasks.

This post explores operational strategies for managing ClickHouse in shared environments. Using a live 3-node replicated cluster with 200 million rows, we demonstrate how to identify contention, implement workload scheduling, and validate system stability.

Understanding Resource Contention in ClickHouse

ClickHouse is built for analytical processing. It scans large datasets fast by spreading work across many CPU threads at the same time. That design is what makes it so quick. But it also means that when multiple workloads run together, they start competing for the same resources at the same time.

This is different from databases like MySQL or PostgreSQL. In those systems, contention usually shows up as lock waits or transaction conflicts. In ClickHouse, the problem is almost always infrastructure saturation.

Take this query running against our 200 million row events table:

SELECT customer_id, count()
FROM my_db.events GROUP BY customer_id ORDER BY count() DESC;

This query looks simple but it scans all 200 million rows, builds aggregation buffers in memory, uses multiple CPU threads in parallel, and reads a significant amount of data from disk. Run one and the cluster handles it fine. Run several at the same time and things start to break down. You can verify this directly by checking the query log:

SELECT query_duration_ms, read_rows, read_bytes, memory_usage
FROM system.query_log
WHERE type = 'QueryFinish' AND query LIKE '%customer_id%'
ORDER BY event_time DESC LIMIT 5;

This gets even more complicated in multi-tenant environments. A tenant can be a different team, a different application, or a different customer all sharing the same cluster at the same time. The challenges are real. One heavy query slows down everyone else.

Without proper row policies data can leak between tenants. Without resource controls one tenant can consume everything and leave nothing for others. And without careful schema design, performance problems become very hard to fix later. These are not just performance problems. In multi-tenant environments they become operational risks.

When contention builds up, operators start noticing these symptoms: CPU stays near 100% even between queries, dashboard responses get slower, replication starts falling behind, merge queues keep growing, insert throughput drops, and network pressure is also real. In our 3 node setup every insert gets replicated to two other nodes at the same time.

During heavy inserts, replication traffic and query traffic compete for the same network interface and replication lag starts climbing:

SELECT replica_name, absolute_delay, queue_size, inserts_in_queue
FROM system.replicas
ORDER BY absolute_delay DESC;

ClickHouse is not broken when this happens. It is doing exactly what it was designed to do, which is use every available resource to finish analytical work as fast as possible. The job of the operator is to make sure no single workload takes more than its fair share.

That is what the rest of this article is about.

CPU Contention and Thread Management

In shared ClickHouse environments, CPU contention is frequent because the system defaults to using maximum threads for speed. While effective for single queries, concurrent workloads compete for threads, overwhelming the CPU.

A common way to control this is with the max_threads setting: SET max_threads = 4;

The first reaction most people have is, “Why would I want to make my queries slower?” The honest answer is that fewer threads does not always mean slower. Sometimes it means faster.

We tested this directly on our 3 node cluster with 200 million rows. We ran the same query under different conditions and checked the query log:

SELECT query_duration_ms, read_rows, Settings['max_threads'] AS max_threads 
FROM system.query_log
WHERE type = 'QueryFinish' AND query LIKE '%customer_id%' ORDER BY event_time DESC LIMIT 4;

In a shared cluster the benefit becomes even more obvious. When 10 analysts run queries at the same time on a 32 core server and each query tries to use 16 threads, that is 160 threads competing for 32 cores. The CPU scheduler gets overwhelmed and everything slows down together. By giving each query fewer threads the cluster stays stable and responsive for everyone.

Think of it this way. A single lane highway moves fast until everyone tries to use it at once. Splitting into more lanes and slowing everyone down slightly keeps traffic moving for all users.

When lowering max_threads makes sense:

A shared cluster where many users run queries at the same time
Dashboard workloads that need consistent low latency
Environments where insert pipelines and merges need to keep running alongside analytical queries

When raising max_threads makes sense:

A dedicated batch environment running a small number of heavy jobs
Overnight ETL workloads where the cluster is mostly idle
Single user environments where there is no competition for resources

The most important thing to understand is that more threads is not always better. The right value always depends on your hardware, your data, and how many workloads are sharing the cluster at the same time.

Memory Management and Query Stability

Memory is the next resource that gets squeezed in a shared ClickHouse environment. Analytical queries are hungry for memory. Operations like GROUP BY, JOIN, sorting, DISTINCT, and distributed aggregations all need to build large temporary buffers while they run.

The setting that controls this is SET max_memory_usage = '1G';

This limits how much memory a single query can use. Most people assume that giving queries more memory is always better because they finish faster. In practice that thinking is one of the fastest ways to destabilize a shared cluster.

Our 3 node cluster is a good real world example of this. Each node has 4GB of total RAM with no swap configured. Here is the actual memory picture on each node:

               total        used        free      
available
Mem:           4.0Gi       1.7Gi       2.1Gi       
2.3Gi
Swap:             0B          0B          0B

ClickHouse is already consuming around 415MB just to keep the server running. That leaves roughly 2.3GB actually available for queries, merges, replication, and the operating system to share.

The default max_memory_usage is set to 0 which means unlimited. On a node with no swap that is dangerous. If a query tries to allocate more memory than the node has available, the operating system will immediately kill the ClickHouse process. There is no swap to fall back on. The process just dies. You can verify your current memory usage and limit with these queries:

SELECT metric, value FROM system.metrics WHERE metric LIKE '%Memory%';
SELECT name, value FROM system.settings WHERE name = 'max_memory_usage';

On our cluster the result looks like this:

The fix is a per query limit in the user config and a server wide cap in /etc/clickhouse-server/config.d/memory.xml. For our environment we set max_memory_usage to 1GB per query and max_server_memory_usage to 3GB total. This leaves 1GB free for the OS, Keeper, and background processes.

When the limit is hit users will see MEMORY_LIMIT_EXCEEDED. That error is actually a good sign. It means the limit is working and protecting the node from going down entirely.

But setting limits too low creates the opposite problem. Some workloads genuinely need large buffers. If limits are too tight legitimate queries start failing.

When lowering max_memory_usage makes sense:

A shared cluster with many concurrent users
Nodes with limited RAM and no swap like our environment
Environments prone to sudden traffic spikes

When raising max_memory_usage makes sense:

Isolated reporting workloads running on a schedule
Heavy ETL jobs running during off peak hours
Dedicated nodes with higher memory capacity

On our 4GB nodes with no swap, keeping memory limits tight is not optional; it is what keeps the cluster alive.

Disk I/O and Merge Pressure

Disk behavior in ClickHouse is very different from most traditional databases because of how the MergeTree engine works. Every insert gets written as a new immutable part on disk. A background process continuously merges these small parts into larger ones to keep storage efficient and queries fast. Without merges, parts accumulate, queries slow down, and storage becomes fragmented.

The most common way operators create merge problems without realizing it is by inserting data in very small batches. We simulated this on our cluster by running 1000 single row inserts in a loop. The parts count jumped significantly with each insert. You can see this directly by checking parts before and after:

SELECT database, table, count() AS parts_count, sum(rows) AS total_rows
FROM system.parts
WHERE active = 1 AND database = 'my_db'
GROUP BY database, table;

Each tiny insert creates a new part on disk. This is what people call a merge explosion. The merge queue builds up faster than ClickHouse can clear it, disk I/O gets saturated from background merges competing with foreground queries, replication falls behind, and query performance drops because ClickHouse has to scan many more physical files.

The fix is simple. Insert data in large batches instead of small ones. Instead of 1 row at a time, insert at least 10,000 rows per batch. When we loaded 200 million rows in large batches the part count stayed manageable throughout.

You can monitor merge activity at any time with:

SELECT database, table, elapsed, progress, num_parts
FROM system.merges
ORDER BY elapsed DESC;

Operators can also tune background merge concurrency with background_pool_size. Higher values help clear backlogs faster but on our 4GB nodes with no swap, more merge threads means more memory and disk I/O competing with foreground queries at the same time.

When increasing background_pool_size makes sense:

Merge queues are growing consistently and not clearing
Disks have spare I/O capacity
Nodes have enough RAM to handle additional merge threads

When keeping background_pool_size lower makes sense:

Disks are already saturated
Nodes have limited RAM like our 4GB environment
Query latency is more important than insert throughput

Higher values do not automatically mean better performance. On constrained hardware like ours, keeping merge concurrency modest is what keeps queries responsive while background work continues steadily.

Workload Scheduling and Prioritization

Even with thread and memory limits in place, a shared cluster still struggles when different workload types compete for the same resources at the same time. Dashboard queries need millisecond responses. ETL jobs can take minutes. Without scheduling, both are treated equally and dashboards suffer. ClickHouse solves this with a workload scheduling system that controls how disk IO, CPU threads, and query slots are shared between workloads.

root
├── realtime
│   ├── dashboards
│   └── api_queries
├── batch
│   ├── etl
│   └── exports
└── background
   ├── merges
   └── replication

Scheduling Hierarchy and Resource Definitions

The foundation of workload scheduling in ClickHouse is the concept of a resource. A resource represents a shared physical asset that multiple workloads compete for. ClickHouse supports three types: disk IO, CPU threads, and query slots.

Start by defining what resources exist on your cluster:

CREATE RESOURCE disk_read (READ ANY DISK);
CREATE RESOURCE disk_write (WRITE ANY DISK);
CREATE RESOURCE cpu (MASTER THREAD, WORKER THREAD);
CREATE RESOURCE query (QUERY);

The READ and WRITE disk definitions are important. They let you control read and write IO separately. In a shared cluster, dashboard read traffic and insert write traffic compete for the same disk bandwidth. Separating them gives you independent control over each.

Once resources are defined, build a workload hierarchy on top of them. The root workload sits at the top and distributes resources down to everything below it:

CREATE WORKLOAD root
SETTINGS
    max_concurrent_threads = 50,
    max_concurrent_queries = 50,
    max_queries_per_second = 20;

CREATE WORKLOAD realtime IN root SETTINGS priority = 1;
CREATE WORKLOAD batch IN root SETTINGS priority = 10;
CREATE WORKLOAD background IN root SETTINGS priority = 100;

You can also apply bandwidth limits per resource directly on a workload. This caps read bandwidth at 100 MB/s and write bandwidth at 50 MB/s:

CREATE WORKLOAD all IN root
SETTINGS
    max_bytes_per_second = 104857600 FOR disk_read,
    max_bytes_per_second = 52428800 FOR disk_write;

The root workload manages resource distribution across the hierarchy. High-priority “realtime” traffic like dashboards requires fast, consistent responses. The “batch” branch handles latency-tolerant tasks such as ETL pipelines, while “background” operations like replication run steadily without impacting foreground performance.

You can verify which workloads exist on your cluster:

SELECT * FROM system.workloads;
SELECT * FROM system.resources;

In ClickHouse lower priority numbers mean higher priority. Realtime gets served first, then batch, then background. Assign users to workloads by creating dedicated users:

CREATE USER dashboard_user IDENTIFIED BY 'dashboard123'
SETTINGS workload = 'realtime';

CREATE USER analyst IDENTIFIED BY 'analyst123'
SETTINGS workload = 'batch';

You can also assign workloads through the user config file for existing users. Add the workload setting to /etc/clickhouse-server/users.d/default-password.xml

N.B. A common mistake is giving all workloads equal priority. When a heavy batch job and a lightweight dashboard query compete equally, the batch job almost always wins because it consumes more resources per query. Proper prioritization flips this; the batch job still runs, it just waits its turn when realtime traffic needs resources first.

Memory Overcommit and Query Queueing

Memory overcommit controls what happens when total memory demand from all running queries exceeds what is physically available. On our 4GB nodes with no swap this is critical. Without overcommit controls, if multiple queries simultaneously try to allocate more memory than is available the OS kills the ClickHouse process immediately.

ClickHouse handles this by waiting briefly for other queries to release memory before terminating the most overcommitted query first. This is much safer than having no limit at all:

SET max_memory_usage = 1073741824;
SET memory_usage_overcommit_max_wait_microseconds = 5000000;

Instead of the entire node going down, only the most memory hungry query gets cancelled. Everything else keeps running. On our 4GB nodes this is the difference between a graceful query failure and a full cluster crash.

Query queueing handles overload at the concurrency level. When more queries arrive than the cluster can handle they queue up instead of all running at once. You can set this at the server level in /etc/clickhouse-server/config.d/cluster.xml. Or via workload scheduling:

CREATE OR REPLACE WORKLOAD root SETTINGS
    max_concurrent_threads = 50,
    max_concurrent_queries = 50,
    max_queries_per_second = 20;

On our 4GB nodes, 50 concurrent queries is a safe ceiling. New queries that arrive when the limit is hit wait for a slot instead of crashing the node. Monitor active and queued queries at any time:

SELECT query, elapsed, memory_usage, read_rows
FROM system.processes
ORDER BY elapsed DESC;

When lowering max_concurrent_queries makes sense:

Nodes with limited RAM like our 4GB environment
Clusters with no swap configured
Environments where query stability matters more than raw throughput

When raising max_concurrent_queries makes sense:

Nodes with large amounts of RAM and fast disks
Clusters serving many lightweight queries simultaneously
Environments where queries are short and memory usage per query is low

A cluster managing a queue is more often more stable than one where unlimited queries run simultaneously. On constrained hardware, queueing is not a limitation but what keeps the cluster alive under pressure. When workload scheduling makes the most difference:

Customer facing dashboards sharing a cluster with internal ETL jobs
Clusters serving multiple teams with different SLA requirements
Environments where insert pipelines and analytical queries run simultaneously

N.B. The goal is not to make batch jobs slow. The goal is to make sure realtime workloads stay fast even when the cluster is under pressure.

Isolation Strategies in Multi-Tenant Environments

Everything we have covered so far assumes different workloads share the same cluster. That works well up to a point; but, some organizations eventually reach a scale where sharing creates too much risk. One bad query from one tenant can still affect everyone else no matter how carefully the limits are tuned — this is the noisy neighbor problem. The solution depends on how much isolation you actually need. There are three main ways organizations handle this depending on their scale and operational maturity.

Approach 1: Shared Cluster with Schema Level Isolation

This is the most common starting point. All tenants share the same cluster and the same table. Isolation is handled through schema design and row policies.

The most important thing to get right is the schema. Including tenant_id in the sorting key makes a significant difference:

CREATE TABLE my_db.events ON CLUSTER my_cluster
(
    tenant_id       UInt32,
    event_date      Date,
    event_id        UInt64,
    customer_id     UInt32,
    event_type      LowCardinality(String),
    event_timestamp DateTime,
    metadata        String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (tenant_id, event_date, customer_id, event_id);

With tenant_id first in the sort key, ClickHouse physically stores each tenant’s data together on disk. A query filtering by tenant_id only reads that tenant’s data and skips everything else.

For dashboard workloads, pre-aggregate data per tenant into materialized views instead of letting tenants query the raw table directly:

CREATE MATERIALIZED VIEW my_db.events_tenant1_mv
ENGINE = SummingMergeTree()
ORDER BY (event_date, customer_id)
AS SELECT
    event_date,
    customer_id,
    count() AS event_count
FROM my_db.events
WHERE tenant_id = 1
GROUP BY event_date, customer_id;

This keeps tenant queries physically separated and pre-computed so one tenant’s heavy scan cannot slow down another’s dashboard.

Then enforce data isolation with restrictive row policies:

-- Grant access
GRANT SELECT ON my_db.events TO tenant1_user;
GRANT SELECT ON my_db.events TO tenant2_user;

-- Create restrictive row policies
CREATE ROW POLICY tenant1_policy ON my_db.events
AS RESTRICTIVE
FOR SELECT USING tenant_id = 1
TO tenant1_user;

CREATE ROW POLICY tenant2_policy ON my_db.events
AS RESTRICTIVE
FOR SELECT USING tenant_id = 2
TO tenant2_user;

-- Verify policies
SELECT short_name, select_filter, is_restrictive, apply_to_list
FROM system.row_policies
WHERE table = 'events';

On our cluster with 200 million rows distributed across 5 tenants, each tenant user can only see their own 40 million rows and gets zero results when querying other tenant data.

Approach 2: Database Level Isolation

A step up from row policies. Each tenant gets their own database but shares the same cluster infrastructure:

CREATE DATABASE tenant1_db ON CLUSTER my_cluster;
CREATE DATABASE tenant2_db ON CLUSTER my_cluster;
CREATE TABLE tenant1_db.events ON CLUSTER my_cluster
(
    event_date      Date,
    event_id        UInt64,
    customer_id     UInt32,
    event_type      LowCardinality(String),
    event_timestamp DateTime,
    metadata        String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/tenant1/events', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, customer_id, event_id);

This gives cleaner separation and makes per tenant storage, backups, and access controls easier to manage. The tradeoff is more tables to maintain as tenant count grows.

Approach 3: Cluster Level Isolation

The strongest form of isolation. Different workload types get entirely separate clusters:

Ingestion cluster	handles high throughput data loading
Dashboard cluster	optimized for low latency concurrent reads
ETL cluster	reserved for heavy transformation jobs

This eliminates the noisy neighbor problem completely. The tradeoff is higher infrastructure cost and more operational complexity.

Using Settings to Prioritize Workloads

Beyond isolation approach, individual query settings give operators per query control over resource consumption per tenant without changing global settings:

-- Heavy report query - limit resources
SELECT customer_id, count()
FROM my_db.events
WHERE tenant_id = 1
GROUP BY customer_id
SETTINGS max_threads = 4, max_memory_usage = 1073741824, workload = 'batch';

-- Dashboard query - allow more resources
SELECT count()
FROM my_db.events
WHERE tenant_id = 1
AND event_date = today()
SETTINGS max_threads = 8, workload = 'realtime';

A heavy analytical report from one tenant can be throttled while their dashboard queries remain fast.

Choosing the Right Approach

Approach	Best for	Tradeoff
Shared cluster with row policies	Small tenant count, limited hardware	Noisy neighbor risk remains
Separate databases per tenant	Medium tenant count, cleaner isolation	More tables to manage
Dedicated clusters	Large scale, strict SLAs	Higher cost and complexity

Most organizations start with Approach 1, move to Approach 2 as tenant count grows, and only adopt Approach 3 when SLA requirements become strict enough to justify the cost. For our 3 node cluster with 4GB RAM per node, Approach 1 with row policies, materialized views, and workload assignment is the most practical starting point.

Operational Best Practices

Resource issues in ClickHouse rarely announce themselves immediately. A cluster can look perfectly healthy from the outside while merge queues, memory pressure, or replication lag quietly build up internally. By the time users start complaining the problem has usually been growing for a while. This is why operational visibility and proper configuration are just as important as the tuning settings we covered in earlier sections.

Setting Up Workload Classes and Assigning Quotas

CREATE WORKLOAD realtime IN root SETTINGS priority = 1;
CREATE WORKLOAD batch IN root SETTINGS priority = 10;
CREATE WORKLOAD background IN root SETTINGS priority = 100;

Then assign users to workloads:

CREATE USER dashboard_user IDENTIFIED BY 'dashboard123'
SETTINGS workload = 'realtime';

CREATE USER analyst IDENTIFIED BY 'analyst123'
SETTINGS workload = 'batch';

Quotas add a second layer of control on top of workload priority. Even if a user has high priority, quotas prevent them from consuming unlimited resources over time:

-- Tenant users: 1000 queries per hour, max 10 billion rows read
CREATE QUOTA tenant_quota
    FOR INTERVAL 1 HOUR
    MAX queries = 1000,
    MAX read_rows = 10000000000
    TO tenant1_user, tenant2_user;

-- Analysts: 100 queries per hour, max 5 billion rows read
CREATE QUOTA analyst_quota
    FOR INTERVAL 1 HOUR
    MAX queries = 100,
    MAX read_rows = 5000000000
    TO analyst;

Monitoring Resource Usage

Good monitoring practices catch problems before they become visible to users. On our 3 node cluster with 200 million rows we focus on five key signals.

Query latency is usually the first visible sign of contention:

SELECT query_duration_ms, read_rows, memory_usage, query
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - INTERVAL 10 MINUTE
ORDER BY query_duration_ms DESC
LIMIT 10;

Replication lag signals network pressure or overloaded replicas:

SELECT replica_name, absolute_delay, queue_size, inserts_in_queue
FROM system.replicas
ORDER BY absolute_delay DESC;

Parts growth indicates merge pressure from small inserts:

SELECT database, table, count() AS parts_count, sum(rows) AS total_rows
FROM system.parts
WHERE active = 1 AND database = 'my_db'
GROUP BY database, table
ORDER BY parts_count DESC;

For a full cluster health snapshot combine all signals into one query:

SELECT
    (SELECT count() FROM system.processes) AS active_queries,
    (SELECT count() FROM system.merges) AS active_merges,
    (SELECT max(absolute_delay) FROM system.replicas) AS max_replication_delay,
    (SELECT max(queue_size) FROM system.replicas) AS max_replication_queue,
    (SELECT count() FROM system.parts WHERE active = 1 AND database = 'my_db') AS parts_count,
    (SELECT value FROM system.metrics WHERE metric = 'MemoryTracking' LIMIT 1) AS memory_used_bytes;

Run this regularly and you will catch problems before they reach users.

One important thing to keep in mind is that tuning ClickHouse rarely eliminates a bottleneck completely. It usually just moves it somewhere else. Increasing background_pool_size may clear the merge queue faster but adds more disk I/O pressure. Lowering max_memory_usage may stabilize the cluster but some queries will start failing. The system tables covered in this section are your best tool for observing exactly what changed after each adjustment.

The best way to understand ClickHouse resource management is not to read about it but to test it directly on a real cluster with real data and watch what happens.

Integrating with Ops Tooling

Running ClickHouse in production is not just about tuning settings and writing good queries. At some point the cluster needs to integrate with the broader operational infrastructure that the rest of your organization already uses. Alerts need to fire before users notice problems. Capacity needs to grow before resource pressure becomes a crisis. And in organizations running multiple database technologies, policies need to be enforced consistently across all of them.

Alerts for Resource Exhaustion

ClickHouse exposes metrics via its HTTP interface that can be scraped by Prometheus or any compatible monitoring system: curl http://server1:8123/metrics

Metric	Warning	Critical
Memory usage	> 2.5GB	> 3GB
Replication delay	> 30s	> 300s
Active merges	> 10	> 20
Concurrent queries	> 40	> 50
Parts count	> 500	> 1000

Recommended alert thresholds for our 4GB nodes

Scaling Out Nodes and Shards When Resource Pressure Builds

Our current setup is 1 shard with 3 replicas. When pressure builds consistently across all nodes it is a signal to grow. Scale up by adding more CPU or memory to existing nodes. Scale out by adding more shards to distribute data and query load across more hardware.

Signs it is time to scale out:

CPU stays above 80% consistently,
memory errors appear regularly,
merge queues keep growing despite tuning,
and replication lag keeps climbing.

Using Unified Management to Enforce Policies Across Databases

Organizations running ClickHouse alongside MySQL or PostgreSQL face the challenge of managing resource policies, backups, and monitoring separately for each technology. Purpose built database management tools provide a unified management layer across heterogeneous database environments. From a single interface operators can monitor ClickHouse alongside other databases, enforce consistent backup policies, manage user access across multiple clusters, and get unified alerting across all database technologies. The goal is not to replace ClickHouse native tooling. It is to reduce operational overhead when running ClickHouse as part of a larger database fleet.

Conclusion

ClickHouse is genuinely fast. But in shared environments, speed without governance becomes a liability. Throughout this article we ran real workloads against 200 million rows on constrained 4GB nodes to show exactly how contention happens and how to control it.
The key takeaways are simple.

Lower max_threads in shared clusters. Set max_memory_usage explicitly especially on nodes with no swap. Insert in large batches to avoid merge explosions. Assign workload classes so dashboards always get priority over batch jobs. Put your primary identifier column first in the sort key and enforce row policies per tenant.

Before going to production with a shared cluster run through this quick checklist:

Primary identifier column is first in the sort key
Row policies are set to AS RESTRICTIVE
max_memory_usage and max_server_memory_usage are explicitly set
Workload hierarchy is defined with realtime, batch, and background
Quotas are assigned per user type
Inserts are batched at minimum 10,000 rows
Health snapshot query is running regularly

Good ClickHouse operations are not about maximizing every resource. They are about finding the right balance between throughput, latency, fairness, and stability for your specific workload.

The post Managing ClickHouse Resources in Multi-Tenant Environments appeared first on Severalnines.

The post Managing ClickHouse Resources in Multi-Tenant Environments appeared first on MariaDB.org.

Percona Operator for MySQL (PXC) 1.20.0: Automatic Storage Resizing, TLS Certificate Rotation, and ARM64 Support

Tue, 09 Jun 2026 10:25:46 +0000

Percona Operator for MySQL PXC 1.20.0 is out today, and it addresses three long-requested operational headaches: storage that grows on its own before it fills up, TLS certificates that rotate without cluster downtime, and images that run natively on ARM64.

Disk-full incidents on PXC clusters often arrive at 2 AM when monitoring alerts fire, and someone has to manually expand PVCs before writes grind to a halt. Certificate rotations have traditionally meant a carefully timed series of kubectl edits with real downtime risk. And ARM64 hardware has been increasingly common in dev clusters and cost-optimized cloud node pools, where x86-only images created extra friction. 1.20.0 addresses all three in a single release.

The operator is open source and runs on any CNCF-conformant Kubernetes distribution, including GKE, EKS, AKS, and OpenShift. It supports Kubernetes 1.33 through 1.36 and PXC 8.4, 8.0, and 5.7.

In this post, you’ll learn about:

Automatic PVC storage resizing with configurable thresholds and a hard cap
Zero-downtime TLS certificate rotation via a new Secret naming convention
Native ARM64 support across all operator images
PITR validation that catches misconfigured targets before restores begin
Configurable leader election for high-latency or unstable networks
Other improvements in this release

Automatic Storage Resizing

Why it matters

A full data volume is the most common cause of unplanned maintenance on a PXC cluster. Until now, avoiding it required external monitoring, manual kubectl patch pvc steps, and waiting for the storage class to honor the resize. Even with good alerting, the operator itself had no mechanism to react: it could only expand PVCs when you changed the spec by hand.

1.20.0 introduces built-in storage autoscaling. The operator polls each PVC’s actual disk usage, and when usage crosses a configured threshold, it automatically expands the claim. You set the trigger percentage, the step size per resize event, and an optional upper bound. The operator handles everything else.

How it works

The autoscaler runs inside the normal reconcile loop. It reads status.capacity.storage from each PXC PVC, compares current usage against triggerThresholdPercent, and issues a PVC resize when the threshold is crossed. It sets a percona.com/pvc-resize-in-progress annotation on the CR while an expansion is active. This annotation blocks concurrent rolling restarts or upgrades from starting, so nothing disrupts the cluster mid-resize.

You can also set enableExternalAutoscaling: true if an external tool, such as KEDA, already manages PVC sizes for your cluster. When you enable external autoscaling, the built-in loop skips its resize check entirely to avoid conflicts.

Wiring it up

Add storageScaling to your PerconaXtraDBCluster spec:

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
  name: cluster1
spec:
  crVersion: 1.20.0
  storageScaling:
    enableVolumeScaling: true
    autoscaling:
      enabled: true
      triggerThresholdPercent: 80   # resize when a PVC is 80% full
      growthStep: 2Gi               # add 2Gi per resize event
      maxSize: 100Gi                # never grow beyond 100Gi per PVC
#     enableExternalAutoscaling: false

Any PVC expansion requires enableVolumeScaling: true, whether the autoscaler or a manual spec change triggers it. Setting autoscaling.enabled: true enables the threshold-based path on top of that. Leave the autoscaling block out if you only want to permit manual spec-driven resizes.

Caveats

Storage expansion requires a StorageClass with allowVolumeExpansion: true. Check before enabling:

kubectl get storageclass 
  -o jsonpath='{range .items[*]}{.metadata.name}{"t"}{.allowVolumeExpansion}{"n"}{end}'

Autoscaling applies only to PXC data volumes. If your storage class or CSI driver handles expansion externally, use enableExternalAutoscaling: true to prevent the two mechanisms from racing.

Automated TLS Certificate Rotation

Why it matters

Rotating TLS certificates on a live PXC cluster has always carried risk. The Galera protocol requires all nodes to trust each other’s CA simultaneously. Swap the CA on one node before the others accept it, and inter-node communication breaks. The safe approach requires a three-phase CA swap with rolling restarts between each phase: a process that is easy to get wrong under time pressure.

1.20.0 formalizes this into a first-class operator workflow. Create a Secret named -new containing the replacement credentials, and the operator runs the full three-phase rotation automatically, pausing for rolling restarts between each step.

How it works

The rotation proceeds in three steps that the operator coordinates:

Combined CA phase. The old CA and new CA are merged into a single ca.crt and pushed to all nodes. Every node now trusts both roots.
New leaf phase. The new tls.crt and tls.key are pushed node by node with a rolling restart. New leaf certs are signed by the new CA, and the combined CA means all nodes trust them.
New CA only phase. The combined ca.crt is replaced with the new CA only. The old root is removed. Another rolling restart completes the rotation.

When step 3 completes, the operator automatically deletes the -new Secret. The cluster never loses TLS connectivity between nodes during the process.

Wiring it up

Given a cluster named cluster1 using the default SSL Secret cluster1-ssl, create the replacement:

kubectl create secret generic cluster1-ssl-new 
  --from-file=ca.crt=new-ca.crt 
  --from-file=tls.crt=new-server.crt 
  --from-file=tls.key=new-server.key

You do not need to change the PerconaXtraDBCluster CR. The operator detects the -new Secret on the next reconcile and starts the rotation. No kubectl patch on the CR, no operator restart.

Caveats

The operator does not yet surface rotation progress in .status.conditions. Monitor the rotation by watching PXC pods restart in sequence and checking that the -new Secret is eventually gone:

kubectl get pods -w -l app.kubernetes.io/component=pxc
kubectl get secret cluster1-ssl-new  # should 404 when rotation is complete

ARM64 Support

Why it matters

AWS Graviton3, Google Axion, and Azure Cobalt100 instances deliver better price-to-performance on memory-intensive workloads like PXC. Previously, running the operator on ARM64 nodes required cross-architecture scheduling workarounds or explicit node exclusions for operator pods. All PXC operator images now publish native linux/arm64 layers alongside nodeSelector

What is covered

Every image in the PXC operator stack ships multi-arch manifests in 1.20.0:

The operator manager image
The PXC xtrabackup sidecar
The log collector (Fluentbit-based)
The init container

This release also fixes a logrotate crash on ARM64 (K8SPXC-1821) that a missing dependency in the ARM64 container layer caused. 1.20.0 ships the fix.

Wiring it up

You do not need any configuration change. Pull the 1.20.0 operator image and Kubernetes schedules it on whichever architecture is available. To pin PXC pods explicitly to ARM64 nodes, add a nodeSelector or node affinity in the spec.pxc block:

spec:
  pxc:
    nodeSelector:
      kubernetes.io/arch: arm64

Other Improvements

PITR target validation before restore begins (K8SPXC-1318, K8SPXC-1634, K8SPXC-1635, K8SPXC-1793): The operator now validates PITR targets (type, GTID, timestamp) against available binary logs before starting a restore. It catches a misconfigured target before it pauses the cluster, rather than after.
Configurable leader election (K8SPXC-1805): Three new environment variables tune leader election timing for high-latency or flaky network environments.
SST retry limit (K8SPXC-1619): A new spec.pxc.sstRetryCount field caps the number of State Snapshot Transfer retry attempts, preventing a node that repeatedly fails SST from looping indefinitely.
Custom logrotate configuration (K8SPXC-1789): Supply a custom logrotate config via a ConfigMap reference in spec.logcollector.logRotate for fine-grained control over log rotation for PXC and utility containers.
Enhanced full cluster crash recovery (K8SPXC-1828): 1.20.0 hardens the crash recovery path to prevent potential data loss after sudden node power-offs.

Deprecation notice: PMM2 monitoring integration is deprecated in 1.20.0. Migrate to PMM 3 before version 1.22.0, when PMM2 support will be removed.

Conclusion

PXC Operator 1.20.0 turns three previously manual steps into operator-managed concerns: disk growth, certificate rotation, and ARM64 scheduling. Combined with PITR validation improvements and configurable leader election, this release reduces the operational surface area for clusters running under production pressure. If you run into edge cases with automatic storage resizing or TLS rotation, the community forum is the right place to share them.

Try It Out

Release notes: Percona Operator for MySQL (PXC) 1.20.0 Release Notes
GitHub: percona/percona-xtradb-cluster-operator
Public roadmap: Percona public roadmap. See what is coming and vote on priorities.
Community Forum: forums.percona.com. Share feedback, ask questions, or report issues.

The post Percona Operator for MySQL (PXC) 1.20.0: Automatic Storage Resizing, TLS Certificate Rotation, and ARM64 Support appeared first on Percona.

The post Percona Operator for MySQL (PXC) 1.20.0: Automatic Storage Resizing, TLS Certificate Rotation, and ARM64 Support appeared first on MariaDB.org.

MariaDB Foundation Sea Lion Champions Nominees: Mark Callaghan

Tue, 09 Jun 2026 05:55:04 +0000

Interview with Mark Callaghan, nominated in the Technical Excellence category.
I had the pleasure of speaking with Mark Callaghan, recently nominated for the MariaDB Sea Lion Champions program in the “Technical Excellence” …

Continue reading “MariaDB Foundation Sea Lion Champions Nominees: Mark Callaghan”

The post MariaDB Foundation Sea Lion Champions Nominees: Mark Callaghan appeared first on MariaDB.org.

MariaDB Foundation Sea Lion Champions Nominees: Sumit Srivastava

Mon, 08 Jun 2026 08:41:43 +0000

Interview with Sumit Srivastava, nominated in the Adoption & Industry Impact category.
I had the pleasure of speaking with Sumit Srivastava, SVP Business Development & Products at Tayana, a Bangalore-based telecom software company. …

Continue reading “MariaDB Foundation Sea Lion Champions Nominees: Sumit Srivastava”

The post MariaDB Foundation Sea Lion Champions Nominees: Sumit Srivastava appeared first on MariaDB.org.

The Power Of The Community!

Thu, 04 Jun 2026 11:42:04 +0000

A proper comparison of the distinct committers to MariaDB and MySQL repositories since Q1 2025.

The post The Power Of The Community! appeared first on MariaDB.org.

Why Modern Finance Runs on Open Source

Thu, 04 Jun 2026 10:59:30 +0000

For decades, financial institutions have relied on proprietary databases to power everything from customer transactions to real-time risk engines. But the pressures facing banks, fintechs, and payment providers have changed.

Even minutes of downtime can trigger customer loss. 91% of enterprises report costs over $300,000 per hour, with 44% saying it can exceed $1 million.

At this level of operational risk, databases must scale predictably and adapt quickly. Proprietary databases often work against that goal. Escalating licensing fees, usage-based pricing, and long-term vendor lock-in make it harder to tune performance, expand capacity, or respond to incidents without compounding cost and complexity.

This is why open-source databases have become a core part of modernization efforts across banks, fintechs, and payment platforms. They help keep costs predictable as performance, security, and regulatory demands continue to rise.

The Financial Industry Is Under Pressure Like Never Before

Financial institutions are facing unprecedented pressure, putting legacy systems under strain and impacting performance, costs, and compliance.

Regulatory Expectations Keep Rising

Rising regulatory expectations demand continuous auditing and compliance with PCI DSS, GDPR, AML/ATF, and ESG standards. Complete transparency, high security, and verifiable controls are no longer negotiable.

The proprietary databases introduce friction, since closed architectures and licensing controls restrict the disclosure that regulators require.

Customers Expect Real-Time Performance

Even brief transaction delays or outages trigger immediate loss of trust, especially as fintechs set the benchmark for instant, failure-proof experiences. Traditional release cycles and weekend maintenance windows no longer match customer tolerance.

AI and Data Growth Exceed Traditional Limits

AI workloads and high-dimensional data pipelines drive unprecedented data volume, outpacing the capacity of legacy systems. The rising pressure is translating into growing data center spending, which is predicted to reach $1 trillion in 2029, largely driven by these workloads.

Legacy and Proprietary Systems Stall Modernization

Vendor-controlled databases slow deployment pipelines, restrict architectural flexibility, and inflate Total Cost of Ownership (TCO) as estates grow more fragmented. Teams lose velocity as they navigate outdated scaling constraints, rigid licensing, and operational silos.

The Real Constraint: Proprietary Databases and Vendor Lock-In

Even as financial institutions modernize, proprietary databases remain a source of costly, rigid constraints that hinder real progress. Let’s see how these constraints surface in practice.

Licensing Costs that Rise Faster than Value

Proprietary licensing has quietly become a structural tax on modernization.

Rising license fees without added capability directly slow innovation.
Compliance-critical features like encryption, auditing, and high availability (HA) are locked behind premium tiers, driving TCO far above infrastructure costs.

When Redis shifted to a more restrictive license, the market reaction was immediate. Nearly 75% of Redis users began exploring alternatives, and over three-quarters were already testing or migrating to new options.

For financial institutions with regulatory, uptime, and AI requirements, this goes beyond cost and starts to limit long-term decisions.

Features Locked Behind Enterprise Tiers

Compliance and security are typically offered as premium add-ons rather than standard features. Multi-layer encryption, detailed auditing, enterprise HA, and reliable backup may require costly upgrades. Maintaining baseline security and regulatory standards can drive unexpected spending.

In contrast, modern open-source databases come with many of these features built in or easily added, giving institutions more control instead of leaving it with the vendor.

Forced Upgrades and Migration Penalties

Closed licensing gives vendors control over upgrade timing and support windows, not the institutions that run the risk. Forced version jumps and commercial changes rarely align with project roadmaps or risk calendars.

In financial services, where planned change is itself a control, this loss of autonomy introduces unnecessary operational and regulatory exposure.

Limited Portability Across Hybrid and Multi-Cloud

Hybrid and multi-cloud have become standard operating models in finance. But proprietary engines often resist movement across regions, providers, or Kubernetes platforms.

That immobility limits options for latency optimization, data sovereignty, and disaster recovery patterns, exactly where many institutions need the most freedom.

DBaaS Convenience That Turns Into Runaway Spend

A managed proprietary DBaaS looks attractive at first. However, at scale, the economics often flip. Percona’s analysis shows that over 74% of DBaaS users cite high and unpredictable costs as their top challenge, a direct result of opaque, usage-based pricing models.

The institutions encounter uncertain costs on IOPS, storage, backups, cross-region traffic, and autoscaling decisions that they have no full control over.

Financial institutions should not rent out their data infrastructure on conditions that may shift unexpectedly to remain competitive. That need for control, portability, and predictability is what’s driving the industry toward open source.

Why Open Source Has Become the Strategic Advantage in Finance

With the global open source database market projected to reach USD 63.48 billion by 2034, more and more financial institutions are recognizing its strategic value. Open source now gives them the control, transparency, and flexibility that proprietary databases can’t match.

Open Source Database Market Size | Source

Let’s look at why open source now leads in every dimension that matters.

Cost Control with Predictable Economics

The shift to open source begins with transparency. Where proprietary vendors raise licensing fees and monetize basic compliance features, open source removes artificial cost ceilings and aligns spend with real usage. This includes:

No per-core or edition-based licensing
No penalties for scaling read replicas or adding environments
No enterprise tier required for auditing, encryption, or HA

Open source restores financial governance and eliminates unexpected cost shocks as data volumes grow. This gives organizations predictable economics and control over their database spending.

Scalability Without Constraint

Open source scales with demand, not with licensing policy. In high-volume trading or high payment flow periods, capacity can grow vertically or horizontally without precipitating premium editions, replica fees, or per-core uplifts.

Cloud-native automation strengthens elasticity across regions, clouds, and Kubernetes environments. It enables scale-out patterns that follow real workload behavior instead of vendor-driven architecture choices.

This saves the IOPS taxes, network surcharges, and hard-and-fast capacity levels that often bloat proprietary DBaaS prices.

Security and Compliance Through Transparency

Financial institutions need to demonstrate every control they implement. Proprietary databases make this challenging because their security mechanisms cannot be inspected or validated.

Open source removes this barrier by giving teams full visibility into how security controls and safeguards function. That clarity sets the foundation for the capabilities that follow:

Native encryption at rest, in transit, and at granular data levels
Robust role-based access control (RBAC) and deep auditing frameworks, such as pgAudit
Configurable policies mapped to PCI DSS, GDPR, AML/ATF, and ESG standards

As regulatory pressure accelerates, this transparency becomes essential. Institutions strengthen compliance, reduce licensing costs, and avoid security features locked behind enterprise editions.

High Availability and Uptime You Can Design, Not Inherit

Financial institutions cannot tolerate unpredictable failover behavior. Open source gives architects full control over HA/DR strategy, rather than relying on opaque vendor-managed mechanisms. This includes:

Synchronous and asynchronous replication
Multi-region and multi–data center architectures
Automated failover without black-box dependencies
The ability to tune HA for latency, cost, or regulatory constraints

HA is not a feature to buy but an architecture you can design with open source.

Modernization and Innovation at Enterprise Scale

Modern platforms require modular, API driven, cloud native systems. Open source databases integrate naturally into:

Cloud-native automation and Kubernetes
CI/CD pipelines to deliver features faster
Asynchronous architectures
Telemetry and ML streams

Proprietary databases slow modernization by restricting portability, enforcing rigid architectures, and complicating automation. Open source removes these barriers and gives teams the freedom to experiment and move quickly.

Talent and Ecosystem Alignment

Open source tooling is widely embraced by developers, SREs, architects, and data engineers, as it aligns seamlessly with how modern teams build, test, and scale software. This includes:

Larger talent pools for PostgreSQL, MySQL, MongoDB, and cloud-native ecosystems
Faster internal development cycles
Higher retention and a more collaborative engineering culture

Open source isn’t simply a cheaper alternative, it is structurally aligned with how modern financial institutions operate. It provides the control, visibility, scalability, and freedom needed to compete in a market of data, AI, compliance, and relentless uptime requirements.

Making Open Source Enterprise-Grade: Where Percona Fits In

Open source provides financial institutions with flexibility and cost control. Percona turns that foundation into an operationally mature, secure, high-availability data platform suitable for regulated, always-on financial workloads.

Unified Multi-Database Expertise

Financial systems commonly use multiple data engines, MySQL for transactions, PostgreSQL for analytics, MongoDB for document storage, and more. Percona supports this diversity through a single engineering organization, offering:

Full production‑grade support for MySQL, PostgreSQL, and MongoDB via Percona’s distributions.
Cross‑engine expertise in performance, indexing, replication, and schema strategies.
An integrable observability layer through Percona Monitoring and Management (PMM), with the ability to monitor and control all your supported engines on a single dashboard.

This unified approach gives financial institutions a consistent, reliable way to operate diverse database environments at scale.

Enterprise-Ready Open Source Software

Percona’s database distributions take community editions and enhance them with features and defaults tuned for enterprise production:

Percona’s PostgreSQL distribution offers enterprise‑ready security and high availability with no licensing fees or hidden add-ons.
Performance optimizations and configuration tuning beyond stock community defaults.
High‑availability automation, backup and recovery workflows (including point-in-time recovery), replication/sharding, and automated cluster scaling.

This means financial teams can run open‑source databases with the predictability, operational rigor, and enterprise-grade features often associated with commercial editions.

24×7×365 Support Built for Critical Financial System

Percona keeps your databases online:

Operators automate backups, scaling, upgrades, and HA.
PMM offers a single-source monitoring, alerting, and management of MySQL, PostgreSQL, or MongoDB, on-prem, in the cloud, or both.
Live tracking and assistance during trading periods, settlement hours, or compliance timelines.

Security and Operational Governance

Percona ensures open source works within your compliance framework:

Traceability (MongoDB, MySQL) through audit logging.
Data encryption in transit and at rest.
Fixed configuration defaults that are consistent with the industry best practices.
Recommendations on how to combine database security with identity access management (IAM) and security information and event management (SIEM) systems.

Modernization and Vendor-Exit Expertise

Percona helps organizations modernize by migrating from proprietary databases to fully open-source solutions with minimal disruption:

Complete open-source using MySQL, PostgreSQL, and MongoDB.
On-prem, cloud, hybrid, and multi-cloud elastic deployments.
Maintain control over cost, compliance, and performance without lock-in to vendors.

Proof in Action: Financial Institutions Winning With Open Source

Modern financial institutions are already succeeding with open-source data infrastructures built and supported by Percona. Their results show what’s possible when open-source databases are engineered, observed, and operated at enterprise scale.

Payments platforms: Merchant Warrior uses Percona for critical MySQL availability and resilience, supporting millions of transactions across 30,000+ customers. Percona XtraDB Cluster and PMM enable the team to have real-time insights and enterprise-level performance without downtime.
Fintech and trading platforms: Fiserv implemented hybrid MongoDB, MySQL, and PostgreSQL clusters using Percona with ultra-low latency and performance of microseconds on key loads. This shows the efficiency of open-source software in helping financial platforms scale and maintain rigorous SLAs.
Credit unions / digital banks: With Percona Server, BBVA moved over 80 applications and 35TB of MongoDB data. This reduced license fees, improved backup performance by 20%, and gave full control over its enterprise database strategy.
Critical applications and cost savings: Protectall reduced escalating database costs and improved reliability by migrating from MySQL Enterprise to Percona Server. The team then partnered with Percona again to plan a long-term PostgreSQL migration that supports future growth.

What’s Next: Take Control With Open Source

Open source is now the backbone of modern financial infrastructure. When your organization is struggling with increasing costs, modernization pressure, compliance problems, or vendor lock-in, it is time to take charge.

Upgrade your databases with Percona, achieve enterprise-grade support, predictable costs, and operational control with MySQL, PostgreSQL, and MongoDB.

The future of finance is open source. Discover how institutions are reducing costs, ensuring compliance, and driving innovation.

The post Why Modern Finance Runs on Open Source appeared first on Percona.

The post Why Modern Finance Runs on Open Source appeared first on MariaDB.org.

What You’re Really Paying for With Proprietary Databases

Thu, 04 Jun 2026 10:58:39 +0000

On paper, proprietary database licensing looks simple. You look at a predictable per-core licensing cost, the estimated infrastructure spend, and the support contract.

But the actual cost of a proprietary database is not a line item. The predictability dissolves once workloads grow, new services are launched, or regulatory and uptime demands force you to scale aggressively. The bill shows up later as add-ons, markups, and architectural limitations that quietly shape every strategic decision.

Proprietary databases are rarely evaluated on the total cost of ownership across five to ten years. Instead, they are justified as a necessary platform cost, even as their pricing, restrictions, and lock-in reduce your ability to modernize, migrate to the cloud, or optimize for AI and new products. Actually, you don’t just pay for the database. You pay for the way it constrains your options.

This article explains the Total Cost of Ownership (TCO) of proprietary databases, detailing the architectural, operational, and strategic burdens they impose beyond the initial price.

License fees are just the entry price

Most proprietary databases start with a base license model, per-core, per-processor, per-socket, or tied to specific instance sizes and regions in the cloud.

As soon as you scale out (more nodes, more replicas, more environments) or scale up (more powerful instances), your licensing footprint expands, even if business value grows more slowly than technical capacity.

Tiered pricing forces expensive upgrades for critical features, such as Transparent Data Encryption (TDE), granular PCI-DSS auditing, or active-active High Availability (HA).

Cloud database services add another layer by bundling license, compute, storage, and management into opaque service tiers with markups that can exceed 80–100% of the underlying infrastructure cost.

What looked like simple all-in pricing at proof of concept becomes a permanent premium as you scale across regions, add failover zones, and support 24×7 workloads.

Put simply, the license is just the entry ticket; the real spending kicks in as you grow.

Cost #1: Lock-in limits your options

Lock-in is a financial and strategic cost. Proprietary databases encourage deep reliance on vendor-specific extensions, APIs, drivers, and management tooling that make migrations complex and risky.

Over time, these dependencies become embedded in application logic, operational runbooks, and compliance documentation, and raise the exit cost every year.

This lock-in becomes visible precisely when you need maximum flexibility:

Cloud strategy shifts (moving from single-cloud to multi-cloud or hybrid) expose how hard it is to re-platform proprietary workloads. Many financial institutions are pursuing multi-cloud or hybrid strategies to meet regulatory requirements (DORA in Europe).
Mergers and acquisitions create duplicate database estates that cannot be consolidated easily because of incompatible vendors and licensing models.
Cost-cutting mandates force leadership to accept high database spend because the transition cost is perceived as even higher.

Going open-source becomes a viable alternative. Open source databases keep leverage on your side by using standard interfaces and portable architectures. So you can adjust infrastructure, cloud providers, and operating models without rewriting your core.

Percona’s multi-vendor expertise across MySQL, PostgreSQL, MongoDB, and others is designed to preserve that flexibility while still delivering enterprise-grade support.

Cost #2: Performance tuning becomes a paid feature

Many proprietary platforms offer advanced observability, query analytics, and tuning tools that are available through separate licenses, packs, or cloud add-ons. And you have to pay extra just to monitor what your database is doing in production, right when performance issues, slowdowns, or outages are already affecting your customers. The paradox is that you end up spending more only after the system has already proven to be inadequate.

That model creates a reactive financial posture. Instead of having 24/7 transparency, organizations find themselves “unlocking” features only after a crisis has already occurred. You end up rewarding the vendor with more revenue because their system proved inadequate for your current workload.

On the other hand, open source tooling such as Percona Monitoring and Management (PMM) provides deep query-level visibility, historical performance data, and health checks as part of an open, community-accessible stack.

You are not charged more for insight, you gain observability by default and can invest in expertise rather than feature unlocks.

Cost #3: Scaling means paying twice

When you scale a proprietary database, you pay twice, once for the underlying infrastructure and again for the licenses tied to that infrastructure. Processor-based licensing or per-core licensing means that adding capacity for peak trading days, new AI scoring models, or additional test environments increases software costs.

This occurs even if revenue per transaction does not grow at the same rate. More cores and nodes do not necessarily represent proportional business value, but they do represent proportional license spend.

In cloud DBaaS models, markups and tier structure amplify this effect. Typical high-availability PostgreSQL workloads can cost more than 100% more on proprietary DBaaS than on open-source deployments on Kubernetes or self-managed infrastructure.

Efficient architectures, read replicas, multi-region setups, and extra capacity for resilience are effectively punished by higher recurring bills rather than rewarded for robustness.

With Percona’s open source approach, organizations scale for performance and resilience without paying the double tax of proprietary licensing and cloud markups.

Cost #4: Support that’s aligned to the vendor, not you

Vendor support is usually positioned as an insurance policy, but its incentives are aligned with renewals rather than with the fastest or most cost-effective solution for your team.

Support SLAs tend to focus on product availability and incident response boundaries rather than on end-to-end business outcomes or architectural improvements. When the “right” answer is to reduce dependence on premium features, simplify architecture, or migrate off the platform entirely, vendor support has little incentive to champion that path.

But in contrast, support that is database-agnostic and open source–oriented can recommend changes across engines and environments, even when that reduces your spend with any given vendor.

Percona’s support model is built on cross-engine expertise and outcome-focused services such as performance tuning, HA design, and root-cause analysis, rather than upselling proprietary options or new editions. The goal is to optimize your environment, not your license stack.

Cost #5: Operational inflexibility

Proprietary ecosystems often include tightly coupled tooling such as backup utilities, monitoring dashboards, deployment frameworks, and security controls that work with that vendor’s stack.

And over time, this creates tool sprawl and operational silos. It necessitates separate processes for different proprietary databases, such as one for Oracle and another for SQL Server, and distinct runbooks for each cloud-specific DBaaS.

Teams spend more time working around tool boundaries than improving reliability, which prevents standardized operations across environments. Change management, incident response, and compliance reporting have to accommodate multiple incompatible systems, increasing training overhead and the risk of errors.

With open source and vendor-neutral tooling such as PMM and Kubernetes-based operators, organizations can standardize how they deploy, monitor, and secure databases while preserving the freedom to choose engines and clouds. Operations become unified without forcing a single proprietary vendor everywhere.

The hidden risk: Strategic decisions get harder over time

The considerable cost of proprietary databases is that they can leave companies stuck. The longer they remain on proprietary systems, the harder it becomes to make strategic choices.

Budget uncertainty: Sudden changes in licensing terms, such as shifting from perpetual to subscription or changing virtualization rules, can blow a hole in a digital transformation budget overnight.
Innovation blockers: If every new microservice triggers a new enterprise database license discussion, your developers will stop proposing new services. Innovation is stifled by procurement friction.
Long-term planning: When the database is a black box, you cannot accurately predict how it will behave under the next generation of AI workloads, which leads to a conservative, often slower roadmap execution.

What open source changes (when done right)

Open source changes the cost through removing artificial constraints and making pricing a function of infrastructure and expertise instead of vendor permissions.

Organizations pay for compute, storage, and operational skills to run databases like MySQL, PostgreSQL, or MongoDB, not per-core licenses or premium features. It ensures costs are predictable and scale with actual usage, not licensing thresholds.

Open source databases also maximize:

Freedom of movement: Move across clouds (AWS, GCP, Azure), environments (VMs, Containers), and distributions without asking for permission or renegotiating a contract.
Full visibility: Open source gives your SREs and architects the ability to “look under the hood,” leading to faster root-cause analysis and more resilient systems.

But open source alone is not enough. Execution, architecture design, performance tuning, HA/DR planning, and day-two operations determine whether open source remains a cost advantage or becomes a different kind of complexity.

Where Percona fits in

Percona bridges the gap between the freedom of open source and the security requirements of modern finance. Its distributions for MySQL, PostgreSQL, and MongoDB upgrade community editions for production with security, HA integrations, backup/restore, and performance tuning built in. Companies gain top-level capabilities without layered licensing fees or edition gates.

Percona also provides:

24×7 support and consulting across multiple engines and environments (on-prem, cloud, Kubernetes). Focused on minimizing downtime and optimizing TCO rather than expanding proprietary footprint.
Open observability through PMM to give teams a single place to monitor and manage all supported databases with full transparency and no per-node monitoring tax.

Simply put, Percona reduces long-term database spend by aligning services with customer outcomes, availability, performance, and cost control.

Conclusion: Pay for control, not constraints

Proprietary databases promise certainty but can create dependence. They offer predictable costs in the short term but often lead to long-term limitations and increasing markups. Open source databases, when used and managed properly, shift the focus from restrictions to options by letting you use your budget to buy infrastructure, skills, and flexibility instead of just permission to grow.

The important question is not “How much does this license cost this year?” but rather “What will it cost us to remain flexible over the next five years?”

With open source databases supported by Percona, financial institutions can answer that question with confidence since they are paying for control and options, not restrictions.

The post What You’re Really Paying for With Proprietary Databases appeared first on Percona.

The post What You’re Really Paying for With Proprietary Databases appeared first on MariaDB.org.

MariaDB Hidden Gem: Create Aggregate Function

Thu, 04 Jun 2026 10:35:49 +0000

Have you ever written a query where the GROUP BY was easy, but the aggregate was the problem?
You know how to group the rows.You know what result you want for each group.But none of the built-in aggregate functions really match your logic. …

Continue reading “MariaDB Hidden Gem: Create Aggregate Function”

The post MariaDB Hidden Gem: Create Aggregate Function appeared first on MariaDB.org.

Celebrating the MariaDB Foundation Sea Lions Champions Nominees

Wed, 03 Jun 2026 08:47:26 +0000

The MariaDB Foundation Sea Lions Champions recognize individuals and organizations that strengthen the MariaDB ecosystem through technical excellence, community leadership, open-source stewardship, ecosystem impact, and real-world adoption. …

Continue reading “Celebrating the MariaDB Foundation Sea Lions Champions Nominees”

The post Celebrating the MariaDB Foundation Sea Lions Champions Nominees appeared first on MariaDB.org.

MariaDB Foundation: Bringing TPC-B Back To Life

Wed, 03 Jun 2026 01:10:45 +0000

When I joined Pervasive PSQL, one of the first performance test cases I was introduced to was TPC-B. It was already implemented inside Pervasive PSQL and it quickly became one of the most important tools in my daily work. …

Continue reading “MariaDB Foundation: Bringing TPC-B Back To Life”

The post MariaDB Foundation: Bringing TPC-B Back To Life appeared first on MariaDB.org.

MariaDB Community Server Corrective Releases

Tue, 02 Jun 2026 19:20:28 +0000

MariaDB Community Server corrective releases are now available for the currently maintained long-term series. These releases address critical CVEs, and we strongly recommend that all users review the security advisories and upgrade as soon as possible. …

Continue reading “MariaDB Community Server Corrective Releases”

The post MariaDB Community Server Corrective Releases appeared first on MariaDB.org.

MariaDB Enterprise Server Q1 2026 Corrective Releases

Tue, 02 Jun 2026 18:38:08 +0000

New corrective maintenance releases for MariaDB Enterprise Server 11.8.6-4, 11.4.10-8, and 10.6.25-22 are now available. Download Now MariaDB Enterprise Server is an enhanced, hardened and secured version of MariaDB Community Server that delivers enterprise reliability, stability and long-term support as well as greater operational efficiency when it comes to…

Source

The post MariaDB Enterprise Server Q1 2026 Corrective Releases appeared first on MariaDB.org.

The Percona Community Slack is open — come hang out

Tue, 02 Jun 2026 11:00:00 +0000

The Percona Community Slack is open — come hang out

There’s a new place for the people behind the databases to actually talk to each other.

The Percona Community Slack is open. Right now it’s one channel — General — and that’s intentional. It’s a place for DBAs, developers, contributors, and database people of all kinds to meet, swap stories, and get to know who else is out there running open source databases for a living. No silos. No sub-channels for every topic. Just a room.

What it’s for

Come here to talk shop. Share what you’re building, breaking, or fixing. Post about the migration that went sideways, the config that finally clicked, the pager incident you survived. Ask the kind of questions that belong in a conversation rather than a ticket — “how do other people handle X?” is exactly the right energy.

It’s also where we’ll share events we’re attending and, when we have tickets or a spare seat, offer them to the community first. If Percona is heading to a conference near you, this is where you’ll hear about it. And if you’re going somewhere yourself — a meetup, a conference, a local user group — tell us. There might be community members nearby who want to meet up.

That’s the point, really. Less broadcast, more conversation.

What it’s not for

Technical support questions belong on the Percona Community Forums. Forum answers are searchable and don’t disappear into scrollback. Percona engineers and experienced community members watch the forums for questions. Your problem is more likely to get a useful answer there — and it’ll help the person who hits the same issue three months from now.

If you post a support question in Slack, expect to be pointed to the forums. That’s not a brush-off.

A few things that make this work

Introduce yourself. One or two sentences about what you work on and where in the world you are. That’s it. You don’t need a bio.

Share what you’re up to. An event you’re going to, a tool you’ve been testing, a war story from production. The low-key post about a thing you just dealt with is exactly what people come here for.

Lurk freely. You don’t have to post to belong. Read, learn, jump in when you have something to say.

The short version of the rules

Be the person you’d want to share an on-call rotation with.

Treat everyone as a peer. Assume good faith. No harassment. Critique technology on technical merits. Don’t cold-DM people with pitches. Keep private things private. If something needs a moderator’s attention, DM one directly — reports stay confidential.

Come in

If you’re a DBA, a developer, a contributor, or just someone who runs databases and occasionally wants to talk to other people who run databases — you belong here.

Join the Percona Community Slack →

The post The Percona Community Slack is open — come hang out appeared first on MariaDB.org.

A New Pull Request Processing Time Record

Mon, 01 Jun 2026 14:25:33 +0000

We have a new record average time to process a pull request: 21 days!
Part of my job is following (and trying to improve of course) some key metrics about MariaDB Server pull request processing. …

Continue reading “A New Pull Request Processing Time Record”

The post A New Pull Request Processing Time Record appeared first on MariaDB.org.

Building Smart Semantic Search using PostgreSQL and pgvector. Case Study – Part 2 – Postgres Layer

Sun, 31 May 2026 11:00:00 +0000

I’ll explain how I built the Postgres layer for semantic vector search on the Percona Community website: pgvector, chunks, two table modifications, the database schema, how the indexer populates Postgres, and what the SELECT statement looks like during a search.

Part 1: why semantic search, what’s already working on the site, the widget, and an overview of the stack.

Architecture

Search runs separately from the website at search.percona.community: FastAPI, a background indexer, and PostgreSQL with pgvector are all in a single Docker Compose file. percona.community remains static on Hugo and GitHub Pages, it doesn’t write directly to the database.

flowchart TB
subgraph users["Users"]
direction TB
User(["User"]) --> Widget["Widget · percona.community"]
Admin(["Admin"]) --> Dash["Admin dashboard · /demo"]
Site["Site · RSS + HTML"]
end
subgraph app["Application"]
direction TB
API["FastAPI · search.percona.community"]
Model["nomic-embed-text-v1"]
Worker["Indexer worker"]
API -->|embed query| Model
Model -->|embedding| API
Worker -->|embed chunks| Model
Model -->|embeddings| Worker
end
subgraph data["Database"]
direction TB
DB[("PostgreSQL + pgvector")]
end
Widget --> API
Dash --> API
Site --> Worker
API -->|read vectors · write queue/history| DB
Worker -->|write vectors · read queue| DB

Search

A visitor enters a query into the widget. The widget sends a POST /search request to FastAPI. The service computes the query embedding with nomic with the prefix search_query: and searches for the nearest vectors in Postgres. The widget knows nothing about pgvector, it only receives JSON with links.

Admin dashboard

On the same FastAPI service I run an admin dashboard at /demo: test queries, search history, a database summary, viewing documents and chunks. The dashboard does not talk to Postgres directly, it only calls the API; the API reads and writes Postgres (search_history, index_queue, search results).

Indexing

To refresh the index, I click Start Indexing in the dashboard, that hits POST /index/start. The same endpoint can be called from outside: a GitHub webhook after a push to the site repo, cron, or curl while debugging. FastAPI enqueues the job in index_queue. A worker in the indexer container picks it up, downloads RSS and HTML from the site, splits text into chunks, computes vectors with nomic (search_document:), and writes to pages, community_nomic, and indexer_runs. The crawl runs in the background and does not block HTTP.

Important limitation: the indexer and the API must use the same embedding model. The query vector and the vectors in the database must be from the same space, otherwise, cosine similarity doesn’t make sense.

pgvector in Postgres

For semantic search, you don’t need an LLM, but an embedding model: a string as input, a vector as output. I chose nomic-embed-text-v1, 768-dimensional, running via sentence-transformers on the CPU, without a paid API.

I’m using Percona Distribution for PostgreSQL 18, pgvector is already included in the distribution; CREATE EXTENSION vector, and you’re done (documentation).

The basic structure is a column with fixed dimensions for the model:

sql

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE chunks (
 id SERIAL PRIMARY KEY,
 chunk_text TEXT,
 embedding vector(768) -- exactly 768 under nomic
);

In the project, the table is named community_nomic: the prefix community_ (site) + the model key nomic. I’m setting up a comparison of embedding models: each model has its own vector table (community_), because the dimensions and embedding spaces are different, so they can’t be mixed in a single table. Currently, there is one model in the project, nomic-embed-text-v1, 768 dimensions; later, I can add a second table community_ and switch the index/API via EMBEDDING_MODEL_KEY.

pgvector compares vectors with several distance operators. I search with cosine distance (the operator in SQL): the smaller the distance, the closer the match. In the widget and API I show similarity, not the raw distance, similarity = 1 - distance, so a higher score means a better hit. The operators:

Operator	When useful
	L2 (Euclidean)
	inner product
	cosine, my choice for nomic

Simplified search for “nearest chunks”:

sql

SELECT slug, chunk_text,
 1 - (embedding  $query_vector) AS score
FROM community_nomic
ORDER BY embedding  $query_vector
LIMIT 20;

The threshold in the API is min_score (my default is 0.52): anything lower is discarded. On beta I tuned this number for a while, the results changed noticeably depending on this single parameter.

To avoid scanning the entire table as the index grows, I set up an HNSW index (approximate nearest neighbor search):

sql

CREATE INDEX ON community_nomic
 USING hnsw (embedding vector_cosine_ops);

At this scale, a separate vector database wasn’t necessary, a single Postgres instance handles metadata, vectors, and search.

Postgres in Docker: `docker-compose`

I set up the stack using Docker Compose, Postgres, the API, and the indexer are all in containers, with the same setup locally and in production. Production, EC2 on AWS (search.percona.community), an ARM instance, using the same docker-compose.

In docker-compose.yml, Postgres on Percona looks like this (on Mac ARM):

yaml

postgres:
 image: percona/percona-distribution-postgresql:18.1-3-arm64
 environment:
 POSTGRES_USER: postgres
 POSTGRES_PASSWORD: postgres
 POSTGRES_DB: community_search
 ports:
 - "5433:5432"
 volumes:
 - pgdata:/var/lib/postgresql/data
 - ./init:/docker-entrypoint-initdb.d

In init/01-enable-pgvector.sql, include only CREATE EXTENSION IF NOT EXISTS vector. If you’re developing on x86, use amd64 in the image tag instead of arm64, see the options in the Percona documentation. I left arm64 on both Mac and EC2: the configuration is the same.

I view the tables and data in pgAdmin. The pages, community_nomic, and service tables themselves are created when the API and indexer start using ensure_* functions in the code: these are CREATE TABLE IF NOT EXISTS and CREATE INDEX IF NOT EXISTS, not a separate migration directory.

Indexing and Chunking

The site doesn’t write directly to the database, the database is populated by an indexer: a worker fetches RSS and HTML from percona.community, splits the text into chunks, computes embeddings, and writes the rows to community_nomic and pages. The widget and API only read what has already been written during searches.

Why Chunks

At first, I tried a single vector for the entire article. I quickly ran into three problems:

Long text takes longer to encode and consumes more memory;
The model has an input length limit;
a single vector for long text blurs the meaning, a query about a specific paragraph doesn’t map well to the “averaged” embedding of the entire article.

I settled on a 400-word window with a 50-word overlap (chunker.py). Each chunk is a separate line with its own embedding.

The first version of the chunker sliced only the body of the article, without the title, author, date, or tags. For queries like “articles by a certain author,” the results were off: the model saw the text but not the document’s context. I added metadata to each chunk, a Title / Author / Date / Tags / Type block at the beginning of each fragment before calculating the vector.

When searching, the API finds the closest chunks, but the card shows the best chunk for the document (one slug, one card). Without this, a long article would clutter the results with multiple lines.

Database Schema: Two Revisions of the Chunk Tables

I revised the chunk storage schema twice, and separately added utility tables for background indexing and search logs.

Version 1: Everything in a Single Table

The first working schema was a single table for all chunks and document information: each row represented a single article fragment, and it also contained duplicated page metadata (url, title, author, date, tags, content_type) along with chunk_text and embedding.

Pros: one INSERT, one SELECT, no joins.

Cons I encountered:

one article, dozens of identical copies of title and author;
when updating a page, it’s easy to get out of sync (one title in chunk #0, another in chunk #3);
fetching the image and description for the card from chunk_text was unreliable.

Conclusion: A vector layer and a card in the UI serve different purposes.

The code still includes _migrate_chunks_table: when the API and indexer start up (inside ensure_content_table), it drops any extra columns from the chunk table if they are left over from the old prototype.

Version 2: `pages` + `community_nomic`

I split the data into two tables:

pages, one row per document: url, title, type, author, date, tags, images, description.
community_nomic, only chunks: slug, chunk_index, chunk_text, embedding.

They are linked by slug (stable key from the URL). Search: find the nearest chunks in community_nomic, assemble the card from pages.

In the admin dashboard I can open any indexed document and see what landed in pages (metadata, image, description) and what text was split into chunks.

HNSW on community_nomic:

sql

CREATE INDEX ... ON community_nomic
 USING hnsw (embedding vector_cosine_ops);

That’s why, when switching models, I don’t reuse community_nomic; instead, I create a new table and re-index it. A single search query involves vectors from only one model, both during indexing and in the API.

Indexer: RSS, HTTP, and Queue

The indexer is a separate container that crawls percona.community and populates the database. It starts with RSS, four feeds:

RSS feeds contain a title, link, date, author, tags, and often a short description, but not the full text of the article. For each entry, I perform an HTTP GET on the HTML page and extract the main content (in crawler.py). If the HTML is empty, I fall back to the description from the RSS feed.

The Index and Status tabs in the dashboard, without them, debugging the crawl and embedding would have been a guessing game.

Table Schema

All tables are created when the API and indexer start (ensure_* in code, CREATE TABLE IF NOT EXISTS, CREATE INDEX IF NOT EXISTS). There is no separate migrations folder. I don’t use foreign keys between search and utility tables: reindexing deletes and re-inserts rows by slug, and the queue tables are only loosely linked.

Search data

pages and community_nomic are linked by slug (no FK). The indexer writes both; the API reads them on POST /search.

pages

One row per document.

Column	Type	Purpose
`slug`	TEXT	primary key, stable key from the URL
`url`	TEXT	canonical link (UNIQUE)
`content_type`	TEXT	blog, event, talk, contributor
`title`	TEXT	card title
`date`	TEXT	publication date from RSS/HTML
`author`	TEXT	author name
`tags`	TEXT[]	tags for search and chunk metadata
`image_url`	TEXT	full image from the site
`image_thumb_url`	TEXT	smaller image for the widget popup
`description`	TEXT	short description for the card
`updated_at`	TIMESTAMPTZ	last time the row was indexed

community_nomic

Chunks and vectors (table name = site + model key).

Column	Type	Purpose
`id`	SERIAL	primary key
`slug`	TEXT	link to `pages`
`chunk_index`	INT	chunk position in the document (UNIQUE with `slug`)
`chunk_text`	TEXT	text passed to the embedding model
`embedding`	vector(768)	nomic vector for cosine search

Utility

Three small tables for indexing and debugging.

index_queue

Pending jobs. Written by the API.

Column	Type	Purpose
`id`	SERIAL	primary key
`created_at`	TIMESTAMPTZ	when the job was queued
`status`	TEXT	pending, running, done, cancelled
`model`	TEXT	embedding model key (`nomic`)
`feeds`	TEXT	RSS feed URLs (comma-separated)
`crawl_delay`	FLOAT	pause between HTTP requests (seconds)
`limit_per_type`	INT	cap per content type (partial reindex)
`run_id`	INT	`indexer_runs.id` once the worker starts
`cancel_requested`	BOOLEAN	cancel flag from the dashboard

indexer_runs

Crawl progress. Written by the indexer worker.

Column	Type	Purpose
`id`	SERIAL	primary key
`started_at`	TIMESTAMPTZ	run start
`finished_at`	TIMESTAMPTZ	run end
`status`	TEXT	running, done, error, cancelled
`model`	TEXT	embedding model key
`total_docs`	INT	documents processed
`total_chunks`	INT	chunks written
`current_url`	TEXT	page being crawled
`current_doc_num`	INT	document counter
`errors`	INT	error count
`message`	TEXT	status or error text

search_history

Search log. Written by the API on each POST /search.

Column	Type	Purpose
`id`	SERIAL	primary key
`created_at`	TIMESTAMPTZ	query time
`query`	TEXT	user query
`content_type`	TEXT	filter: all or one type
`limit_requested`	INT	requested result limit
`results_count`	INT	rows returned
`chunks_in_index`	INT	snapshot: chunk count at query time
`by_type`	JSONB	hit counts per content type
`prepare_ms`	REAL	API timing breakdown
`model_load_ms`	REAL	model load time
`embed_ms`	REAL	embedding time
`db_ms`	REAL	Postgres search time
`format_ms`	REAL	JSON formatting time
`total_ms`	REAL	end-to-end time
`model`	TEXT	embedding model key

Indexes

Created in the same ensure_* functions as the tables. Besides primary keys and UNIQUE on pages.url and (slug, chunk_index) in community_nomic:

pages_content_type_idx on content_type, filter by blog / event / talk / contributor in search;
community_nomic_embedding_idx, HNSW on embedding (vector_cosine_ops); without it, nearest-neighbor search would scan the whole table as chunks grow;
community_nomic_slug_idx on slug, delete all chunks for one document on reindex;
search_history_created_at_idx, recent queries first in the dashboard History tab.

index_queue and indexer_runs only have a serial primary key, few rows, a full scan is fine.

How Postgres Responds to a Search Query

The API receives the query text, computes a vector with nomic (search_query: + text), and runs SQL that finds the nearest chunks and joins row metadata from pages.

The First Query Was Naive

At first, I did what the pgvector tutorials suggest, “find the 20 closest vectors”:

sql

SELECT slug, chunk_index, chunk_text,
 1 - (embedding  $query_vector) AS score
FROM community_nomic
ORDER BY embedding  $query_vector
LIMIT 20;

The query worked, but the results were incorrect from a UI perspective. It returns 20 chunks, not 20 documents. A long article with fifteen chunks could take up half the list with a single slug; a short post with one good paragraph didn’t make it to the top. The user sees pages (cards with links), but we search the database by chunks, I close that gap in SQL.

What I do now

Join community_nomic + pages by slug.
ROW_NUMBER() PARTITION BY slug, I keep one best chunk per document.
WHERE score >= min_score (default 0.52).
ORDER BY score DESC LIMIT N.

Simplified version of the final query:

sql

WITH ranked AS (
 SELECT
 p.url, p.title, p.content_type, c.slug,
 1 - (c.embedding  $query_vector) AS score,
 ROW_NUMBER() OVER (
 PARTITION BY c.slug
 ORDER BY c.embedding  $query_vector
 ) AS rn
 FROM community_nomic c
 INNER JOIN pages p ON p.slug = c.slug
 -- AND p.content_type = 'blog' -- optional: filter by type
),
best_per_page AS (
 SELECT * FROM ranked WHERE rn = 1
)
SELECT url, title, content_type, slug, score
FROM best_per_page
WHERE score >= 0.52
ORDER BY score DESC
LIMIT 20;

Filtering by content type

The site has blog, event, talk, and contributor, in the widget and on /search/, you can search for all at once or a single type. In the API, this is the content_type field in POST /search; in SQL, AND p.content_type = %s is added when a single type is selected.

In SQL, results are ranked by similarity (ORDER BY score DESC), “what matches the query best?”

On a community site, recent material often matters as much as the top semantic match. An older article might score 0.71 while a newer post on the same topic scores 0.66. I still build the shortlist in SQL (one best chunk per document, min_score threshold), but the API then re-sorts blog, event, and talk by publication date, newest first. Contributors and rows without a date stay at the bottom.

The widget still shows the similarity score on each card so you can see why the page was included:

Summary

Postgres layer: I set this up without a separate vector DB, using pgvector in Percona, two table modifications for chunking, auxiliary tables for background indexing, HNSW, and SQL with “best-fit chunk per document.” The indexer processes RSS and HTML; I manage the database in pgAdmin.

Currently, the search index has about 803 documents and 1,656 vectors, thousands of rows, not billions. This is a community-scale setup: a single Postgres instance on EC2, embedding on the CPU, HNSW on all chunks, the solutions above were chosen with this in mind. When I add videos, GitHub issues, and the forum, the volume will grow, then I’ll re-evaluate the indexing time and hardware.

Note from the author

About six months ago, I already tried to set up something similar to Postgres + vectors using AI agents. Back then, I kept running into the same issues: a clunky startup of the environment, the schema and its modifications, initializing Percona Distribution for PostgreSQL, and pgvector, the agent would either skip a step or suggest incompatible configuration snippets.

This time, with percona.community, went better: the agent set up Compose, ensure_*, search SQL, and the admin dashboard, without that series of failures at startup. More time was spent on the logic (chunks, min_score, result ordering) rather than on “why the database won’t start.”

If you try this setup yourself or notice any inaccuracies, please leave a comment.

The post Building Smart Semantic Search using PostgreSQL and pgvector. Case Study – Part 2 – Postgres Layer appeared first on MariaDB.org.

MariaDB Server 12.3 LTS Released

Fri, 29 May 2026 16:47:25 +0000

The MariaDB Foundation is pleased to announce the availability of MariaDB Server 12.3 LTS, the latest Long Term Support release of MariaDB Server. The first GA of the 12.3 series is 12.3.2. …

Continue reading “MariaDB Server 12.3 LTS Released”

The post MariaDB Server 12.3 LTS Released appeared first on MariaDB.org.

Building Smart Semantic Search using PostgreSQL and pgvector. Case Study – Part 1

Fri, 29 May 2026 11:00:00 +0000

Type “zero downtime database migration” into the site’s search bar and you’ll get articles and talks about database migration with minimal downtime, even if those words aren’t in the titles or content. This is semantic search on PostgreSQL and pgvector, without paid embedding APIs or a separate vector database. In this series I’ll cover how it works and why I chose this stack.

I’ll walk through how and why I built the search for our community site: blog, events, talks, and profiles. The post should help if you want to repeat the approach or need a practical case study on simple components. If you’ve done something similar, I’d like to hear your feedback.

Context: Website, Search, and Task

The community team has a website on Hugo, an open source static site generator, hosted for free on GitHub Pages. The site has articles, events, talks, videos, and more.

If you’re thinking of starting your own, I recommend checking out these examples: blog.koehntopp.info, openeverest.io, perconalive.com, oursqlfoundation.org

But a Hugo site is a collection of HTML files without a backend. Search or filters only work via frontend JS or an external service. For a long time our site had no search at all. Then Kai Wagner contributed a JS search for the blog that matched exact words (percona.community/blog).

Recently our community lead Laura Czajkowski asked for smart AI search on the site. We tried several off-the-shelf products; they were either too expensive or a poor fit. We also want search to cover more than the site itself eventually: videos from other platforms, the forum, our GitHub repos, and maybe documentation later.

I suggested building it ourselves. Modern AI assistants are good enough for a prototype like this. Below I’ll explain the stack.

What We’ll Do

The site stays on Hugo and GitHub Pages. The search service runs separately; for this architecture that’s the sensible option. The goal is simple: the user types a query in plain language and gets a list of semantically relevant links.

Kai’s keyword search was a step forward, but it doesn’t catch meaning. Type “postgresql” and you get pages where the word appears. An article about slow queries or replication may be missing if the wording is different. Semantic search works differently: the query and documents become vectors, numeric representations of meaning (embedding). Similar meaning lands nearby in vector space even when the words differ. A query like “how to speed up slow queries in MySQL” can surface tuning and optimization content without those words in the title.

Why not another engine? OpenSearch is a solid open-source option: full-text and vector search, mature ecosystem. I also looked at Manticore Search. Both work, but semantics still need an embedding pipeline (model at index time and on each query). That’s another service to run beside the model.

I wanted my own stack on Postgres with pgvector: a practical experiment, not a hunt for the perfect search product. PostgreSQL with pgvector keeps page metadata, chunks, vectors, and query history in one database. Percona Distribution for PostgreSQL 18 ships pgvector in the distribution; run CREATE EXTENSION vector and you’re set.

The plan has four parts:

Widget on the site: search field and results (plain JS; Hugo unchanged).
API: takes the query, embeds it with the same model as indexing, searches the DB, returns JSON links.
Indexer: background worker that reads RSS/HTML, chunks text, embeds, writes to the DB.
PostgreSQL + pgvector: one database for metadata, chunks, vectors, and search history.

Hugo stays static; the smart parts live in a separate service. No separate vector DB, no paid embedding API, no RAG chat, only links.

The diagram shows two flows: search (user query) and indexing (refresh the DB on demand or on a schedule). Top to bottom, from the user:

flowchart TB
User(["👤 User"])
Widget["🔍 JS widget
percona.community · GitHub Pages"]
API["⚡ FastAPI
search.percona.community"]
Model["🧠 Embedding model
shared · API & indexer"]
DB[("🗄️ PostgreSQL + pgvector")]
Content["📰 Content
blog · events · talks"]
Indexer["📥 Indexer worker"]
User -->|"① query"| Widget
Widget -->|"② POST /search"| API
API |embed query| Model
API |"③ vector search"| DB
API -->|"④ results"| Widget
Widget --> User
Content -->|"A. RSS + HTML"| Indexer
Indexer |embed chunks| Model
Indexer -->|"B. chunks + vectors"| DB
style User fill:#e1f5ff
style Widget fill:#fff4e6
style Content fill:#fff9e6
style API fill:#ffe6e6
style Model fill:#fff0f5
style Indexer fill:#f0e6ff
style DB fill:#e6ffe6

The diagram shows the shared embedding model; worth stating explicitly anyway. The indexer and the API must use the same model. Query vectors and stored vectors must share one space or search is meaningless. Don’t mix Nomic at index time with OpenAI at query time, for example. The widget only sends text; it doesn’t know which model runs behind the API.

On paper it looked simple. In practice I changed the database schema three times and tuned ranking so blog posts didn’t crowd out events and talks. The similarity threshold mattered more than I expected: one parameter, large swing in results. Still, within a few days we had a working beta on the live site. Here’s what shipped.

The Result (Spoiler)

It took about three unhurried days and roughly $20 in Cursor tokens to build, debug, and deploy. Try it on percona.community (search icon in the header) or percona.community/search/.

The index currently covers the site: blog, events, talks, member profiles. Video from other platforms, the forum, and GitHub are planned; the design should allow new sources without replacing the stack.

This is beta: the content is public and search isn’t business-critical, but I watch stability and security.

The header has a search icon. Click it to get an input field and a popup with results, similarity score (0 to 1, how close the hit is in meaning), and API latency. The site stays static; the widget calls search.percona.community and renders JSON. “All results” opens /search/.

Try it on percona.community, e.g. slow queries mysql tuning or kubernetes operator database. Comments welcome if something feels off.

Full Results Page

A separate /search/ page with filters by content type, cards, and links.

Example

API

FastAPI at https://search.percona.community: embed the query, search Postgres, return JSON with links, scores, and timings (model vs database).

The service runs on AWS EC2 in Docker Compose: API, indexer, Postgres.

Demo Dashboard

The Cursor AI agent handled a lot of the boilerplate, so I also built a dev dashboard (/demo) to test search, run indexing, inspect history, and browse indexed chunks. Not for production, but it saved debugging time.

Demo Dashboard

Search history: making search better

Indexing status, to see when search data was last updated

Indexed documents with the ability to view data and chunks.

What I Used

Briefly, why this stack (deeper comparison in part two):

PostgreSQL + pgvector: vectors and metadata in one DB. Cosine similarity plus an HNSW index is enough at community scale. (pgvector in Percona docs)
Percona Distribution for PostgreSQL 18: PostgreSQL with pgvector and a Docker image. Vanilla Postgres works too if you install the extension; I used Percona to try “their” Postgres + pgvector in a real deploy.
Python + FastAPI: fast API setup, OpenAPI included, good libraries for crawl/embed/Postgres.
nomic-embed-text-v1 + sentence-transformers: open model, 768 dims, CPU-friendly, no per-chunk API bill. Index and query must use the same model; Nomic fits. I’ll compare others later.
Hugo + JavaScript: thin widget on existing static site.
Docker / Docker Compose: same layout locally and on EC2.
AWS EC2 + nginx: HTTPS on search.percona.community, CORS for GitHub Pages.
Cursor: main dev tool; its AI agent helped with boilerplate, wiring API to the demo, and Docker fixes. I still reviewed everything. Without it, the same work would have taken weeks.

How long it took

~6 hours with Cursor to a first prototype: crawl, API, Docker, basic demo;
~2 more days for schema changes, per-type ranking, embed/page widget, search history, dashboard, indexer fixes, EC2 deploy;
~$20 in Cursor tokens total.

Without AI I’d have stretched the same work over weeks. With the agent I mostly wrote tasks, checked output, and fixed edges.

About the code and repository

I’m not publishing the repo yet. The code is tied to percona.community: our RSS feeds, content types, Hugo widget, EC2 layout. It’s an internal prototype, not a reusable library.

If you wanted a drop-in repo: porting someone else’s monolith often takes longer than rebuilding from a clear sketch. Part two will have architecture, schema, and stack notes enough for a Cursor agent (or similar) to rebuild for your feeds and UI.

Interested in a generic open source or search-as-a-service version? Say so in the comments; I’m weighing whether it’s worth a separate project.

What’s Next

Try search on percona.community and comment what you find, especially where semantics beat the old substring search.

Part two will go inside: schema (including those three rewrites), chunking, HNSW, per-type result caps, and a local Docker Compose walkthrough.

The post Building Smart Semantic Search using PostgreSQL and pgvector. Case Study – Part 1 appeared first on MariaDB.org.

MariaDB Community Server 12.3 LTS: How It Scales AI Workloads and Delivers 4x Write Performance

Thu, 28 May 2026 19:27:14 +0000

We are thrilled to announce that MariaDB Community Server 12.3 has reached Generally Available (GA) status. As our latest Long Term Maintenance (LTS) release, version 12.3 marks the culmination of the 12.x rolling release cycle (12.0–12.2), bringing together a year of rapid innovation into a rock-solid, production-ready package maintained for the next three years. From scaling modern AI…

Source

The post MariaDB Community Server 12.3 LTS: How It Scales AI Workloads and Delivers 4x Write Performance appeared first on MariaDB.org.

Percona Operator for PostgreSQL 3.0.0: Hard Fork, OLM Scoping, Major Upgrades

Thu, 28 May 2026 13:29:03 +0000

The Percona Operator for PostgreSQL 3.0.0 is here. This is the release that completes the hard fork of the operator from the Crunchy Data PostgreSQL Operator into a fully independent project, with a dedicated upstream.pgv2.percona.com API group for the inherited CRDs, an automatic CRD-rename rollout for existing 2.x installs on upgrade, and a public roadmap that drives what comes next.

This release ships three headline changes that matter for production teams. The CRD renaming under a Percona-owned API group, which finally lets the Crunchy operator and the Percona operator coexist in the same Kubernetes cluster. Proper OLM namespace scoping for OpenShift installations. And the move to the official Percona Distribution image for major PostgreSQL version upgrades, aligning the upgrade path with the same binaries that run in your clusters.

All three land in service of the same goal: making 3.0.0 a clean, durable operational baseline for the operator’s next several years as an independent project. Future releases will be shaped by what the community asks for and contributes back. The public roadmap is the durable signal of that commitment.

In this post, you will learn about:

The hard fork and how the CRD rename unlocks coexistence with the Crunchy operator
OLM namespace-scoping improvements for OpenShift installations
The move to the official Percona Distribution image for major PostgreSQL version upgrades
Other improvements and the 2.7.0 deprecation
Supported PostgreSQL versions and platforms

Hard fork: CRDs renamed under upstream.pgv2.percona.com

The Percona Operator for PostgreSQL has, until now, been a soft fork. Custom Resources inherited from Crunchy PGO used the upstream postgres-operator.crunchydata.com API group. The two operators shared CRDs, which meant you could only run one of them in a given Kubernetes cluster. Installing both would lead to overlapping CRDs, conflicting webhooks, and finalizer collisions, so platform teams had to pick a side before they had finished evaluating.

Starting with 3.0.0, every inherited CRD is renamed into a new dedicated upstream.pgv2.percona.com API group (K8SPG-1007). Percona’s own native CRDs (such as PerconaPGCluster under pgv2.percona.com/v2) are unchanged. The change applies to the inherited resources: PostgresCluster, PGUpgrade, PGAdmin, and the rest.

Coexistence: running both operators in the same cluster

The practical effect is that the Crunchy Data PostgreSQL Operator and the Percona Operator for PostgreSQL can now run on the same Kubernetes cluster at the same time, even in the same namespaces, with no CRD or webhook conflict. That unlocks a few real workflows: evaluating both operators on the same staging cluster without spinning up a second cluster, running existing Crunchy-managed clusters in some namespaces while bringing up new Percona-managed clusters in others, or testing a new database version on the Percona side while production stays on Crunchy until you are confident. The choice between the two operators stops being all-or-nothing.

Upgrade behavior for existing 2.x installs

For an existing install, the upgrade to 3.0.0 is mechanically simple. The operator creates the new-API-group CRDs alongside the legacy ones, then runs a one-time migration that updates dependent objects (Secrets, certificates, finalizer references) to point at the new CRD instances. Existing custom resources keep working through the legacy CRDs during the transition, and once migration completes, all reconciliation moves to the new group.

Old PostgresCluster reference:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: cluster1

New (after upgrade to 3.0.0):

apiVersion: upstream.pgv2.percona.com/v1beta1
kind: PostgresCluster
metadata:
  name: cluster1

Day-to-day, your PerconaPGCluster Custom Resource (the one most teams interact with directly) is unchanged. The rename mostly matters in three situations: when a kubectl filter or a GitOps repository hard-codes the old API group, when a CI pipeline references the legacy CRD by name, and when you run the Percona and Crunchy operators side by side and need them not to collide.

Note: During the CRD migration on upgrade, the release notes report brief disruptions to pgBackRest operations (typically 1 to 2 minutes) while Kubernetes propagates certificate changes. Plan the upgrade during a maintenance window if backup continuity is critical, or pause scheduled backups during the upgrade.

Full details on the API-group change are in the Percona PostgreSQL operator documentation.

Improved OLM namespace scoping for OpenShift

OpenShift users install operators through the OpenShift Lifecycle Manager (OLM), and OLM enforces an OperatorGroup to scope which namespaces an operator watches. In practice, 2.x had quirks: teams that selected “Single namespace” mode would sometimes see the operator reconciling CRs in other namespaces, and teams in “All namespaces” mode would sometimes see incomplete coverage when CRs were created in newly-added namespaces.

3.0.0 fixes this by aligning the operator’s namespace watch list with the OperatorGroup that OLM applies. All-namespaces installs watch all namespaces. Single-namespace installs respect the targetNamespaces set on the OperatorGroup.

Why it matters in shared infrastructure

For an OpenShift platform team running shared infrastructure, this distinction matters operationally. A typical setup has the database operator installed once in a platform namespace (such as openshift-operators) but expected to serve PerconaPGCluster resources owned by individual application teams in their own namespaces. If the operator over-reaches into namespaces it should not watch, RBAC noise multiplies. If it under-reaches, application teams file tickets about clusters that never reconcile. The 3.0.0 alignment with OperatorGroup semantics removes both failure modes.

OperatorGroup wiring

For users installing through OLM via the OpenShift web console, the install flow is unchanged. The fix is in how the operator’s reconciler interprets the OLM-supplied namespace scope after install. For users who manage OperatorGroups directly, a single-namespace install looks like this:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: percona-pg-operator-group
  namespace: postgres-prod
spec:
  targetNamespaces:
    - postgres-prod

And an all-namespaces install:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: percona-pg-operator-group
  namespace: openshift-operators
spec: {}

The empty spec: {} (or an OperatorGroup with no targetNamespaces) means “watch all namespaces” by OLM convention. The 3.0.0 operator now honors that.

Note: After you upgrade an existing 2.x install to 3.0.0, the operator may begin reconciling PerconaPGCluster resources in namespaces it had previously ignored due to the prior scoping bug. Audit existing CRs across your cluster before upgrading, especially if you have stale test clusters in unintended namespaces. The release notes call this out explicitly.

Note for community vs certified bundle users: Community OLM bundles did not support cluster-wide (all-namespaces) mode in earlier versions, 3.0.0 adds it. Certified bundles already supported cluster-wide mode, but they used a separate stable-cw channel for it with 3.0.0 the channels are unified, so users upgrading from a certified stable-cw install need to switch their subscription channel to stable to receive the upgrade.

For the full install workflow on OpenShift, see the OpenShift installation documentation.

Major PostgreSQL version upgrades now use the official Percona Distribution image

Major-version upgrades (for example, PostgreSQL 17 to 18) require running pg_upgrade, which needs binaries for both the source and target versions in the same environment. The operator has supported major-version upgrades since 2.x, but it shipped its own dedicated upgrade image to do so. That worked, but it meant a Percona-specific image lived in the upgrade path, separate from the same Percona Distribution for PostgreSQL build that runs in your clusters.

Switching to the official Percona Distribution image

In 3.0.0, the operator switches to using the official Percona Distribution for PostgreSQL image for major-version upgrades: percona/percona-distribution-postgresql-upgrade (current tag: 18.4-17.10-16.14-15.18-14.23-1, which encodes the bundled major versions). The benefit is alignment: the binaries that run pg_upgrade are the same binaries that ship in the corresponding percona-distribution-postgresql image you already run in production, built from the same source, signed the same way, and patched on the same schedule. The operator orchestrates the upgrade through the PerconaPGUpgrade Custom Resource that names the source and target versions, the upgrade image, and the target component images (PostgreSQL, pgBouncer, pgBackRest).

Running an upgrade through the PerconaPGUpgrade CR

A PostgreSQL 17 to 18 upgrade looks like this:

apiVersion: pgv2.percona.com/v2
kind: PerconaPGUpgrade
metadata:
  name: cluster1-17-to-18
spec:
  postgresClusterName: cluster1
  image: docker.io/percona/percona-distribution-postgresql-upgrade:18.4-17.10-16.14-15.18-14.23-1
  fromPostgresVersion: 17
  toPostgresVersion: 18
  toPostgresImage: docker.io/percona/percona-distribution-postgresql:18.4-1
  toPgBouncerImage: docker.io/percona/percona-pgbouncer:1.25.2-1
  toPgBackRestImage: docker.io/percona/percona-pgbackrest:2.58.0-2

Apply it with kubectl apply -f upgrade.yaml -n . The operator reconciles the upgrade as a controlled, observable process: it brings the cluster down for the upgrade window, runs pg_upgrade from the bundled image, brings the cluster back up on the target version, and updates pgBouncer and pgBackRest images in the same step.

Operationally, this matters for teams running on PostgreSQL’s annual major-version cadence. Every September brings a new major release; staying on a supported version means executing one major upgrade per cluster per year. Pulling the upgrade image from the same percona-distribution-postgresql registry path as the runtime image means image-signature verification, mirror-to-private-registry rules, and CVE-scanning policies you already have in place apply to the upgrade flow without any per-image exception.

Note: The pgaudit extension is not upgraded automatically. After the operator completes the major version upgrade, drop and recreate pgaudit manually in each database that uses it: DROP EXTENSION pgaudit; followed by CREATE EXTENSION pgaudit;. The release notes call this out as a required step (K8SPG-1022). Also worth scanning for collation-dependent indexes after the upgrade and refreshing collation metadata with ALTER DATABASE REFRESH COLLATION VERSION; per the upstream PostgreSQL 18 release notes.

Full procedure, prerequisites, and rollback notes are in the major version upgrade documentation.

Other Improvements

Operational polish landed alongside the headline changes:

Go 1.26 update (K8SPG-1019): the operator binary is now built with Go 1.26, picking up performance optimizations, tooling improvements, and the security fixes that landed in the Go runtime since the previous release.
pgaudit upgrade documentation (K8SPG-1022): the major-version upgrade docs now include an explicit pgaudit drop-and-recreate procedure, surfacing the gotcha that previously caught users mid-upgrade.

The release also defaults the cluster-upgrade documentation to PostgreSQL 18 across all examples and tutorials.

Supported software and platforms

The Percona Operator for PostgreSQL 3.0.0 is developed and tested on:

PostgreSQL: 14.23-1, 15.18-1, 16.14-1, 17.10-1, 18.4-1
pgBackRest: 2.58.0-2
pgBouncer: 1.25.2-1
Patroni: 4.1.3
PostGIS: 3.5.6
PMM Client: 2.44.1-1 and 3.7.1

Supported Kubernetes platforms:

Google Kubernetes Engine (GKE) 1.33 to 1.35
Amazon Elastic Kubernetes Service (EKS) 1.33 to 1.35
OpenShift 4.18 to 4.21
Azure Kubernetes Service (AKS) 1.33 to 1.35
Minikube 1.38.1 (Kubernetes v1.35.1) for local development

Deprecation: 2.7.0 support dropped

Support for Custom Resource Definitions from operator version 2.7.0 has been removed. If you are still on 2.7.0, upgrade to 2.8.x or 2.9.x first, then upgrade to 3.0.0. The CRD migration described above only handles 2.8.x and 2.9.x to 3.0.0 transitions cleanly.

Conclusion

3.0.0 is the release where the Percona Operator for PostgreSQL becomes a fully independent project. The CRD rename removes the last upstream coupling that mattered operationally. The OLM scoping fix removes a long-standing OpenShift quirk. The official major-version upgrade image removes one of the more painful operational gaps in earlier versions.

Beyond the technical work, 3.0.0 is also where Percona’s commitment to community-driven development moves from intent to mechanism. The public roadmap is open. The issue tracker is open. The images are freely redistributable. Future releases will be shaped by what the community asks for, files, and contributes back. If there is a feature you want to see in 3.1.0 or 3.2.0, open an issue or a PR, that is where the work happens now.

Try It Out

Release notes: Percona Operator for PostgreSQL 3.0.0 Release Notes
Documentation: https://docs.percona.com/percona-operator-for-postgresql/latest/
GitHub: percona/percona-postgresql-operator
Public roadmap: https://github.com/orgs/percona/projects/10/views/6
Issue tracker: https://github.com/percona/percona-postgresql-operator/issues
Community Forum: forums.percona.com

The post Percona Operator for PostgreSQL 3.0.0: Hard Fork, OLM Scoping, Major Upgrades appeared first on Percona.

The post Percona Operator for PostgreSQL 3.0.0: Hard Fork, OLM Scoping, Major Upgrades appeared first on MariaDB.org.

MariaDB Foundation at Oracle’s MySQL Contributor Summit: Ecosystems, Forks and Constructive Coexistence

Thu, 28 May 2026 12:33:27 +0000

Last week, Oracle invited MariaDB Foundation to give a presentation at Oracle’s MySQL Contributor Summit 2026. I had the opportunity to participate remotely and speak about MariaDB’s role within the broader MySQL ecosystem. …

Continue reading “MariaDB Foundation at Oracle’s MySQL Contributor Summit: Ecosystems, Forks and Constructive Coexistence”

The post MariaDB Foundation at Oracle’s MySQL Contributor Summit: Ecosystems, Forks and Constructive Coexistence appeared first on MariaDB.org.

MariaDB Community Server Q2 2026 corrective releases

Wed, 27 May 2026 19:09:10 +0000

MariaDB is pleased to announce the immediate availability of MariaDB Community Server 11.8.8, 11.4.12, 10.11.18, and 10.6.27 corrective releases. See the release notes and changelogs for additional details on each release and visit mariadb.com/downloads to download.

Source

The post MariaDB Community Server Q2 2026 corrective releases appeared first on MariaDB.org.

Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Backup-Restore and PV Reuse

Wed, 27 May 2026 12:02:55 +0000

A Percona PostgreSQL operator pgBackRest restore is the simplest way to move off the Crunchy Data PostgreSQL Operator: take a full Crunchy backup, point the new Percona cluster’s dataSource at the existing pgBackRest archive, and the cluster bootstraps from it before its first start. This post covers that path, plus a second option, persistent-volume reuse, for cases where you want to skip the data copy entirely.

This is part 3 of a 3-part series on running PostgreSQL on Kubernetes with a fully open-source operator. Part 1 walked through the changing open-source landscape and announced the hard fork of the Crunchy Data PostgreSQL Operator into the fully independent Percona PostgreSQL Operator v3.0.0. Part 2 covered the standby cluster method, the safest migration path when downtime budget is tight.

This post covers two simpler paths:

Backup and restore, the fastest if you can tolerate a short application-downtime window
Persistent volume reuse, when you want to skip the data copy entirely and keep the existing PGDATA

If you are landing here cold, start with part 1 for the why, then read Part 2 for the standby method. The rest of this post assumes you have already decided to migrate and want a tested playbook.

Tested with

Component	Version
Crunchy Data PostgreSQL Kubernetes Operator	v5.8.x (tested on v5.8.7)
Percona PostgreSQL Kubernetes Operator	v3.x.x (tested on v3.0.0)
PostgreSQL	18 (must match between source and target)
Object storage	SeaweedFS (Apache-2.0), or any S3-compatible service. Required for the backup-and-restore method, optional for PV reuse.
Tools	`kubectl`, `helm` (v3)

Different versions may have slight differences in CR fields or behavior. Always consult the official documentation for the operator and PostgreSQL version you are running.

What this post does NOT cover

Application-side connection-string changes beyond updating to the new pgBouncer service
Schema-changing upgrades, major PostgreSQL version upgrades, or extension migrations
Crunchy enterprise-only features like TDE or pgBackRest custom encryption
Operating two operators against the same namespace before the hard fork. Use Percona PostgreSQL Operator v3.0.0 or higher.

1. Migration using backup and restore

This is often the fastest and simplest path, especially when you do not need a live standby. You take a full backup of the Crunchy source cluster, then create a Percona cluster that automatically restores from that backup before its first start.

Data written between the final backup and the application cutover is lost, so the migration window is the time between those two events. For a near-zero-downtime alternative, see part 2: standby cluster method.

Overview

Before you begin

Set the namespace once. Every command in this guide reads from this variable:

export MIGRATION_NS=postgres-migration
kubectl create namespace $MIGRATION_NS

Deploy SeaweedFS

Skip this step if you already have an S3-compatible repository (AWS S3, GCS, Ceph). Update the endpoint and credentials in the YAML examples accordingly.

SeaweedFS provides an S3-compatible object store that runs inside Kubernetes. Both operators will use it as the shared pgBackRest WAL archive.

TLS is required. pgBackRest always connects to S3 endpoints over HTTPS, even when repo1-s3-verify-tls: "n" is set (that flag skips certificate verification, it does not fall back to HTTP). The steps below generate a self-signed certificate and pass it to SeaweedFS via Helm values.

# Generate a self-signed TLS certificate for SeaweedFS S3
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 
  -keyout /tmp/seaweedfs.key 
  -out /tmp/seaweedfs.crt 
  -subj "/CN=seaweedfs-all-in-one"

kubectl -n $MIGRATION_NS create secret tls seaweedfs-s3-tls 
  --cert=/tmp/seaweedfs.crt 
  --key=/tmp/seaweedfs.key

helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm
helm repo update

helm install seaweedfs seaweedfs/seaweedfs 
  --namespace $MIGRATION_NS 
  --version 4.23.0 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-backup-restore/examples/seaweedfs-values.yaml 
  --wait

The Helm values file in the repo creates the pg-migration bucket on first start, so no separate aws s3 mb step is needed.

Step 0. Create pgBackRest secrets

Both operators need credentials to read and write the shared SeaweedFS bucket. Apply the secrets from examples/01-pgbackrest-secrets.yaml:

# Copy and edit the file first to set your credentials.
kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-backup-restore/examples/01-pgbackrest-secrets.yaml

Both contain the same SeaweedFS credentials (pgmigration / pgmigration123). For AWS S3, replace those with your IAM access key ID and secret access key.

Step 1. Start with your existing Crunchy Data cluster

If you already have a running Crunchy cluster, ensure its pgBackRest repo1 points at the shared bucket. The repo1-path value must match the path that will be referenced in the Percona dataSource.pgbackrest.global.repo1-path field.

Optional: deploy the Crunchy operator for testing. The Helm install below is shown only as a quick way to reproduce this blog post’s example. The migration steps in the rest of this post do not depend on how you deployed the source operator.

helm install pgo 
  oci://registry.developers.crunchydata.com/crunchydata/pgo 
  -n $MIGRATION_NS 
  --version 5.8.7 
  --set singleNamespace=true 
  --wait

To start a fresh source cluster for testing, apply examples/02-crunchy-source-cluster.yaml:

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-backup-restore/examples/02-crunchy-source-cluster.yaml

The key pgBackRest settings:

global:
  repo1-path: /crunchy-to-percona/repo1   # source repo referenced in Percona dataSource
  repo1-s3-uri-style: path                # required for path-style S3 endpoints (SeaweedFS, MinIO)
  repo1-s3-verify-tls: "n"                # skip TLS verification for self-signed cert; remove for AWS S3
repos:
  - name: repo1
    s3:
      bucket: pg-migration
      endpoint: seaweedfs-all-in-one.postgres-migration.svc.cluster.local:8443
      region: us-east-1

Wait for the cluster and its pgBackRest stanza to be ready:

kubectl wait pod 
  --selector postgres-operator.crunchydata.com/cluster=crunchy-source,postgres-operator.crunchydata.com/data=postgres 
  -n $MIGRATION_NS 
  --for=condition=Ready 
  --timeout=300s

kubectl wait postgrescluster/crunchy-source 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.pgbackrest.repos[0].stanzaCreated}'=true 
  --timeout=300s

Step 2. Trigger a full backup (the migration cutover point)

This is the backup the Percona cluster will restore from. Stop accepting writes on the application side before triggering it to ensure a consistent snapshot, or accept that data written after this backup will be lost.

kubectl annotate postgrescluster crunchy-source 
  -n $MIGRATION_NS 
  postgres-operator.crunchydata.com/pgbackrest-backup="$(date +%s)"

kubectl wait job 
  --selector postgres-operator.crunchydata.com/pgbackrest-backup=manual,postgres-operator.crunchydata.com/cluster=crunchy-source 
  -n $MIGRATION_NS 
  --for=condition=Complete 
  --timeout=600s

Step 3. Deploy the Percona Operator

kubectl apply -n $MIGRATION_NS --server-side 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/tags/v3.0.0/deploy/bundle.yaml

kubectl wait deployment percona-postgresql-operator 
  -n $MIGRATION_NS 
  --for=condition=Available 
  --timeout=120s

Step 4. Create the Percona cluster from the backup

Apply examples/03-percona-restored-cluster.yaml:

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-backup-restore/examples/03-percona-restored-cluster.yaml

The key section that bootstraps the cluster from the Crunchy backup:

dataSource:
  pgbackrest:
    stanza: db
    configuration:
      - secret:
          name: percona-pgbackrest-secret
    global:
      # Must match repo1-path in the Crunchy source cluster exactly.
      repo1-path: /crunchy-to-percona/repo1
      repo1-s3-uri-style: path
      repo1-s3-verify-tls: "n"
    repo:
      name: repo1
      s3:
        bucket: pg-migration
        endpoint: seaweedfs-all-in-one.postgres-migration.svc.cluster.local:8443
        region: us-east-1

The Percona cluster’s own backup repository must use a different path from the Crunchy source:

backups:
  pgbackrest:
    global:
      repo1-path: /percona-restored/repo1   # different from Crunchy's path

As soon as the Custom Resource is applied, the cluster is bootstrapped from the storage referenced in dataSource and then started. Once the cluster becomes ready, you can immediately create new backups; in this case, repo1 from the backups section will be used as the target repository.

Wait for the cluster to reach ready state:

kubectl wait perconapgcluster/percona-restored 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=ready 
  --timeout=600s

Verify the data was restored successfully:

PERCONA_PRIMARY=$(kubectl get pod -n $MIGRATION_NS 
  --selector postgres-operator.crunchydata.com/cluster=percona-restored,postgres-operator.crunchydata.com/role=primary 
  -o jsonpath='{.items[0].metadata.name}')

kubectl -n $MIGRATION_NS exec "${PERCONA_PRIMARY}" -c database -- 
  psql -t -c "SELECT pg_is_in_recovery();"

Expected output: f. The cluster is the primary and accepts writes.

Step 5. Verify the cluster is healthy

kubectl wait perconapgcluster/percona-restored 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=ready 
  --timeout=600s

kubectl wait perconapgcluster/percona-restored 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.pgbackrest.repos[0].stanzaCreated}'=true 
  --timeout=300s

Step 6. Take a post-migration backup

Apply examples/04-post-migration-backup.yaml:

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-backup-restore/examples/04-post-migration-backup.yaml

kubectl wait perconapgbackup/post-migration-backup 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=Succeeded 
  --timeout=600s

This creates a clean recovery baseline on the Percona cluster’s own repository. All future PITR restores will use this backup, independent of the Crunchy archive.

Step 7. Reconnect your application

kubectl get service -n $MIGRATION_NS 
  --selector postgres-operator.crunchydata.com/cluster=percona-restored,postgres-operator.crunchydata.com/role=pgbouncer

Step 8. Clean up the Crunchy cluster

Once the migration is verified and your application is connected to the new cluster:

kubectl delete postgrescluster crunchy-source -n $MIGRATION_NS
helm uninstall pgo -n $MIGRATION_NS

Rollback

Until Step 8, rollback is straightforward: switch the application connection string back to the Crunchy pgBouncer service. The Crunchy primary still holds the authoritative state because no writes were directed at the Percona cluster during the cutover (you stopped writes before Step 2). Any writes the application sent to the Percona cluster after cutover will not be present on Crunchy and would need to be replayed manually.

After Step 8, rollback requires restoring the Crunchy cluster from a backup, which is feasible because the original repo1 is still in the bucket.

Troubleshooting

archive.info missing. The repo1-path in dataSource.pgbackrest.global must match the Crunchy source cluster’s repo1-path exactly:

kubectl get postgrescluster crunchy-source -n $MIGRATION_NS 
  -o jsonpath='{.spec.backups.pgbackrest.global.repo1-path}'

kubectl get perconapgcluster percona-restored -n $MIGRATION_NS 
  -o jsonpath='{.spec.dataSource.pgbackrest.global.repo1-path}'

Restore job fails with TLS errors. pgBackRest requires HTTPS even with repo1-s3-verify-tls: "n". Verify SeaweedFS is reachable:

kubectl run -i --rm s3-check 
  --image=perconalab/awscli 
  --restart=Never 
  -n $MIGRATION_NS 
  -- bash -c "
    AWS_ACCESS_KEY_ID=pgmigration 
    AWS_SECRET_ACCESS_KEY=pgmigration123 
    AWS_DEFAULT_REGION=us-east-1 
    aws --endpoint-url https://seaweedfs-all-in-one.${MIGRATION_NS}.svc.cluster.local:8443 
        --no-verify-ssl 
        s3 ls s3://pg-migration
  "

Cluster stuck in restoring state. Check the pgBackRest restore job logs:

kubectl logs 
  --selector postgres-operator.crunchydata.com/cluster=percona-restored,postgres-operator.crunchydata.com/pgbackrest-restore=percona-restored 
  -n $MIGRATION_NS 
  -c pgbackrest

Data missing after restore. The restore captures data up to the latest backup. If post-backup data is critical, re-run the backup on the Crunchy cluster after quiescing writes, then delete and recreate the Percona cluster to restore from the newer backup.

2. Migration using existing persistent volumes

This method reuses the Crunchy primary’s PGDATA persistent volume directly. It avoids a full backup-restore cycle: you retain the Crunchy primary’s PV, delete the Crunchy cluster, then create a Percona cluster whose PVC binds to that same PV. PostgreSQL starts on the existing data directory without any restore step.

It is useful when:

you want to avoid copying data
your storage is very large
you must preserve the original data directory exactly
you removed the cluster but kept the PV

Overview

Before you begin

export MIGRATION_NS=postgres-migration
kubectl create namespace $MIGRATION_NS

Step 1. Deploy the Crunchy and Percona operators

Both operators run in the same namespace. Crunchy PGO is uninstalled during the migration once the PV is retained.

Note (Crunchy): The Helm install for Crunchy PGO below is shown only as a quick way to reproduce this blog post’s example. If you are running Crunchy PGO in production, follow the official Crunchy Data documentation for installation. The migration steps in the rest of this post do not depend on how you deployed the source operator.

Note (Percona): The kubectl apply of the Percona operator below uses defult configuration of v3.0.0 from the operator repo for reproducibility of this guide. For production deployments, follow the official Percona Operator for PostgreSQL installation documentation to ensure the cluster configuration is properly sized and configured for your workload and traffic requirements.

helm install pgo 
  oci://registry.developers.crunchydata.com/crunchydata/pgo 
  -n $MIGRATION_NS 
  --version 5.8.7 
  --set singleNamespace=true 
  --wait

kubectl apply -n $MIGRATION_NS --server-side 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/tags/v3.0.0/deploy/bundle.yaml

kubectl wait deployment pgo 
  -n $MIGRATION_NS --for=condition=Available --timeout=120s

kubectl wait deployment percona-postgresql-operator 
  -n $MIGRATION_NS --for=condition=Available --timeout=120s

Step 2. Start the Crunchy source cluster

If you already have a running Crunchy cluster with replicas: 1, proceed to Step 3.

To start a fresh cluster for testing:

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-pv/examples/01-crunchy-source-cluster.yaml

kubectl wait pod 
  --selector postgres-operator.crunchydata.com/cluster=crunchy-source,postgres-operator.crunchydata.com/role=master 
  -n $MIGRATION_NS 
  --for=condition=Ready 
  --timeout=300s

Step 3. Stop writes and identify the primary PV

Stop your application from writing to the database. This is the start of the downtime window. Then identify the primary pod, its PVC, and the backing PV:

PRIMARY=$(kubectl get pod -n $MIGRATION_NS 
  --selector postgres-operator.crunchydata.com/cluster=crunchy-source,postgres-operator.crunchydata.com/role=master 
  -o jsonpath='{.items[0].metadata.name}')

PVC_NAME=$(kubectl get pod -n $MIGRATION_NS "${PRIMARY}" 
  -o jsonpath='{.spec.volumes[?(@.name=="postgres-data")].persistentVolumeClaim.claimName}')

PV_NAME=$(kubectl get pvc -n $MIGRATION_NS "${PVC_NAME}" 
  -o jsonpath='{.spec.volumeName}')

echo "Primary pod: ${PRIMARY}"
echo "PVC:         ${PVC_NAME}"
echo "PV:          ${PV_NAME}"

Step 4. Configure the source cluster to retain PVs

If you want to delete the Crunchy source cluster but keep the persistent volumes, the PV reclaim policy must be set to Retain. For dynamically provisioned PersistentVolumes, the default reclaim policy is Delete, which removes the data once there are no more PersistentVolumeClaims associated with the PV.

kubectl patch pv "${PV_NAME}" 
  -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

kubectl get pv -n $MIGRATION_NS

Delete the Crunchy cluster and uninstall PGO:

kubectl patch postgrescluster crunchy-source -n $MIGRATION_NS 
  --type=json -p='[{"op":"remove","path":"/metadata/finalizers"}]' 2>/dev/null || true

kubectl delete postgrescluster crunchy-source -n $MIGRATION_NS
helm uninstall pgo -n $MIGRATION_NS

After the PVC is deleted, the PV enters Released state. A Released PV retains its old claimRef and cannot be claimed by a new PVC until it is cleared:

kubectl patch pv "${PV_NAME}" --type=json 
  -p='[{"op":"remove","path":"/spec/claimRef"}]'

kubectl wait pv "${PV_NAME}" 
  --for=jsonpath='{.status.phase}'=Available 
  --timeout=60s

Label the PV so the Percona PVC selector binds to it exclusively. This prevents accidental binding to another available volume:

kubectl label pv "${PV_NAME}" percona-pv-migration=migrated

Step 5. Create the Percona cluster with the retained volume

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-pv/examples/02-percona-migrated-cluster.yaml

The key section that binds the PVC to the labelled PV:

instances:
  - name: instance1
    replicas: 1
    dataVolumeClaimSpec:
      selector:
        matchLabels:
          percona-pv-migration: migrated

The Percona Operator creates a PVC with that selector. The PVC binds to the labelled PV, and PostgreSQL starts on the existing PGDATA directory with no restore needed. pgBackRest uses a local PVC-backed repository (repo1.volume), so no S3 credentials or external storage are required, but you can use S3 storage as well.

Wait for the cluster to become ready and verify the data is intact:

kubectl wait perconapgcluster/percona-migrated 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=ready 
  --timeout=600s

PERCONA_PRIMARY=$(kubectl get pod -n $MIGRATION_NS 
  --selector postgres-operator.crunchydata.com/cluster=percona-migrated,postgres-operator.crunchydata.com/role=primary 
  -o jsonpath='{.items[0].metadata.name}')

kubectl -n $MIGRATION_NS exec "${PERCONA_PRIMARY}" -c database -- 
  psql -t -c "SELECT pg_is_in_recovery();"

Expected output: f. The cluster is the primary and accepts writes.

Step 6. Scale up replicas

The cluster started with a single replica to reuse the migrated PV. Once the primary is healthy, drop the PVC selector and scale out so the operator can provision fresh replica volumes from the storage class:

kubectl patch perconapgcluster percona-migrated 
  --namespace $MIGRATION_NS 
  --type=json 
  -p='[
    {"op":"remove","path":"/spec/instances/0/dataVolumeClaimSpec/selector"},
    {"op":"replace","path":"/spec/instances/0/replicas","value":3}
  ]'

kubectl wait perconapgcluster/percona-migrated 
  --namespace $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=ready 
  --timeout=300s

Removing the selector here is important: leaving it in place would cause the new replica PVCs to fail provisioning because no other PV carries the migration label.

Step 7. Take a post-migration backup

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-pv/examples/03-post-migration-backup.yaml

kubectl wait perconapgbackup/post-migration-backup 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=Succeeded 
  --timeout=600s

This creates the first backup on the Percona cluster’s local pgBackRest repository, establishing a baseline for future PITR restores.

Step 8. Reconnect your application

kubectl get service -n $MIGRATION_NS 
  --selector postgres-operator.crunchydata.com/cluster=percona-migrated,postgres-operator.crunchydata.com/role=pgbouncer

Step 9. Cleanup

After the migration is verified, remove the migration label from the PV (Step 6 already removed the PVC selector that depended on it):

kubectl label pv "${PV_NAME}" percona-pv-migration-

Rollback

PV migration is the least rollback-friendly of the three methods. Once the Percona cluster has started writing to the PGDATA directory, the original Crunchy timeline is gone. If you need a way back, take a Crunchy-side pgBackRest backup before Step 4 and treat that backup as your rollback point. Recovery is then a fresh Crunchy cluster restored from that backup.

Troubleshooting

PVC stays in Pending state. The PVC selector did not match the labelled PV. Verify the label and PV phase:

kubectl get pv "${PV_NAME}" --show-labels
kubectl get pv "${PV_NAME}" -o jsonpath='{.status.phase}'

PostgreSQL fails to start (data directory errors). Check the database container logs:

kubectl -n $MIGRATION_NS logs "${PERCONA_PRIMARY}" -c database

If the Crunchy cluster was shut down uncleanly, there may be incomplete WAL. Patroni will attempt crash recovery automatically; check the logs for progress.

PV was deleted before setting Retain. If the PV was deleted along with the PVC (default Delete policy), the data is gone and PV migration is no longer possible. Use the backup-and-restore migration above, restoring from the most recent pgBackRest backup.

Conclusion

Two more migration paths from the Crunchy Data PostgreSQL Operator to the fully open-source Percona PostgreSQL Operator. Combined with Part 2, the series gives you three production-tested options:

Standby cluster (part 2): near-zero downtime via streaming replication and pgBackRest standby
Backup and restore (this post): the simplest path, restoring directly from a Crunchy pgBackRest backup
Persistent volume reuse (this post): when you want to keep storage and skip the data copy

All three approaches are safe, predictable, and reversible, with the rollback caveats noted in each section. Because Percona’s operator, images, and tooling are 100 percent open source, you keep full control: you can always migrate back to the Crunchy operator, or out to another open-source operator (Zalando, StackGres, CloudNativePG) using the same patterns. That last journey is a topic for a future post.

This post covers basic deployment patterns and simplified configuration examples. If your environment uses custom images, Crunchy enterprise features, or otherwise needs tailored migration steps, contact the Percona team and we will help you plan and execute the move.

Try It Out

Percona Operator for PostgreSQL docs: https://docs.percona.com/percona-operator-for-postgresql/latest/
GitHub: https://github.com/percona/percona-postgresql-operator
Public roadmap: https://github.com/orgs/percona/projects/10/views/6
Community Forum: https://forums.percona.com/c/postgresql/percona-kubernetes-operator-for-postgresql/68

The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Backup-Restore and PV Reuse appeared first on Percona.

The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Backup-Restore and PV Reuse appeared first on MariaDB.org.

Virtuozzo Renews Sponsorship of MariaDB Foundation

Wed, 27 May 2026 10:48:37 +0000

We are delighted to announce that Virtuozzo has renewed its sponsorship of MariaDB Foundation.
Virtuozzo has been a long-standing supporter of open infrastructure, service providers, and cloud platforms, and we are very pleased to continue strengthening our collaboration. …

Continue reading “Virtuozzo Renews Sponsorship of MariaDB Foundation”

The post Virtuozzo Renews Sponsorship of MariaDB Foundation appeared first on MariaDB.org.

How MariaDB Cloud Optimizes Database Resilience and Cost: A Deep Dive into High Availability

Tue, 26 May 2026 17:49:30 +0000

MariaDB has long been the backbone of mission-critical applications, valued for its rich feature set, developer-friendly SQL dialect and a massive global ecosystem of tools. But as workloads migrate to the cloud, the conversation has shifted from simple database features to operational excellence. Today, the priority is on seamless scaling, rock-solid security, effortless replication…

Source

The post How MariaDB Cloud Optimizes Database Resilience and Cost: A Deep Dive into High Availability appeared first on MariaDB.org.

How Procurement Leaders Realize ROI from Open Source Databases

Tue, 26 May 2026 15:20:50 +0000

Database purchases are often considered just another IT expense. The primary concerns are limited to license fees and sign support contracts. But this mindset ignores hidden costs like downtime, excess capacity, rising renewal fees, and data transfer charges.

The financial sector particularly suffers, as proprietary databases hinder system updates for compliance and real-time AI, impose rigid pricing, and shift operational risk to the buyer.

Procurement leaders are starting to see this problem. Over 74% of Database as a Service (DBaaS) users cite high and unpredictable costs as their top challenge due to proprietary pricing structures. Meanwhile, the open source database market is projected to reach $63.48 billion by 2034, signaling a major industry shift.

Switching to open source databases offers procurement teams better financial control, allowing spending to be measured and predicted like any other asset.

This article provides a framework for procurement leaders to realize the ROI of open source databases. It explains how to move beyond license-focused sourcing to a strategy that prioritizes risk reduction, spend predictability, and vendor optionality.

The procurement blind spot: What database TCO really includes

License fees often become the total cost baseline in many sourcing cycles. But that license cost is only a fraction of the true database Total Cost of Ownership (TCO). The massive operational and strategic costs are hidden beneath the surface. The cost drivers show up in six areas:

Outage and SLA penalties: Downtime incurred due to vendor-managed recovery or architecture constraints.
Forced over-provisioning: Licensing models that require institutions to buy capacity they may not fully use (because licenses are sold in “blocks” or “cores”). If your workload requires 9 cores, you are often forced to pay for 16.
Escalating renewal pricing: Per-core or per-instance fees that climb with infrastructure growth, unrelated to feature value.
Data egress fees and platform taxes: Cloud DBaaS charges for cross-region replication, data exports, backups, and traffic that accumulate unpredictably.
Staffing and operational overhead: Database administrators and Site Reliability Engineers (SREs) dedicating time to tuning, patching, and managing vendor-specific tooling.
Migration and switching costs: The financial and technical burden of moving data if vendor changes or licensing terms shift.

Downtime as a financial liability (Not a technical issue)

Procurement teams may not always be the primary owners of downtime risk, but they often influence it through vendor selection, contract terms, and support coverage. Because outages carry measurable business impact, support responsiveness and recovery capability should be evaluated as a financial exposure.

For example, critical-incident response and restoration expectations must be defined and aligned with the organization’s risk tolerance. If not, the institution may be accepting avoidable financial and operational exposure during high-severity events.

The financial model is simple:

(Incident Frequency) × (Incident Duration) × (Cost Per Hour) = Annual Risk Exposure

For a bank with three major outages per year, averaging 4 hours each, with a $500K/hour impact:

3 × 4 × $500,000 = $6 million in annual downtime risk

Open source works best when supported by vendor-agnostic experts like Percona’s. It allows procurement to source support that focuses on restoring service across the entire stack, rather than defending a specific piece of software.

Cost predictability vs. vendor-driven cost escalation

Budget forecasting becomes impossible when database costs are unpredictable. Yet proprietary licensing introduces multiple mechanisms that undermine forecast accuracy and negatively affect business success.

The scaling penalty: As your customer base grows and you add more hardware, your software costs increase exponentially because of per-core or per-socket licensing.
Tier creep: You might start on a Standard tier, but as soon as you need a critical security feature like advanced encryption or granular auditing for DORA (Digital Operational Resilience Act) compliance, you are forced into an Enterprise tier that can cost more.

In contrast, open source separates the software cost from growth. If you double your infrastructure to handle peak trading volumes, your software cost remains zero. It lets procurement provide the business with a linear, predictable cost model (you only pay for the infrastructure you use and the expertise required to run it).

Vendor lock-in and contract leverage

The primary objective of a sales team from a proprietary vendor is to make customers more committed to their products. The more vendor-specific features you use, the harder it is for procurement to negotiate when renewing the contract. Over time:

Switching costs add up: Data migration, changing schemas, and application changes create high barriers to leaving.
Vendor leverage grows: As integration deepens, alternatives become more costly, reducing competition in renewal negotiations.
Renewal pricing rises: With fewer alternatives, vendors increase renewal fees, confident that institutions cannot easily leave.

Percona’s research on Redis users shows that nearly 75% have considered or tested alternatives when licensing terms change, but most couldn’t really switch. This is vendor lock-in at its most destructive, as institutions resent the vendor but cannot leave.

Open source gives institutions more options (restores vendor optionality). Procurement can regain power through:

Multi-vendor support: If one support provider underperforms or raises prices, you can move your support contract to another provider without migrating your data.
Deployment flexibility: Open source can run on-premise, in any cloud (AWS, Azure, GCP), or in a hybrid model.
Lower switching costs: Since open source uses standard protocols, it is easier to find talent and tools that work across the stack, and reduce the exit cost of any single relationship.

Operational efficiency as a budget control mechanism

Database operations often increase operating expenses. When teams react to problems rather than prevent them, labor costs rise without notice. Inefficient database management raises labor costs in two ways:

Specialized labor scarcity: Finding a specialist for a proprietary database is expensive.
Reactive engineering: When database performance is poor, teams spend more time fixing issues instead of building new products.

Switching to an open-source system with integrated management tools, like Percona Monitoring and Management, can help the organization save valuable engineering time.

For example, if a 10-person engineering team spends 20% of their time on manual database maintenance, that’s like paying two full-time employees just to keep things running. Improving tools and support reduces this work and provides immediate operational ROI.

Data egress fees: The hidden variable cost

In cloud services, it’s usually free to get your data in, but costs can skyrocket when you want to get your data out. Many managed proprietary DBaaS platforms are designed to trap your data. They make it easy to scale up, but charge massive data egress fees if you want to move that data to a third-party analytics tool or a different cloud provider.

Open source databases, particularly when run on Kubernetes or self-managed infrastructure, give you full control over the data path. With that control, procurement and platform stakeholders can design data flows that reduce unnecessary cross-cloud transfers and help minimize egress fees.

Annualized ROIs Summary for procurement

When presenting the move to open source to the executive team, procurement should frame the benefits across the following financial pillars:

ROI Lever	Procurement outcome	Financial impact
Risk avoidance	Reduced downtime frequency and duration.	Lowered black swan event liability.
Spend control	Removal of license multipliers.	Predictable, linear cost growth.
Leverage	Multi-vendor support options.	Stronger renewal negotiating power.
Productivity	Reduced manual DB management.	Reclaiming expensive engineering hours.

Why open source aligns with procurement objectives

Modern procurement is about governance, compliance, and strategic alignment. Open source databases align with these goals better than proprietary ones:

No licensing premiums for scale or performance. A 10x increase in data does not mean 10x higher license fees.
Transparent, auditable cost structures. Procurement knows exactly what they are paying for.
Support can be competitively sourced. Institutions are not locked into a single software vendor.
Better compliance and security. Open code enables internal security reviews, transparency for auditors, and helps meet regulatory requirements.

Operating open source with procurement-grade assurance

A common objection to open source is that it’s unsupported. Procurement teams need operational assurance, confidence that open source environments meet the same regulatory, availability, and financial standards as proprietary systems.

When evaluating a support partner, procurement should require:

SLA clarity: Specific, contractually backed response and resolution times.
Multi-database coverage: One contract that covers MySQL, PostgreSQL, and MongoDB to reduce contract sprawl.
Regulated-environment experience: A partner who understands PCI-DSS, SOC2, and the high-compliance needs of finance.

Percona meets all of these criteria and offers procurement with a partner that turns technical operations into financial metrics.

Where Percona fits

Percona operationalizes open source databases for regulated, mission-critical environments. It delivers measurable outcomes across four strategic dimensions for procurement teams:

Independent, vendor-neutral support model

Percona provides technology-agnostic support across MySQL, PostgreSQL, MongoDB, MariaDB, and Valkey. It operates independently of cloud providers and database vendors, supporting on premises, cloud, and hybrid environments. This vendor neutrality ensures institutions maintain full control over technology choices without being locked into specific platforms or ecosystems.

Predictable support costs without licensing dependency

Percona’s pricing model decouples support costs from database licensing and creates transparent, forecastable expenses. While proprietary databases force organizations to pay escalating per-core or usage-based fees, Percona’s support subscriptions operate independently of infrastructure growth. For example, organizations like BBVA reduced licensing and support costs while simultaneously improving backup performance by 20% after migrating to Percona Server for MongoDB.

Proven experience supporting regulated financial systems

Percona supports regulated financial systems, including Fortune 500 companies and government agencies, and meets compliance standards such as HIPAA, PCI DSS, GDPR, and DORA EU.

Major financial services implementations include:

Merchant Warrior: Australia’s payments gateway relies on Percona for critical MySQL availability, supporting millions of transactions across 30,000+ customers.
MultiPay and Bukalapak: Financial services and e-commerce platforms leveraging Percona’s support to maintain high availability and optimize deployment performance.

Conclusion: Database performance as a spend control strategy

Databases have evolved from technical infrastructure into financial assets. Their uptime, performance, and flexibility influence costs, vendor leverage, and operational resilience. For procurement, buying databases is a strategic investment to control expenses and manage risks.

Organizations gain predictable costs, measurable ROI, vendor optionality, and long-term operational control by choosing open-source databases and partnering with Percona. These advantages compound over time, while proprietary systems often fall short.

Get started with Percona Operators and see how consistency, scale, and freedom come together.

The post How Procurement Leaders Realize ROI from Open Source Databases appeared first on Percona.

The post How Procurement Leaders Realize ROI from Open Source Databases appeared first on MariaDB.org.

ProxySQL joins MariaDB Foundation as Silver Sponsor

Tue, 26 May 2026 06:11:40 +0000

We are very pleased to welcome ProxySQL as a Silver Sponsor of the MariaDB Foundation.
ProxySQL is the leading proxy for MySQL and has recently focused on supporting more and more of MariaDB, both with the Proxy and with other open-source projects ProxySQL is stewarding, like dbdeployer and orchestrator. …

Continue reading “ProxySQL joins MariaDB Foundation as Silver Sponsor”

The post ProxySQL joins MariaDB Foundation as Silver Sponsor appeared first on MariaDB.org.

ProxySQL joins MariaDB Foundation as Silver Sponsor

Tue, 26 May 2026 06:10:00 +0000

Continue reading “ProxySQL joins MariaDB Foundation as Silver Sponsor”

The post ProxySQL joins MariaDB Foundation as Silver Sponsor appeared first on MariaDB.org.

ProxySQL joins MariaDB Foundation as Silver Sponsor

Tue, 26 May 2026 06:10:00 +0000

Continue reading “ProxySQL joins MariaDB Foundation as Silver Sponsor”

The post ProxySQL joins MariaDB Foundation as Silver Sponsor appeared first on MariaDB.org.

Drupal recommends MariaDB

Mon, 25 May 2026 11:29:10 +0000

MariaDB is now clearly listed as the recommended database in Drupal’s official documentation. Following community discussion on Drupal.org, the Database server requirements now lists MariaDB first, and identifies it as the recommended database for Drupal 10, 11, and 12. …

Continue reading “Drupal recommends MariaDB”

The post Drupal recommends MariaDB appeared first on MariaDB.org.

Running TidesDB as a MySQL 9.7 storage engine

Mon, 25 May 2026 10:51:55 +0000

tidesdb-mysql is an experimental build that was developed to verify how TidesDB, the LSM-tree key/value engine, can work with MySQL 9.7 as a storage engine. The current build is v0.2.4, and it’s an experiment, not a finished product. So you can use it in your tests if you also want to try TidesDB with MySQL and compare with MariaDB

Why we made it

There was already a way to use TidesDB from SQL. It’s TideSQL, which loads the engine into MariaDB as ha_tidesdb, and it works fine. But it doesn’t work with MySQL. So we wanted TidesDB to work with MySQL 9.7.

MariaDB and MySQL share a lot of history, but they are not the same. We couldn’t just recompile the MariaDB plugin against MySQL headers and call it done. The one thing that stayed put through all of it was TidesDB itself, doing exactly what it does anywhere else. Only the server wrapped around was changed. In result we got our implementation, so if you’re on MySQL, you no longer have to switch to MariaDB to give TidesDB a try.

What it actually is

tidesdb-mysql is a loadable plugin, ha_tidesdb.so. The engine gets built on its own and loaded into the server at runtime, the same shape as the MariaDB version. It speaks the MySQL handler API and wires MySQL tables and indexes onto TidesDB column families. After it loads, TidesDB sits right next to InnoDB in SHOW ENGINES and you choose it per table.

Getting started

All you need is Docker. Pull the image and start it:

docker pull perconalab/tidesdb-mysql:0.2.4

docker run -d --name tidesdb 
-e MYSQL_ROOT_PASSWORD=secret 
-p 3306:3306 
perconalab/tidesdb-mysql:0.2.4

The plugin is baked into this image and loaded on boot, so there’s no INSTALL PLUGIN step to remember. Confirm the engine is live:

docker exec tidesdb mysql -uroot -psecret 
-e "SELECT engine, support FROM information_schema.engines WHERE engine='TidesDB';"
# TidesDB | YES

Now make a table and treat it like any other:

CREATE DATABASE shop;
USE shop;

CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(64) NOT NULL,
price DECIMAL(10,2) NOT NULL,
KEY idx_price (price)
) ENGINE=TIDESDB;

INSERT INTO products (name, price) VALUES ('Widget', 9.99), ('Gadget', 24.50);
SELECT * FROM products WHERE price < 20;

Transactions, secondary indexes, the usual SQL, it all behaves:

START TRANSACTION;
UPDATE products SET price = price + 1 WHERE name = 'Widget';
COMMIT;

Per-table TidesDB options ride along in MySQL’s ENGINE_ATTRIBUTE JSON field. MySQL doesn’t have MariaDB’s COMPRESSION=… grammar, so the options are identical but you write them differently:

CREATE TABLE events (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
msg TEXT
) ENGINE=TIDESDB
ENGINE_ATTRIBUTE='{"compression":"ZSTD","bloom_filter":true}';

Compression accepts NONE, SNAPPY, LZ4, ZSTD, or LZ4_FAST. Server-wide knobs live in system variables such as tidesdb_default_compression, tidesdb_block_cache_size, tidesdb_compaction_threads, and tidesdb_flush_threads. The full list is in docs/build-and-load.md.

Prove the crash recovery

Write a handful of rows, kill the server with no clean shutdown, bring it back, and count what’s left:

# 1. Write rows inside a transaction and COMMIT.

docker exec -i tidesdb mysql -uroot -psecret <<'SQL'
CREATE DATABASE IF NOT EXISTS t;
CREATE TABLE IF NOT EXISTS t.kv (k INT PRIMARY KEY, v VARCHAR(32)) ENGINE=TIDESDB;
BEGIN;
INSERT INTO t.kv VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e');
COMMIT;
SELECT COUNT(*) AS before_crash FROM t.kv; -- 5
SQL

# 2. Hard-kill the server (no graceful shutdown) and restart it.

docker kill -s KILL tidesdb
docker start tidesdb
until docker exec tidesdb mysql -uroot -psecret -e 'SELECT 1' >/dev/null 2>&1; do sleep 2; done

# 3. The committed rows are still there.

docker exec tidesdb mysql -uroot -psecret 
-e "SELECT COUNT(*) AS after_crash FROM t.kv;" -- 5

after_crash should come back equal to before_crash.

A few more things to try

Compression is the one people ask about first, so here’s a table that leans on it. We generate a couple thousand rows of repetitive text, which is exactly the shape ZSTD likes:

CREATE TABLE logs (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
level VARCHAR(8) NOT NULL,
body TEXT,
KEY idx_level (level)
) ENGINE=TIDESDB
ENGINE_ATTRIBUTE='{"compression":"ZSTD","bloom_filter":true}';

INSERT INTO logs (level, body)
SELECT IF(RAND() < 0.2, 'warn', 'info'),
REPEAT('the quick brown fox jumps over the lazy dog ', 40)
FROM information_schema.columns
LIMIT 2000;

SELECT level, COUNT(*) AS rows FROM logs GROUP BY level;
SELECT id, LEFT(body, 30) AS preview FROM logs WHERE id = 1000;

The rows go in compressed and come back out as the original text, so queries don’t change at all. If you want to confirm the option actually landed on the table rather than being silently dropped, ask the server what it stored:

SHOW CREATE TABLE logsG
-- ENGINE=TIDESDB ... ENGINE_ATTRIBUTE='{"compression":"ZSTD","bloom_filter":true}'

The bloom filter from that same attribute is what keeps point lookups cheap once the data has compacted down into several on-disk files:

SELECT id, level FROM logs WHERE id = 1500;

A JSON column behaves the way you’d expect, including the ->> extraction operator:

CREATE TABLE kv (k VARCHAR(64) PRIMARY KEY, v JSON) ENGINE=TIDESDB;

INSERT INTO kv VALUES
('en', JSON_OBJECT('lang','English', 'msg','hello')),
('es', JSON_OBJECT('lang','Spanish', 'msg','hola')),
('fr', JSON_OBJECT('lang','French', 'msg','bonjour'));

SELECT k, v->>'$.lang' AS language, v->>'$.msg' AS greeting
FROM kv
ORDER BY k;

And the secondary index on products from earlier is a real index, not decoration. A range query uses it, and EXPLAIN will show idx_price in the key column:

SELECT name, price FROM products WHERE price BETWEEN 5 AND 20 ORDER BY price;

EXPLAIN SELECT name, price FROM products WHERE price BETWEEN 5 AND 20;

What works, and what doesn’t yet

Quite a bit works. The common column types are all there, primary keys single and composite, AUTO_INCREMENT, secondary indexes with index-condition pushdown, COMMIT/ROLLBACK, REPLACE and INSERT … ON DUPLICATE KEY UPDATE, online add/drop index, instant add column, full-text search, spatial indexes, per-row TTL, per-table compression and bloom filters, at-rest encryption, and mixed-engine transactions where a TidesDB table and an InnoDB table share one BEGIN … COMMIT. The functional test suite, which we lifted from TideSQL and then extended, passes 58 of 58 executed tests.

A few things you should know about before you lean on it:

Native partitioning and the MySQL 9 vector column type aren’t implemented. Those two test cases are skipped deliberately.
Atomic, crash-safe DDL (the data-dictionary integration) is wired up but we haven’t driven it end-to-end yet. Your data writes are crash-safe; schema changes during a crash are next on the list.
Replication, foreign keys, and nested savepoints aren’t in scope at the moment.

Treat v0.2.5 as a serious experiment. It’s solid enough that committed data rides through a crash, and it’s not something we’d point production traffic at yet.

Try it, then tell us

docker pull perconalab/tidesdb-mysql:0.2.5

That’s the whole setup. Spin up a table with ENGINE=TIDESDB, run the crash demo, and point your own SQL at it. The source, the build scripts, and the engine patches all live in the tidesdb-mysql repository, and the durability fixes are written up in KNOWN-ISSUES.md. This is a tool made by users for users, so if you give it a spin, we’d genuinely like to hear what held up and what fell over.

The post Running TidesDB as a MySQL 9.7 storage engine appeared first on Percona.

The post Running TidesDB as a MySQL 9.7 storage engine appeared first on MariaDB.org.

Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method

Mon, 25 May 2026 07:49:01 +0000

A Crunchy to Percona PostgreSQL migration is more straightforward than most cross-operator moves on Kubernetes, because the Percona PostgreSQL Operator is a hard fork of the Crunchy Data PostgreSQL Operator. Same Patroni HA, same pgBackRest backups, same overall CRD shape. This post walks through the safest of the three migration paths: a standby cluster method with near-zero downtime.

This is part 2 of a 3-part series on running PostgreSQL on Kubernetes with a fully open-source operator. Part 1 walked through the changing open-source landscape and announced the hard fork of the Crunchy Data PostgreSQL Operator into the fully independent Percona PostgreSQL Operator v3.0.0.

This post is the first practical playbook of the series. It covers the standby cluster method, the safest migration path when the downtime budget is tight. Part 3 will cover two simpler paths: backup-and-restore and persistent-volume reuse.

If you are landing here without context on why you might want to migrate at all, start with part 1. The rest of this post assumes you have already decided to move and want a tested playbook.

Migration approach in one paragraph

The Percona PostgreSQL Kubernetes Operator is a hard fork of the Crunchy Data PostgreSQL Kubernetes Operator, which simplifies the migration paths considerably: the same underlying tools (Patroni, pgBackRest, PgBouncer) and the same overall design are used in both operators. All three migration paths in this series are reversible: because Percona’s operator is fully open source and remains compatible with the same backup format, the move back to Crunchy is also possible if your team decides to walk it

A note on the storage layer

All examples in this guide use an in-cluster SeaweedFS instance as the pgBackRest S3 repository. SeaweedFS is Apache-2.0 licensed, actively maintained, and a clean drop-in replacement for the role MinIO used to fill in this stack. Any other S3-compatible storage works just as well: AWS S3, Google Cloud Storage (via HMAC keys), Ceph RadosGW, Cloudflare R2, and so on. For non-SeaweedFS endpoints, remove repo1-s3-uri-style: path and repo1-s3-verify-tls: “n” from the pgBackRest configuration and replace the endpoint with your provider’s URL.

What this series does NOT cover

To keep scope honest:

Application-side connection-string changes beyond updating to the new pgBouncer service. If your app uses connection-pool tuning, custom auth, or a service mesh, that work stays with you.
Schema-changing upgrades, major PostgreSQL version upgrades, or extension migrations. The PostgreSQL major version must match between the source and the target.
Crunchy enterprise-only features like TDE, Crunchy Postgres for Kubernetes-specific operators, or pgBackRest custom encryption. If your environment uses these, contact the Percona team for a tailored plan.
Operating two operators against the same namespace before the PGO hard fork. Use Percona PostgreSQL Operator v3.0.0 or higher.

Tested with

Component	Version
Crunchy Data PostgreSQL Kubernetes Operator	v5.8.x (tested on v5.8.7)
Percona PostgreSQL Kubernetes Operator	v3.x.x (tested on v3.0.0)
PostgreSQL	18 (must match between source and target)
Object storage	SeaweedFS (Apache-2.0), or any other S3-compatible service accessible from all cluster pods
Tools	kubectl, helm (v3), yq

Different versions may differ slightly in CR fields or behavior. Always consult the official documentation for the operator and PostgreSQL version you are running.

Migration using a standby cluster

This is the safest method when the downtime budget is tight. The Percona cluster is brought up as a standby of the Crunchy primary, catches up via pgBackRest plus streaming replication, and is promoted at cutover. The only downtime is the cutover step itself.

You can wire the standby in two ways, and combining both gives you maximum safety:

pgBackRest repo-based standby seeds the standby from the latest base backup and replays archived WAL
Streaming replication keeps the standby in sync with the live primary

Overview

Before you begin

Set the target namespace once. Every command in this guide reads from this variable, so you can change it in a single place:

export MIGRATION_NS=postgres-migration
kubectl create namespace $MIGRATION_NS

Deploy SeaweedFS

Skip this step if you already have an S3-compatible repository (AWS S3, GCS, Ceph). Update the endpoint and credentials in the YAML examples accordingly.

SeaweedFS provides an S3-compatible object store that runs inside Kubernetes. Both operators will use it as the shared pgBackRest WAL archive.

TLS is required. pgBackRest always connects to S3 endpoints over HTTPS, even when repo1-s3-verify-tls: “n” is set (that flag skips certificate verification, it does not fall back to HTTP). The steps below generate a self-signed certificate and pass it to SeaweedFS via Helm values.

# Generate a self-signed TLS certificate for SeaweedFS S3
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 
  -keyout /tmp/seaweedfs.key 
  -out /tmp/seaweedfs.crt 
  -subj "/CN=seaweedfs-all-in-one"

kubectl -n $MIGRATION_NS create secret tls seaweedfs-s3-tls 
  --cert=/tmp/seaweedfs.crt 
  --key=/tmp/seaweedfs.key

helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm
helm repo update

helm install seaweedfs seaweedfs/seaweedfs 
  --namespace $MIGRATION_NS 
  --version 4.23.0 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-standby/examples/seaweedfs-values.yaml 
  --wait

The Helm values file in the repo creates the pg-migration bucket on first start, so no separate aws s3 mb step is needed.

Step 0. Create pgBackRest secrets

Both operators need credentials to read and write the shared SeaweedFS bucket. Apply the secrets from examples/01-pgbackrest-secret.yaml after filling in your access key and secret key:

# Copy and edit the file first to set your credentials.

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-standby/examples/01-pgbackrest-secret.yaml

Both secrets contain the same SeaweedFS credentials (pgmigration / pgmigration123). For AWS S3, replace those with your IAM access key ID and secret access key.

Step 1. Start with your existing Crunchy Data cluster

If you already have a running Crunchy cluster, ensure its pgBackRest repo1 points at the shared bucket and path. The repo1-path value must be identical in both cluster specs. Mismatched paths will prevent the Percona standby from finding the WAL archive.

The Helm install below is shown only as a quick way to reproduce this blog post’s example. The migration steps in the rest of this post do not depend on how you deployed the source operator.

Optional: deploy a Crunchy operator to test the migration end to end:

helm install pgo 
  oci://registry.developers.crunchydata.com/crunchydata/pgo 
  -n $MIGRATION_NS 
  --version 5.8.7 
  --set singleNamespace=true 
  --wait

Apply examples/02-crunchy-source-cluster.yaml (or adapt your existing cluster’s pgBackRest config):

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-standby/examples/02-crunchy-source-cluster.yaml

The key pgBackRest settings in the example:

global:
  repo1-path: /crunchy-to-percona/repo1   # shared path, must match Percona side
  repo1-s3-uri-style: path                # required for path-style S3 endpoints (SeaweedFS, MinIO)
  repo1-s3-verify-tls: "n"                # skip TLS verification for self-signed cert; remove for AWS S3
repos:
  - name: repo1
    s3:
      bucket: pg-migration
      endpoint: seaweedfs-all-in-one.postgres-migration.svc.cluster.local:8443
      region: us-east-1

Wait for the cluster to be ready:

kubectl wait pod 
  --selector postgres-operator.crunchydata.com/cluster=crunchy-source,postgres-operator.crunchydata.com/data=postgres 
  --namespace $MIGRATION_NS 
  --for=condition=Ready 
  --timeout=300s

Step 2. Trigger a full backup on the Crunchy cluster

Wait for the pgBackRest stanza to be created:

kubectl wait postgrescluster/crunchy-source 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.pgbackrest.repos[0].stanzaCreated}'=true 
  --timeout=300s

Take a full backup before creating the Percona standby. This gives the standby a recent base to restore from, so it only needs to replay a small amount of WAL to catch up. This matches the realistic production migration pattern.

kubectl annotate postgrescluster crunchy-source 
  --namespace $MIGRATION_NS 
  postgres-operator.crunchydata.com/pgbackrest-backup="$(date +%s)"

Wait for the backup job to complete:

kubectl wait job 
  -l postgres-operator.crunchydata.com/pgbackrest-backup=manual,postgres-operator.crunchydata.com/cluster=crunchy-source 
  -n $MIGRATION_NS 
  --for=condition=Complete 
  --timeout=600s

Step 3. Copy TLS certificates (cross-namespace only)

If the Percona cluster is in a different namespace from the Crunchy cluster, copy the Crunchy TLS secrets to the Percona namespace. These allow mutual TLS authentication during streaming replication:

for secret in crunchy-source-cluster-cert crunchy-source-replication-cert; do
  kubectl get secret "${secret}" -n  -o json | 
    yq '{"apiVersion": .apiVersion, "kind": .kind, "data": .data,
         "metadata": {"name": .metadata.name}, "type": .type}' -o yaml | 
    kubectl -n $MIGRATION_NS apply -f -
done

If both clusters are in the same namespace, skip this step. The secrets are already accessible.

Step 4. Deploy the Percona PG Operator

The Crunchy PGO operator can stay in the same or a different namespace.

kubectl apply -n $MIGRATION_NS --server-side 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/tags/v3.0.0/deploy/bundle.yaml

Wait until the operator deployment is ready:

kubectl wait deployment percona-postgresql-operator 
  -n $MIGRATION_NS 
  --for=condition=Available 
  --timeout=120s

Step 5. Create the Percona cluster in standby mode

Note: The kubectl apply below pulls the CR manifest from the migration-from-crunchy-guide branch of the operator repo, which is the source for this guide’s examples. For production deployments, follow the official Percona Operator for PostgreSQL installation documentation and pin to a released version tag rather than a feature branch.

Apply examples/03-percona-standby-cluster.yaml:

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-standby/examples/03-percona-standby-cluster.yaml

The key settings that wire the Percona cluster to the Crunchy source:

standby:
  enabled: true
  repoName: repo1                             # restore initial base backup from this repo
  host: crunchy-source-ha.postgres-migration.svc.cluster.local
  port: 5432

secrets:
  customTLSSecret:
    name: crunchy-source-cluster-cert         # Crunchy CA for mutual TLS
  customReplicationTLSSecret:
    name: crunchy-source-replication-cert     # cert for _crunchyreplication user

The Percona operator will:

Restore the base backup from the SeaweedFS bucket.
Replay WAL from SeaweedFS until it catches up with the live Crunchy cluster.
Switch to streaming replication from crunchy-source-ha.

Wait for the cluster to reach the ready state:

kubectl wait perconapgcluster/percona-standby 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=ready 
  --timeout=600s

Verify that data is replicating to the standby:

STANDBY_POD=$(kubectl get pod -n $MIGRATION_NS 
  -l postgres-operator.crunchydata.com/cluster=percona-standby,postgres-operator.crunchydata.com/data=postgres 
  -o jsonpath='{.items[0].metadata.name}')

kubectl -n $MIGRATION_NS exec "${STANDBY_POD}" -c database -- 
  psql -t -c "SELECT pg_is_in_recovery(), pg_last_wal_replay_lsn();"

Expected output: t (in recovery) and a non-null LSN.

Step 6. Verify replication lag before cutover

Query the Crunchy primary to confirm the Percona standby has caught up:

CRUNCHY_PRIMARY=$(kubectl get pod 
  -l postgres-operator.crunchydata.com/cluster=crunchy-source,postgres-operator.crunchydata.com/role=master 
  -n $MIGRATION_NS 
  -o jsonpath='{.items[0].metadata.name}')

kubectl -n $MIGRATION_NS exec "${CRUNCHY_PRIMARY}" -c database -- 
  psql -c "
    SELECT
        client_addr,
        state,
        pg_wal_lsn_diff(sent_lsn, replay_lsn) AS byte_lag,
        write_lag,
        flush_lag,
        replay_lag
    FROM pg_stat_replication;
  "

Proceed to the next step only when write_lag and replay_lag are NULL or under a few seconds.

Step 7. Cutover the Crunchy cluster

This is the only step that causes downtime. Stop accepting writes on the application side, then patch the Crunchy cluster into standby mode. Patroni steps down and archives the final WAL.

kubectl patch postgrescluster crunchy-source 
  -n $MIGRATION_NS 
  --type=merge 
  -p '{"spec": {"standby": {"enabled": true, "repoName": "repo1"}}}'

Verify demotion (poll until pg_is_in_recovery() returns t):

kubectl -n $MIGRATION_NS exec "${CRUNCHY_PRIMARY}" -c database -- 
  psql -t -c "SELECT pg_is_in_recovery();"

Step 8. (Optional) Shut down the Crunchy cluster

Once the Percona standby has replayed all WAL, shut down the Crunchy cluster to prevent split-brain:

kubectl patch postgrescluster crunchy-source 
  -n $MIGRATION_NS 
  --type=merge 
  -p '{"spec": {"shutdown": true}}'

kubectl wait pod 
  -l postgres-operator.crunchydata.com/cluster=crunchy-source,postgres-operator.crunchydata.com/data=postgres 
  -n $MIGRATION_NS 
  --for=delete 
  --timeout=120s || true

Step 9. Promote the Percona cluster

Confirm that the Percona standby has finished replaying all WAL (the LSN stops advancing):

kubectl -n $MIGRATION_NS exec "${STANDBY_POD}" -c database -- 
  psql -t -c "SELECT pg_last_wal_replay_lsn();"

Run this a few times. When the LSN is stable, replay is complete.

kubectl patch perconapgcluster percona-standby 
  -n $MIGRATION_NS 
  --type=merge 
  -p '{"spec": {"standby": {"enabled": false}}}'

Wait for the cluster to become ready and confirm it is writable:

kubectl wait perconapgcluster/percona-standby 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=ready 
  --timeout=480s

PERCONA_PRIMARY=$(kubectl get pod -n $MIGRATION_NS 
  -l postgres-operator.crunchydata.com/cluster=percona-standby,postgres-operator.crunchydata.com/role=primary 
  -o jsonpath='{.items[0].metadata.name}')

kubectl -n $MIGRATION_NS exec "${PERCONA_PRIMARY}" -c database -- 
  psql -t -c "SELECT pg_is_in_recovery();"

Expected output: f (the cluster is now the primary and accepts writes).

Step 10. Verify stanza creation

kubectl wait perconapgcluster/percona-standby 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.pgbackrest.repos[0].stanzaCreated}'=true 
  --timeout=300s

Step 11. Take a post-migration backup

Apply examples/04-post-migration-backup.yaml:

kubectl apply -n $MIGRATION_NS 
  -f https://raw.githubusercontent.com/percona/percona-postgresql-operator/refs/heads/migration-from-crunchy-guide/e2e-tests/tests/migration-from-crunchy-standby/examples/04-post-migration-backup.yaml

kubectl wait perconapgbackup/post-migration-backup 
  -n $MIGRATION_NS 
  --for=jsonpath='{.status.state}'=Succeeded 
  --timeout=600s

This creates a clean recovery point on the new timeline. All future PITR restores will use this backup as their starting point, independent of the old Crunchy WAL archive.

Reconnecting your application

Update your application’s connection string to point at the Percona cluster’s pgBouncer service:

kubectl get service -n $MIGRATION_NS 
  -l postgres-operator.crunchydata.com/cluster=percona-standby,postgres-operator.crunchydata.com/role=pgbouncer

This migration path works almost entirely out of the box. For users coming from the Crunchy Data PostgreSQL Operator, this method feels familiar because it leverages the same standby/replica mechanisms used for HA and disaster recovery. The key difference is that you can now use this familiar mechanism to migrate safely to the Percona PostgreSQL Operator, a fully open-source alternative running on a fully open-source storage layer.

Rollback

The standby method is the most rollback-friendly of the three. Until you take the post-migration backup, the Crunchy cluster still holds the original timeline. To roll back:

Stop writes on the Percona side and patch the Percona cluster back into standby mode (spec.standby.enabled: true).
Patch the Crunchy cluster out of standby mode and let Patroni promote it.
Verify with pg_is_in_recovery() on both sides.
Switch the application connection string back to the Crunchy pgBouncer service.

After Step 11 (post-migration backup), the timelines have diverged. From that point, the rollback story is the same as a fresh restore, and you should treat the Crunchy cluster as a historical reference, not a live target.

Troubleshooting

Percona standby not connecting to the Crunchy primary. Verify the crunchy-source-ha service resolves from within the Percona pod:

kubectl -n $MIGRATION_NS exec "${STANDBY_POD}" -c database -- 
  bash -c "getent hosts crunchy-source-ha.${MIGRATION_NS}.svc.cluster.local"

Replication authentication errors. The Percona standby authenticates as the _crunchyreplication PostgreSQL user using the certificate in crunchy-source-replication-cert. Verify the secret exists and matches what the Crunchy operator generated:

kubectl get secret crunchy-source-replication-cert -n $MIGRATION_NS

pgBackRest restore fails. Confirm both secrets contain identical credentials and that repo1-path is the same in both cluster specs (/crunchy-to-percona/repo1 in this guide). Mismatched paths cause an archive.info missing error. Verify the bucket is reachable:

kubectl run -i --rm s3-check 
  --image=perconalab/awscli 
  --restart=Never 
  -n $MIGRATION_NS 
  -- bash -c "
    AWS_ACCESS_KEY_ID=pgmigration 
    AWS_SECRET_ACCESS_KEY=pgmigration123 
    AWS_DEFAULT_REGION=us-east-1 
    aws --endpoint-url https://seaweedfs-all-in-one.${MIGRATION_NS}.svc.cluster.local:8443 
        --no-verify-ssl 
        s3 ls s3://pg-migration
  "

Timeline history file (00000002.history) missing after promotion. This is a known issue with Crunchy PGO’s async archive mode. After promotion, push the history file synchronously:

kubectl -n $MIGRATION_NS exec "${PERCONA_PRIMARY}" -c database -- 
  bash -c "
    pgbackrest --stanza=db --no-archive-async 
      archive-push "${PGDATA}/pg_wal/00000002.history" || true
  "

What’s next

This was the safest migration path. Part 3 will cover two simpler options:

Backup and restore. The simplest path. You take a Crunchy pgBackRest backup and the Percona cluster bootstraps from it. Cutover is the time between the final backup and pointing the application at the new cluster.
Persistent volume reuse. For when you want to skip the data copy entirely. The Percona cluster takes over the existing PGDATA volume, no restore step required.

Pick the method that fits your downtime budget, data size, and storage layout.

This post covers basic deployment patterns and simplified configuration examples. If your environment is more complex, uses custom images, includes Crunchy enterprise features like TDE, or otherwise needs tailored migration steps, contact the Percona team and we will help you plan and execute the move.

Try It Out

Percona Operator for PostgreSQL docs: https://docs.percona.com/percona-operator-for-postgresql/latest/
GitHub: https://github.com/percona/percona-postgresql-operator
Public roadmap: https://github.com/orgs/percona/projects/10/views/6
Issue tracker: https://github.com/percona/percona-postgresql-operator/issues
Community Forum: https://forums.percona.com/c/postgresql/percona-kubernetes-operator-for-postgresql/68

The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method appeared first on Percona.

The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method appeared first on MariaDB.org.

Write for the Percona Community

Fri, 22 May 2026 11:00:00 +0000

You’ve fixed something gnarly in production this year. You’ve migrated a database that nobody wanted to touch. You’ve built something on top of Percona Operators, or Percona Toolkit, or Percona Monitoring and Management (PMM), and you’ve learned things along the way that aren’t written down anywhere yet.

Write it up. We’ll publish it, and we’ll pay you.

What we’re doing

The Percona Community Writers Program publishes technical posts from the people actually using these tools — DBAs, developers, contributors, and engineers running real workloads. Posts go up on percona.community/blog under your name, with your bio and links.

For every post we publish, you get:

$350 paid out after publication
Community engagement points you can redeem in our swag store for t-shirts, stickers, and other items

The points stack across contributions. The more you write, the more you collect.

A note on payment

Not everyone can accept payment for writing — employment contracts, tax situations, visa rules, and conflict-of-interest policies all get in the way. If that’s you, we’ll donate the same $350 to an open source project or community of your choice on your behalf. Tell us who to send it to when you pitch.

What we want to read

Anything you’ve done with the Percona stack — or alongside it — that another engineer would learn from. Some directions to consider:

Percona Operators — running databases on Kubernetes, scaling decisions, upgrade paths, what surprised you
Percona Toolkit — how you use specific tools in your day-to-day, scripts you’ve built around them, edge cases
Migrations — moving between versions, between database engines, on-premises to cloud, the parts that aren’t in the docs
Troubleshooting — a real incident, what you saw, what fixed it, what you’d do differently
Percona Monitoring and Management (PMM) — dashboards you’ve built, alerts that actually catch things, integrations
Databases themselves — MySQL, PostgreSQL, MongoDB, MariaDB, Valkey, anything in the open source database world you’re hands-on with

We’re not only interested in Percona-product posts. If you’re active in the wider open source database community — contributing to MySQL, PostgreSQL, Valkey, or anywhere else — we want to hear about that work too. Your projects, your perspective, your hard-won opinions.

Standards

Every submission is reviewed by the community team for technical accuracy and grammar before it goes live. We’re not gatekeeping — we’re making sure your name goes on something solid.

One firm rule: no AI-generated content. We run every submission through GPTZero and it has to come back clean. We’re publishing your voice and your experience, not a model’s summary of either. If you used AI to help draft, that’s fine — but the post needs to read as yours and pass the check.

How to start

Pitch us first. A couple of sentences on what you want to write about and why you’re the person to write it is enough. We’ll reply with feedback, a timeline, and any direction that helps you write a stronger post.

You don’t need to be a published writer. You need to have done something and be willing to explain how. A 900-word post about how you debugged a replication lag issue last quarter is more valuable than a 3,000-word survey of the database landscape.

Send pitches and questions to the Percona Community team — by filling in this form.

Open topics: blog, talks, guides

Not sure where to start? Here are some directions we’d love to see covered. Pick one, narrow it down to something you’ve actually done, and pitch us.

Databases

Automating database setup for production in under a few hours
Backup and disaster recovery strategies that hold up
Failure stories — what broke, what you learned

DevOps and reliability

Database Reliability Engineering (DBRE) in practice
Site Reliability Engineering (SRE) applied to databases
Monitoring and SLAs that mean something
Useful scripts you actually run in production
Testing and QA for database changes

Distributed computing

Consensus algorithms and real-world implementations
Synchronous vs asynchronous replication — trade-offs and where each fits

How-tos

Moving from a single node to a cluster (any DB engine)
Batch processing patterns
Stream processing patterns
Metrics that actually tell you something

Open source

Measuring your open source project’s success
Bug squashing done right
Licensing — what to know before you pick one
Vendor lock-in and how to spot it early

We pay engineers to share what they’ve learned. That’s the whole offer. If you’ve got something worth writing, write it.

The post Write for the Percona Community appeared first on MariaDB.org.

MySQL 9.7.0 PGO Benchmark Analysis

Fri, 22 May 2026 07:33:49 +0000

Overview

Servers Tested:

MySQL 9.7.0 (PGO-enabled build released by Oracle)
MySQL 9.7.0 Non-PGO (built without Profile-Guided Optimization — see BUILD.md)

Tier Configurations:

Tier 2G: 2GB InnoDB buffer pool
Tier 12G: 12GB InnoDB buffer pool
Tier 32G: 32GB InnoDB buffer pool

View Results

Interactive Reports

The benchmark reports are available as interactive HTML pages at:

https://percona-lab-results.github.io/2026-pgo/index.html

Performance Graphs

Tier 2G (2GB Buffer Pool):

Tier 12G (12GB Buffer Pool):

Tier 32G (32GB Buffer Pool):

Key Findings

Performance Impact of PGO

MySQL 9.7.0 with Profile-Guided Optimization (PGO) demonstrates measurable performance improvements over the non-PGO build:

Overall Performance Summary:

Average improvement: 6.5% across all configurations
Peak improvement: 14.3% (Tier 32G, 1 thread), gradually tapering to 10.3% at 512 threads as concurrency increases
Performance gains range from 0.5% to 14.3% in most scenarios
Minor regression (-3.1% at Tier 12G, 128 threads)

Performance by Buffer Pool Size:

Tier 2G (2GB buffer pool): Average improvement of 3.0%

– Best gains at 4 threads (5.5% improvement)

– Gains range from 0.5% to 5.5% across all thread counts

– Modest improvements with no regressions

Tier 12G (12GB buffer pool): Average improvement of 4.1%

– Best gains at 4 threads (8.6% improvement)

– Strong gains at low concurrency (1-4 threads: 7.3%-8.6%)

– Minor regression at 128 threads (-3.1%), neutral at 512 threads (-0.0%)

Tier 32G (32GB buffer pool): Average improvement of 12.2%

– Consistently strong gains across all thread counts (10.3% to 14.3%)

– Peak performance at lowest concurrency (1 thread: 14.3%)

– Maintains 11-12% improvement even at highest concurrency (128-512 threads)

Key Observations:

PGO provides the most significant benefits with larger buffer pools (32GB tier shows 12.2% average improvement)
Largest buffer pool configuration benefits from PGO across all concurrency levels with no regressions
Low to moderate concurrency (1-32 threads) shows best PGO gains across all tiers
Smaller buffer pools (2GB, 12GB) show more modest improvements and occasional regressions at very high thread counts
The performance improvements demonstrate PGO’s effectiveness in optimizing hot code paths, particularly when memory resources are abundant

InnoDB Metrics Analysis

Deep analysis of InnoDB metrics reveals the source of PGO’s performance improvements:

Root Cause: CPU-Level Optimizations

PGO improvements are NOT from I/O optimization, caching, or lock reduction
Buffer pool hit ratios remain virtually identical between PGO and non-PGO builds
Lock contention is minimal in both builds
All I/O metrics scale proportionally with increased throughput

What PGO Actually Optimizes:

✓ Better instruction cache utilization
✓ Improved branch prediction in hot code paths
✓ Optimized function inlining
✓ More efficient CPU instruction ordering

The metrics confirm that PGO’s 6.5% average improvement comes entirely from making the CPU more efficient at executing MySQL’s hot code paths, allowing it to process more transactions per second with the same hardware resources.

What is PGO?

Profile-Guided Optimization (PGO) is a compiler optimization technique that uses runtime profiling data to guide code optimization. The compiler first instruments the code, collects execution profiles during typical workload runs, and then recompiles the code with optimizations targeted at the most frequently executed code paths.

Benefits of PGO:

Improved branch prediction
Better instruction cache utilization
Optimized function inlining
Reduced code bloat
Better register allocation

Benchmark Methodology

Workload

Tool: Sysbench OLTP Read/Write benchmark
Tables: 20 tables
Table Size: 5,000,000 rows per table
Thread Counts: 1, 4, 16, 32, 64, 128, 256, 512

Configuration

Warmup:

– Read-only: 180 seconds

– Read-write: 600 seconds

Measurement Duration: 900 seconds (15 minutes) per thread count
Runs: Single run per configuration

System Metrics Collected

InnoDB storage engine metrics
MySQL status variables
MySQL system variables
System I/O statistics (iostat)
Virtual memory statistics (vmstat)
CPU statistics (mpstat)
System statistics (dstat)

Appendix

For Repository structure, Build steps and Technical details go to https://github.com/Percona-Lab-results/2026-pgo/blob/main/README.md#report-categories

The post MySQL 9.7.0 PGO Benchmark Analysis appeared first on Percona.

The post MySQL 9.7.0 PGO Benchmark Analysis appeared first on MariaDB.org.

Knowing when new open source database engine versions release on Amazon Aurora and Amazon RDS

Thu, 21 May 2026 18:36:46 +0000

If you’re running or considering Amazon Aurora with PostgreSQL or MySQL compatibility, you’ve likely wondered, “When will the latest community version be available on AWS?” The same question applies if you run Amazon Relational Database Service (Amazon RDS) for PostgreSQL, MySQL, or MariaDB. Whether you want the newest features quickly or prefer to standardize on stable long-term support (LTS) versions, our release timelines help you plan upgrades and maintenance cycles. In this post, we share the version currency timelines for Aurora and RDS open source engines. We also explain why timelines differ across engines and how you can use them to plan your upgrades.

Today, we are publishing version currency timelines for Aurora and RDS open source engines. The timelines apply to new major and minor versions going forward and define when you and your teams can expect new versions on AWS. With this predictability, you can plan maintenance windows, upgrade cycles, and Aurora LTS adoption for workloads that prioritize long-term stability.

Database engine	Release type	Timeline
RDS for PostgreSQL	Minor versions	Within 7 days of community release
RDS for PostgreSQL	Major versions	Within 30 days of the community `.1` release
RDS for MySQL	Minor versions	Within 30 days of community release
RDS for MySQL	Major versions	Within 6 months of community `.1` release (Oracle MySQL LTS majors)
RDS for MariaDB	Minor versions	Within 30 days of community release
RDS for MariaDB	Major versions	Within 3 months of community’s first patch release
Aurora PostgreSQL	Minor versions	Within 3 months of community release
	Major versions	Within 8 months of the community `.1` release
	Aurora LTS per major	Within 12 months of Aurora major GA
Aurora MySQL	Minor versions	Within 3 months of community release
	Major versions	Within 12 months of community `.1` release (Oracle MySQL LTS majors)
	Aurora LTS per major	Within 12 months of Aurora major GA

For the current schedule of upcoming and recently shipped versions, including specific version numbers and target dates, see the release calendar linked from each engine name in the table.

Why timelines differ across engines

The timelines differ by engine because the upstream development and integration models differ. PostgreSQL and MariaDB communities develop in the open, which lets us start validation early. MySQL commits are available closer to public releases. RDS runs the community engine on managed infrastructure, so after a community release passes validation it can ship quickly. Aurora adds a distributed storage layer, Global Database, and serverless capabilities underneath PostgreSQL- and MySQL-compatible engines. Every new version goes through additional validation to verify that those capabilities continue to function correctly. This is why Aurora timelines are longer than RDS timelines for the same engine. Aurora also offers Long-Term Support releases for multi-year stability on a single minor version.

How we choose major version starting points

For PostgreSQL, our first production release of a new major version is typically based on the community .1 release rather than .0. The .1 release generally arrives roughly three months after the initial major release. It incorporates the first round of bug fixes and security patches identified during early production deployments, which provides a more stable starting point for production workloads.

MariaDB follows a similar pattern. The published major version timelines are measured from the community’s first patch release for a new major version rather than the initial .0 release. This gives customers a more mature production baseline to target.

For MySQL, the major version timelines apply to Oracle MySQL LTS major releases and are measured from the corresponding .1 release. This aligns the timelines to the first patch release after the initial LTS major becomes generally available.

What this means for your upgrade planning

Published version currency timelines give you and your teams earlier visibility into release planning and upgrade scheduling. With RDS for PostgreSQL minor versions arriving within 7 days of community release, teams can stay current on security patches and bug fixes with relatively little operational planning. You can enable automatic minor version upgrades to receive patches during maintenance windows. You can also use AWS Organizations upgrade rollout policies to manage deployment sequencing across your development, test, and production environments, or apply upgrades manually based on your own operational processes.

Earlier visibility into major version timelines helps your teams adopt new database capabilities on your own schedule. Knowing when a new version is expected on Aurora or RDS gives teams more time to review release notes, validate application behavior, and prepare rollout plans ahead of adoption. With RDS Database Preview, you get early access to PostgreSQL and MySQL major versions in a non-production environment so you can test application compatibility in advance. With Blue/Green Deployments, you can validate changes before cutover and transition production traffic with minimal downtime.

With Aurora LTS releases, you can prioritize operational stability over rapid feature adoption. Your teams can remain on a stable minor baseline for multiple years while aligning major version upgrades with broader application and infrastructure roadmaps.

For workloads approaching or beyond community end-of-life timelines, Amazon RDS Extended Support gives you additional time to finish upgrades while continuing to receive critical security updates.

Learn more

For detailed upgrade procedures and release guidance, see the Amazon Aurora User Guide and the Amazon RDS User Guide.

About the authors

The post Knowing when new open source database engine versions release on Amazon Aurora and Amazon RDS appeared first on MariaDB.org.

AI-Assisted Production Database Ops with ClusterControl MCP and CCX MCP

Thu, 21 May 2026 10:22:01 +0000

In December, we introduced how Model Context Protocol could make ClusterControl easier to work with from AI assistants. Since then, Severalnines has expanded that MCP direction across its database operations platforms with ClusterControl MCP and CCX MCP.

The latest ClusterControl MCP is the major update, providing a more robust implementation with 69 tools and 20 MCP resources / templates for production database operations across ClusterControl-managed environments. CCX MCP is the companion MCP server for CCX, bringing AI-assisted workflows to managed cloud database operations.

Together, they give Severalnines users a practical way to inspect, troubleshoot, and act on database infrastructure from MCP-compatible clients such as Claude Desktop, Claude Code, OpenAI Codex, and other tools that support MCP.

What is new in ClusterControl MCP?

ClusterControl MCP 1.0 moves beyond the earlier MCP concept and provides broader coverage across daily database operations. You can ask questions such as:

“List all my database clusters and their status.”
“Show me the topology of cluster 2.”
“Are there any active alarms across all clusters?”
“What backup jobs have run on cluster 3?”
“Show me the top queries by wait time on cluster 1.”
“Are there any tables without primary keys?”
“Show me recent transaction deadlocks.”
“List the log files collected from cluster 1.”
“Who made changes to cluster 3 in the last hour?”

You can also prepare actions such as:

“Run a backup on cluster 1 right now.”
“Create a nightly backup schedule at 02:00.”
“Put db1.example.com into maintenance from 22:00 to 23:00 UTC.”
“Create a read-only database user for reporting.”
“Set max_connections to 500 on db1.example.com.”
“Restore backup #42 to cluster 1.”

Write operations use a dry-run-first model. The assistant describes what would happen before anything is executed, and high-risk operations include extra warnings.

Example: move from alarm to evidence faster

A common operational flow starts with a broad question:

“Are there any active alarms across all clusters?”

From there, you can drill down:

“Show me alarms for cluster 3.”
“Show me the CMON log for cluster 3 from the last hour.”
“Summarize the warnings by component and hostname.”

That is where the 1.0 implementation becomes useful. It is not just returning a static dashboard view. It can help you move across related operational data: alarms, jobs, CMON controller logs, database server logs, topology, backup history, maintenance windows, and audit events.

Example: inspect and manage backups conversationally

Backups are another area where ClusterControl MCP 1.0 adds practical coverage. You can ask:

“When was the last successful backup on my MongoDB cluster?”
“Show me only failed backups on cluster 1.”
“Does cluster 1 have a backup schedule configured?”

And then prepare a change:

“Create a nightly backup schedule at 02:00 on cluster 1 using xtrabackup.”

The assistant first returns a dry-run preview. Only after confirmation does it execute the change.

Installing ClusterControl MCP

ClusterControl MCP packages are published through the Severalnines repository alongside other ClusterControl components.

Debian / Ubuntu:

apt-get install clustercontrol-mcp

RHEL / Rocky / AlmaLinux:

yum install clustercontrol-mcp

The binary installs to:

/usr/bin/cmon-mcp

The package also installs:

/etc/systemd/system/cmon-mcp.service
/etc/default/cmon-mcp

Setting up ClusterControl MCP in stdio mode

First, we’ll start with stdio mode for when the AI client runs the MCP server locally, such as Claude Desktop or Claude Code.

Claude Desktop configuration:

{
 "mcpServers": {
   "clustercontrol": {
     "command": "cmon-mcp",
     "env": {
       "CMON_ENDPOINT": "https://your-cc-host:9501",
       "CMON_USERNAME": "admin",
       "CMON_PASSWORD": "your-password"
     }
   }
 }
}

Restart Claude Desktop. The hammer icon confirms that the MCP server loaded.

For Claude Code:

claude mcp add clustercontrol -- cmon-mcp 
 -endpoint https://your-cc-host:9501 
 -username admin 
 -password your-password

Setting up ClusterControl MCP in HTTP mode

Use HTTP mode for OpenAI Codex, team access, or multi-client access. Edit:

/etc/default/cmon-mcp

Example:

CMON_ENDPOINT=https://127.0.0.1:9501
CMON_USERNAME=admin
CMON_KEY_FILE=/etc/clustercontrol/id_rsa

MCP_BIND_ADDRESS=0.0.0.0:3000
MCP_BASE_URL=http://your-cc-host:3000
MCP_AUTH_TOKEN=

Generate a strong token:

openssl rand -hex 32

Restart the service:

systemctl restart cmon-mcp
journalctl -u cmon-mcp -n 20

Connect OpenAI Codex:

codex --mcp-server-uri http://your-cc-host:3000/mcp 
     --mcp-header "Authorization: Bearer "

Connect Claude Code over SSE:

claude mcp add clustercontrol --transport sse http://your-cc-host:3000/sse 
 --header "Authorization: Bearer "

Connect Claude Desktop over SSE:

{
 "mcpServers": {
   "clustercontrol": {
     "type": "sse",
     "url": "http://your-cc-host:3000/sse",
     "headers": {
       "Authorization": "Bearer "
     }
   }
 }
}

CCX MCP: AI-Assisted Workflows for CCX

As noted upfront, CCX MCP brings the MCP-based workflow to Severalnines users running managed cloud databases in CCX. It lets MCP-compatible AI clients interact with CCX datastores, cloud providers, plans, databases, users, firewall rules, backups, parameter groups, and performance data.

Typical prompts include:

“List my datastores.”
“Create a PostgreSQL cluster.”
“Get the connection string for my production database.”
“Add 10.0.0.0/24 as a trusted source.”
“Show me the slowest queries.”
“List available backups for this datastore.”

CCX MCP supports PostgreSQL, MySQL / Percona, MariaDB, Redis, Valkey, and Microsoft SQL Server. It also includes protection behavior for destructive operations, which are blocked by default unless protection is explicitly disabled.

Installing and setting up CCX MCP

CCX MCP can be installed from npm:

npm install @severalnines/ccx-mcp

Or used directly through npx in your MCP client configuration:

{
 "mcpServers": {
   "ccx": {
     "command": "npx",
     "args": ["-y", "@severalnines/ccx-mcp"],
     "env": {
       "CCX_BASE_URL": "https://app.myccx.io",
       "CCX_USERNAME": "your-email@example.com",
       "CCX_PASSWORD": "your-password"
     }
   }
 }
}

OAuth2 is also supported:

{
 "CCX_CLIENT_ID": "your-client-id",
 "CCX_CLIENT_SECRET": "your-client-secret"
}

Claude Code

For Claude Code, register the CCX MCP server in one command. No manual config file editing is required:

claude mcp add ccx -- npx -y @severalnines/ccx-mcp@latest 
 --endpoint https://app.myccx.io 
 --client-id  
 --client-secret

Create OAuth2 credentials in the CCX UI under Account > Security.

Then restart Claude Code, or run /mcp and reconnect. After that, you are ready to start using CCX MCP from your Claude Code session.

Wrapping up

The original ClusterControl MCP work showed how AI assistants could become useful in database operations when connected to the right operational context. The latest version makes that idea much more complete, providing a broader and safer operational interface across clusters, jobs, alarms, backups, logs, performance, users, maintenance, audit, and configuration. Go here for more information on how it works and detailed documentation.

For CCX and service provider users, CCX MCP provides the companion interface for managed cloud database operations. Together, they give database teams a practical way to use AI where it matters: inside real operational workflows, with context, control, and safety.

The post AI-Assisted Production Database Ops with ClusterControl MCP and CCX MCP appeared first on Severalnines.

The post AI-Assisted Production Database Ops with ClusterControl MCP and CCX MCP appeared first on MariaDB.org.

Vibe-coding an Audit Plugin in Under 3 Minutes

Thu, 21 May 2026 09:06:36 +0000

Who says developing MariaDB plugins is hard? I was able to produce one in under 3 minutes!
I of course did it by asking Grok nicely:
The produced result is actually very decent:
It even produced a Makefile:
And compilation instructions:
While I had Grok’s attention I’ve given it a follow up task:
Note the “please” …

Continue reading “Vibe-coding an Audit Plugin in Under 3 Minutes”

The post Vibe-coding an Audit Plugin in Under 3 Minutes appeared first on MariaDB.org.

Stop Paying for Air: Most Cloud Database Spend Is Wasted

Wed, 20 May 2026 15:58:06 +0000

This blog was originally published on SkySQL’s website. Structured databases are amongst the biggest line items in enterprise IT infrastructure – and a significant share of that spend is simply wasted. According to Harness’s FinOps in Focus 2025 report, an estimated 21% of enterprise cloud infrastructure spend – equivalent to $44.5 billion in 2025 – goes to…

Source

The post Stop Paying for Air: Most Cloud Database Spend Is Wasted appeared first on MariaDB.org.

CVE-2026-8053: “We don’t use time-series” is not a mitigation

Wed, 20 May 2026 15:13:41 +0000

TL;DR: A bug in MongoDB’s time-series collection code allows a user with the standard readWrite
role to corrupt memory within the mongod process. Best case: your database crashes, and you spend the night writing a postmortem. Worst case: an attacker is running their code as mongod, with the same access to your data that the database process itself has — every collection on that node, every index, every secret stored in it. The patch for Percona Server for MongoDB 7.0 is already available; 8.0 will be available tomorrow, and 6.0 will be available early next week.

Every time a bug like this lands, the same conversation plays out in incident channels across the industry. Are we affected? We don’t even use time-series collections! Heads nod. Everyone moves on.

That’s the mistake.

CVE-2026-8053 is an out-of-bounds memory write in MongoDB’s time-series collection — specifically in the internal mapping between measurement field names and column indexes. Under the right input, the mapping drifts out of sync with the underlying buffer and mongod writes off the end of an allocation. From there, under the right conditions, you can execute arbitrary code as the database process.

Upstream tracking lives at SERVER-126021. CVSS v3.1 puts it at 8.8. CVSS v4.0 puts it at 8.7. The labels say “High.” How that “High” translates into your week depends on a couple of assumptions worth questioning.

Read literally, the prerequisite is “an authenticated user with database write privileges.” Read operationally, that bar is lower than most teams treat it as.

The mitigation you think you have doesn’t exist

Modern stacks have dozens of service accounts, with secrets scattered across config files, pipelines, and laptops you’ve long forgotten about. Others end up in log files on bad days. And every user with write access to your cluster sits one step away from the vulnerable code path. In a world like that, “the attacker would need credentials first” isn’t a speed bump — it’s a shrug.

So the real question was never authenticated vs. unauthenticated. It’s what authentication unlocks. Here, it unlocks Remote Code Execution (RCE), which is exactly what the CVSS score is trying to tell you — even if the industry’s reaction hasn’t quite caught up. Attackers don’t need your time-series collection to already exist – they just need someone’s credentials in the wrong hands, and there are more ways for that to happen than most teams want to admit.

I’m not raising this to be smug. I’m raising it because too many incident channels keep stalling on the wrong question. It isn’t: “Does our app use time-series?” It’s: “What can a user holding our readWrite role actually do this week?”

Until you patch, the answer is more than you think.

What Percona is shipping, and when?

Percona Server for MongoDB 7.0.34-19 — May 20, 2026
Percona Server for MongoDB 8.0.23-10 — May 21, 2026
Percona Server for MongoDB 6.0.28-22 — May 25, 2026

6.0 is on the End-Of-Life (EOL) track. The easy call would be to point at the lifecycle page, note that the upgrade conversation is overdue, and stop there. We’re shipping the fix anyway. Customers running 6.0 in production have real reasons they haven’t migrated yet — frozen application stacks, certification cycles, dependencies that don’t move on quarterly cadences — and none of those reasons are worth exploiting while a migration plan gets approved.

Percona is not building binary packages for the 5.x line. We’re being upfront about that — the calculus on extended support has a limit, and 5.x is past it for us. But the fix itself is already in our public release branch: release-5.0.33-26. If you have a hard requirement on 5.x and the time pressure to meet it, the source is available for building. Percona customers on 5.x can open a ticket, and we’ll work on the case individually.

What to do this week?

Patch! Specifically:

If you’re on 7.0, upgrade to 7.0.34-19 from May 20 onward.
If you’re on 8.0, upgrade to 8.0.23-10 from May 21 onward.
If you’re on 6.0, upgrade to 6.0.28-22 from May 25 onward.
If you’re on 5.0 and you can’t move, build from release-5.0.33-26. Customers — open a ticket and we’ll help.

As usual, you can download patches from your package manager or Percona Software Downloads page.

If you’re running PSMDB on Kubernetes via the Percona Operator for MongoDB, edit the image tag in your PerconaServerMongoDB custom resource and let the operator roll the cluster. Don’t wait for the June operator release to do it for you. See details in our documentation on how to Upgrade Percona Server for MongoDB.

While you’re in there, audit your custom roles. Anything granting createCollection on a production database is, today, an RCE primitive in waiting. Decide whether the service accounts that hold it actually need it. Decide whether your application users need full readWrite or whether a narrower role would do the same job. Treat the answer as part of your security posture, not as a quarterly cleanup task you’ll get to.

Questions, sharp disagreements, or a 5.x build that won’t compile? Find us on the Percona Forum or, if you’re a customer, in your support portal. If you want to become one and ensure your databases run, check out Percona Services .

The post CVE-2026-8053: “We don’t use time-series” is not a mitigation appeared first on Percona.

The post CVE-2026-8053: “We don’t use time-series” is not a mitigation appeared first on MariaDB.org.

Not All Open Source Is Equal: Choosing a PostgreSQL Operator for Kubernetes in 2026

Tue, 19 May 2026 17:12:07 +0000

Choosing an open source PostgreSQL operator for Kubernetes used to be a question about features and community size. In 2026, it has become a question about licensing posture, image distribution, and whether the project you pick today will still be operationally open in three years.

This is part 1 of a 3-part series on running PostgreSQL on Kubernetes with a fully open-source operator.

Part 1 (this post): how the open-source landscape has shifted under your feet, and what to look for in an operator before you commit
Part 2: migrating from the Crunchy Data PostgreSQL Operator to the Percona PostgreSQL Operator using the standby cluster method (near-zero downtime)
Part 3: two simpler migration paths: backup-and-restore and persistent-volume reuse

In this post, you will learn about:

What has changed in the open-source landscape over the last few years, with specific examples
What licensing and redistribution actually mean for Kubernetes operators in production
How to evaluate whether a project is “open source in theory” or open source in practice
Where Percona’s PostgreSQL Operator fits in, and what the practical migration looks like

Open source isn’t what it used to be
The landscape of open source has undergone significant changes in recent years, and selecting the right operator and tooling for PostgreSQL clusters in Kubernetes has never been more important. Three recent shifts illustrate the pattern.

MinIO

MinIO was the default open-source S3-compatible storage backend for Kubernetes workloads for years. The trajectory over the last few years tells the story:

Switched its license to AGPLv3, with several enterprise features moved into a commercial-only edition
Entered what amounted to maintenance mode, narrowing community engagement, limiting support to paid subscriptions, and reducing acceptance of community contributions
On April 25, 2026, the github.com/minio/minio repository was archived by the project owner, ending public development of the open-source version

The code is still cloneable, but the project is no longer maintained as open source. Teams running MinIO in production now need an exit plan.

Bitnami images

Bitnami Docker images have long been a staple for databases (including Postgres), middleware, and developer tooling. In July 2025, Broadcom’s Tanzu Division announced Bitnami Secure Images and signalled the deprecation of the free public catalog. The concrete timeline that followed:

August 28, 2025: deprecation of non-hardened Debian-based images in the free tier began, and non-latest images started to be removed
September 29, 2025 (after community pushback): the public docker.io/bitnami catalog was reduced. The remaining free images were limited to a small curated set of latest-version, hardened images intended for development use; older versions of most applications were moved to a “Bitnami Legacy” repository
The full catalog and the hardened production images now require a paid Bitnami Secure Images subscription, reportedly priced in the tens of thousands of dollars per year per organization

For Kubernetes teams, the practical impact was immediate: any Helm chart that pinned a specific Bitnami image version (a recommended practice) found that image gone or moved, breaking CI pipelines and air-gapped deployments.

Crunchy Data PostgreSQL images

Crunchy Data illustrates the same dynamic in the Postgres operator space. To be clear: the Crunchy Data PostgreSQL Operator is a mature, well-engineered project, and the team behind it has done a lot of valuable work upstream and around pgBackRest and Patroni integrations. The point of this section is not the engineering, it is the redistribution and usage terms that govern the official builds.

Crunchy’s licensing shifts, 2022 to 2024

Between 2022 and 2024, several shifts occurred:

Redistribution restrictions. While the PostgreSQL code is open source, Crunchy’s official Docker images include branding and enterprise features that are not freely redistributable. The Crunchy Data Developer Program terms describe the software as intended for internal or personal use; production use by larger organizations typically requires an active support subscription.
Restrictions on consulting and resale. The terms explicitly prohibit using Crunchy’s images to deliver support or consulting services to others without an authorized agreement. The PostgreSQL source code remains open source, but the official images and their packaging are not freely redistributable, which limits practical use in commercial and customer-facing settings.
Registry move. Most images were moved to registry.developers.crunchydata.com, which requires authentication and acceptance of terms before pulling. That draws a clearer line between open-source code and proprietary builds.

In other words, the project is open source on the code side, but the practical artifacts (images, Helm releases) are gated.

What these restrictions really mean for Kubernetes users

When container images and operators come with redistribution limits, authentication requirements, or “internal-use-only” clauses, the impact on Kubernetes environments is immediate and concrete. Teams can no longer:

Build air-gapped clusters by mirroring images to a private registry without working through a license review
Rely on GitOps workflows that assume publicly accessible OCI images
Fork or customize the operator freely, because official images cannot be redistributed with modifications
Use the software in commercial or customer-facing products without additional licensing
Run multi-cluster or multi-tenant Postgres at scale without bumping into usage terms

For a database operator, where almost every operational pattern depends on the container images you can pull and run, these restrictions effectively turn a project into a “source-available but not operationally open” solution. The code is open. The operating story is not.
As a result, many teams are switching to fully open-source alternatives: the Percona Operator for PostgreSQL, CloudNativePG, Zalando Postgres Operator, StackGres, and a few others.

How to evaluate “open source” in 2026

The bigger picture here is that “open source” today often exists more in theory than in practice. It pays to look past the badge and check the operating reality. Three questions to ask before you commit to an operator:

1. Are the container images publicly redistributable?

If you cannot pull the official images without authentication, or you cannot mirror them to your private registry without a license review, your air-gapped and GitOps stories are constrained from day one. This is the question that turned out to be the most consequential one for MinIO, Bitnami, and Crunchy users in 2025.

2. Are core operational features in the open-source build, or behind a paywall?

Backup, monitoring, HA, and security features should be in the build everyone uses, not gated behind an enterprise tier. A “community edition” that omits the feature most teams actually need is a marketing build, not a real open-source build.

3. Is the governance and roadmap public?

A project where you can see the issues, the PRs, and the roadmap is one you can plan around. The Percona PG Operator’s public roadmap is an example of what this looks like in practice. A project run inside a vendor’s private tracker, by contrast, gives you no visibility.
These are not gotchas. They are the questions that decide whether a project will still serve you the same way in three years.

Migrate to freedom

Announcing the hard fork

We strongly believe in fully open-source software and want to increase our investment in the PostgreSQL and Kubernetes ecosystems. To back that up, we have decided to hard fork the Crunchy Data PostgreSQL Kubernetes Operator. Starting from version 3.0.0 (coming soon), the Percona PostgreSQL Kubernetes Operator is a fully independent project, with a public roadmap, public issue tracker, and freely redistributable images.

The hard fork is not a critique of Crunchy’s engineering. It is a commitment that the operator will keep evolving in a fully open-source direction, with no surprises about which features will be available to which audience.

Why migration is straightforward

Because the Percona PostgreSQL Operator is a hard fork of the Crunchy operator, the migration paths are surprisingly straightforward. The same underlying tools (Patroni, pgBackRest, PgBouncer) and the same overall design are used in both, which means migration can be done in multiple ways, sometimes with near-zero downtime, sometimes faster with a small downtime window. The next two posts in this series walk through three concrete options.

What’s next

This was the “why.” The next two posts are the “how”:

Part 2: Standby cluster migration. Bring up a Percona cluster as a standby of the Crunchy primary, catch it up via pgBackRest plus streaming replication, and promote it at cutover. The only downtime is the cutover itself.
Part 3: Backup-restore and PV reuse. Two simpler paths: bootstrap a Percona cluster directly from a Crunchy pgBackRest backup, or retain the existing PGDATA persistent volume and have Percona pick up where Crunchy left off.

Reversibility and exit options

All three paths are reversible: because Percona’s operator, images, and tooling are 100 percent open source and remain compatible with the same backup format and the same Patroni HA model, you keep full control. You can migrate back to Crunchy if your team decides to, or out to another open-source operator (CloudNativePG, Zalando, StackGres) using the same patterns. That last journey is a topic for a future article.

This series covers basic deployment patterns and simplified configuration examples. If your environment is more complex, uses custom images, includes Crunchy enterprise features like TDE, or otherwise needs tailored migration steps, contact the Percona team and we will help you plan and execute the move.

Try It Out

Percona Operator for PostgreSQL docs: https://docs.percona.com/percona-operator-for-postgresql/latest/
GitHub: https://github.com/percona/percona-postgresql-operator
Public roadmap: https://github.com/orgs/percona/projects/10/views/6
Issue tracker: https://github.com/percona/percona-postgresql-operator/issues
Community Forum: https://forums.percona.com/c/postgresql/percona-kubernetes-operator-for-postgresql/68

The post Not All Open Source Is Equal: Choosing a PostgreSQL Operator for Kubernetes in 2026 appeared first on Percona.

The post Not All Open Source Is Equal: Choosing a PostgreSQL Operator for Kubernetes in 2026 appeared first on MariaDB.org.

Backrest’s back, alright!

Tue, 19 May 2026 11:00:00 +0000

Events unfolded quickly over the course of a couple of weeks starting on 27 April 2026, when a message appeared on the pgBackRest project announcing:
that the repository would be archived and active maintenance would stop.

For many in the PostgreSQL ecosystem, this landed like a shock. pgBackRest is one of the most widely used backup and recovery tools for PostgreSQL, deeply embedded in production environments across enterprises large and small. Now it was suddenly described as “dead”, “EOL”, or “abandoned”. The trigger was clear: its long-time maintainer, after more than a decade of work, announced he could no longer continue without sustainable funding and would archive the repository.
i
That message spread fast. The interpretation spread even faster.

And it was wrong.

This wasn’t EOL

Open source software doesn’t simply “go end of life” in the way proprietary software does. There is no vendor switch flipped to OFF. No license revoked. No binaries disappearing overnight.

What actually happens is more subtle and more important:

Maintainers step away
Funding runs out
Work stops

That’s not EOL. That’s a sustainability gap.

pgBackRest didn’t die. It hit a problem seen too often in open source world: a critical piece of infrastructure maintained by fewer and fewer people, until it ultimately depended on one person being able to justify working on it full time.

The real problem

The message from the maintainer was not about abandoning the project. It was about reality:

maintaining a widely used tool requires time, and time requires funding

For years, pgBackRest was supported through corporate sponsorship from mainly one vendor. When that disappeared due to the Crunchy Data acquisition, so did the ability to keep investing the same level of effort.

This is the “Nebraska guy problem” in action: software used by a large part of the industry, sustained by a very small number of people.

Yes, anyone can fork the project (and some already did), but:

trust doesn’t fork
community doesn’t fork
sustainability definitely doesn’t fork

A fork without coordination creates fragmentation without adding real value and that weakens the ecosystem. What pgBackRest needed was not a replacement, but continuity.

The danger of bad framing

Calling the project “dead” shifted the conversation in the wrong direction.

Instead of asking:

how do we keep this project healthy?

the discussion drifted at best toward:

what is the strategic solution here?

and more often to:

what do we replace it with?

and

what do we name our fork?

That’s a natural reaction, but it’s not a good one.

Critical infrastructure should not be treated as disposable. Doing so erodes trust in the solutions we rely on and weakens the ecosystem. These foundational pieces should be treated as a shared responsibility so that the entire community becomes stronger.

What happened next

Behind the scenes, things moved quickly, with coordination between David and companies active in the PostgreSQL community.

Conversations started across companies, contributors and the wider ecosystem. The goal wasn’t to “rescue” pgBackRest, but to do something far more valuable: to restore a sustainable model around it.

This is what open source actually requires: not heroics, but coordination.

So what’s with pgBackRest?

It’s all good. Well, better.

The short version:

Multiple companies coordinated together to ensure continued funding and support around pgBackRest
Engineering effort is now being shared more broadly to expand the contributor and maintainer base
Discussions around longer term sustainability and governance in the PostgreSQL ecosystem accelerated significantly
Percona played an active role in coordinating these efforts, contributing engineering resources, and helping bring organizations together around a sustainable path forward

The project was never closed.

The way (forward) is open

pgBackRest’s situation is not unique. It’s a signal.

The PostgreSQL ecosystem depends on a wide range of tools that don’t have the same visibility, or funding, as the database itself. That gap is becoming harder to ignore.

There’s growing alignment on a few things:

sustainability needs to be intentional
funding needs to be easier to organize
engineering effort needs to be shared

Whether that leads to an umbrella foundation or another model, one thing is clear: the ecosystem needs structures that support both users and maintainers.

The post Backrest’s back, alright! appeared first on MariaDB.org.

MariaDB Community Server Q2 2026 maintenance releases

Mon, 18 May 2026 21:10:41 +0000

MariaDB is pleased to announce the immediate availability of MariaDB Community Server 11.8.7, 11.4.11, 10.11.17, and 10.6.26 maintenance releases. See the release notes and changelogs for additional details on each release and visit mariadb.com/downloads to download.

Source

The post MariaDB Community Server Q2 2026 maintenance releases appeared first on MariaDB.org.

Introducing Our First MariaDB Server Solution Stack: A Privacy-First Stack with Nextcloud, Passbolt, and MariaDB

Fri, 15 May 2026 14:27:49 +0000

MariaDB Foundation is pleased to announce the publication of our first MariaDB Server Solution Stack in the MariaDB Server Ecosystem Hub:
Privacy-First Stack: Nextcloud, Passbolt, and MariaDB Server
This stack brings together three open-source technologies with a shared purpose: helping organizations build collaboration infrastructure around privacy, control, and long-term digital sovereignty. …

Continue reading “Introducing Our First MariaDB Server Solution Stack: A Privacy-First Stack with Nextcloud, Passbolt, and MariaDB”

The post Introducing Our First MariaDB Server Solution Stack: A Privacy-First Stack with Nextcloud, Passbolt, and MariaDB appeared first on MariaDB.org.

MariaDB Community Server 10.6 Is Reaching End of Life – Here’s What to Do Next

Thu, 14 May 2026 17:21:52 +0000

If you’re running MariaDB Community Server 10.6, mark your calendar: July 6, 2026 is the official End of Life (EOL) date. After that date, there will no longer be security patches, bug fixes, or updates for this version. That’s not a distant concern – it’s a few weeks away. And if you haven’t started planning your next move, now is the time. This post walks you through what EOL actually…

Source

The post MariaDB Community Server 10.6 Is Reaching End of Life – Here’s What to Do Next appeared first on MariaDB.org.

Documented: The MariaDB Server (Community) Contribution Process

Mon, 11 May 2026 14:13:34 +0000

If you ever considered contributing code to the MariaDB server, you should know that this is an intricate process involving multiple steps and multiple actors. To help you see your contributions successfully merged into the MariaDB Server codebase I’ve compiled a comprehensive description of the contribution process itself, the roles involved into it, the sequence of actions and conditions for transition from one to another. …

Continue reading “Documented: The MariaDB Server (Community) Contribution Process”

The post Documented: The MariaDB Server (Community) Contribution Process appeared first on MariaDB.org.

Our Experience at MongoDB.local London 2026: The Era of AI Agents, Badges, and… Surviving on Chips!

Mon, 11 May 2026 00:00:00 +0000

On May 7th, Keith (Quality Engineer, Percona for MongoDB) and I had the super cool opportunity to head over to MongoDB.local London! The event was amazing and packed with insights about where the database ecosystem is heading.

If there was one massive takeaway from the day, it was this: We are officially in the Era of AI and “Agentic” Systems. During the event, the message was clear: we are shifting from basic LLMs (that just answer a prompt and forget it) to autonomous AI Agents that follow a continuous loop of Perception → Planning → Action. MongoDB’s President and CEO, CJ Desai, repeated a powerful phrase:

While AI models change rapidly, the Data Layer is the constant.

Here is a look at our day, what we learned, and the fun we had along the way!

Arriving Early and Chasing Badges

We got a great tip before the event: arrive early to get a head start on the gamified learning! MongoDB had a super nice setup where you could take tests on Credly to earn knowledge badges.

We jumped right in. I got a MongoDB Overview badge, but Keith was on a mission. He completed three different tests (including MongoDB for Developers) and unlocked some cool swag: a really cute, high-quality bag! It was a brilliant way to get attendees engaged right from the morning.

Here are more badges in case you want to get yours!

General Session Highlights

We spent a lot of our time in the main room for the General Session, and the announcements were packed with impressive numbers and tech:

MongoDB 8.3 is Fast: Osmar Olivo (Senior Director, Database Product Management) shared that the new version brings up to 35% more write throughput, 45% more read throughput, and 15% more for ACID transactions.
The Scale is Real: We learned that Stripe uses MongoDB to process over $1 trillion in payments volume every year (maintaining 5 nines of availability!). Osmar framed this perfectly: that is 1.5% of the global GDP running through MongoDB.
LangGraph.js Store Integration: This was a big one for developers. MongoDB is positioning itself as the “memory hard drive” for AI agents. By supporting JavaScript and TypeScript, they are making it super easy for companies to use their existing web developers to build complex AI workflows.
Hugging Face Partnership: They are scaling with MongoDB Atlas to support over 3 million models, officially tying themselves to the “GitHub for AI.”

Feel free to explore the recorded sessions for more: MongoDB.local London 2026

Guest Speakers

Ulku Rowe (CIO, Commercial Business at Lloyds Banking Group) talked about this being the “Decade of AI.” Lloyds is actively upskilling their current engineers through an internal “AI Academy” built in partnership with Cambridge University! She emphasized that as they build out this infrastructure, partnerships are absolutely critical to their success.

We also heard from Alex Holt from ElevenLabs, a company focused on producing the absolute best, human-sounding voice AI. Their scale is wild: they have 40 million agents running and hit $500 million in Annual Recurring Revenue in just 3 years! Alex mentioned that because many enterprises don’t know how to build agents yet, ElevenLabs uses “forward deployed engineers” to sit directly with customers to build, deploy, and prove the ROI of their voice agents.

The Hands-on Workshop and Our Lunch “Diet”

Later in the day, we attended a hands-on workshop: Designing Memory Systems for AI Agents, hands-on workshop about how AI agents can remember information and use it later to give better responses. We used Python and MongoDB Atlas to build memory into an AI agent and learned how to store, search, update, and manage that memory.
The setup was good, everything was prepared in advance so we could focus on executing the commands and truly understanding the concepts. At the end, we answered some questions and earned another badge!

However, the workshop ran until 1:00 PM. One of our friends had warned us to “go for food fast,” but we were too focused on the workshop! By the time we made it to the lunch area, all the main food was completely sold out.

How did we survive? Chips, candies, and a lot of beverages. Between the sodas, coffee, and tea, we kept our energy, but it was definitely a funny learning experience for next time! I can imagine Keith arriving home for dinner!!

We spent our afternoon speaking with sponsors and even got to participate in a quick podcast focusing on AI and how Atlas is being used as a strong platform for these projects!

The person being interviewed was Bikram Das, who is Chief Data Architect at Tata Consulting Services, and we had a great chat with him. TCS and MongoDB have partnered on a super impressive real-time payment and fraud detection platform. They use autonomous AI agents to instantly assess risk, investigate anomalies, and route safe transactions to networks like Visa and SWIFT without any downtime.

We also talked with IBM folks; they showed us their “plug-and-play” enterprise AI foundation. They are focused on letting large companies safely deploy AI agents without having to completely rip out and rebuild their current data infrastructure.

We also stopped by the Accenture booth! They are actively working on integrating AI directly into their platforms so they can offer smarter, more advanced solutions to their customers.

Wrapping Up!

To cap off a great day, MongoDB had one last treat. If you took less than 3 minutes to fill out the end-of-event survey, they handed you a super nice pair of socks. (We love community ideas like this!).

Overall, MongoDB.local London was a great experience. It was a nice space to learn, connect, have hands-on experience, and see exactly how the database world is evolving to meet the Agentic AI era head-on.

See you at the next event!

The post Our Experience at MongoDB.local London 2026: The Era of AI Agents, Badges, and… Surviving on Chips! appeared first on MariaDB.org.

Our Experience at MongoDB.local London 2026: The Era of AI Agents, Badges, and Surviving on Chips!

Mon, 11 May 2026 00:00:00 +0000

While AI models change rapidly, the Data Layer is the constant.

Here is a look at our day, what we learned, and the fun we had along the way!

Arriving Early and Chasing Badges

We got a great tip before the event: arrive early to get a head start on the gamified learning! MongoDB had a super nice setup where you could take tests on Credly to earn knowledge badges.

Here are more badges in case you want to get yours!

General Session Highlights

We spent a lot of our time in the main room for the General Session, and the announcements were packed with impressive numbers and tech:

MongoDB 8.3 is Fast: Osmar Olivo (Senior Director, Database Product Management) shared that the new version brings up to 35% more write throughput, 45% more read throughput, and 15% more for ACID transactions.
The Scale is Real: We learned that Stripe uses MongoDB to process over $1 trillion in payments volume every year (maintaining 5 nines of availability!). Osmar framed this perfectly: that is 1.5% of the global GDP running through MongoDB.
LangGraph.js Store Integration: This was a big one for developers. MongoDB is positioning itself as the “memory hard drive” for AI agents. By supporting JavaScript and TypeScript, they are making it super easy for companies to use their existing web developers to build complex AI workflows.
Hugging Face Partnership: They are scaling with MongoDB Atlas to support over 3 million models, officially tying themselves to the “GitHub for AI.”

Feel free to explore the recorded sessions for more: MongoDB.local London 2026

Guest Speakers

The Hands-on Workshop and Our Lunch “Diet”

We spent our afternoon speaking with sponsors and even got to participate in a quick podcast focusing on AI and how Atlas is being used as a strong platform for these projects!

We also stopped by the Accenture booth! They are actively working on integrating AI directly into their platforms so they can offer smarter, more advanced solutions to their customers.

Wrapping Up!

See you at the next event!

The post Our Experience at MongoDB.local London 2026: The Era of AI Agents, Badges, and Surviving on Chips! appeared first on MariaDB.org.

Unleashing Innovation Through Plugins

Fri, 08 May 2026 09:23:58 +0000

One of the corner stones in MariaDB Foundation’s mission is:

We strive to increase adoption by users and across use cases, platforms and means of deployment. …

Continue reading “Unleashing Innovation Through Plugins”

The post Unleashing Innovation Through Plugins appeared first on MariaDB.org.

Bringing pt-query-digest-Style Slow Query Analysis to PostgreSQL with pg_enhanced_query_logging

Fri, 08 May 2026 01:52:10 +0000

In this blog post, we are going to briefly discuss pg_enhanced_query_logging (PEQL for short), a PostgreSQL extension that produces slow query logs in the same format MySQL and Percona Server users have been feeding into pt-query-digest for years. The idea is simple: reuse the tried-and-true tools and concepts we have been using for performing full query audits with low performance hits. This tool was conceived and developed for the recent Percona Build with AI Competition.

A quick word of caution before we begin: PEQL is under active development and has not been validated for production use. We will use it in a development environment here, and you should do the same.

Why a new slow log for PostgreSQL?

Out of the box, PostgreSQL gives us log_min_duration_statement and a handful of related GUCs that print slow queries to the server log. That is useful, but the format is line-oriented and mixed in with everything else PostgreSQL writes there. On the MySQL side, the Percona Server extended slow query log goes much further: per-query counters, lock and I/O times, plan-quality flags, and a structured format that pt-query-digest can group by query fingerprint and rank by total time, average time, lock time, etc. This introduces the more powerful concept of performance of a family of queries, and not just individual query executions.

PEQL ports that same workflow to PostgreSQL. It hooks into the executor and planner, captures timing, buffer I/O, WAL, JIT and row-count metrics for every query slower than a configurable threshold, and writes them to a dedicated log file using a pt-query-digest-compatible format.

Motivation and benefits

The original idea behind this extension is doing query audits with minimal impact on the running server. We want to be able to ask “what queries will we benefit more from tuning?” without paying for it in latency, I/O or in a flood of unrelated log lines.

That goal drives most of the design decisions:

Statistically accurate sampling with low overhead. We don’t need to log every single query to draw useful conclusions. PEQL can sample 1 out of every N queries (or 1 out of every N sessions), and doing this for enough time will mean that we can have a sample that represents the overall workload for that time period. The cost on the producer side stays low even on busy servers.
pt-query-digest compatibility out of the box. The output format mirrors the MySQL/Percona Server slow log, so the same toolchain we already use for MySQL audits works for PostgreSQL with no extra steps.
Logging to a separate file. All entries go to a dedicated file (default peql-slow.log), not to PostgreSQL’s main error log. That keeps the error log clean for actual errors and lets us point the slow log at a separate mountpoint if we want to isolate its I/O from the rest of the server.
Rate limiting by both queries and bytes per second. On top of the per-session/per-query 1-in-N sampling, peql.rate_limit_auto_max_queries and peql.rate_limit_auto_max_bytes give us a cluster-wide cap on logged queries per second and on bytes written per second. Useful for guaranteeing that the slow log itself never becomes a performance issue.
Always-log override for slow outliers. Even when sampling is on, peql.rate_limit_always_log_duration lets us say “but always log anything that takes longer than X ms”. The common queries get randomly sampled; the long-running ones always get logged.
Extended resource usage metrics. Each entry includes buffer hit/read/dirtied/written counts (shared, local and temp), block I/O timings, WAL records/bytes/full-page images, JIT compilation timings, planning time, optional memory context allocations and an optional wait-event histogram.
Execution plans embedded in the entry. With peql.log_query_plan = on, the full EXPLAIN ANALYZE output (text or JSON) is appended to each entry, so the plan that produced the metrics is right there next to them when we are reviewing the log later.
Automatic pause when disk space is low. If the log mountpoint drops below a configurable free-space threshold, PEQL pauses logging on its own (with optional auto-purge of old rotated files) and resumes once there is room again. The database keeps serving traffic; the slow log gets out of the way.

Installing the extension

PEQL is a regular PGXS extension, to build it we can execute the following steps:

git clone https://github.com/guriandoro/pg_enhanced_query_logging.git
cd pg_enhanced_query_logging
make USE_PGXS=1
sudo make install USE_PGXS=1

This installs the shared library into $(pg_config --pkglibdir) and the SQL/control files into $(pg_config --sharedir)/extension/. The hooks live in the shared library, so we need to preload it. Add the following line to postgresql.conf (or edit your current value to include it):

shared_preload_libraries = 'pg_enhanced_query_logging'

Restart PostgreSQL, and then create the extension in any database where we want the SQL helper functions:

CREATE EXTENSION pg_enhanced_query_logging;

To easily test it, the repository ships a Docker-based quick start that builds Rocky Linux 9 + PostgreSQL 18 with the extension preloaded:

./test/deploy_docker_pg18_rhel.sh

A minimal configuration

For a first look, the easiest thing to do is to log every query at full verbosity:

shared_preload_libraries = 'pg_enhanced_query_logging'
peql.log_min_duration = 0 # log every query
peql.log_verbosity = 'full' # emit all metric lines

While we are at it, we can also silence PostgreSQL’s native query logging so we have a single place to look:

log_statement = 'none'
log_min_duration_statement = -1
log_duration = off

By default, PEQL writes to peql-slow.log inside PostgreSQL’s log_directory. The location and filename are configurable via peql.log_directory and peql.log_filename.

What an entry looks like

After running a few queries, opening peql-slow.log shows entries like this one (trimmed for brevity):

# Time: 2026-03-11T09:15:32.847291
# User@Host: app_user[app_user] @ 10.0.1.42 []
# Thread_id: 48712 Schema: mydb.public
# Query_id: -6432758210044805760
# Query_time: 1.285034 Lock_time: 0.000000 Rows_sent: 256 Rows_examined: 87500
# Shared_blks_hit: 4096 Shared_blks_read: 312 Shared_blks_dirtied: 0 Shared_blks_written: 0
# Temp_blks_read: 0 Temp_blks_written: 48
# Shared_blk_read_time: 0.024310 Shared_blk_write_time: 0.000000
# WAL_records: 0 WAL_bytes: 0 WAL_fpi: 0
# Plan_time: 0.003210
# Full_scan: Yes Temp_table: No Temp_table_on_disk: Yes Filesort: Yes Filesort_on_disk: No
# JIT_functions: 4 JIT_generation_time: 0.001250 JIT_emission_time: 0.003100
SET timestamp=1741680931;
SELECT o.id, o.total, c.name FROM orders o JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending' ORDER BY o.total DESC LIMIT 256;

The full breakdown of every field, with the GUCs that produce it, lives in doc/annotated-sample.md. This is a great place to start reading the documentation.

Feeding it to pt-query-digest

Because the format mirrors the MySQL slow log, we can point pt-query-digest at it directly:

pt-query-digest --type slowlog $(pg_config --logdir)/peql-slow.log

We get the familiar profile at the top (queries grouped by fingerprint, ranked by total time), followed by the per-query detail blocks. The plan-quality flags above can also be used as filters, for example to look only at queries that did a sequential scan:

pt-query-digest --type slowlog 
--filter '$event->{Full_scan} eq "Yes"' 
$(pg_config --logdir)/peql-slow.log

If you do not have pt-query-digest installed, the standalone script can be downloaded directly:

curl -LO https://percona.com/get/pt-query-digest
chmod +x pt-query-digest

Example pt-query-digest outputs will look like the following images.

Queries grouped by fingerprint, ranked by total time.

Per-query detail blocks.

A few useful knobs

Once we move beyond logging everything, there are a handful of GUCs worth knowing about:

peql.rate_limit: 1-in-N sampling, either per session or per query, with a peql.rate_limit_always_log_duration override so that very slow queries are always captured even when sampling is on.
peql.log_parameter_values: include actual bind parameter values for prepared statements alongside the placeholder query text.
peql.log_query_plan: embed the full EXPLAIN ANALYZE output (text or JSON) inside the log entry, so the plan that produced the metrics is right there next to them. This can be expensive in terms of I/O, so use sparingly and only if needed.

The full list, with default values and contexts, is documented in doc/configuration.md.

Future work

The pt-query-digest compatibility is a feature, but it’s also a constraint: the MySQL slow log format was designed to be human readable, which means it’s way too verbose. For instance, the plan-quality flags line only has 5 bits of actual information, but uses around 100 bytes to encode them:

# Full_scan: Yes Temp_table: No Temp_table_on_disk: No Filesort: Yes Filesort_on_disk: No

We can do this better by simply logging YNNYN or 10010 (hence the 5 bits of information mentioned above), and have the position within the query log entry make it self-explanatory as to what this information is.

This is a 20x amplification factor! And other lines suffer of similar issues… Multiply that by every query on a busy server and the overhead adds up quickly, both in disk space and in the I/O the backend has to do to write the entries out.

There are two pieces of follow-up work we have in mind to address this:

A PEQL-native log format. A more compact, structured format (think key-value pairs with short keys, bitfields for the boolean flags, or a binary framing for the numeric metrics) that drops the bytes-per-query cost without losing any of the information we currently emit. The verbose pt-query-digest-compatible format could still be available for users that want it; the native format would be the recommended option for high-throughput workloads.
Tooling for the new format. Once the native format exists, we will either contribute a parser to pt-query-digest so that it can ingest it natively (--type peql or similar), or ship a small companion tool that either post-processes them or produces the same kind of profile reports pt-query-digest does today. Either way, the goal is to keep the analysis workflow we are used to while removing the format-imposed overhead from the producer side.

If any of this sounds interesting and you would like to help shape it, the repository’s doc/contributing.md is the right place to start.

Conclusion

PostgreSQL has had rich per-query metrics available for a while now, but stitching them together into the kind of “show me the worst-performing family of queries from the last hour” workflow MySQL users have enjoyed for years has taken more effort. PEQL closes that gap by emitting a single, pt-query-digest-compatible log file with timing, buffer, WAL, JIT and plan-quality data attached to every query.

If you want to dig deeper, the doc/ directory in the repository has detailed pages on the output format, the architecture of the hooks, the rate limiter and the disk-space protection logic. And if you have not used pt-query-digest before, this is a great time to do it!

The post Bringing pt-query-digest-Style Slow Query Analysis to PostgreSQL with pg_enhanced_query_logging appeared first on Percona.

The post Bringing pt-query-digest-Style Slow Query Analysis to PostgreSQL with pg_enhanced_query_logging appeared first on MariaDB.org.

Meet the Percona Community team

Thu, 07 May 2026 09:00:00 +0000

We’ve just landed on X and Mastodon, and before the first real post goes out, we wanted to do something we don’t do often enough: introduce ourselves.

If you’ve been to Percona Live, a Percona.connect, a PGConf, KubeCon, FOSDEM, or pretty much any open source database event in the past few years, you’ve probably already met one of us. We’re the people behind the booth, on stage, organising the speakers, herding the giant Jenga set, or trying to convince you to play a quick game of chess between sessions. Now we’re also the people behind @PerconaCommunity on X and our new Mastodon account on the fediverse.

Each of us will sign our posts with our initials, so you’ll always know who you’re talking to. Here’s who we are.

Laura Czajkowski – Director of Community (LC)

Laura runs the team. She’s been in open source community work since the early 2000s, starting at the University of Limerick’s Skynet computer society and going on to lead community at Canonical (Ubuntu), MongoDB, Couchbase, Vonage, Solace, and Dragonfly before joining Percona. Former Ubuntu LoCo Council and Community Council member. Outside work she’s a Munster and Ireland rugby fan, runs a book club, plays tennis, and books regular trips to Disney World. Find her at laura.community.

Alastair Turner – Postgres Community Advocate (AT)

Alastair has been working with databases since 1995, settling on Postgres around 2002. If you’ve spoken to anyone at Percona about PostgreSQL, Kubernetes, Transparent Data Encryption, or extensions, there’s a good chance it was him. He writes regularly on the Percona Community blog and speaks at PGConf events across Europe and North America. He’s particularly interested in how open source communities work together – and what they can learn from each other.

Daniil Bazhenov – Senior Community Manager (DB)

Daniil organises our conference speakers, runs the Percona Forums, and has been a long-time contributor to the Percona Community blog. If you’ve ever submitted a talk to Percona Live or asked a question on forums.percona.com, you’ve crossed paths with him. He writes hands-on technical content too – GitOps with ArgoCD, PMM monitoring, Percona Everest from source – and hosts the Russian-language Percona Podcast.

Kyle Flanagan – Global Manager, Events (KF)

Kyle is the reason any of our events actually happen. He runs Percona’s global events programme, from Percona Live and Percona.connect to our presence at Open Source Summit, KubeCon, and dozens of regional events each year. Before Percona, he ran executive events at Utah Valley University. If you’ve grabbed a sticker at one of our booths, Kyle probably packed the box it came in.

Edith Puclla – Technology Evangelist (EP)

Originally from Peru, now based in London, Edith is a CNCF Ambassador, Docker Captain, and Data on Kubernetes Ambassador. Her background is in DevOps and infrastructure – Kubernetes, GPUs, Linux, distributed systems – and she contributes to translating Kubernetes documentation into Spanish through SIG-Operators. She’s a regular speaker at FOSDEM, KubeCon, Cloud Native Rejekts, and Percona University events across Latin America.

Why we’re doing this

We spend a lot of our time at events because that’s where the most useful conversations happen – the ones over coffee, at the booth, in the hallway between talks. Being on social gives us a way to keep those conversations going when we’re not in the same room. Expect event updates, contributor shout-outs, things we’ve found useful, and the occasional opinion. If we’ve shared it, we’ve actually read it.

Photo below was taken at our recent team offsite in Antalya – five people who genuinely like working together, in case the smiles don’t give it away.

Find us:

X: @PerconaBytes
Mastodon: @PerconaBytes
Forums: forums.percona.com
Community blog: percona.community

Come say hi. If we’re at an event near you, the booth is open – and so is the giant Jenga.

The post Meet the Percona Community team appeared first on MariaDB.org.

Adding a New Data Type to MariaDB with Type_handler – Part 5

Wed, 06 May 2026 13:51:44 +0000

We are concluding our series related to new data types using the Type_handler framework, with some limitations that are not yet covered by the framework:
It would have been handy for our MONEY datatype to have the possibility to define, for example, the currency to show. …

Continue reading “Adding a New Data Type to MariaDB with Type_handler – Part 5”

The post Adding a New Data Type to MariaDB with Type_handler – Part 5 appeared first on MariaDB.org.

PSMDB Sandbox: A Browser-Based UI for Deploying MongoDB with Terraform and Ansible

Tue, 05 May 2026 17:12:09 +0000

If you’ve ever wrestled with .tfvars files, juggled Ansible inventory paths, or tried to remember the exact command sequence for a MongoDB setup — this post is for you.

PSMDB Sandbox is a lightweight web frontend built in Go that ships inside the Percona MongoDB Automation repository. It puts a clean browser interface on top of the full Terraform + Ansible automation stack, so you can spin up, manage, and tear down MongoDB environments without ever touching a config file by hand.

This project was built using vibe coding — the result is a fully functional application developed rapidly without writing every line from scratch. It’s a great example of how AI-assisted development can accelerate tooling projects that would otherwise sit in the backlog forever.

Why a Web UI?

The mongo_terraform_ansible project already automates a lot: it can deploy Percona Server for MongoDB (PSMDB), Percona Backup for MongoDB (PBM), and Percona Monitoring and Management (PMM) across AWS, GCP, Azure, Docker, and Libvirt/KVM. That’s powerful — but the workflow traditionally meant editing .tfvars files, running commands in the right order, and tracking state in your head.

The Go UI changes that. It wraps the same Terraform and Ansible automation in a wizard-style interface, streams live output to your browser, and keeps track of environment state so you always know what’s running, stopped, or in progress.

It’s particularly useful as a testing sandbox for PSMDB features. You can quickly spin up a replica set or sharded cluster, test backup and restore workflows with PBM, explore audit logging, and observe everything through PMM monitoring — all from the browser, and all torn down just as easily when you’re done.

What You Can Configure

Cluster Topology

Define how many clusters and replica sets you want, the number of nodes per replica set, and whether to deploy a sharded cluster or a simple replica set. Each cluster is independently configurable.

PSMDB Version and Packages

Pick the exact Percona Server for MongoDB release you want to test — package identifiers are fetched automatically from the Percona repository listing on startup, so you’re always selecting from what’s genuinely available. For Docker-based environments, image tags are pulled live from Docker Hub and cached for five minutes.

Backup and Restore with PBM

Percona Backup for MongoDB (PBM) can be included in the deployment. PBM is configured with the native storage backend for the supported environments (e.g. an S3 bucket is automatically created for AWS). This makes the sandbox ideal for testing backup policies, point-in-time recovery, and restore scenarios without touching production.

PMM Monitoring

You can include a PMM Server in your environment so every PSMDB node is monitored from the moment it comes up. This makes it straightforward to test alerting rules, explore query analytics, or simply validate that your monitoring setup looks right before applying it elsewhere.

Live Deployment Logs

When you hit Deploy, the UI kicks off terraform init && terraform apply (plus Ansible playbooks for cloud platforms) in a background goroutine and streams the output directly to your browser via Server-Sent Events. No more tailing log files in a separate terminal.

Hosts & Connections Panel

After a successful deployment, the environment detail page shows every host (or container) with:

Its IP address
A ready-to-copy connect command (ssh user@host or docker exec -it bash)
MongoDB connection strings for every replica set and cluster
Clickable Open buttons for PMM and MinIO Console URLs

Stop, Restart, Reset, and Destroy

Full lifecycle management is available from the UI. For Docker environments, Stop and Restart call docker stop / docker restart filtered by the environment’s prefix. For cloud environments, the corresponding Ansible stop.yml and restart.yml playbooks run. Destroy calls terraform destroy and, on success, automatically cleans up the inventory and redirects you back to the environments list.

Getting Started

git clone https://github.com/percona/mongo_terraform_ansible.git
cd mongo_terraform_ansible/ui-go
go run .

Then open http://127.0.0.1:5001 in your browser.

If you prefer a compiled binary:

go build -o mongodeploy .
./mongodeploy

You can customize the bind address and port with environment variables:

Security note: The UI is designed for local use. It binds to 127.0.0.1 by default. Don’t expose it to the public internet without adding authentication.

Try It and Share Your Feedback

PSMDB Sandbox is a community-contributed tool. If you try it out, run into issues, or have ideas for improvements, open an issue or pull request on GitHub. The project is licensed under Apache 2.0.

Happy deploying!

The post PSMDB Sandbox: A Browser-Based UI for Deploying MongoDB with Terraform and Ansible appeared first on Percona.

The post PSMDB Sandbox: A Browser-Based UI for Deploying MongoDB with Terraform and Ansible appeared first on MariaDB.org.

Adding a New Data Type to MariaDB with Type_handler – Part 4

Tue, 05 May 2026 09:30:53 +0000

This is part 4 of a series related to extending MariaDB with a custom data type using the Type_handler framework.
You can find the previous articles below:
Overriding Existing Types
In the previous examples, our MONEY data type inherits from DOUBLE and then we override some methods. …

Continue reading “Adding a New Data Type to MariaDB with Type_handler – Part 4”

The post Adding a New Data Type to MariaDB with Type_handler – Part 4 appeared first on MariaDB.org.

How I Stopped Babysitting My Coding Agent (With Dotfiles)

Tue, 05 May 2026 00:00:00 +0000

Most developers at least try to use coding agents for development-related tasks, but babysitting LLMs and managing their permissions is no fun.
Completely skipping permission checks is a dangerous idea on your main machine, and setting up containers or VMs for sandboxing is a pain.
Can we do better?

The autonomy problem

If you work in software development, you have most certainly heard the phrase:

Let’s just use an LLM to solve it!

People tend to forget that it’s a bit more complicated than this:
anybody can easily use LLMs, of course, but using them properly is a different question.
Ideally, we could all just download a simple tool, give it some instructions, and relax:

Disclaimer: your employer might not approve if you do gardening during work hours; I suggest choosing a different activity in this case!

In all seriousness, every panel in the above image contains details that people tend to ignore, which either results in inefficient workflows or the creation of slop.

We can’t talk about all of them in one go; it would be overly long and complex.
I’ll only focus on panel 2:
what can we do to ensure our uninterrupted ~~gardening~~ normal work?

If you simply download Claude/Codex and start using the CLI tool, VS Code extension, or anything else, you’ll quickly get bored of all the babysitting.

Hey, user, can I execute another slightly different ls command?

Either you decide it isn’t worth the effort because of all the interruptions, or you start blindly hitting Enter: “of course I approve, it should be safe…”

Are you really thoroughly reviewing every command it throws at you?
Even that 100-line bash script the TUI doesn’t display properly, because it wouldn’t fit on the screen?
Have you ever seen an agent circumvent directory permissions by accessing the restricted files through a one-off script instead?

Fine-grained permissions of course exist, and in theory, you could try to configure something like that.
But let’s be honest, most of us won’t take the time, and we likely won’t notice if (3) happens as part of a long script.

Let the AI run free

That’s the point where you might discover the other option:
completely disabling the permission system and letting the AI do whatever it wants.

Nothing can go wrong, it’s only on your machine, right?

Except that:

it will also have full network access, both for reading and posting
it can read all your secrets: its own OAuth token, your SSH key, and so on…
do you load your SSH key into ssh-agent? That’s convenient so you don’t have to enter your password every time, but do you also have a hardware key you have to touch on every use, or can the AI force-push your repository and later say

You are absolutely right! I shouldn’t have done that. If you have backups you can restore them with the following steps: …

Or it might end up in any number of similar situations.
Coding agents aren’t malicious by design, but they can be subject to prompt injection from the web, or simply reach dumb conclusions.
There’s a good reason why Claude, for example, calls this option --dangerously-skip-permissions.

Put them in a cage!

The next obvious choice is to let them run free, but only within a cell:
run the agent inside a container or virtual machine, where it can only access what you let it.

This, however, costs us some convenience, as we face new issues:

If we completely separate the environment, we can’t access it from our main system.
Allowing AI tools to push to your repo without confirmation is a bad idea, but maybe you yourself should be able to push somehow?
Or to verify the changes in a more complex, outside environment?
Our environment and the AI’s environment are different… which means we have to set up both.
I hope your project is easy to bootstrap, with proper scripting so you don’t have to do this by hand.
But is your development environment also easy to bootstrap?

There are some existing, ready-to-use solutions: for example, both Claude Code and OpenAI Codex have support for devcontainers.
If you want an easy setup, these can be an option.

However, I wanted more:
to replicate my main setup exactly – the same compilers, tools, shell and editor settings, and so on.
The AI tools should have the same executables available.
If I have to edit or do something directly in the container, I shouldn’t be surprised by something working differently.

That’s when I remembered: I already have a dotfiles repo. Can I make it even better for this use case?

Automate all the things!

The idea of dotfiles is simple:
a repository where you store your configuration, so when you reinstall your system, or when you have to start using another one, you can quickly replicate your preferred settings.
Editors, shells, git – everything works the same, without spending hours figuring it all out again.

The problem is that it usually only focuses on configuring an already properly installed system.
When you only buy a new PC every few years, or system administrators already set up every server you have to use before your first login, this isn’t a big issue.

But when you want to be able to quickly set up and iterate with throwaway systems?
Then you need better automation!

This is also a solved problem; tools like Ansible and Puppet exist.

The idea is simple:

instead of manually setting up your system, use an automation tool to install and configure everything
you can leverage free CI services to make sure that your scripts work when run on a clean system
while docker/podman traditionally uses its own setup scripting, it is possible to build an image using the same automation tool instead
the result? Main PC, containers, virtual machines, and quick VPS instances all behaving exactly the same way!

The downside is, of course, that you either have to reinstall your main PC once your new setup is good enough, or accept that it will be slightly different until you do so.
I went with the reinstall; it’s easy once you have things working.

And if you don’t know any of these tools?
That’s the best part – we’re using AI, and AI knows them well.

A side note on architecture

The focus of this blog post is panel 2, not the others.
But I want to at least mention that the architecture and human review, including design review, are as important as with any other AI-driven software project.

If you completely vibe-code it and create an unmaintainable, sloppy dotfiles configuration, you are going to regret it later. This is your everyday work environment.

After the initial idea, when I started to think more about my requirements, I quickly realized that I want something generic.

First, I want to install a different set of packages depending on where I am installing them: containers, WSL instances, or real machines.
My laptop needs slightly different settings compared to my desktop.

Second, I want to be able to do this on multiple distributions.
Previously it was really annoying when I had to debug a distro-specific bug, unless it happened to involve one of my primary Linux distributions.
I am also using a different OS on my work laptop and personal desktop PC because of company requirements.

With a proper Ansible setup, I can make all of these work seamlessly, even autodetecting the environment, and verifying all important configurations on CI for every commit.

Your requirements will most likely be different.
Think about these beforehand and structure your repository accordingly!

Containers or virtual machines?

So far I mentioned both as alternatives, and both have their pros and cons.

Aspect	Container	Virtual machine
Resource overhead	Low	Higher
Spin-up time	Seconds	Minutes
Host integration (mounts, networks)	Easy, direct	Network only
Isolation from host	Partial	Strong
GUI / IDE support	Limited, terminal-friendly	Full desktop
Privileged tools (GDB, GPU)	Extra capabilities required	Native, inside the VM
Credential storage	Shares host’s filesystem	Must duplicate (SSH key, hardware key)

For now, I went with containers.
With a few helper scripts I can mount specific directories from the host OS, and I can also specify which docker/podman network the new container should join.
This lets me start up my docker-compose development clusters directly on my main OS, and lets the agent access the development/test database and other containers for its work using the shared network.

dcont run --mount `pwd` --network hackorum_default --context main-dev

I even have a context parameter, which lets me keep multiple independent AI configurations: different system-level CLAUDE.md, plugin set, hooks, and so on.
Underneath, this is just a few specific mounts and symlinks, but the advantage is huge:

I can quickly experiment without fearing that I’ll break my main workflows
I have completely separate setups for development and review work, without them conflicting with each other

The upsides are easy integration, lower resource overhead, and quicker spin-up.
I can mount directories directly from the host, and easily interact with docker containers running on the host.

The downside comes from that same integration:
everything is still on the host, and the more access you give to the container, the less secure it becomes.
Tools like GDB and GPU access require extra privileges, and you might have to relax SELinux features for the container.

The privilege problem, and the possibility of giving the container too much access, is a real risk.
Docker, which runs as root on the host, and rootless podman, which maps the container root to the current host user, behave very differently if something is misconfigured – but neither protects the data accessible to the running user.

You can tighten the defaults with flags like --cap-drop=ALL, --security-opt=no-new-privileges, and read-only mounts where possible, but these only narrow the attack surface; they don’t fix what you mount in.
Which means what you mount matters more than which runtime you pick.

Mounts and credentials

.env files, for example, can be challenging:
these can contain API keys, passwords, and other secrets required by the application, which ideally shouldn’t be accessible to the coding agent.
I started using two levels of them – one in the project folder with only generic data, and another one level above containing sensitive login information for external services.
This way, when I mount the project folder, the container can’t access the sensitive .env file.

There are also some special files to watch out for:
mounting /var/run/docker.sock into the container, for example, can break the sandbox completely, as it grants access equivalent to root on the host.

When to pick a VM instead

A container also isn’t a full-fledged desktop.
Personally, I am used to working in terminals; I like tools like tmux or neovim.
But if you prefer desktop applications and IDEs, a full virtual machine might be a better option.

Full virtual machines aren’t more difficult to set up and give you a complete GUI, but they raise a different question:
how do you set everything up without accidental credential leaks?

You either rely on network synchronization between your main OS and the virtual machine – pushing to remotes only from the main OS – or you give the virtual machine a hardware key and store your SSH key on it.

Agents can of course always access and leak their own API keys; we can’t do anything about that with 100% certainty.
But we can aim to reduce their ability to leak anything else, by minimizing what they physically have access to.

An example setup

You can check out my dotfiles for inspiration.
It should be only that:
something you can look at while designing your own version.

It is designed for my workflows, and yours are most likely different.
You also shouldn’t blindly trust a script somebody else’s LLM generated.

The helper script

The repository has a readme; the most interesting part is probably the script I mentioned earlier, which builds and runs the containers.

It is long and complex, and deals with additional details I didn’t even mention here, to keep this introduction from getting too involved.

The basic idea, however, is easy to summarize.
A basic docker command is simple:

docker run -it ubuntu:latest /bin/bash

But it also gets complicated quickly:

what folders need mounting? (the project, specific directories for tools)
which networks to join?
do we have to set up specific hardware, like a GPU?
do we need specific access permissions for some software?
and so on

The command quickly becomes longer and longer, and copy-pasting it from notes or shell history isn’t fun.
It is also most likely project-specific.
You want different mounts, different contexts for AIs, different specific permissions.
All this should be configurable, and still simple.

In my case, most of my projects also have .env files set up, which makes it a no-brainer to also support configuration through environment variables.

Most of the time, all I have to do is cd into the project directory and execute dcont without any extra parameters. That starts up a ready-to-use, project-specific setup, and I can immediately start typing instructions to Claude.

The Ansible part

I already mentioned this before, but didn’t go into the details:
you can build docker or podman images with Ansible.

Normally this isn’t that useful:
if the only goal is a container cluster, a Containerfile is much easier to use, and more efficient for rebuilding, since it automatically detects which layers have to be rebuilt.

If the goal, however, is to replicate the same setup on a real host and in a container, the picture is different.
These images are only meant for local use, so layering and image size don’t matter – we’ll never upload them.

Build times are also secondary.
Even if a rebuild is needed once or twice a day, you can continue using the previous version in the meantime and switch later.
And it’s not like we can’t do proper incremental builds with it; Ansible supports that too – it’s just a bit slower than how containers normally do it.

This is included in the same script, and the solution is surprisingly simple:

start a container with sleep infinity
copy the dotfiles repo into it, since it’s already checked out on the host
run the same dotfiles/Ansible script as on other hosts (with proper parameters)
set up a proper user
commit the image

The same could be done using a Containerfile, but what’s the advantage?
The image isn’t shareable or reusable anyway, and some operations are easier to implement directly in bash.
This process also leaves open the possibility of doing incremental builds, instead of always rerunning the installation script from scratch.

The unsaid part: network access

In all of the sandboxing discussion above, I quietly ignored the question of network access:
if you give unrestricted network access to an LLM agent, you can have a bad time.

Prompt injection exists, even if AI companies try to make it harder and harder.

For most use cases, a complete network ban is also a bad idea for productivity and code quality, which makes this another complex, open-ended question with its own options and tradeoffs – out of scope for this already long blog post.

I hope the information I provided here was useful, and that you can improve your AI setup based on it!

The post How I Stopped Babysitting My Coding Agent (With Dotfiles) appeared first on MariaDB.org.

Long live to dbdeployer!

Mon, 04 May 2026 14:22:15 +0000

As you know, MySQL-Sandbox and then dbdeployer have always been part of the Swiss Army knife for DBAs trying to evaluate, test, or reproduce issues with a certain version of their database. …

Continue reading “Long live to dbdeployer!”

The post Long live to dbdeployer! appeared first on MariaDB.org.

I Know Kung Fu

Mon, 04 May 2026 12:43:46 +0000

You might find this hard to believe, but AI has become kind of a thing around here.

Bennie published a post on our Build with AI competition last week, in which he shared that I was lucky enough to land the second place prize. Genuinely flattered, and a real thank you to Peter F, PZ, Vadim, and Bennie for organizing it. The recognition is great. But the part that does not quite come through in the recap is what those six weeks actually felt like from the inside. Forty-plus submissions, 10+ teams, marathon demo sessions that ran out of time twice over, and a constant drumbeat of ideas where every fifth one made me think “wait, we can just… ship that?”

Two submissions that really impressed me (and are worthy of high praise): Kedar Vaijanapurkar shipped a four-tool MySQL stack (Advisor, random data generator, CleanPrompt, and a Query Reviewer), any one of which on its own would have been a strong submission. And Daniil built a leaderboard for Percona ecosystem contributors plus a vector-search prototype running on Percona’s own products, which is exactly the dogfood story we want.

There were a lot more than three projects worth backing, which is part of why a second contest round is being coordinated later this year. A lot of the entries are not waiting for it either – they are already developing into real, operational utilities (some of mine included).

The two submissions of my own that I would point to first are IBEX and percona-dk.

IBEX (Integration Bridge for EXtended systems) is a local MCP multi-tool server that connects either a local model or a Percona-owned LLM to the systems where the most valuable context actually lives. Slack, Notion, Jira, ServiceNow, Salesforce, etc. A solution was needed here since we could not point the standard Claude or ChatGPT connectors at our sensitive internal data, and obviously most of the context that makes LLMs so valuable is precisely that kind of data.

percona-dk is the other one. It started as a way to keep AI honest about our own products by giving the AI tools our teams use (Claude, Cursor, anything that speaks MCP) direct access to Percona’s documentation, so the answer to a question about our products comes from real docs with linked citations instead of stale training data or even scraped web results that can get things wrong. It has evolved a fair bit since the contest. The Percona Community blog and forums are now indexed alongside the docs, Perconians are getting real day-to-day value out of it, and it is starting to look like the kind of thing that could grow into a community utility (perhaps even beyond Percona docs).

Those two were just the start. Once IBEX worked, I needed shared memory across LLMs, so I built that. Once I had three MCP servers running, the boilerplate got annoying, so I built CAIRN, a scaffolding tool that builds on Anthropic’s official MCP builder skill. The official skill walks you through writing a server step by step, but CAIRN spins up a complete, working project in minutes with a streamlined install wizard for non-technical users. It is now in the hands of other Perconians building their own MCP tools, and providing real value of its own. Then I learned about .mcpb files and Desktop Extensions (.dxt), packaged everything that way, and stood up an internal Claude plugin marketplace so any Perconian can install the lot from one place. Each layer opened a door I did not know existed until I was already through it. Some of those doors seemingly materialized from thin air as they magically aligned with new releases from Anthropic.

What started as a competition entry is now a small internal ecosystem. I am still a product person, not a software engineer. I am not going to pretend any of the code is pristine, and a lot of it was vibe-coded with Claude as a partner. But the architecture holds together, it works, and most of it is in daily use by people who are not me. That last part is the bit I am most proud of.

The next batch is pointed squarely at product operations. Making customer signals legible. Making internal telemetry something any teammate can talk to in plain English. The early returns are promising, and what gets me most excited is not the tech itself, it is watching people across Product, Engineering, and Support pull in the same direction with an AI colleague in the room. Turns out the interesting part of AI at work is not the model. It is the connective tissue.

I know Kung Fu

For a product guy who does not code for a living, this era is my “I know kung fu” moment. Not because I suddenly learned to fight. Because the move set I already had – product judgment, systems thinking, customer empathy, the ability to spec a thing precisely – just got a massive upgrade. The gap between “that would be useful” and “that exists now” is short enough to cross in an evening. I do not see it getting longer again.

Thanks for reading this far. If you want more detail or want to try anything not linked here, ping me. I am happy to share more.

The post I Know Kung Fu appeared first on Percona.

The post I Know Kung Fu appeared first on MariaDB.org.

Adding a New Data Type to MariaDB with Type_handler – Part 3

Mon, 04 May 2026 06:27:16 +0000

In the previous article, we wrote, compiled, and tested our first custom data type for MariaDB using the Type_handler framework.
But currently, aside from allowing the use of its new name (MONEY) and listing it in the metadata, our new data type behaves exactly like a DOUBLE, the class it inherits from. …

Continue reading “Adding a New Data Type to MariaDB with Type_handler – Part 3”

The post Adding a New Data Type to MariaDB with Type_handler – Part 3 appeared first on MariaDB.org.

Curious case of PXC node that refused to start due to SSL

Mon, 04 May 2026 05:45:15 +0000

In this blog, I am going to share a real-world debugging case study where a routine Percona XtraDB Cluster node restart led to an unexpected failure. I will walk through what we observed, what we checked, and how we ultimately identified the root cause.

Let’s see how the maintenance goes. It was supposed to be a simple restart. The kind you’ve done a hundred times. You SSH in, run the maintenance, bring the node back up, and go grab a coffee. Except this time, the coffee went cold on the desk… because MySQL refused to start.

The Problem

The error log of Percona XtraDB Cluster (8.0) had the following information:

2025-11-05T05:26:10.982984Z 0 [ERROR] [MY-000059]   [Server] SSL error: Unable to get certificate from '/var/lib/mysql/server-cert.pem'.
2025-11-05T05:26:10.983030Z 0 [Warning] [MY-013595] [Server] Failed to initialize TLS for channel: mysql_main. See below for the description of exact issue.
2025-11-05T05:26:10.983045Z 0 [Warning] [MY-010069] [Server] Failed to set up SSL because of the following SSL library error: Unable to get certificate
2025-11-05T05:26:10.983052Z 0 [Note] [MY-000000] [WSREP] New joining cluster node configured to use specified SSL artifacts
2025-11-05T05:26:10.983083Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib64/galera4/libgalera_smm.so initial position: 07c67757-0d18-11ef-b5a9-ee5d87b39aa8:4147053897
2025-11-05T05:26:10.983098Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib64/galera4/libgalera_smm.so'
2025-11-05T05:26:10.983742Z 0 [Note] [MY-000000] [Galera] wsrep_load(): Galera 4.22(f6c0465) by Codership Oy  (modified by Percona ) loaded successfully.
2025-11-05T05:26:10.983771Z 0 [Note] [MY-000000] [Galera] Resolved symbol 'wsrep_node_isolation_mode_set_v1'
2025-11-05T05:26:10.983784Z 0 [Note] [MY-000000] [Galera] Resolved symbol 'wsrep_certify_v1'
2025-11-05T05:26:10.983807Z 0 [Note] [MY-000000] [Galera] CRC-32C: using 64-bit x86 acceleration.
2025-11-05T05:26:10.983995Z 0 [Note] [MY-000000] [Galera] not using SSL compression
2025-11-05T05:26:10.984341Z 0 [ERROR] [MY-000000] [Galera] Bad value '/var/lib/mysql/server-cert.pem' for SSL parameter 'socket.ssl_cert': 336245135: 'error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small'
         at /mnt/jenkins/workspace/pxc80-autobuild-RELEASE/test/rpmbuild/BUILD/Percona-XtraDB-Cluster-8.0.42/percona-xtradb-cluster-galera/galerautils/src/gu_asio.cpp:ssl_prepare_context():471
2025-11-05T05:26:10.984401Z 0 [ERROR] [MY-000000] [Galera] Failed to create a new provider '/usr/lib64/galera4/libgalera_smm.so' with options 'gcache.size=1G;gcache.recover=yes;socket.ssl=yes;socket.ssl_ca=/data00/mysqldata/ca.pem;socket.ssl_cert=/data00/mysqldata/server-cert.pem;socket.ssl_key=/data00/mysqldata/server-key.pem;socket.ssl_key=/var/lib/mysql/server-key.pem;socket.ssl_ca=/var/lib/mysql/ca.pem;socket.ssl_cert=/var/lib/mysql/server-cert.pem': Failed to initialize wsrep provider
2025-11-05T05:26:10.984434Z 0 [ERROR] [MY-000000] [WSREP] Failed to load provider
2025-11-05T05:26:10.984448Z 0 [ERROR] [MY-010119] [Server] Aborting
2025-11-05T05:26:10.984602Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.42-33.1)  Percona XtraDB Cluster (GPL), Release rel33, Revision 6673f8e, WSREP version 26.1.4.3.
2025-11-05T05:26:10.985473Z 0 [ERROR] [MY-010065] [Server] Failed to shutdown components infrastructure.

MySQL was down, and the maintenance clock was running. The certificate file sitting at /var/lib/mysql/server-cert.pem was the same file that had been working perfectly fine before the restart!!
From past history, it was known that the following commands were executed correctly on the same cluster node

SET GLOBAL ssl_ca = '/var/lib/mysql/ca.pem';
  SET GLOBAL ssl_cert = '/var/lib/mysql/server-cert.pem';
  SET GLOBAL ssl_key = '/var/lib/mysql/server-key.pem';
  ALTER INSTANCE RELOAD TLS;

Clients connected over TLS. Galera nodes communicated securely. There were zero complaints from the error log.
In other words, the SSL reload at runtime inherited the process environment that existed when MySQL originally booted. Everything was smooth, but after a restart? MySQL complains and declines to start. So what has changed?

Checking Usual Suspects

File permissions

We checked the PEM files.

Ownership: mysql:mysql.
Permissions: 644 for the cert, 600 for the key.

We compared them against the other Galera nodes, and they were identical. This didn’t look like a permissions problem.

Is SELinux to blame here?

SELinux has ruined enough DBA time that it is one of the top spots on such checklists – but it was permissive.

$ getenforce
Permissive

That means it was logging any security issues, but not blocking. And there were no AVC denials related to MySQL or the PEM files in /var/log/audit/audit.log or dmesg!

File corruption

Did the files get corrupted/replaced during or before the MySQL restart?

$ openssl x509 -in /var/lib/mysql/server-cert.pem -noout -text
# Output looked perfectly valid when compared to the output from other nodes

$ openssl rsa -in /var/lib/mysql/server-key.pem -check
RSA key ok

The files were fine. They parsed cleanly. OpenSSL could read them. So why couldn’t MySQL?

More Logs review

We scanned /var/log/messages and journalctl for anything unusual around the time of the restart. No disk errors. No OOM kills. No kernel panics. Nothing that screamed “I am the Dhurandhar that’s destroyed your node.” At this point, most of the usual suspects were guilt-free, staring at us, asking, “Who did it?”

The Clue

It is good to communicate with stakeholders, and we did – “Was there any recent change on your side?” to the client, and then uttered the golden words “Last week the crypto-policy was updated on all of the DB servers to comply with PCI.”

PCI > Crypto-policy – Let’s go and check it !!

$ update-crypto-policies --show
FUTURE

The system was running RHEL’s FUTURE cryptographic policy.

For those unfamiliar (including me at the time), Red Hat Enterprise Linux (and its derivatives, such as Rocky, Alma, and Oracle Linux) ships with a system-wide cryptographic policy framework. It’s a centralized way to enforce minimum standards for TLS versions, cipher suites, key lengths, and signature algorithms across all applications on the system that include OpenSS and yes, anything that links against those libraries… like MySQL.

Here’s a table that shows information about the crypto-policy levels:

Policy	RSA Minimum	TLS Minimum	SHA-1 Signatures	Use Case
LEGACY	1024-bit	TLS 1.0	Allowed	Old systems compatibility
DEFAULT	2048-bit	TLS 1.2	Allowed	Standard operations
FUTURE	3072-bit	TLS 1.2	Blocked	Forward-looking hardening
FIPS	2048-bit	TLS 1.2	Blocked	FIPS 140 compliance

So FUTURE demands a 3072-bit RSA key; otherwise, it is blocked. What do we have?

$ openssl rsa -in server-key.pem -text -noout | head -1
RSA Private Key: (2048 bit, 2 primes)

2048 bits! C’mon! And now I recall the error log again… The hint was there:

error:140AB18F:SSL routines:SSL_CTX_use_certificate:ee key too small

Now we have our story straight.
On restart, our PXC cluster node started a new process linked against OpenSSL, which now enforced the FUTURE policy. OpenSSL looked at the 2048-bit RSA certificate and said: “Nope. Too small.”

Fixture

The quick fix here would be to adjust the policy to DEFAULT.

sudo update-crypto-policies --set DEFAULT

This will accept the current SSLs, and the node will join the cluster readily.

Alternatively, to remain compliant and adhere to the security policy strictness, the fixture will be to

Generate new certificates
Deploy the keys/certs to all Galera nodes
Perform a rolling restart

Conclusion

This was a classic case of a problem hiding at the boundary between two domains, database administration and operating system security. The DBA saw valid certificates and correct MySQL configuration. The sysadmin saw a properly hardened system with a strong crypto policy. Neither was wrong. But the intersection of their two correct configurations produced a failure.

This incident reinforces the importance of cross-domain awareness, where resolving database issues sometimes requires understanding and challenging system-level security decisions.

The post Curious case of PXC node that refused to start due to SSL appeared first on Percona.

The post Curious case of PXC node that refused to start due to SSL appeared first on MariaDB.org.

Building Query Analysis and Insights Dashboard in PMM

Mon, 04 May 2026 05:00:52 +0000

Percona Monitoring and Management is a great open source database monitoring, observability, and management tool. Query analytics is one of the prominent features DBA uses actively to trace the incidents and query performance identification.

We all know and love the Query Analytics (QAN) dashboard… It’s the first place we look when an incident alert fires or when a developer asks, “Why is the app slow?” or “What was going on during the midnight production outage?”

But sometimes, the standard dashboards just don’t tell the whole story or maybe are not clear enough. QAN is great, but shouldn’t we have more? If you have PMM running, you already have a Ferrari engine under the hood: ClickHouse. Most of us just drive it in first gear using the default UI.

In this post, we are going to take the training wheels off. We will bypass the standard QAN interface and talk directly to the ClickHouse backend to build highly specialised dashboards. We aren’t just looking for “slow” queries anymore; we are hunting for inefficiency, volatility, and the “silent killers” that standard monitoring often misses.

This is the hands-on blog, so grab your coffee and let’s turn that PMM instance into a deep-dive forensic tool.

Create a New Dashboard in PMM

Connect to PMM > Dashboards > Create New Dashboard
Save it with name “Slow Query Analysis” and Description “Slow Query Analysis from PMM’s QAN database (clickhouse)”
Click on add visualisation & select datasource “ClickHouse”
Choose SQL Builder

Paste the following query to get top 10 slow queries from the database

SELECT fingerprint
    FROM pmm.metrics
    WHERE service_type = 'mysql'
      AND $__timeFilter(period_start)
    GROUP BY fingerprint
    ORDER BY sum(m_query_time_sum) DESC
    LIMIT 10

Choose “Table View” on the top to view the list
When you click “Run Query” you will see the top 10 slow queries in the chosen time period.
Let’s Save the dashboard after Panel Options updates as follows7.1 Change Panel Name and Description to: “Slow Query Analysis”7.2 Legend Placement to “Bottom”, Values to “min”,”max”, “mean”7.3 Change Axis’ Scale to “Logarithmic”Logarithmic scale on an axis compresses large ranges of data, making it ideal for visualizing metrics with vastly different magnitudes. This provides good visualisation for queries of different execution time frames.7.4 Save DashboardAlright, we’re at our first step. This first result set shows the top 10 slow query fingerprints across all MySQL services tracked by PMM for the selected time range. It provides a quick, environment-wide view of the most expensive query patterns. But this does not provide a clear picture. Let’s refine the dashboard to focus on specific queries, servers and observe their performance over time.Now, let’s introduce a variable to filter the data.
Click on Settings on Dashboard’s home page8.1 Choose “Variables” tab and click on “Add Variable”8.2 Add variable configuration and Save Dashboard
Go Back to Dashboard and Edit “Slow Query Analysis” Panel.
- Now you should see the Query ID filter on the top.

Change the query to the following

SELECT
  period_start AS time,
  left(fingerprint, 80) AS query_text,
  sum(m_query_time_sum/m_query_time_cnt) AS query_time
FROM
  pmm.metrics
WHERE
  service_type = 'mysql'
  AND $__timeFilter(period_start)
  AND fingerprint IN (
    SELECT fingerprint
    FROM pmm.metrics
    WHERE service_type = 'mysql'
      AND $__timeFilter(period_start)
      AND ($queryid = '' OR queryid = $queryid)
    GROUP BY fingerprint
    ORDER BY sum(m_query_time_sum) DESC
    LIMIT 10
  )
GROUP BY
  time,
  fingerprint
ORDER BY
  time,
  query_time DESC

Basically the query is fetching start time, query text and average query time for the selected period for the top 10 Queries in that time-frame.
There is a filter for the “queryid” variable which you may use if you want to filter on a specific queryid.
Choose “Time Series” as “Query Type”

Adjust Panel Options11.1 Choose “Standard options” > “Unit” as “Time / Seconds (s)” from drop down.11.2 Choose “Standard options” > “Display name” as “${__field.labels.query_text}”11.3 Click on “Save Dashboard”
Your dashboard should be ready

Now, by default this dashboard is plotting top 10 queries. If you have a query fingerprint handy, you may be able to filter the search by that specific query. That said, this is still plotting queries across all the monitored instances. Let’s move on to add the service_name filter.

Adding service_name filter

Add Variable
1. Create new variable named “service_name”
2. Use variable type “Query”
3. Use Data Source as “ClickHouse”
4. Query:
```
select distinct service_name from pmm.metrics where service_type = 'mysql';
```
5. Unselect all checkboxes in “Selection options”
6. Save Dashboard
Update Query

SELECT
  period_start AS time,
  left(fingerprint, 80) AS query_text,
  sum(m_query_time_sum/m_query_time_cnt) AS query_time
FROM
  pmm.metrics
WHERE
  (service_name = '' OR service_name = '$service_name')
  AND service_type = 'mysql'
  AND $__timeFilter(period_start)
  AND fingerprint IN (
    SELECT fingerprint
    FROM pmm.metrics
    WHERE service_type = 'mysql'
      AND $__timeFilter(period_start)
      AND (service_name = '' OR service_name = '$service_name')
    GROUP BY fingerprint
    ORDER BY sum(m_query_time_sum) DESC
    LIMIT 10
  )
GROUP BY
  time,
  left(fingerprint, 80) 
ORDER BY
  time,
  query_time DESC

I know many of you are naturally curious and enjoy experimenting with PMM and Grafana… So you’ve probably already started thinking about how far this can be taken. Feel free to share your ideas or custom dashboards in the comments.

Sample Dashboards:

The Query Analysis and Insights Dashboard

Okay, for those who are looking to have quick results, I’ve prepared the complete Query Analysis and Insights Dashboard for you to import and use instantly.

By importing the JSON file, you’ll get the full working dashboard with all panels preconfigured, including:

Slow Query Analysis
Latency Distribution Heatmap
Query Volatility (P99 vs Average)
Lock Wait Ratio Over Time (Top Contended Queries)
Temporary Table Usage (Disk & Memory)
Query Efficiency (Rows Examined vs Rows Sent)
Error Rate vs Throughput
Workload Distribution by User
Query Volume by Client Host
Execution Time vs Lock Wait Time

This allows you to instantly explore PMM Query Analytics data, adjust time ranges and filters, and correlate query performance, contention, and workload behavior without recreating the dashboard from scratch.

Dashboard JSON available here:

Grafana: https://grafana.com/grafana/dashboards/24896
GitHub: https://github.com/Percona-Lab/pmm-dashboards/query_analysis_insights.json

Give it a go and let me know if you have suggestions or requests. Also consider sharing if you create something interesting.

Cheers.

The post Building Query Analysis and Insights Dashboard in PMM appeared first on Percona.

The post Building Query Analysis and Insights Dashboard in PMM appeared first on MariaDB.org.

Adding a New Data Type to MariaDB with Type_handler – Part 2

Sat, 02 May 2026 18:12:27 +0000

After having discovered the Type_hander framework and learned how to build MariaDB Server from source, it’s time to code our first data type! …

Continue reading “Adding a New Data Type to MariaDB with Type_handler – Part 2”

The post Adding a New Data Type to MariaDB with Type_handler – Part 2 appeared first on MariaDB.org.

InnoDB Redo Log Sizing: Stop Guessing, Start Measuring

Sat, 02 May 2026 00:00:00 +0000

Introduction

Many MySQL configurations inherit redo log sizing from defaults, aging blog posts, or configuration folklore.

innodb_redo_log_capacity gets set once… and then quietly fades into the background.

But redo log capacity directly shapes how efficiently MySQL absorbs writes, manages checkpoint pressure, and handles burst-heavy workloads.

Set it too low, and aggressive flushing can throttle throughput.
Set it too high, and crash recovery can become painfully long.

Redo logs are more than crash insurance.

They are part of your write-performance architecture.

Redo logs are the shock absorbers of write-heavy MySQL. Too small, and performance jolts. Too large, and recovery drags.

Why Redo Logs Matter

InnoDB redo logs are often described as crash recovery journals, but that description undersells their real operational value.

Redo logs function as a write buffer between committed transactions and eventual data file writes.

When a transaction commits:

Changes are written to the redo log first
Dirty pages remain in memory
Data pages are flushed later

This write-ahead logging (WAL) design allows MySQL to:

Absorb bursts of write activity
Reduce immediate random disk writes
Smooth checkpoint behavior
Preserve durability

Redo logs act like pressure regulators in a write-heavy system.

They absorb pressure spikes so the entire system doesn’t thrash every time demand increases.

Without enough redo capacity, MySQL has less room to absorb write bursts before it must flush aggressively.

Checkpoint Age and Flushing Pressure

Redo log sizing becomes most visible when checkpoint pressure builds.

Checkpoint age represents how far current write activity has advanced beyond the last durable checkpoint:

Checkpoint Age = Current LSN - Last Checkpoint LSN

As checkpoint age approaches total redo capacity:

Adaptive flushing intensifies
Page cleaners become more aggressive
Dirty pages flush faster
Disk I/O spikes
Latency often becomes unstable

This is where undersized redo logs can trigger flush storms.

MySQL isn’t writing more data. It’s being forced to write sooner and less efficiently.

Useful metrics

Innodb_checkpoint_age
Innodb_buffer_pool_pages_dirty
Innodb_data_fsyncs
Innodb_log_waits

When redo space shrinks, MySQL doesn’t stop writing. It starts panicking earlier.

Symptoms of Undersized Redo

Small redo logs rarely announce themselves directly.

Instead, they often masquerade as generalized storage or write-performance issues.

Common warning signs

Periodic write stalls
Spikes in fsync activity
Sharp increases in page cleaner workload
TPS drops during burst traffic
Dirty page percentage volatility
Stable CPU, unstable write latency

A common misdiagnosis

Many systems blame disks when the real issue is insufficient redo headroom.

If writes are arriving faster than redo can comfortably buffer them, MySQL is forced into reactive flushing patterns.

The problem may not be disk speed.

It may be timing pressure.

Measuring with Status Counters

Redo log sizing should be based on observed workload, not memory percentages or inherited defaults.

Step 1: Measure redo generation rate

Use:

sql

SHOW ENGINE INNODB STATUS;

Track:

Log sequence number
Log flushed up to
Last checkpoint at

Measure LSN growth over time:

Redo Generation Rate = (LSN delta) / elapsed time

Example

If LSN grows by 4 GB over one hour:

1 GB redo capacity = frequent pressure
4 GB redo capacity = ~1 hour buffer
8 GB redo capacity = larger burst tolerance

Step 2: Watch for log stress

sql

SHOW GLOBAL STATUS LIKE 'Innodb_log_waits';
SHOW GLOBAL STATUS LIKE 'Innodb_os_log%';

Key metric

Innodb_log_waits

If this value increases, transactions are waiting for log free space.

That is one of the clearest signs your redo logs may be too small.

Innodb_log_waits is less a tuning suggestion and more a smoke alarm.

Practical Sizing Strategy

Forget percentage-of-RAM formulas.

Redo logs should be sized around workload intensity.

A practical starting point:

Size redo capacity to hold 30 to 60 minutes of peak redo generation

Example

Peak redo generation = 6 GB/hour

Minimum:

30 minutes = 3 GB

Safer:

60 minutes = 6 GB

Heavy burst environments:

Larger sizing may reduce flush volatility further

Trade-Offs

Smaller Redo Logs

Pros:

Faster crash recovery
Lower storage footprint

Cons:

Increased checkpoint pressure
More aggressive flushing
Greater write instability

Larger Redo Logs

Pros:

Better burst absorption
Smoother sustained write performance
Reduced flush storms

Cons:

Longer crash recovery
Delayed visibility into pressure buildup

Common Mistakes

Treating redo like buffer pool sizing
Redo capacity is about write throughput buffering, not memory caching.
Ignoring Innodb_log_waits
This can leave obvious pressure invisible until performance suffers.
Oversizing without testing recovery
Large redo logs may improve runtime but worsen restart scenarios.
Sizing for average load instead of peak
Redo logs exist to absorb pressure spikes, not calm periods.

Final Thoughts

The right redo log size isn’t about maximizing a configuration value.

It’s about matching capacity to workload behavior.

Too small, and MySQL becomes reactive.
Too large, and crash recovery becomes the hidden tax.

When redo logs are properly sized, they fade into the background.

They quietly absorb bursts, smooth checkpoint behavior, and preserve performance consistency under pressure.

Redo logs work best when they disappear into the background, quietly absorbing pressure instead of creating it.

Stop guessing.

Measure your workload, observe your log pressure, and size with intent.

The post InnoDB Redo Log Sizing: Stop Guessing, Start Measuring appeared first on MariaDB.org.

Run an ALTER TABLE for a huge table in Aurora

Fri, 01 May 2026 01:55:40 +0000

Recently, we received an alert for one of our Managed Services customers indicating that the auto_increment value for the table was 80% of its maximum capacity. The column was INT UNSIGNED, which has a limit of 4,294,967,295.

At 80%, we have enough time to change it to BIGINT.…. Right? Let’s see.

So we used pt-online-schema-change to perform the alter.

It started running at a good pace but slowed over time.

Why?

Well, let’s look at the definition of the table:

mysql> show create table myschema.mytableG
*************************** 1. row ***************************
       Table: mytable
Create Table: CREATE TABLE `mytable` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `long_column` varchar(1000) NOT NULL,
  `state` tinyint unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `short_column` varchar(30) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_long_column` (`long_column`,`state`),
  KEY `idx_short_column` (`short_column`,`state`),
  KEY `idx_short_col2` (`short_column`)
) ENGINE=InnoDB AUTO_INCREMENT=4009973818 DEFAULT CHARSET=utf8mb3

NOTE1: The index on long_column is for a varchar column with a length of 1000; it may not be required, and an index prefix may be more helpful here.

NOTE2: The index idx_short_col2 is duplicated, as it is covered by the index idx_short_column.

Those changes require testing and are out of scope for this emergency, but they are worth mentioning.

Table size:

+---------------+------------+------------+---------+----------+---------+----------+--------+
| TABLE_SCHEMA  | TABLE_NAME | TABLE_ROWS | DATA_GB | INDEX_GB | FREE_GB | TOTAL_GB | ENGINE |
+---------------+------------+------------+---------+----------+---------+----------+--------+
| myschema      | mytable    | 3906921584 |    1118 |     1790 |       0 |     2907 | InnoDB |
+---------------+------------+------------+---------+----------+---------+----------+--------+

Look at the indexes being way bigger than the data.

mysql> SELECT database_name, table_name, index_name, ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) AS size_in_mb FROM mysql.innodb_index_stats WHERE stat_name = 'size' AND index_name != 'PRIMARY' and database_name='myschema' and table_name='mytable' ORDER BY size_in_mb DESC;
+---------------+------------+-------------------+------------+
| database_name | table_name | index_name        | size_in_mb |
+---------------+------------+-------------------+------------+
| myschema      | mytable    | idx_long_column   | 1583538.95 |
| myschema      | mytable    | idx_short_column  |  126432.98 |
| myschema      | mytable    | idx_short_col2    |  122699.95 |
+---------------+------------+-------------------+------------+
3 rows in set (0.01 sec)

While the pt-online-schema-change runs, it copies the data to a new table. As the data is being copied, the secondary indexes must be maintained.

NOTE the huge index for a varchar(1000) that is ~1.5T in size. Maintaining such an index becomes increasingly expensive as the data size increases.

The pt-online-schema-change had been running for ~8 days, and its latest estimate was 53 more days, which we can’t afford, since the maximum value would be exceeded in ~15 days.

Copying `myschema`.`mytable`:  12% 53+16:48:01 remain
Copying `myschema`.`mytable`:  12% 53+16:48:30 remain
Copying `myschema`.`mytable`:  12% 53+16:48:59 remain
Copying `myschema`.`mytable`:  12% 53+16:49:26 remain
Copying `myschema`.`mytable`:  12% 53+16:49:53 remain
Copying `myschema`.`mytable`:  12% 53+16:50:19 remain
Copying `myschema`.`mytable`:  12% 53+16:50:49 remain
Copying `myschema`.`mytable`:  12% 53+16:51:17 remain
Copying `myschema`.`mytable`:  12% 53+16:51:45 remain

So what do we do now?

We suggested canceling the pt-online-schema-change and creating an Aurora blue-green deployment.

Then perform the direct ALTER on the green cluster. And finally, when ready, do the failover.

Sounds good, doesn’t it?

First, we need to ensure that the new cluster (green) has the replica_type_conversions parameter in its cluster parameter group to “ALL_NON_LOSSY, ALL_UNSIGNED” in order to be able to replicate from an int unsigned column to a bigint unsigned column.

So we tried that, it started too fast ~0.036% per minute, that’s 2 days. That’s great!

We left the process running over the weekend, but we noticed it started to slow down again… By Monday, it was advancing at ~0.01% every 5 mins, which gives an ETA of 34 days.

Why?

Again, using the direct ALTER MySQL copies the data to a temp table, and the bigger the data, the harder it is to maintain the indexes.

Again, unacceptable.

Note that with the above 2 approaches, we lost ~12 days of precious time, and the deadline for auto_increment exhaustion was approaching.

Then we thought: What if we drop the secondary indexes, do the alter, and then add the indexes back?

In theory, it should be faster, as:

Dropping the indexes is a metadata-only operation with ONLINE DDL.
Altering the column datatype from INT to BIGINT is not an ONLINE operation, but the fact that it doesn’t have to update secondary indexes during row copying to a new temporary table prevents the slowdown.
Adding back the secondary indexes is an ONLINE DDL operation:

“Online DDL support for adding secondary indexes means that you can generally speed the overall process of creating and loading a table and associated indexes by creating the table without secondary indexes, then adding secondary indexes after the data is loaded.”

https://dev.mysql.com/doc/refman/8.4/en/innodb-online-ddl-operations.html

So let’s do this:

The deletion of the indexes was really quick, as expected (metadata-only operation):

mysql> ALTER TABLE myschema.mytable DROP INDEX idx_long_column, DROP INDEX idx_short_column, DROP INDEX idx_short_col2;
Query OK, 0 rows affected (49.40 sec)
Records: 0  Duplicates: 0  Warnings: 0

Then the change of the datatype:

mysql> ALTER TABLE myschema.mytable CHANGE COLUMN id id bigint unsigned NOT NULL AUTO_INCREMENT;
Query OK, 4058047205 rows affected (13 hours 9 min 10.62 sec)
Records: 4058047205  Duplicates: 0  Warnings: 0

Looks very promising!!!

The final step, add back the indexes:

mysql> ALTER TABLE myschema.mytable ADD INDEX `idx_long_column` (`long_column`,`state`), ADD INDEX `idx_short_column` (`short_column`,`state`), ADD INDEX `short_col2` (`short_column`);
ERROR 1878 (HY000): Temporary file write failure.

Why?

Well, the INPLACE operation uses the tmp dir to write sort files. In Aurora, there are certain limits for the temporary space based on the instance type.

In a regular MySQL instance, we can modify the innodb_tmpdir to another location with enough disk space; however, in Aurora, the parameter is not modifiable, which could have made the whole process easier.

Even with a larger instance type, it’s hard to create the 1.5T index without breaking open the piggy bank.

Last resort, add the indexes back with the COPY algorithm:

mysql> ALTER TABLE myschema.mytable ALGORITHM=COPY, ADD INDEX `idx_long_column` (`long_column`,`state`), ADD INDEX `idx_short_column` (`short_column`,`state`), ADD INDEX `idx_short_col2` (`short_column`);
Query OK, 4147498819 rows affected (6 days 1 hour 55 min 57.00 sec)
Records: 4147498819  Duplicates: 0  Warnings: 0

Why does it work? Because ALTER TABLE using the COPY algorithm uses the datadir as the destination for the temporary table, the rows are copied there. It doesn’t have the limitation of the temporary directory mentioned above.

We were able to make it on time about 4 days before the auto_increment exhaustion, preventing downtime.

In retrospective we could have used the following approach to avoid the use of the blue/green deployment:

Perform a pt-online-schema-change on the main table, dropping the indexes, and changing the column type to bigint. ( with –no-swap-tables –no-drop-old-table –no-drop-new-table –no-drop-triggers).
Add the secondary indexes using the direct alter with the COPY algorithm in the _new table.
Once the alter finishes, swap the tables and drop the triggers.

Conclusion:

What initially looked like an easy task with pt-online-schema-change, ended up being more complex.

You need to check the data definition, the index sizes, the Aurora limits, and how the different algorithms work to make a decision on the best way to proceed with those tasks, specially on situations like these where you have the pressure of the auto_increment being exhausted and there’s risk of downtime if it is not done on time.

And of course, monitor auto_increment exhaustion for your tables, and use a reasonable threshold that gives you enough time to plan and change the table definition. You can use Percona Monitoring and Management for this, specifically on the MySQL > MySQL Table Details dashboard.

The post

Run an ALTER TABLE for a huge table in Aurora appeared first on Percona.

The post Run an ALTER TABLE for a huge table in Aurora appeared first on MariaDB.org.

Managing Valkey Cluster in Kubernetes

Thu, 30 Apr 2026 23:05:39 +0000

Over the last several years, Percona has introduced several rock-star Kubernetes Operators for managing MySQL, Percona XtraDB Cluster, MongoDB, and PostgreSQL. For Valkey, we are actively working with the community to contribute our knowledge, and experience to help brainstorm, develop, and test the official Valkey Operator for Kubernetes.

While the Valkey Operator has not yet released a GA 1.0 version, we wanted to take this opportunity to highlight some recently added features.

Cluster Configuration

Up until recently, there was no native ability to provide configuration parameters to the Valkey server process running inside each deployed pod. This hurdle is now overcome, and you can supply configuration natively within the deployment CR.

apiVersion: valkey.io/v1alpha1
kind: ValkeyCluster
metadata:
  name: my-valkey-cluster1
spec:
  shards: 3
  replicas: 1
  config:
    maxmemory: 500mb
    maxmemory-policy: allkeys-lfu
    maxclients: 5000
    commandlog-execution-slower-than: 10000

For now, these parameters are set on initial cluster deployment. There is already traction underway to allow certain parameters to be dynamically set at runtime. There are a small handful of certain cluster-based parameters that cannot be overridden by the user, otherwise it would break operator functionality.

User Access Control List (ACL)

Managing users is always a tedious task for any database administrator. Creating ACLs for users in Valkey can be a bit confusing coming from a traditional RDBMS using GRANT syntax. To make things just a bit easier, Valkey Operator has added user permissions management to the deployment CR.

Firstly, create your Secret containing usernames, and passwords:

apiVersion: v1
kind: Secret
metadata:
  name: valkey-cluster-sample-users
data:
  alicepw: M21wdHlQQHNzdzByZA==
  davidold: OVYqTHQlYXU4Mk5tdTlyeQ==
  davidnew: VmFsa2V5I1J1bHojMjIzMw==

Next, deploy your cluster with users:

apiVersion: valkey.io/v1alpha1
kind: ValkeyCluster
metadata:
  name: my-cool-valkey-cluster
spec:
  shards: 3
  replicas: 1
  users:
    - name: alice
      enabled: true
      passwordSecret:
        name: valkey-cluster-sample-users
        keys: [alicepw]
      commands:
        allow: ["@read", "@write", "@connection"]
        deny: ["@admin", "@dangerous"]
      keys:
        readWrite: ["app:*", "cache:*"]
        readOnly: ["shared:*", "config:*"]
        writeOnly: ["logs:*", "metrics:*"]
    - name: david
      enabled: true
      passwordSecret:
        name: valkey-cluster-sample-users
        keys:
          - davidold
          - davidnew
      commands:
        allow: ["@admin"]

There’s quite a lot going on here. Let’s break it down by first looking at the user ‘alice’:

The ‘alice’ user is enabled, with a password found in the referenced Secret and secret key. Next, we can see what commands, or in this case, command groups (Noted with ‘@’) that alice is allowed to execute, and which commands/groups are denied. Lastly, permissions on specific key patterns are identified for maximum security restrictions.

The other user, ‘david’, can access all of the admin-group commands, and cannot read or write to any keys. Note that david’s secret key reference is an array, which means you can provide multiple passwords per user; great for password rotation! Once david confirms the new password, the old password references can be removed from the CR and Secret, and the Valkey Operator will synchronize the ACLs.

Users are dynamic, which means they can be added, removed, and modified without restarting the cluster.

TLS Support

Bring on the encryption! TLS support was also recently added to the Valkey Operator. Create your Secret with the CA, TLS Key, and Cert files, and tell the CR where to find them:

apiVersion: valkey.io/v1alpha1
kind: ValkeyCluster
metadata:
  name: cluster-sample
spec:
  shards: 3
  replicas: 1
  tls:
    certificate:
      secretName: my-valkey-tls-secret

Once deployed, the Valkey operator will mount the referenced secret to each pod, and add all the proper configuration parameters. By doing so, the operator enforces SSL/TLS communication between each Valkey cluster node, securing node-to-node, and replication traffic within your kubernetes network. Additionally, by creating user certificates signed by the same CA, traffic between your clients, and the Valkey clusters nodes is secured. This configuration is BYOC (bring-your-own-certificate), which works well with the popular CertManager, or other certificate authority you may be using.

On The Horizon

As a teaser, here are a couple other features coming soon to Valkey Operator:

Data Persistence: The ability to enable background snapshots of the in-memory dataset for backup, and recovery. Additionally, supporting the AOF (append-only file) for streaming changes.
Simple Replication: The operator currently only supports Valkey in cluster mode. Be on the lookout for traditional primary -> N-replica configurations, along with Sentinel monitoring.

Join Us

Want to contribute to the Valkey Operator? Join any of the discussions/issues on our github, or come introduce yourself in the Valkey Slack community.

The post Managing Valkey Cluster in Kubernetes appeared first on Percona.

The post Managing Valkey Cluster in Kubernetes appeared first on MariaDB.org.

Open source doesn’t die. It gets unfunded.

Thu, 30 Apr 2026 11:00:00 +0000

If you are using PostgreSQL in any capacity very likely this week has started for you with a bang. pgBackRest, one of the most known tools for PostgreSQL, praised for the scalable and reliable way to do backups has announced that the project is currently archived.

Archived, you mean EOL?

No! Open source software rarely has a hard “end of life.” What it does have are maintainership gaps and those can be just as serious.

It’s different when PostgreSQL community announces a major version EOL. This happens because Community chooses to not to support it and move on to focus on newer versions.

Reading the message from David Steele, the long-time primary maintainer of pgBackRest you will not find “end of life” term. The project is marked read-only and no longer actively maintained, but that is not the same as being permanently dead.

pgBackRest is not “end of life.” There is no governing body declaring support ended. What happened is simpler and more common in open source: the maintainer can no longer afford to continue.

So what happened then?

This requires some story telling and I don’t think that I can do it better than Lætitia Avrot already did in her blogpost (though I do not like the title):

Crunchy Data, which had sponsored pgBackRest for most of its life and employed David, was sold. After that, David spent months looking for a position that would let him keep working on the project. He also tried to secure independent sponsorship. Neither worked out. He needs to make a living. The project requires sustained effort which he can no longer provide without being paid for it.

This is the issue. An experienced developer, who wants to work on the project (that a big chunk of enterprises use) finds himself to be the “Nebraska guy” from XKCD comic.

When you look at the situation we’re in this is the classic “Nebraska guy problem”: critical infrastructure maintained by a single person. pgBackRest is widely used in production, yet its sustainability dependson one individual being able to justify working on it. That does not seem fair and David did right to point this out with his move.

Of course, if anyone in the community chose to, they can still maintain the project by forking it. But why, since the problem is elsewhere?

Most people understand that engineers need to be paid for their work. What not everyone realizes is that the free for all software that the open source license provides does not mean free as in beer. Someone still needs to fund it!

Unfortunately “someone” almost certainly is going to be “no-one” unless “anyone” realizes they are going to miss the software if nobody maintains it anymore.

While there’s a claim to be made that:

Companies are as good as they have to and as bad as they are allowed to

And often we see that an entity uses software they do not have to pay license fees for, treating this as cost optimization. There is also a large chunk of organizations that realize this is not a good long term strategy. Actively lowering the operational risk is important.

This is where foundations typically kick in: providing an easy way for organizations to contribute and ensure the longevity and healthiness of the projects. But PostgreSQL does not (yet) have one.

Where are we now?

There’s a lot of backchannel talks happening.

Join the ones on:

Telegram
Discord
Slack
Reddit (I don’t have enough karma to engage there :/)

or let us know what is your stance (Percona Community Forum thread available) so that we can represent you in the discussions we are having.

A lot of blog posts have been written on this subject, check out Planet PostgreSQL to find some of them! I particularly enjoyed some of them, the one from Stefanie Janine Stölting, I feel I am mostly aligned with. PostgreSQL needs an Ecosystem Umbrella Foundation

The future of open source is on us

Reading that a project is EOL is triggering to me. When long-time maintainer announced plans to step away after more than a decade of work, instead of focusing on what the problem is that caused him to do so and how to solve the issue.Naming it “dead” complicate things even further. Labeling the project as “dead” doesn’t solve the problem. Rather, it accelerates the wrong response. Users start looking for replacements instead of asking how to sustain the project.

This is not the way, young Padawan!

We need a body that helps both users and authors by:

Providing governance and a helping hand to the ecosystem. Yes, this is also funding
Providing guarantees of healthiness. This means users will have it easier to know the tools are in good shape.

So what’s with pgBackRest

While we talk here in the public, a lot of decisions are being made and Percona among other companies is working towards resolving this situation.

Have patience. Work is already underway behind the scenes, and the situation is evolving. There will be positive news resolving the situation coming soon, as Open Source doesn’t die!

The post Open source doesn’t die. It gets unfunded. appeared first on MariaDB.org.

Open source doesn’t die. It gets unfunded.

Thu, 30 Apr 2026 11:00:00 +0000

Archived, you mean EOL?

No! Open source software rarely has a hard “end of life.” What it does have are maintainership gaps and those can be just as serious.

It’s different when PostgreSQL community announces a major version EOL. This happens because Community chooses to not to support it and move on to focus on newer versions.

pgBackRest is not “end of life.” There is no governing body declaring support ended. What happened is simpler and more common in open source: the maintainer can no longer afford to continue.

So what happened then?

This requires some story telling and I don’t think that I can do it better than Lætitia Avrot already did in her blogpost (though I do not like the title):

Crunchy Data, which had sponsored pgBackRest for most of its life and employed David, was sold. After that, David spent months looking for a position that would let him keep working on the project. He also tried to secure independent sponsorship. Neither worked out. He needs to make a living. The project requires sustained effort which he can no longer provide without being paid for it.

This is the issue. An experienced developer, who wants to work on the project (that a big chunk of enterprises use) finds himself to be the “Nebraska guy” from XKCD comic.

Of course, if anyone in the community chose to, they can still maintain the project by forking it. But why, since the problem is elsewhere?

Unfortunately “someone” almost certainly is going to be “no-one” unless “anyone” realizes they are going to miss the software if nobody maintains it anymore.

While there’s a claim to be made that:

Companies are as good as they have to and as bad as they are allowed to

This is where foundations typically kick in: providing an easy way for organizations to contribute and ensure the longevity and healthiness of the projects. But PostgreSQL does not (yet) have one.

Where are we now?

There’s a lot of backchannel talks happening.

Join the ones on:

Telegram
Discord
Slack
Reddit (I don’t have enough karma to engage there :/)

or let us know what is your stance (Percona Community Forum thread available) so that we can represent you in the discussions we are having.

The future of open source is on us

This is not the way, young Padawan!

We need a body that helps both users and authors by:

Providing governance and a helping hand to the ecosystem. Yes, this is also funding
Providing guarantees of healthiness. This means users will have it easier to know the tools are in good shape.

So what’s with pgBackRest

While we talk here in the public, a lot of decisions are being made and Percona among other companies is working towards resolving this situation.

Have patience. Work is already underway behind the scenes, and the situation is evolving. There will be positive news resolving the situation coming soon, as Open Source doesn’t die!

The post Open source doesn’t die. It gets unfunded. appeared first on MariaDB.org.

Continued Commitment to Percona XtraDB Cluster

Thu, 30 Apr 2026 09:53:09 +0000

At Percona, our priority has always been to provide the open source database solutions that our users can count on for the long term. Percona XtraDB Cluster (PXC) is a core part of that promise, delivering the high availability, scalability, and data integrity that mission-critical MySQL deployments depend on.

MariaDB has announced that September 30, 2026 will be the end-of-life date for continued maintenance and regular binary releases of MySQL Galera Cluster. We want to be clear about what this means for the organizations that rely on PXC: nothing is changing. Our commitment to PXC and the community that runs it is as strong as ever.

What is ending upstream is precisely what we already have in place. For anyone looking for an alternative path forward, PXC is the natural place to land.

What PXC users can count on

Our open Galera fork: Percona maintains its own Galera repository, open today and staying that way. We track upstream Galera releases, carry the fixes our customers need, and keep the codebase fully available for the community. PXC is built on this work, on terms we control.
Regular releases at the current cadence: Binary releases, bug fixes, and security patches continue to ship on the same terms and schedule our users have come to expect. You can review our full release history and release notes on the Percona documentation site.
Long-term support: PXC remains fully supported under our existing long-term support terms. If your organization is planning three to five years ahead, PXC is a safe foundation for those plans.
Compatibility and ecosystem integration: Strong binary compatibility with MySQL and Percona Server for MySQL, tight integration with Percona XtraBackup and Percona Monitoring and Management, and continued support across Kubernetes and traditional deployment environments.

What we’re continuing to invest in

Our engineering teams remain committed to making PXC better, focused on the things that make it a trusted choice: performance, stability, security, and a smooth operator experience. That work continues at pace. The PXC you depend on today will keep getting better, and the PXC you are evaluating for tomorrow will be ready when you need it.

Talk to us

If you have specific questions about your PXC deployment, your upgrade path, or your long-term high availability strategy, we’d love to hear from you. Reach out to your Percona contact, post a question in the Percona community forums, or connect with our team directly. High availability is too important to leave to uncertainty, and we are here to make sure you have the clarity and the support you need.

The post Continued Commitment to Percona XtraDB Cluster appeared first on Percona.

The post Continued Commitment to Percona XtraDB Cluster appeared first on MariaDB.org.

From Ecosystem to Architecture: Expanding How We Look at MariaDB

Thu, 30 Apr 2026 07:46:58 +0000

Over the past month, one question has been coming up with increasing frequency:
What is the MySQL / MariaDB ecosystem?
In most discussions, the answer tends to focus on contributors to the source code: engineers, committers, and core developers shaping the database itself. …

Continue reading “From Ecosystem to Architecture: Expanding How We Look at MariaDB”

The post From Ecosystem to Architecture: Expanding How We Look at MariaDB appeared first on MariaDB.org.

Adding a New Data Type to MariaDB with Type_handler – Part 1

Thu, 30 Apr 2026 06:12:31 +0000

This is the first part of the series about how to add a new data type to MariaDB using the Type_handler framework. A preliminary article has already been published to start the series; …

Continue reading “Adding a New Data Type to MariaDB with Type_handler – Part 1”

The post Adding a New Data Type to MariaDB with Type_handler – Part 1 appeared first on MariaDB.org.

Troubleshooting logical replication delay made easy

Thu, 30 Apr 2026 04:57:02 +0000

This blog is based on a real production case in which users experienced a serious delay in logical replication. Let me try to explain how to approach similar cases and analyze them in an easy method, because lag in logical replication is a common problem, and we should expect it to come up for different environments. But sometimes troubleshooting can be challenging, especially on DBaaS environments where we won’t get in-depth information at OS / hardware level. Such situations force us to deal with limited information which is available within the PostgreSQL connection (No host-level troubleshooting possible)

The Case

The case that triggered this blog was an attempt to migrate from one cloud vendor, to a recent version of PostgreSQL on a DBaaS offering of another cloud vendor. They started observing huge replication lag and reported to Percona. As usual, we started with pg_gather data collection.

(At Percona, we use pg_gather for diagnosis. Even though this blog and diagnosis refers to pg_gather report, any good diagnosis tool / scripts which can help to study the wait-event pattern and lag details could be able to help)

We saw upto 4.5 terabyte lag is happening at the transmission side (Publisher) on the customer case. The “Transmission” lag” is the difference between the latest generated LSN and the LSN which the WAL Sender is able to send (sent_lsn of pg_stat_replication). That’s a first indication that the problem is mainly at the publisher side (WAL Sender) and it is not able to send the information fast enough.

Next step of investigation is to understand what both those WAL Senders might be doing. The wait event information for each WAL Sender could provide a clear clue on where the delay is happening.

Both the WAL Senders are mainly waiting in “WalSenderWriteData” event upto 85% of its time. This is a very unusual level of wait.

Following is the logic behind this.

Logical decoding hands a finished record to
WalSndWriteData()
The data is queued in the
libpq send buffer with
pq_putmessage_noblock
A non-blocking flush is attempted with
pq_flush_if_writable() . But when the kernel send buffer is full, internal_flush_buffer() returns 0 with EAGAIN / EWOULDBLOCK and data stays buffered —
pq_is_send_pending() becomes
true
The fast-path return is skipped, so it enters
ProcessPendingWrites()
There,
WalSndWait(WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE, sleeptime, WAIT_EVENT_WAL_SENDER_WRITE_DATA) blocks until the socket is writable again or the subscriber sends a reply — that wait is what shows up as wait-event WalSenderWriteData

Source code reference : src/backend/replication/walsender.c

Which means that this wait event could happen due to the following reasons which I could think about. I would appreciate your comments if you know more reasons.

The subscriber’s apply worker not consuming the stream fast enough (apply lock contention, slow I/O, heavy CPU load on the subscriber) — its TCP receive buffer fills up, the TCP window shrinks to zero, the publisher’s
send() returns EAGAIN, and the WAL sender spins in the loop.
Saturated network bandwidth between publisher and subscriber. There we have two cases. Traffic originating from Publisher to Subscriber could be slow OR handshake/acknowledgement traffic from Subscriber back to Publisher could be slow. The symptoms may differ.
Large decoded transactions produce bursts of WalSndWriteData calls faster than the network can absorb them. This is generally a temporary problem and the cluster might catchup once the overhead of the large transaction is over.

Now the question would be : Now we know all the probable causes, but how to narrow down to the most probable cause ?, so that we can have an action plan.

At Percona, our engineers take time to put all hypotheses for testing, trying to simulate similar conditions and produce observable and reproducible cases. One might argue that we can use low level tracing / OS level tools at this stage to narrow down. Definitely, Yes. That’s the most appropriate thing to do. However many of the users may not have low level access and DBaaS offerings prevent it by design. But the good news is that wait – event patterns can tell us a story in more detail.

Slow Network traffic from Publisher to subscriber

If the network connectivity from Publisher side is slow or suffering with high latency, the send buffer won’t get cleared fast enough. Resulting in repeated attempts to send the data.

When we simulated the situation in the lab, we observed the similar Transmission side lag

Since this is a network connection problem, automatically the acknowledgment coming from the subscriber side will also be delayed and expected to show lags in all Write, Flush and Replay stages, which is visible in the data collection.

Obviously, our next question is what WALSender is doing ?. The wait events reveal that

The WAL sender is struggling to send data to Standby. This matches with what the user was seeing in their database.

Meanwhile, at the subscriber side, what we could see is that the apply worker is majorly sitting idle in the main loop: LogicalApplyMain

Two other major symptoms to be noted in the cases is 1). much smaller “Write lag” and compared to “Transmission lag” and 2.). Both the subscribers are suffering the lag, which is less probable if the problem is on the subscriber side. Even additional clues like data collection running longer when executed from the publisher side about the subscriber instance is also supplementary evidence.

All the above symptoms helps us to conclude with a reasonable level of confidence that the network traffic from the Publisher is the problem.

Overloaded or slow Subscriber node

The problem could be caused by subscriber not communicating fast enough with Publisher. In such cases the replication lag is expected. I tested that scenario and the following is the observation.

The cause of the lag shifts from the “Transmission lag” to “Replica Write Lag”, which is the difference between send_lsn and write_lsn.

The WAL Sender / publisher side don’t have any problem in sending

That looks really cool. The major wait event is WalSenderWaitForWal. Which means that the WAL sender is just waiting for the next WAL to be flushed and ready, In other words, sleeping until the next commit.

However, the situation on the Subscriber side is different. Unlike in the previous case, the apply worker could be busy with all sorts of work, no more free time to wait in the main loop.

The wait events and their percentage may vary depending on the performance-bottleneck on the subscriber side.

Slow traffic from Subscriber to Publisher

The impact of the network traffic from the Subscriber side back to Publisher has less effect than that from Publisher side. Because the data flow is from Publisher to Subscriber. The Subscriber needs to send only handshake acknowledgment information back to the publisher, which requires less bandwidth. So an apply worker can be waiting in the main loop (LogicalApplyMain)

But contrary to expectations, there could be cases of significant CPU usage if there are repeated attempts to reach primary, it may be consuming significant CPU cycles. If we are suspecting network problems from the subscriber side, paying close attention to PostgreSQL logs is important.

There can be timeout captured in PostgreSQL logs at the publisher side

2026-04-23 17:34:07.078 UTC [36057] postgres@postgres LOG:  terminating walsender process due to replication timeout
2026-04-23 17:34:07.078 UTC [36057] postgres@postgres CONTEXT:  slot "sub", output plugin "pgoutput", in the commit callback, associated LSN DA/70450B8
2026-04-23 17:34:07.078 UTC [36057] postgres@postgres STATEMENT:  START_REPLICATION SLOT "sub" LOGICAL D9/DF13B50 (proto_version '4', streaming 'parallel', origin 'any', publication_names '"pub"')

Corresponding errors might be appearing at subscriber side also

2026-04-23 17:36:00.933 UTC [38413] ERROR:  could not receive data from WAL stream: SSL connection has been closed unexpectedly
2026-04-23 17:36:00.940 UTC [38428] LOG:  logical replication apply worker for subscription "sub" has started
2026-04-23 17:36:00.946 UTC [18241] LOG:  background worker "logical replication apply worker" (PID 38413) exited with exit code 1

All these are indications of poor connection.

Summary

PostgreSQL Wait events provide lots of visibility into PostgreSQL and underlying infrastructure problems. Reading it with PostgreSQL statistics information (pg_stat_*) and PostgreSQL log could provide answers for a lot of questions as follows easily.

Where is the problem ?. Which side of replication is lagging ?
What are WAL Senders and WAL Receivers doing ? Are they busy doing something ? Is there anything pending or sitting idle ?
Are there any hits of underlying infrastructure problems ?
Is the problem correlatable with the wait events ?

Collecting and correlating the wait-event data from both sides of the replication, checking the LSN differences, and information from PostgreSQL logs gives us a complete picture.

All it takes is just a couple of minutes! With the right tools and methods in hand. I would like to encourage readers to make use of wait event pattern analysis for easy spotting of performance bottlenecks, if you are not doing it already. It can save you from the treachery of all indepth tracing. Observed Replication lag can be treated as a symptom of something more serious underlying.

This blog is written for those who don’t have access to OS level, But if we have access, getting into details is far easier. For example, the send queue (Send-Q) of the Publisher host can be checked like:

postgres@node0:~$ ss -tnp
State            Recv-Q            Send-Q                         Local Address:Port                       Peer Address:Port             Process
ESTAB            0                 18791538                          172.18.0.2:5432                         172.18.0.3:44974             users:(("postgres",pid=48178,fd=11))

The post Troubleshooting logical replication delay made easy appeared first on Percona.

The post Troubleshooting logical replication delay made easy appeared first on MariaDB.org.

OIDC error scenarios

Thu, 30 Apr 2026 00:00:00 +0000

Last time, in OIDC in PostgreSQL: With Keycloak, we created a working demo setup that was able to successfully authenticate a user using OIDC.

In this blog post, we follow the same example, but instead of the success story, we explore how OAuth keeps our PostgreSQL servers secure.

We won’t focus on complex attack vectors, like the examples in the second blog post in the OIDC series.
Instead of social engineering, we’ll look at practical errors, misconfigurations and honest mistakes – understanding error messages and how to fix them.

Improved test setup

Along with the step by step tutorial, previously we also linked a docker/podman compose configuration.
This is still available, and we even improved it for testing the error scenarios.

If you want to update the configuration manually instead, this is what changed: we duplicated everything!

Instead of a single testuser, we have two: testuser and testuser2, both using the same asdfasdf password
Instead of one client, we have two: pgtest and pgtest2
Instead of one scope, we have two: pgscope, pgscope2
Instead of one realm, we have two – containing exactly the same setup: pgrealm and wrongrealm

A role named pgrole is also defined.
The pgtest2 client and pgscope2 scope both require the pgrole role.
Only testuser2 is assigned this role; testuser does not have it.

The following table summarizes the access matrix:

	pgtest	pgtest2	pgscope	pgscope2
testuser	OK	denied	OK	denied
testuser2	OK	OK	OK	OK

Success despite a FATAL error?

However, before we start using all these additional items, let’s go back to the end of the Keycloak story, where we succeeded in logging in.
Or did we?
While psql logged us in, if we checked the server error log, we could see the following there:

FATAL: OAuth bearer authentication failed for user "testuser"

But if we are observant enough, this message is logged before we even go to the device authentication website of Keycloak, and enter the authentication code.
This isn’t a real error, it’s just a side effect of how OAuth is implemented internally, and will most likely be fixed in PostgreSQL 19, the FATAL message will no longer show up.

As for PostgreSQL 18, unfortunately, we have to live with this.
This also means that we can’t rely on simply looking for OAuth authentication errors in the server log, because all OAuth authentication failures will result in exactly the same message, no matter if they are logged because of this harmless situation or because of a real authentication issue.

A workaround is to rely on the validators instead: since the server is unaware of the exact error situation anyway – it delegates validation to the validator – these plugins will print out much more detailed log messages.
We’ll see some examples later with pg_oidc_validator, as we explore the error scenarios.

However, keep in mind that the sentence above says log messages, and not errors or fatal errors.
Validators are not allowed to print out ERROR and FATAL messages for authentication failures, so users have to look for WARNING or LOG level messages for the details.
In practice, PostgreSQL will still print the same generic FATAL error message about OAuth bearer authentication failing – but it will appear after the validator-specific WARNING or LOG messages that contain the actual diagnostic information.

Why does it happen?

Earlier we already established that PostgreSQL validates that the client and the server use the same issuer, but didn’t go into more detail than this.

Usually when a service validates user input, it does so on the server.
The main reason for this is that developers can trust the backend, controlled by administrators, while they can’t trust the frontend, potentially used by malicious users.

But are we validating user input in this case?
Why does the user even have to specify the issuer, since the server already knows it, it’s in the HBA configuration?

Because this check isn’t the server validating the user, it’s the opposite:
the user validating the server.

The client sends an empty connection request to the server.
The server confirms that we are using OAuth, and sends back its issuer URL.
The client checks whether the issuer it is planning to use – or has already used, if it already has a valid token – matches the one sent by the server.
- If it doesn’t match, it aborts the login attempt and prints an error.
- If it does match, it continues with a real authentication attempt.

sequenceDiagram
participant C as psql (Client)
participant S as PostgreSQL Server
participant K as Keycloak
C->>S: Connection request (no token)
S-->>S: No token, auth fails
Note right of S: FATAL logged here (harmless side effect)
S->>C: OAuth challenge + issuer URL
C-->>C: Compare issuer URL with oauth_issuer
alt Issuer mismatch
C-->>C: Abort with error
else Issuer matches
C->>K: Device authorization flow
K->>C: Access token
C->>S: Connection request (with token)
S->>S: Validator checks token
S->>C: Authentication success
end

The FATAL error in the server log is a side effect of the first empty authentication attempt, that wasn’t fixed in time before the PG18 release.

Why do we need this check?

With the improved configuration, we can test what happens when we specify the wrong issuer:

bin/psql -h 127.0.0.1 'dbname=postgres oauth_issuer=https://keycloak:8443/realms/wrongrealm oauth_client_id=pgtest'
psql: error: connection to server at "127.0.0.1", port 5432 failed: server's discovery document at https://keycloak:8443/realms/pgrealm/.well-known/openid-configuration (issuer "https://keycloak:8443/realms/pgrealm") is incompatible with oauth_issuer (https://keycloak:8443/realms/wrongrealm)

Notice that compared to the correct command, which had the pgrealm, we are using the other Keycloak realm.
And if we check the server log, we can see that there are no additional log messages there – we see a single OAuth FATAL error, which is not a real error, just the side effect we are investigating.
This error is entirely on the client side.

And that brings us back to the question:
why do we need this?

On one hand, it helps us prevent honest mistakes early.
In our example, wrongrealm and pgrealm are exactly the same – they have users with the same name, scopes with the same names, clients with the same names.
If there’s a misconfiguration, and the server and the client use different realms in a similar setup, everything would seem to work – the user trying to log in would be able to log in, get a token, psql would send it to the server…
and then on the server the validator would reject it – assuming that it is a good validator, like pg_oidc_validator.
No harm done – other than disclosing a token to the server that shouldn’t have been sent there –, but figuring out what the problem is could take a while.

On the other hand: what if we aren’t dealing with a malicious user, but a malicious server?

In the previous attack vectors we showcased, the attacker was always a third party:
somebody who wanted to steal access to the database server.

But we don’t necessarily need a different unknown adversary, it could be the server we are using:
do we absolutely know and trust its administrators?
Sometimes yes, sometimes no.

Those administrators might be aware that we are also using OAuth for something else, and might plot to gain access to it.
So instead of sending us the issuer we expect, the server sends us something else – for example a spoofed site, tricking us to complete login into a different service.

Remember the earlier situation where the Fake Photo Gallery Website used Client ID spoofing to gain access to the PostgreSQL Database?
This situation is basically the same – the only difference is that this time PostgreSQL Database is trying to gain access to Photo Gallery.

sequenceDiagram
participant U as User (psql)
participant M as Malicious PostgreSQL Server
participant SSO as Other service
U->>M: Connection request
M->>U: OAuth challenge + spoofed issuer URL (instead of expected issuer)
Note over U: Without issuer check, user proceeds
U->>SSO: Authenticates, thinking it is for PostgreSQL
SSO->>U: Access token (valid for a different service)
U->>M: Sends token to server
Note over M: Malicious server now holds a token valid for the other service

The client-side issuer check prevents this: psql compares the issuer URL from the server against the oauth_issuer it was configured with, and aborts if they don’t match.

While requiring the client to specify the issuer may seem redundant, it’s an important safeguard that prevents tokens from being disclosed to malicious servers.

Can we verify the validator?

If we specify an incorrect issuer, the client rejects it before completing the OAuth flow – that’s great, but this means the validator isn’t part of the picture.
Can we even test that a validator handles this situation correctly, to verify that it properly rejects an attempt with an incorrect issuer?
Security-aware users might want to double check that somebody using a modified psql is also properly rejected.

This is possible: in our next blog post, we’ll see how to implement custom clients outside psql, possibly using other OAuth flows.
In that scenario, we’ll be able to send custom tokens to the server, which has many uses – one of which is internal testing of OAuth validators.

Rest assured, pg_oidc_validator handles this correctly – and we’ll show you how to verify it yourself in the next post.
If you are using a different validator, stay tuned to see how you can verify it!

With pg_oidc_validator, you will see something like this in the server log:

WARNING: OAuth validation failed with exception: claim value does not match expected value
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email",map=kcmap"

Here “claim value does not match expected value” means that a field (claim) in the JWT doesn’t match our expectation.
While this might seem generic, currently pg_oidc_validator only validates exactly one field in this way: the issuer.

On the client side, you can see the following generic error message:

Connection error: connection to server at "127.0.0.1", port 5432 failed: retrying connection with new bearer token
connection to server at "127.0.0.1", port 5432 failed: FATAL: OAuth bearer authentication failed for user "testuser"

Which is generally true for most OAuth errors – validators are expected not to provide detailed information about why they reject a connection back to the client, to limit the information available to potential attackers.

Signature failure

For some it might be surprising that we are getting an error about the issuer, and not about the token.
Why is that?

The reason we use JWTs for access tokens is because they are cryptographically signed tokens.
While they contain the payload in clear text, the token ends with a signature, a proof that it was generated by the issuer we trust.

This means that if the token was generated by a different issuer, it is signed by a different key.

However, the order of operations inside the validator is different:
first we validate the fields in the cleartext data we have strong expectations about – in this case the issuer.
Then, after that’s valid, we also verify that the signature matches the public key of the issuer.

Since in the above situation the issuer is different, we never get to the point of signature validation.

To do that, somebody has to tamper with the token.
For example, an attacker realizes that we require a specific scope, and since JWTs contain everything in clear text, decides to edit the scp claim and insert pgscope into it.
In that situation, the issuer matches, the validator verifies the signature, and we end up with a different error:

WARNING: OAuth validation failed with exception: failed to verify signature: VerifyFinal failed
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email",map=kcmap"

The client side error message didn’t change with this – this is clearly an attack attempt, we do not have to provide nice error messages for malicious users.

What about expired tokens?

Another interesting scenario you might wonder about is token lifetime:
in OAuth, tokens have a limited period in which they are valid.

PostgreSQL currently has no facilities to enforce token lifetime when a connection is active – once somebody is logged in, they stay logged in until they disconnect for some reason –, but validators are expected to validate that tokens are still valid at least during authentication.

Similarly to the previous situation, testing this without a custom client isn’t possible, as psql always asks for a new token during the connection attempt, there is no way to send an earlier token with it.

As in the previous example, this scenario is rejected by pg_oidc_validator, which logs the following message on the server:

WARNING: OAuth validation failed with exception: token expired
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email",map=kcmap"

On the client side, you can only see the same generic error message as before.

While this doesn’t seem too user friendly, keep in mind that both of these errors can only happen with faulty clients.
Clients can, and should verify both the issuer and the expiration time before connecting to the server, and they should be able to provide nice error messages to the users based on that.

Scope mismatch

After the previous two situations, which are untestable with psql, let’s move to the realm of errors which don’t require custom code.

In the first and second blog posts we tried to emphasize how important scopes are in OAuth, how they can help prevent accidents.
Obviously, validators have to make sure that all the scopes the server asked for are present in the received token.
Having more scopes isn’t an issue – sometimes clients use the same token for multiple services –, but missing a required scope should be an error.

To verify what happens in this situation, we can simply modify the pg_hba line to include a scope that doesn’t exist on the server, for example adding fooscope:

host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email fooscope",map=kcmap

And then we can connect with psql as before:

bin/psql -h 127.0.0.1 'dbname=postgres oauth_issuer=https://keycloak:8443/realms/pgrealm oauth_client_id=pgtest'

Which should result in the following detailed error message in the server log:

LOG: Authorization failed because of scope mismatch. Required scopes: email, fooscope, pgscope. Received scopes: email, pgscope, profile
LOG: OAuth bearer authentication failed for user "testuser"
DETAIL: Validator failed to authorize the provided token.
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email fooscope",map=kcmap"

Similarly to the previous scenarios, this is completely validator specific, we can only showcase our validator.

This scenario also depends on the OAuth flow used and the identity provider.
Note: Keycloak, for example, permits unknown scopes for the device flow – it simply ignores them and returns the scopes it can.
However, it doesn’t do that for other flows – the Token Endpoint rejects unknown scopes with an error and doesn’t provide an access token.

On the client side, the error is the same as before – no details about what’s missing.
Which is fine in this situation, as this is clearly a configuration error, something the administrators have to figure out and fix.

Now let’s see the error slightly differently.

The above example worked with the unmodified keycloak setup, described in the previous blog, but we have an improved test setup for this one.
Instead of using a non existent foo scope, let’s change our requirement to pgscope2, which requires pgrole:

host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope2 email",map=kcmap

And similarly add testuser2 to pg_ident, so both can log in:

# MAPNAME SYSTEM-USERNAME DATABASE-USERNAME
kcmap testuser@example.com testuser
kcmap testuser2@example.com testuser2

In this new setup, we transformed the configuration problem into a permission issue where testuser2 can log in and testuser can not.

The error message on the client side is unchanged, it still doesn’t say “permission denied” or “scope mismatch”, or anything like that.
This is debatable, but it is still mainly a task for administrators, and not the user:
somebody will have to investigate the permission setup on keycloak, and fix it, if testuser also needs access to the server.

Unknown user

Another common error source is a problem with the user mapping.
In our example we are using a pg_ident file with an email, but it would be similar with other configurations.

Regardless of the setup, there are many reasons why we can’t properly look up a username:

using an incorrect field for authn_field
missing an entry from pg_ident
having a typo in the name either in pg_ident or in keycloak
and so on

In all situations, the error message for this case won’t be generated in the validator, but in the PostgreSQL user mapping code instead.
For example, if you previously added testuser2 to the ident file, comment it out and try to log in with it again:

LOG: no match in usermap "kcmap" for user "testuser" authenticated as "testuser2@example.com"
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email",map=kcmap"

In an alternative configuration – which is not part of the sample keycloak configuration – it is possible to create a custom claim “postgres_username” on keycloak, and skip the map file completely.
In this situation, a mismatched username would result in a slightly different error message:

LOG: provided user name (testuser) and authenticated user name (testuser2) do not match
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email""

Connection problems

While it usually isn’t a configuration or permission problem, it is possible that we have a network issue:
either a localized routing error, where the client can connect to the identity provider but the server can’t, or a situation where the identity provider / network crashed between obtaining the access token and verifying it on the server.

The client executable has an access token and sends it to the server, which then has to validate it without being able to communicate with the identity provider.
This is another situation which is difficult to validate with psql, but it is relatively easy with a custom client.

Our OIDC validator has to connect to the identity provider for two reasons:

One, to retrieve the discovery document which contains the URL of the JWKS endpoint – which stores the public keys of the issuer
Two, to retrieve the public keys using that JWKS endpoint

The validator also follows HTTP Cache headers:
for example, if the server allows caching the keys for 4 days, the validator only retrieves them for the first attempt, and then keeps using them for that time.
After it passes, it connects to the server one more time, and if it again receives a 4 day window, it will keep using the keys for 4 more days.
This means that with a proper provider setup, the validator might not even notice a short service loss.

Fortunately for our testing, but not so fortunately for production use, keycloak doesn’t support JWKS caching at all.

An inaccessible OIDC server will result in logs similar to:

WARNING: OAuth validation failed with exception: HTTP request failed: Could not connect to server
FATAL: OAuth bearer authentication failed for user "testuser"
DETAIL: Connection matched file "/pg_hba.conf" line 119: "host all all 127.0.0.1/32 oauth issuer=https://keycloak:8443/realms/pgrealm,scope="pgscope email",map=kcmap"

Where the exact error message depends on the situation – a timeout, internal server error, etc, all would result in slightly different error messages, while a timeout would also slow down the response time of the authentication attempt.

Let’s run without errors!

We hope these examples will be useful for everybody. To avoid errors, to diagnose problems, and to simply understand the security model and guarantees given by OAuth and validators.

While this is not an all-inclusive list, as we can’t possibly cover every error scenario in a setup involving several components, it covers the most common scenarios, and should address all possible security problems.

In our next blog post, we’ll focus on a practical, minimal development example:
while currently only the provided command line tools support OAuth, libpq already has the infrastructure in it to implement custom OAuth logic, allowing users to integrate it into their applications – we’ll provide examples how it is doable.

The post OIDC error scenarios appeared first on MariaDB.org.

XtraBackup incremental prepare phase is 2x-3x faster!

Wed, 29 Apr 2026 18:52:44 +0000

TL;DR

Percona XtraBackup is a 100% open-source backup solution for Percona Server for MySQL and MySQL®. It is designed for high-availability environments, performing online, non-blocking, and highly secure backups of transactional systems without interrupting your production traffic.

While full backups work for small databases, large-scale systems rely on incremental backups to save space and time. However, the “prepare” stage, required to make the incremental backups consistent, was slow because XtraBackup processed the .delta files serially. The .delta files are generated per table and store only the modifications since the last backup.

Great news! In XtraBackup versions 8.0.35-33 and 8.4.0-3 and later, we’ve added support for the --parallel option during the prepare stage. This option lets XtraBackup process multiple .delta files simultaneously, significantly reducing the preparation time, especially when you have a large number of IBD files.

Please add --parallel=X, with the number of threads to use, to the xtrabackup --prepare --apply-log-only command to speed up the incremental prepare operation.

The Incremental Backup Workflow

Before we dive into the performance gains, it’s important to understand how Incremental backups work.

1. Creating the Backups

The process starts with a full backup followed by a backup that captures the changes since the last backup. This smaller backup is called an incremental backup. XtraBackup creates .delta files during incremental backups. Let’s review an example.

Take Full Backup: Your starting point is Point A. This backup is an entire copy of your data.
Take Inc1 Backup: XtraBackup identifies the changes between Point A and Point B. It creates a .delta file for every table that has been changed. Delta files contain only the pages that changed between the backups.
Take Inc2 Backup: XtraBackup identifies the changes between Point B and Point C. It creates a new set of .delta files for this specific period.

For more detailed steps/commands, please check the documentation here: https://docs.percona.com/percona-xtrabackup/8.0/create-incremental-backup.html

2. Preparing the Backups

To restore the data to the latest point, you must merge these changes back into the full backup. The “prepare” phase works differently here:

Prepare Inc1: You merge the Inc1 changes into the full backup using the --apply-log-only option. In this step, XtraBackup applies the .delta files and the redo logs, but does not apply the Undo logs
Prepare Inc2: You merge the Inc2 changes into the updated base using the --apply-log-only option. XtraBackup applies the .delta files and the Redo logs but skips the Undo logs.
Final Prepare: After all the incremental backups are merged, you run a final prepare command on the full backup. This final step applies the Undo logs to make the entire dataset consistent. If you apply the Undo logs during the intermediate steps, you cannot merge any further backups.

More detailed steps to prepare an incremental backup are described here: https://docs.percona.com/percona-xtrabackup/8.0/prepare-incremental-backup.html

The Improvement: Parallel Incremental Delta Apply

We have improved the Incremental Delta Apply phase. These are “Prepare inc1” and “Prepare inc2” phases as described above. --parallel option should be used along with the --apply-log-only to apply the .delta files in parallel.

We completed this essential improvement as part of [PXB-3427].

In previous versions, XtraBackup applied the .delta files as soon as a file was discovered in the incremental backup directory. Starting with versions 8.0.35-33 and 8.4.0-3, to apply the .delta files, XtraBackup scans the backup directory and builds a queue of delta files. Multiple threads (defined by --parallel ) consume this queue simultaneously. Each thread reads a .delta file and writes its pages to the corresponding InnoDB Data File (.ibd file).

Benchmarks

This benchmark is created using the scripts, and the instructions are in JIRA: PXB-3427

When your backup contains a large number of small .delta files, increasing the --parallel value can drastically reduce the time taken to prepare the incremental backup by distributing the high per-file overhead across more threads. However, for other categories with fewer or larger files, performance typically plateaus after 16 threads, and pushing higher can even lead to slight regressions due to thread management overhead. While there is no single “golden value” to recommend for every scenario, we recommend starting with a value of 8 to find the optimal balance for your specific environment.

Disk Utilization with XtraBackup prepare using `--parallel=1` vs `--parallel=64`

The PMM graphs below show the Disk IOPs used by the XtraBackup prepare command. The graph is generated when XtraBackup applies the incremental backup to a full backup directory. Incremental backup directory that has 20,608 .delta files, each of which is 2.5 MB.

With `--parallel=1`

With --parallel=1, max Disk IOPs utilized is 18.2 K, and the XtraBackup prepare operation finished in 3.76 minutes.

With `--parallel=64`

With --parallel=64, the max Disk Write IOPs utilized is 85K, and the XtraBackup prepare operation finished in around a minute. XtraBackup utilized 4.67x more disk IOPS and finished 3.49x faster.

Results from the bug reporter

We saw some amazing results shared by the reporter on PXB-3427. The time required for XtraBackup prepare command (--prepare --apply-log-only) to complete, reduced from 237 minutes to just 6 minutes. That’s an incredible 40X speed-up!

Here are the details from their setup:

Full backup: 235,188 *.ibd files
Incremental backup: 236,214 *.ibd.delta files
Average .delta size: 53,041 bytes (~53KB)
Threads used: 48 (–parallel=48)
Disk specs: 25K IOPS performance and an average of 500 to 600 MB/s of throughput

We hear you! This specific feature came to us from a post on the community forum. We reached out, asked them to create a JIRA ticket, and then implemented the improvement. We wanted to share this story as a demonstration of our commitment to listening to and acting on community feedback!

The post XtraBackup incremental prepare phase is 2x-3x faster! appeared first on Percona.

The post XtraBackup incremental prepare phase is 2x-3x faster! appeared first on MariaDB.org.

Orchestrator’s Next Chapter: What It Means for Percona Customers

Wed, 29 Apr 2026 17:55:10 +0000

Last week, ProxySQL announced that they are taking over the maintenance and development of Orchestrator, the MySQL high-availability and topology management tool originally authored by Shlomi Noach. You can read their announcement here: Announcing the future of Orchestrator.

We want to briefly share Percona’s position on the news.

We welcome this

Orchestrator became the de facto standard for MySQL topology management and automated failover, and it has been a foundational tool in the ecosystem for over a decade. When the upstream project was archived, many operators were left running internal forks. A revived project under active development, with a stated roadmap and continued Apache 2.0 licensing, is good news for the MySQL community, and we’re glad to see ProxySQL step up to take it on. Thanks are due to Shlomi Noach for creating Orchestrator in the first place, and to everyone who contributed to it over the years.

A small clarification on Percona’s role

The ProxySQL announcement kindly credited Percona alongside GitHub for “stewardship over the years.” To be accurate: Percona has never been a maintainer of the upstream Orchestrator project. What we have done, and will continue to do, is support our customers who rely on it. That includes operational guidance, troubleshooting, and carrying internal patches where a customer situation requires it. The upstream project itself has always lived with Shlomi and later with the team at GitHub.

Nothing changes for Percona customers

If you are a Percona customer running Orchestrator today, your support experience is unchanged. We will continue helping you operate it in production, diagnose issues, and plan around its role in your high-availability stack. That commitment is steady regardless of where the upstream project lives.

Orchestrator’s maintenance also matters to us beyond support engagements. Percona Operator for MySQL uses Orchestrator to manage asynchronous topologies, so our own product depends on the project staying healthy. That’s part of why we plan to coordinate closely with the ProxySQL team as the next chapter unfolds.

Coordinating with the ProxySQL team

We plan to open coordination conversations with the ProxySQL team to make sure that operators running Orchestrator today, including our customers, have a smooth path as the project evolves. We wish the ProxySQL team well in this next chapter and look forward to supporting the community alongside them.

If you’re a Percona customer, reach out to your account team with any questions about your Orchestrator deployment. If you’re running Orchestrator outside of a Percona engagement and want to talk through support options, get in touch with our MySQL team.

The post Orchestrator’s Next Chapter: What It Means for Percona Customers appeared first on Percona.

The post Orchestrator’s Next Chapter: What It Means for Percona Customers appeared first on MariaDB.org.

Adding a New Data Type to MariaDB with Type_handler – Part 0

Wed, 29 Apr 2026 09:37:01 +0000

Welcome to this new series about extending MariaDB. This series covers the addition of a new data type using the Type_handler.
The goal of the entire series is to create a new plugin data type MONEY to store and display amounts with currency. …

Continue reading “Adding a New Data Type to MariaDB with Type_handler – Part 0”

The post Adding a New Data Type to MariaDB with Type_handler – Part 0 appeared first on MariaDB.org.

pgBackRest is archived, what now?

Tue, 28 Apr 2026 11:00:00 +0000

pgBackRest is an open source backup and restore tool for PostgreSQL. It’s fair to say it’s one of the most popular options, widely used across the PostgreSQL ecosystem.

On 27 April 2026, pgBackRest maintainer David Steele announced on LinkedIn and in the GitHub repository that the project is becoming ~~unmaintained~~ archived, starting with:

TL;DR: pgBackRest is no longer being maintained. If you fork pgBackRest, please select a new name for your project.

If you’re reading this, you’re likely either affected or at least concerned. In this short write up I will do my best to calm your nerves, present short term as well as more long term ideas and options.

Where are we now – the status quo

pgBackRest is a critical part of the PostgreSQL ecosystem, and nobody seriously expects it to simply disappear. What happens next is now up to the community.
One possible outcome is the emergence of multiple forks of pgBackRest. That raises the risk of fragmentation or, put bluntly, ~~Clone~~ Fork Wars.

That said, there has already been a significant amount of discussion across the community, and one thing is clear:

The PostgreSQL community acknowledges the problem and wants change.

The challenge now is twofold:

What can we do immediately to stabilize the situation?
What direction should we take long term, without overcomplicating the short-term response?

What is Percona planning

Percona includes pgBackRest in the Percona Distribution for PostgreSQL as the recommended backup and restore solution. From our perspective, it remains the most mature, enterprise-ready and reliable option available. While alternatives like WAL-G or Barman are well regarded, our recommendation remains unchanged.

To emphasize the message:

the current situation does not impact our recommendation.

Percona will continue supporting pgBackRest. What that support looks like in terms of maintainership and collaboration with other organizations is still being actively discussed and will take time to solidify.

The immediate priority is to avoid fragmentation. We want to ensure we don’t end up with multiple independent forks maintained in isolation.

If you are a Percona customer, you remain fully supported. Please continue reporting issues through standard support channels. For our community users, we encourage you to use the Percona Community Forums, we will do our best to help there.

The power of open source community

In an era where we often hear about companies reducing teams due to AI-driven cost optimization, it’s easy to forget that software is still built and maintained by people. This is especially true in open source.

Two observations are worth calling out:

People need sustainable funding, work cannot be assumed to be purely voluntary.
A healthy open source project should not depend on a single company or individual.

The current situation is, to some extent, a result of the opposite model. pgBackRest development was largely driven by a single company and later single maintainer, David Steele, with sponsorship from Crunchy Data. While others have contributed (e.g.i Stephen Frost and Stefan Fercot – pgstef), and there was a wider team maintaining the project in the past, recently the project effectively relied on one primary maintainer.

I think it’s fair to say we’ve seen a fair share xkcd #2347 posted all over the internet over the course of last 24h. So here’s one more:

To avoid repeating this pattern, we (along with other vendors) are deliberately taking time before jumping into forks or immediate solutions. The goal is to find a sustainable, collaborative model rather than rushing into fragmentation.

For comparison, it took the Linux Foundation 6 days to respond to the Redis license change by launching Valkey. While this situation is different as there’s no license change in pgBackRest, it illustrates that meaningful coordination takes time.

This is exactly where the open source community can demonstrate its strength.

What are the long term options?

This situation is particularly surprising to me personally, as I recently referenced David’s proposed transparent funding model in my talk at PGConf.DE just last week.

The idea, distributing funding across organizations that rely on the project, seemed like a promising path toward a more sustainable ecosystem. In hindsight, it appears that adoption of this model was either too slow or insufficient to support ongoing maintenance.

Looking ahead, several long-term options are being discussed within the community:

Establishing a foundation-backed project (similar to models used by Codeberg or the Linux Foundation)
Creating a coordinated, multi-vendor stewardship model
In more extreme scenarios, moving critical tooling closer to the PostgreSQL core ecosystem

These discussions are ongoing. If you’re attending PGConf.Dev, this will almost certainly be a major topic, especially in the extensions ecosystem track of community sessions in the Canfor room on Tuesday.

So what should I do now?

In short, nothing but wait. Yes, this means:

Keep on using pgBackRest as you did!

If your company is relying on pgBackRest, now is the time to engage. If you have capacity for this, please join the discussion (we’ve kicked off a thread on Percona Community Forums if you are looking for a place to join this topic)

Rest assured that you can follow the updates from us, we will be messaging about the progress made in regards to establishing the future for pgBackRest.

One thing to clear is: are there any immediate risks?

Not new ones. There is the uncertainty that this is not a comfortable feeling. Rest assured that the longevity of the solution is not in jeopardy as we do have an obligation to our customer and user base to make sure the project is continued.

The post pgBackRest is archived, what now? appeared first on MariaDB.org.

Database Trends: What is changing in the database world (besides AI)

Mon, 27 Apr 2026 13:48:45 +0000

Earlier this month, I had a half-hour chat with Kellyn Gorman, a Database and AI Advocate and Engineer at Redgate. …

Continue reading “Database Trends: What is changing in the database world (besides AI)”

The post Database Trends: What is changing in the database world (besides AI) appeared first on MariaDB.org.

Achieving High Availability with Valkey Sentinel

Fri, 24 Apr 2026 04:03:16 +0000

In the previous guide, a robust Primary-Replica topology for Valkey was established. Read scaling is now active, and a hot copy of the data is securely stored on a second node.

But there is a catch. If a primary node crashes, the replica will remain faithful and wait for instructions. It will not automatically take over the responsibilities of the primary. Applications will start throwing write errors until an administrator manually logs in and reconfigures the replica to become the new primary.

To achieve true High Availability (HA) and ensure continuous uptime without manual intervention, Valkey Sentinel is required.

What is Valkey Sentinel?

Valkey Sentinel is a distributed system designed to monitor Valkey instances, detect failures, and automatically handle failover.

When Sentinel detects that a primary node is unresponsive, it performs the following tasks:

Monitoring: It continuously checks whether primary and replica nodes are functioning as expected.
Notification: It can notify system administrators or another computer program via an API that something is wrong.
Automatic Failover: It promotes a healthy replica to the new primary and reconfigures the other replicas to sync with it.
Configuration Provider: It acts as a source of truth for clients. Applications can connect to Sentinel to ask for the current primary’s address. If a failover occurs, Sentinel reports the new address.

The Rule of Three (Quorum)

Sentinel is a distributed system, meaning multiple Sentinel processes must run and agree on a node’s failure before taking action. This agreement is called a quorum.

To prevent a “split-brain” scenario (where a network partition causes two nodes to both assume they are the primary), at least three Sentinel instances must be deployed.

For this guide, the environment consists of three dedicated database nodes. Each node will run both the Valkey database service and the Valkey Sentinel service:

ArunValkeyPrimary (Primary + Sentinel): 172.31.32.27
ArunValkeyReplica (Replica 1 + Sentinel): 172.31.37.55
ArunValkeyReplica2 (Replica 2 + Sentinel): 172.31.39.58

The primary node is healthy and running as the master, with two replicas connected and actively syncing.

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -a amma@123

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.

127.0.0.1:6379> INFO replication

# Replication

role:master

connected_slaves:2

slave0:ip=172.31.37.55,port=6379,state=online,offset=98214,lag=1

slave1:ip=172.31.39.58,port=6379,state=online,offset=98214,lag=1

master_failover_state:no-failover

master_replid:629656a198b7290bf6492e470b449ad1ced509e0

master_replid2:30977276632877f46ad12fcc2bbc2c5191c67c0c

master_repl_offset:98214

second_repl_offset:1643

repl_backlog_active:1

repl_backlog_size:1048576

repl_backlog_first_byte_offset:1643

repl_backlog_histlen:96572

127.0.0.1:6379>

Step 1: Create the Sentinel Configuration File

Sentinel runs as a separate process from the main Valkey database, using its own configuration file and listening on port 26379 by default.

The Sentinel configuration file (typically /etc/valkey/sentinel.conf) must be created or edited on all three nodes(ArunValkeyPrimary, ArunValkeyReplica, and ArunValkeyReplica2).

Open the file and add the following core directives:

port 26379
# Format: sentinel monitor    
sentinel monitor mymaster 172.31.32.27 6379 2

# The primary password set in the previous setup
sentinel auth-user mymaster default

sentinel auth-pass mymaster amma@123

# How many milliseconds the primary must be unreachable before Sentinel considers it down
sentinel down-after-milliseconds mymaster 5000

# How long to wait before trying another failover if the first one fails
sentinel failover-timeout mymaster 10000

Understanding the monitor line:

mymaster is the arbitrary name given to this cluster.
172.31.32.27 6379 points to the current primary node (ArunValkeyPrimary). (Sentinels will automatically discover both replicas by querying the primary, so the replica IPs do not need to be listed).
2 is the quorum. This means at least 2 out of the 3 Sentinels must agree the primary is down to initiate a failover.

Step 2: Ensure Proper Permissions

Sentinel needs the ability to rewrite its own configuration file. When a failover happens, Sentinel updates sentinel.conf with the new primary’s IP address and the current state of the cluster.

Ensure the valkey user has write permissions to the file on all three nodes:

sudo chown valkey:valkey /etc/valkey/sentinel.conf

Step 3: Start the Sentinel Services

Start the Sentinel service on all three nodes. Depending on the Linux distribution and the Valkey installation method, this is usually done via systemctl:

root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl enable valkey-sentinel
Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl start valkey-sentinel
root@ArunValkeyPrimary:/home/ubuntu#


root@ArunValkeyReplica:/home/ubuntu# sudo systemctl enable valkey-sentinel
Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel
root@ArunValkeyReplica:/home/ubuntu# sudo systemctl start valkey-sentinel
root@ArunValkeyReplica:/home/ubuntu#

root@ArunValkeyReplica2:/home/ubuntu# sudo systemctl enable valkey-sentinel
Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel
root@ArunValkeyReplica2:/home/ubuntu# sudo systemctl start valkey-sentinel
root@ArunValkeyReplica2:/home/ubuntu#

Step 4: Verify the Sentinel Cluster

Check if the Sentinels are successfully communicating with each other and monitoring the database. Log into any node and use the Valkey CLI to connect to the Sentinel port (26379):

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379
AUTH failed: ERR AUTH  called without any password configured for the default user. Are you sure your configuration is correct?
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.32.27:6379,slaves=2,sentinels=3
127.0.0.1:26379>

Look closely at the master0 line at the bottom. This confirms everything is functioning correctly:

status=ok: The primary (ArunValkeyPrimary) is healthy.
slaves=2: Sentinel found both ArunValkeyReplica and ArunValkeyReplica2.
sentinels=3: All three Sentinel instances have discovered each other and formed a quorum.

Additional Verification: Sentinel Peer Health

To further validate that all Sentinel nodes are actively communicating and healthy, we can query the list of Sentinel peers and inspect their status:

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 SENTINEL SENTINELS mymaster | grep -E -A 1 '^ip$|^flags$|^last-ok-ping-reply$|^down-after-milliseconds$'
ip
172.31.37.55
--
flags
sentinel
--
last-ok-ping-reply
65
--
down-after-milliseconds
5000
--
ip
172.31.39.58
--
flags
sentinel
--
last-ok-ping-reply
65
--
down-after-milliseconds
5000
root@ArunValkeyPrimary:/home/ubuntu#

What this means:

ip → Lists the other Sentinel nodes in the cluster
flags=sentinel → Confirms these are active Sentinel peers
last-ok-ping-reply → Indicates the last successful heartbeat response (in milliseconds)
down-after-milliseconds: 5000 ms → failure threshold

Lower values here indicate healthy and responsive communication between Sentinel nodes.

Step 5: The Chaos Test (Triggering a Failover)

The best way to trust an HA setup is to break it intentionally. We will simulate a crash by killing the primary node, verifying the failover, and then manually failing back to our original primary.

1. Kill the Primary

On ArunValkeyPrimary (172.31.32.27), stop the Valkey database service (do not stop Sentinel, just the database):

root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl stop valkey
root@ArunValkeyPrimary:/home/ubuntu#

2. Verify the Failover via Sentinel

Wait for about 5 to 10 seconds to allow the down-after-milliseconds threshold to pass and the Sentinels to complete the election process. Instead of checking the logs, you can query the Sentinel information directly to confirm the failover has occurred and find out which node was promoted.

On ArunValkeyReplica, connect to the Sentinel port (26379) and run the INFO sentinel command:

root@ArunValkeyReplica:/home/ubuntu# valkey-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.37.55:6379,slaves=2,sentinels=3
127.0.0.1:26379>

Look at the master0 line at the bottom. It shows that the status is ok and the primary address is now 172.31.37.55:6379.

3. Verify the Failover via the Database

Now, connect to that newly promoted node (172.31.37.55) on the standard database port to verify the promotion from the database’s perspective:

root@ArunValkeyReplica:/home/ubuntu# valkey-cli -a amma@123
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.31.39.58,port=6379,state=online,offset=574633,lag=0
master_failover_state:no-failover
master_replid:b93b82982616a59a2304a799e548d7398ee15732
master_replid2:43ea3aeca4846f06c3c6dd11174e9bfd7ac7fabf
master_repl_offset:574633
second_repl_offset:475110
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:450256
repl_backlog_histlen:124378
127.0.0.1:6379>

Notice that the role has changed from slave to master, and it now shows 1 connected slave (the other surviving replica, 172.31.39.58).

4. Restarting the Old Primary

When the Valkey service on ArunValkeyPrimary is eventually restarted, Sentinel will automatically detect it, reconfigure it as a read-only replica, and point it to the newly promoted primary to catch up on missed data.

root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl start valkey
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 INFO sentinel
AUTH failed: ERR AUTH  called without any password configured for the default user. Are you sure your configuration is correct?
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.37.55:6379,slaves=2,sentinels=3

Check the database replication status on the old primary to see it is now acting as a replica:

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli INFO replication
# Replication
role:slave
master_host:172.31.37.55
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:614120
slave_repl_offset:614120
slave_priority:1
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:b93b82982616a59a2304a799e548d7398ee15732
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:614120
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:607384
repl_backlog_histlen:6737
root@ArunValkeyPrimary:/home/ubuntu#

5. Executing a Manual Failback

If you want ArunValkeyPrimary to reclaim its throne as the primary node, you can trigger a manual failover. First, configure it to have a high priority for elections, then issue the failover command to Sentinel:

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli CONFIG SET replica-priority 1
OK
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli CONFIG REWRITE
OK
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 SENTINEL FAILOVER mymaster
AUTH failed: ERR AUTH  called without any password configured for the default user. Are you sure your configuration is correct?
OK

(Note: The AUTH failed warnings simply indicate the CLI attempted to pass a default auth to a Sentinel instance that might not require it or is configured differently, but the OK confirms the command successfully executed.)

Check Sentinel one last time to confirm ArunValkeyPrimary (172.31.32.27) is back in charge:

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 INFO sentinel

AUTH failed: ERR AUTH  called without any password configured for the default user. Are you sure your configuration is correct?
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.31.32.27:6379,slaves=2,sentinels=3
root@ArunValkeyPrimary:/home/ubuntu#

Wrapping Up

By combining replication with Sentinel, a single cache becomes a highly available, self-healing data cluster. If hardware fails or network hiccups occur, Sentinel automatically handles the reshuffling. Furthermore, as demonstrated, system administrators still retain full control to manually shuffle roles during planned maintenance or load balancing.

The post Achieving High Availability with Valkey Sentinel appeared first on Percona.

The post Achieving High Availability with Valkey Sentinel appeared first on MariaDB.org.

Innovation From Every Corner: Inside Percona’s Build with AI Competition

Thu, 23 Apr 2026 18:53:00 +0000

At Percona, we’re passionate about open source database software, helping organizations of all sizes run, manage, and optimize their databases with the freedom and transparency that open source provides. That spirit of openness doesn’t stop at our products, it runs through everything we do, including how we encourage our own people to innovate.

We recently ran a 6-week “Build with AI” competition here at Percona, where we invited all Perconians to use their AI technology/tool of choice to solve a problem, create something new, or improve a product or internal process that they felt was worth improving.

In the spirit of our belief that The Way Is Open, that open source should mean real freedom, not lock-in, inflated costs, or hollow promises, we encouraged as much transparency as possible sharing ideas with colleagues, and where appropriate with the community as well. “Default to building in open” was the guideline.

The Rules

The rules were simple and minimal.

Do whatever you like. Whether that’s building new products or features that our customers and community will value, creating internal tools that help us work smarter and faster, improving our customer experience, or even tearing down processes and workflows that have outlived their usefulness.
Use whichever tool you want – our Percona-approved AI toolset, or other tools we haven’t bought en masse (yet). Expense your token spending if needed – we’ll cover the cost.
Submit as many times as you like – multiple projects are welcome
Keep our compliance & governance policies in mind around customer/confidential data, we want to build responsibly.

The Top 3 (as judged by our competition judges, Peter Farkas (CEO), Peter Zaitsev (Founder), Vadim Tkachenko (Co-Founder), and me (COO) would win prizes.

It’s easy to forget the breadth of talent and innovative ideas we have at Percona. I am fiercely proud to say that we’re surrounded by the best of the best every day, and I say it often to customers, partners, and colleagues. This competition was a great reminder! We had over 40 submissions from across the company, and from Perconians in 10+ teams. We saw representation from our Customer Success, Support, Engineering, Marketing, Product Management, Community, Legal and Contracts team, Professional and Managed Services teams, and more.

We ran into two stereotypical “great problems to have”:

First, we scheduled an entire day for the demos and presentations, and it wasn’t enough time. Fortunately, we found extra time pretty quickly.

Second – it was simply impossible to pick only 3 winners! There were just too many great ideas, so we had to get creative…

The Winners

After a tough deliberation, here’s how our judges landed:

With no further ado, the first-place winner was Tibi Korocz (Observability Tech Lead). Tibi extended PMM to become an AI-assisted incident workspace, helping users not only understand their database environment better but also receive intelligent insights and real recommendations to improve, optimize, and fix issues. It’s also integrated with Percona Services, improving customer experience through integration with our ServiceNow platform. It’s a deserving winner and is worthy of its own blog post, which Tibi will publish (this article will be updated with the link)

Second was Dennis Kittrell (head of MySQL product & engineering). Dennis built a suite of open-source tools — including connectors for an internally hosted AI assistant that integrate Slack, Notion, Jira, and ServiceNow, and a local semantic search engine for Percona documentation — that together give Perconians smarter, faster access to the knowledge they need every day.

Third place was Agustin Gallego (Lead Database Performance Engineer), who built a Postgres extension that produces pt-query-digest-compatible slow query logs with extended PostgreSQL-specific metrics, inspired by Agustin’s vast MySQL experience and his belief that “thinking about Percona Server’s extended slow logging capabilities and pt-query-digest, I’ve always felt we could do better in Postgres.”

Special Recognition Awards

We also recognized the following:

Community Impact Award to Zsolt Parragi and Kai Wagner (Postgres Engineering team), for their hackorum.dev site. A new frontend for the pgsql hackers mailing list, something that has been missing in the community for a long time, and is already live. Check it out!

AI Transformation Award to Scott LaFortune and Kim Reddy (Marketing team), who built an internal platform for managing and creating materials (content, copy, campaigns, etc) that specifically encodes Percona’s brand voice and compliance rules so any Percona marketer gets on-brand output without needing to know how to prompt Claude, and kick-starts the AI transformation of our marketing team.

The Dare to Try award went to Molly Fulton in our contracts and legal team, who wanted to “pressure test the idea that “anyone can enter” and represent non-technical Perconians in this challenge”, and she did it by building a tool to help Perconians learn more about how our contracts team identifies and mitigates risk for our customers!

Congratulations to all of our winners! And also to all of our submissions – it was difficult to pick the winners, and we have decided that we are going to run this competition again later in the year. It was simply too good to be a one-time thing. A lot of the projects are going to continue and likely turn into full initiatives – I expect we will have more blog posts about them in the coming weeks and months.

Of course, I couldn’t finish this post without making a small request. If this is the sort of thing that excites you and you want to be at a company that encourages, embraces, and rewards this sort of innovation and experimentation, check out our careers page and join us!

The post Innovation From Every Corner: Inside Percona’s Build with AI Competition appeared first on Percona.

The post Innovation From Every Corner: Inside Percona’s Build with AI Competition appeared first on MariaDB.org.

A Clearer Path Forward for GridGain Customers

Thu, 23 Apr 2026 17:12:13 +0000

This blog was originally published on GridGain’s website. When MariaDB evaluated and acquired GridGain, we noticed something important: GridGain is not a platform used on the margins of an enterprise, instead it sits at the heart of systems that matter deeply to the business. Customers rely on it for payments, fraud prevention, risk management, customer 360, and other applications where ultra…

Source

The post A Clearer Path Forward for GridGain Customers appeared first on MariaDB.org.

Scaling Your Cache: A Step-by-Step Guide to Setting Up Valkey Replication

Thu, 23 Apr 2026 16:46:27 +0000

In the recent open-source data landscape, Valkey has emerged as a prominent player. Born as a Linux Foundation-backed, fully open-source fork of Redis (following Redis’s recent licensing changes), Valkey serves as a high-performance, in-memory key-value data store.

Whether Valkey is deployed as a primary database, an ephemeral cache, or a rapid message broker, a single node is rarely sufficient for production workloads, as it creates a single point of failure. Ensuring high availability and scaling out read operations requires replication.

This comprehensive guide explores how to configure a Primary-Replica (Master-Slave) replication topology in Valkey, detailing its underlying mechanics and the verification process.

How Valkey Replication Works

Valkey’s replication is asynchronous and non-blocking. When a replica connects to a primary node, it initiates a synchronisation process.

Initially, the primary creates a snapshot of its entire dataset in memory (an RDB file) and sends it to the replica. Once this initial full sync is complete, the primary continuously streams a log of all new write operations to the replica. Because this happens asynchronously, the primary node does not wait for the replica to acknowledge writes, meaning applications experience zero latency penalty from the replication process.

Why Use Replication?

Before diving into the configuration commands, it is important to understand the concrete benefits of this architecture:

Data Redundancy: Replication maintains a near real-time copy of the data on replicas. However, because replication is asynchronous, there may be a small delay, and recent writes might not be fully replicated at the moment of a primary failure. For applications requiring stronger durability guarantees, the WAIT command can be used to ensure that writes are acknowledged by one or more replicas.
Read Scaling: Heavy read operations (like GET or LRANGE commands) can be offloaded to replicas. This frees up the primary node to dedicate its CPU and network bandwidth to handling write operations efficiently.
High Availability: When paired with Valkey Sentinel or a cluster manager, replication forms the foundational layer for automatic failover.

Prerequisites

The following components are required:

Two servers, virtual machines, or containers with Valkey installed.
Network connectivity: The replica must be able to reach the primary on its Valkey port. Ensure firewalls (UFW, iptables) or cloud security groups (AWS, GCP) allow TCP traffic on port 6379 between the specified IPs.

This tutorial uses the following hypothetical IP addresses:

Primary Node: 172.31.32.27
Replica Node: 172.31.37.55
Valkey Port: 6379 (the default)

Step 1: Configure the Primary Node

By default, Valkey binds only to localhost (127.0.0.1), meaning it rejects connections from outside servers. This must be adjusted to allow replica connections.

Open the Valkey configuration file on the Primary Node (typically located at /etc/valkey/valkey.conf, depending on the installation method).

2. Find the bind directive. Update it to listen on both localhost and the internal network IP. Note: While * can be used to listen on all interfaces, specifying the exact IP provides better security.

bind 127.0.0.1 172.31.32.27

Configure ACLs (Access Control Lists) for Security and Replication:

While older versions of Redis used the requirepass directive, Valkey utilizes modern ACLs. Securing the default user and creating a dedicated, restricted user specifically for replication is highly recommended. This ensures the replica only has the permissions necessary to sync data. Add the following ACL rules:

# Secure the default user so unauthorized users cannot access the database
user default on >amma@123 ~* &* +@all

# Create a dedicated replication user with ONLY the permissions needed to sync
user repl_user on >ReplPass@123 -@all +psync +replconf +ping +info

Restart the Valkey service to apply the changes:

sudo systemctl restart valkey

Step 2: Configure the Replica Node

Now, let’s move over to the Replica Node (172.31.37.55). There are two ways to configure a replica: dynamically via the CLI using the REPLICAOF command (which resets upon reboot) or permanently via the configuration file. For production, we will use the permanent method.

Open the Valkey configuration file on the Replica.
Locate the replicaof directive (in older versions or Redis compat mode, this might be slaveof). Uncomment it and add the Primary’s IP and port:

replicaof 172.31.32.27 6379

Authenticate using the Replication ACL: Because a dedicated repl_user was created on the primary node, the replica must be configured to use those specific credentials. Find the masteruser and masterauth directives and update them:

masteruser repl_user
masterauth ReplPass@123

(Optional but recommended) Ensure the replica is strictly read-only by checking this directive:

replica-read-only yes

Restart the Valkey service on the replica:

sudo systemctl restart valkey

Step 3: Verify the Replication

The replica should now be connecting to the primary and synchronizing the dataset. Let’s verify the connection and test the data flow.

Checking Replication Status

Pro tip: The CLI generates a warning when passing a password with -a, as it appears in bash history. For production systems, consider utilizing the VALKEYCLI_AUTH environment variable instead. Note for Valkey 7.2.x users: As the first release following the Redis fork, the system still uses the legacy REDISCLI_AUTH environment variable name.

valkey-cli -a YourSuperSecurePassword

Once inside the prompt, type the INFO replication command:

root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -a  amma@123

127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:1
slave0:ip=172.31.37.55,port=6379,state=online,offset=126,lag=1
master_failover_state:no-failover
master_replid:7ef2ac2cc03b211c9571da0c0a53899327177349
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:126
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:126
127.0.0.1:6379>

Warning: Using a password with ‘-a’ or ‘-u’ option on the command line interface may not be safe.

Key Metrics to Watch:

role: Confirms this node is the master/primary.

state=online: The replica is fully synced and streaming.

lag=0: The replica is up to date. If this number climbs, the replica is struggling to keep up with the primary’s write volume.

offset: Matches the master_repl_offset, confirming data parity

The Write Test

To double-check the data flow, set a key on the primary:

# On Primary (172.31.32.27)
127.0.0.1:6379> SET blog_status "Replication works!"
OK
127.0.0.1:6379>

Then, hop over to the Replica’s CLI and read the key:

# On Replica (172.31.37.55)
127.0.0.1:6379> GET blog_status
"Replication works!"
127.0.0.1:6379>

Wrapping Up

With just a few configuration changes, you’ve transformed a single-node Valkey setup into a scalable, production-ready replication topology. You now have data redundancy, improved read performance, and a solid foundation for growth.

But there’s one missing piece—automatic failover. If the primary node goes down, the replicas won’t take over automatically. That’s where the next evolution comes in.

In the upcoming guide, (Achieving High Availability with Valkey Sentinel) I will dive into Valkey Sentinel and show how to turn this replicated setup into a fully self-healing, highly available system.

The post Scaling Your Cache: A Step-by-Step Guide to Setting Up Valkey Replication appeared first on Percona.

The post Scaling Your Cache: A Step-by-Step Guide to Setting Up Valkey Replication appeared first on MariaDB.org.

Percona Live 2026 is Back in the Bay Area — Here’s Why You Don’t Want to Miss It

Wed, 22 Apr 2026 11:42:06 +0000

We’re thrilled to welcome the open source database community back in person for Percona Live 2026, taking place May 27–29 in the Bay Area. After the energy of past events, there’s nothing like being together again — swapping war stories over coffee, sketching architectures on napkins, and learning from the people building and running databases at serious scale. The full schedule is live now, and it’s shaping up to be our most hands-on, community-driven event yet.

Save your seat → Register for Percona Live 2026

Keynotes You Won’t Want to Miss

This year’s keynote lineup reflects where the open source database world is heading — AI in the workflow, community ownership of the MySQL ecosystem, and extending the databases we already rely on. A few sessions we’re especially excited about:

Andy Pavlo, Carnegie Mellon University — “Using LLMs to Develop and Optimize Database Systems” (Wednesday, opening keynote). Andy always brings sharp, evidence-based thinking, and this year he’s digging into how large language models are genuinely changing how we build and tune database systems — separating real wins from the hype.

Dominic Preuss, VillageSQL — “Driving Innovation in MySQL with Extensions” (Thursday keynote). Dominic will share VillageSQL’s vision for a more extensible MySQL, followed later in the day by his deep-dive breakout “Design Considerations for a Simple Extension Framework for MySQL” — a great pairing if you care about the future of MySQL.

Heather VanCura, Oracle — “The Path to MySQL Open Innovation” (Thursday keynote). Heather brings years of community-building experience to the stage, with a look at how open governance and community input are shaping MySQL’s next chapter.

Vadim Tkachenko, Percona — “OurSQL Foundation — The Future of the MySQL Ecosystem” (Wednesday keynote). Vadim will introduce the newly-formed OurSQL Foundation and what a truly community-led MySQL future can look like. If you care about where MySQL is going next, this one’s essential.

And that’s just a taste — we’ve also got talks from engineers at Google, Meta, Pinterest, PayPal, Apple, Amazon, Plaid, and more across MySQL, PostgreSQL, MongoDB, Valkey, ClickHouse, and Kubernetes-native databases. The full lineup is on the agenda page.

Friday is Hands-On: Roll Up Your Sleeves

New this year: we’re making Friday, May 29 an all-day hands-on workshop day. These aren’t lectures — they’re real, practical deep-dives where you’ll leave with working skills (and working configs). Bring your laptop.

Beyond apt install: Building Safe MySQL Test Environments with Sveta Smirnova (Percona)
Build Real-Time Discovery and Recommendations with Valkey Search with Karthik Subbarao and Allen Samuels (Valkey)
Building Highly Scalable PostgreSQL Platforms: Internals, Tuning, HA Automation with Avinash Vallarapu (HexaCluster)
Hands-On PMM 3: Building a Professional Database Observability Stack with Michael Coburn and Tibor Korocz (Percona)
MongoDB on Kubernetes 101: A Hands-On Guide with Michal Nosek (Percona)
Migrating to MySQL High Availability with PXC with Matthew Boehm (Percona)
Postgres DBA Accelerator with Alastair Turner (Percona) and Elizabeth Christensen (Snowflake)
Cache Me If You Can, Valkey Edition with Roberto and Adrian Luna Rojas (Valkey)

Why Be There in Person

The hallway conversations at Percona Live are honestly half the value. You’ll meet the people behind the tools you run every day, compare notes with DBAs and engineers solving the same problems you are, and probably walk away with a few ideas you’ll put into production next week. We’re genuinely excited to be back together — and we’d love to see you there.

Ready to join us? Register for Percona Live 2026

The post Percona Live 2026 is Back in the Bay Area — Here’s Why You Don’t Want to Miss It appeared first on Percona.

The post Percona Live 2026 is Back in the Bay Area — Here’s Why You Don’t Want to Miss It appeared first on MariaDB.org.

Impacts of updates in open-source databases

Tue, 21 Apr 2026 17:50:49 +0000

We recently looked at how various open-source database engines maintain their secondary indexes (in a previous analysis) and found significant differences. The maintenance of indexes is not the only aspect where storage engines differ, another significant difference is how they handle simple row updates. These updates highlight how these open-source databases organize data and manage the versions of records while processing transactions. The management of versions is called MVCC, which stands for Multi-version Concurrency Control. In this post, we’ll examine one aspect of open-source database engines: how IO-efficient they are for simple updates.

While performing updates, storage engines require access to storage for multiple reasons. Obviously, in order to update a record, it must be read from storage and, eventually, it will need to be written back. The storage engines also have to manage the record versions which, depending on the MVCC implementation, may require additional IOPs. All storage engines also use operational journal to sequentially log their operation for recovery purpose. These journals are called redo or WAL and are normally only written to. Finally, there must be crash protections against partial writes which could cause data corruption. Such protections also normally only consist of writes.

For this post, we’ll conduct an experiment using the dataset created during the previous post and update an unindexed column from 100k randomly chosen rows. In order to see the impacts of data reorganization, we’ll run updates to the same set of rows a second time. In both cases, we’ll examine the total number of required IOPs, limited to a size of 16KB.

Generalities about MVCC

Since the MVCC implementations are less known, let’s first examine what exactly their responsibilities are. MVCC is required only when there is significant concurrency for access and manipulation of data. At low concurrency, locking is simpler and often preferred. A good example of this is MySQL’s MyISAM engine, where any attempt to write to a table results in a full table lock. At higher concurrency, many sessions have queries running simultaneously. MVCC manages the record versions and determines which version a given transaction can see and update.

IO-wise, the aspects of MVCC that concern us relate to the handling and storage of the record versions. These versions are organized into lists and sorted by age. From a base record, there are two possible approaches for these lists: from oldest to newest (O2N) or from newest to oldest (N2O). Most storage engines use the N2O approach, with PostgreSQL being a notable exception, using mostly an O2N approach. Let’s discuss in more detail examples of these two implementations.

PostgreSQL MVCC

As mentioned above, PostgreSQL uses mostly an oldest to newest (O2N) approach. When a row is updated, the whole row is copied to a new position, and only the version pointer (CTID) is updated in the original row. The old version will eventually be removed by the vacuum process. In a sense, an update in PostgreSQL is essentially an INSERT followed by an eventual DELETE. When a row needs to be accessed without an index, the oldest version is accessed and then the versions are iterated until the version compatible with the current transaction number is found. This means heavily updated rows could have a long list of versions and be slower to access. At some point, though, the vacuum process will kick in and shrink the list, removing the irrelevant versions.

PostgreSQL uses physical positions of rows (CTID) as pointers for indexes. After an update to a non-indexed column, the original index record points to the oldest version. An additional index entry is added if the updated row have been written a new page. The PostgreSQL behavior with row versions is quite complex, it is likely there are aspects I don’t fully grasp. If someone wants to experiment and dig further on this topic, I recommend the pageinspect extension, it is really awesome. Eventually, IOPs will be needed during vacuum to remove the old versions.

For more information on this topic, the following link is a good starting point on PostgreSQL MVCC implementation. We must also keep on our radar a project developing a new MVCC implementation called Orioledb but so far, it has a low adoption rate.

InnoDB MVCC

InnoDB is a fairly typical implementation of the newest-to-oldest (N2O) approach. For simplicity, we’ll restrict ourselves to updates. When a transaction updates a row, the row diff (or delta) is copied to the undo space. Then, the row is edited, and as part of the process, the field DB_ROLL_PTR is set to point to the undo position of the row diff. Finally, the field DB_TRX_ID is set to the ID of the transaction modifying the row. When another transaction attempts to read the same row, if its transaction ID is smaller than the recorded DB_TRX_ID, the DB_ROLL_PTR points to the previous version of the row.

When no running transaction needs an undo entry, the purge process removes it. That means for short transactions, the undo space lives in memory and is only persisted by the redo log. This removes a significant amount of IOPs. Also, the secondary indexes are not impacted because they use logical pointers, the primary key. For more information on InnoDB MVCC, see this excellent article.

Results

The starting point of this experiment is the dataset at the end of our previous experiment, centered on the maintenance of secondary indexes. These datasets have 10 million rows in a table with 7 indexes, one primary on an integer and 6 on large varchar UUID values. From that dataset, 100k rows were randomly selected for an update on the unindexed column, status. The updates were executed twice to illustrate the reorganization of data caused by the MVCC implementations. The best way to illustrate the impacts is to report the total number of IOPs (read and writes) to/from storage. For consistency, IOP sizes are limited to 16KB. The results are shown below:

Total number of IOPs required to perform 100k updates

It is important to note the logarithmic scale used for Total IOPs. The large variation in IOPs imposed that choice of scale for readability.

PostgreSQL

The “Run 1” of PostgreSQL illustrates the issues with its MVCC implementation. It is actually worse than I expected. This first set of updates required, on average, nearly 22 IOPs per update. A significant amount of these IOPs are caused by Full Page Writes (FPW). FPW occurs the first time a page is modified after a checkpoint. It is there to protect against torn pages, serving the same purpose as the InnoDB doublewrite buffer with MySQL.

Since PostgreSQL MVCC copies updated rows to new pages (which are append-only), this has the benefit of grouping the active rows. In our experiment, the updated rows, about 1% of all the rows, end up grouped together in a limited number of database pages. Because of this, the “Run 2” required less than 1/10th of the IOPs of “Run 1”. It is important to highlight the importance of this behavior as nearly all database workloads present a small subset of very active rows. While PostgreSQL old MVCC implementation bears a very high cost for the initial run, it kind of self-tunes for a much better “Run 2”.

InnoDB

InnoDB results are better in terms of IOPs and feature-less, both runs are within a few hundred IOPs of each other. This is a testimony to its younger MVCC design. The actual number of IOPs required is just a few percent above the second run of PostgreSQL. This means, while its behavior is excellent, performance will not improve over time with a small set of active rows.

MongoDB

MongoDB WiredTiger engine required about 33% more IOPs than InnoDB in our experiment. Because of the way MongoDB evaluates updates, the actual value of the status had to be modified between the update runs, otherwise the second update would have been a no-op. The second run required about 8% less IOPs than the first one.

MyRocks

RocksDB is optimized for writes, and it shows. MyRocks demonstrates a significant advantage over other engines. It required about 1/3rd of the IOPs of the second contender (InnoDB) for “Run 1” and, amazingly, only 1/30th of PostgreSQL (2nd best) for “Run 2”. Clearly, if your workload is dominated by writes, and you can cope with slightly slower reads, you should take a look at MyRocks.

Conclusion

This database experiment sheds some light on the various Multi-version concurrency control implementations of popular database engines. We have observed large variations in the number of required IOPs between engines for similar workloads.

Here are some key points to remember:

PostgreSQL MVCC implementation is the most inefficient in terms of IOPs
PostgreSQL efficiency improves considerably for “Run 2”.
PostgreSQL groups active rows together, improving efficiency.
InnoDB is a bit more efficient than MongoDB, but both are stable between runs
Given this is a write-only benchmark, MyRocks efficiency is in a different league as it plays to its strength
MyRocks is the clear winner, especially for the 2nd run, at more than an order of magnitude better than any other engine
MyRocks also groups active rows together, improving efficiency

The post Impacts of updates in open-source databases appeared first on Percona.

The post Impacts of updates in open-source databases appeared first on MariaDB.org.

Expanding board of directors – Kurt Daniel, CEO at Virtuozzo

Tue, 21 Apr 2026 13:50:45 +0000

The MariaDB Foundation is pleased to welcome Kurt Daniel, five-time CEO and current CEO of Virtuozzo, to its Board—bringing in a perspective shaped at the very heart of the database industry. …

Continue reading “Expanding board of directors – Kurt Daniel, CEO at Virtuozzo”

The post Expanding board of directors – Kurt Daniel, CEO at Virtuozzo appeared first on MariaDB.org.

Percona Operator for MySQL 1.1.0: PITR, Incremental Backups, and Compression

Tue, 21 Apr 2026 13:21:35 +0000

The latest release of the Percona Operator for MySQL, 1.1.0, is here. It brings point-in-time recovery, incremental backups, zstd backup compression, configurable asynchronous replication retries, and a set of stability fixes. This post walks through the highlights and how they help your MySQL deployments on Kubernetes.

Percona Operator for MySQL 1.1.0

Running stateful databases on Kubernetes means your backup and recovery story has to be airtight. A full nightly backup is fine, until the DBA drops a table at 2 PM and you’re looking at 14 hours of lost work. Or until your storage bill grows faster than your actual data because every backup is a full copy.

Percona Operator for MySQL 1.1.0 addresses exactly these pain points. This release lands point-in-time recovery, incremental backups, and backup compression: three features that together give you finer recovery control, faster backup jobs, and meaningfully smaller storage footprints. It also brings configurable asynchronous replication retries and a set of stability fixes that harden everyday operations.

This is a community-driven release. Nearly every headline feature in 1.1.0 traces back to user feedback: issues raised on forums.percona.com, JIRA tickets filed by operators in production, and recurring questions from teams running MySQL on Kubernetes at scale. The operator is fully open source, runs on any CNCF-conformant Kubernetes distribution (GKE, EKS, OpenShift, or bare metal), and costs nothing to run. Let’s walk through what’s new.

In this post, you’ll learn about:

Point-in-Time Recovery (Tech Preview)
Incremental Backups (Tech Preview)
Backup Compression with zstd
Asynchronous replication retry configuration
Other improvements

Point-in-Time Recovery (Tech Preview)

A backup restores your cluster to the moment the backup was taken, but incidents rarely respect your backup schedule. With point-in-time recovery now available in Tech Preview, you can restore your MySQL cluster to any specific timestamp or GTID position, not just to a backup snapshot.

The operator continuously collects binary logs and stores them alongside your full and incremental backups. When a restore is needed, it starts from the nearest full backup, applies incremental backups, and then replays binary logs forward to the exact point in time you specify. PITR works identically across asynchronous and group replication topologies, so you don’t need to restructure your setup to take advantage of it.

A timestamp-based restore targets the exact moment before an incident:

apiVersion: ps.percona.com/v1
kind: PerconaServerMySQLRestore
metadata:
  name: restore-pitr-example
spec:
  clusterName: cluster1
  backupName: backup-20260418
  pitr:
    type: date
    date: "2026-04-18 13:45:00"
    #Restore with GTID
#    type: gtid
#    gtid: a3e5ff70-83e2-11ef-8e57-7a62caf7e1e3:1-36

When you need finer precision than timestamp-based recovery (for example, replaying right up to the transaction immediately before a bad UPDATE), use pitr.type: gtid and specify the exact GTID position.

This is especially useful after an accidental DROP TABLE or a bad application deploy mid-day: you recover to the moment just before the event, not to last night’s snapshot.

See the documentation for the full configuration reference.

Note: PITR is marked Tech Preview in 1.1.0 and is not recommended for production workloads yet. Try it in staging and share your feedback on the community forum.

Incremental Backups (Tech Preview)

Full backups work, but they come with a cost: every job copies your entire dataset, consuming time, I/O, and storage whether or not much has changed since the last run. Incremental backups solve this by capturing only the changes since the previous backup.

The Operator integrates incremental backup support, powered by Percona XtraBackup, across all supported backup storage backends (S3-compatible, GCS, Azure Blob Storage). Both scheduled and on-demand backup jobs can run incrementally. When you trigger a restore, the Operator reconstructs the full state by chaining the base backup with the subsequent incremental sets, so you don’t manage that complexity manually.

This helps when you need:

- Faster daily backup jobs on large datasets that change slowly

- Lower storage and egress costs per backup cycle

- Tighter recovery windows without sacrificing backup frequency

- Less I/O pressure on the primary during backup jobs

The backup manifest lives in deploy/backup/backup.yaml. Note the commented type and incrementalBaseBackupName fields: they are exactly how you switch a backup to incremental mode and point it at a previous backup as its base.

apiVersion: ps.percona.com/v1
kind: PerconaServerMySQLBackup
metadata:
  finalizers:
    - percona.com/delete-backup
  name: backup1
spec:
  clusterName: ps-cluster1
  storageName: minio
  type: incremental

Set type: full to take a base backup, then for each subsequent incremental set type: incremental.

Note: Incremental backups are also marked Tech Preview in 1.1.0. You can learn more about this feature in a separate blog post: Incremental backups in Percona Kubernetes Operator for MySQL

Backup Compression with zstd

Even without incremental backups, you can now shrink your full backup size significantly. The operator adds support for zstd compression, which compresses backup data with Percona XtraBackup before it streams to object storage.

Smaller transfers mean faster uploads, lower egress costs, and less object storage consumption, especially relevant when your cluster is in a different region from your storage bucket. The operator handles decompression transparently during restore, so your recovery workflow stays the same.

You can enable compression globally by configuring XtraBackup in mysql.configuration on the Custom Resource:

spec:
  mysql:
    configuration: |
      [xtrabackup]
      compress=zstd

Or enable it per on-demand backup via containerOptions:

apiVersion: ps.percona.com/v1
kind: PerconaServerMySQLBackup
metadata:
  name: backup1-compressed
  finalizers:
    - percona.com/delete-backup
spec:
  clusterName: ps-cluster1
  storageName: s3-us-west
  containerOptions:
    args:
      xtrabackup:
        - "--compress"

Full details are in the compressed backups documentation. Percona XtraBackup’s zstd compression reference covers the algorithm-level tradeoffs if you want to tune further. One known limitation in 1.1.0: lz4 compression is not yet supported pending an upstream resolution.

Asynchronous Replication Retry Configuration

In asynchronous replication topologies, transient network issues can stall replication threads on a MySQL Pod. Previously, reconnection behavior was fixed. Now you can tune it via the Custom Resource using two environment variables:

ASYNC_SOURCE_RETRY_COUNT: the number of reconnection attempts before the replica gives up
ASYNC_SOURCE_CONNECT_RETRY: the delay in seconds between reconnection attempts

spec:
  mysql:
    env:
      - name: ASYNC_SOURCE_RETRY_COUNT
        value: "10"
      - name: ASYNC_SOURCE_CONNECT_RETRY
        value: "30"

This is useful in environments with higher network latency or less reliable connectivity between zones. You can give the replica more time to recover without manual intervention.

A related improvement (K8SPS-69): the readiness probe now fails if replication threads stop on a MySQL Pod. This prevents Kubernetes from routing traffic to a replica that has quietly fallen behind, a common source of stale reads that were difficult to detect without custom monitoring.

Other Improvements

Operational polish shipped alongside the headline features:

- Readiness probe catches stopped replication (K8SPS-69): the readiness probe now fails when replication threads stop, so Kubernetes stops routing traffic to replicas that have quietly fallen behind.

- Automatic PVC removal on async replication restore (K8SPS-215): old PVCs are cleaned up automatically when restoring in async replication mode, one less manual step after a restore.

- Scheduled backups paused on unhealthy clusters (K8SPS-435): backups no longer kick off against a degraded cluster, preventing partial or corrupted backup sets.

- PMM agent temp path (K8SPS-467): the Percona Monitoring and Management agent now defaults its temp path to /tmp/pmm, improving compatibility across platforms including OpenShift.

- Structured error handling (K8SPS-595): invalid storage configurations now surface as structured error messages instead of Operator panics.

- Status events reclassified (K8SPS-601): normal status transitions emit as Normal event types instead of warnings, cutting noise in kubectl describe output and alerting pipelines.

- HAProxy file descriptor handling (K8SPS-666): file descriptor management in the HAProxy container is optimized so connection counts are no longer silently capped on busy clusters.

The release also ships improved documentation: OpenShift installation instructions now include the full OLM procedure, an Operator upgrade tutorial for OpenShift has been added, and Helm documentation covers customized parameters and custom release naming.

Conclusion

Percona Operator for MySQL 1.1.0 delivers meaningful improvements to every phase of the database lifecycle on Kubernetes. PITR and incremental backups in Tech Preview give you a path toward granular recovery without full-backup overhead. Compression with zstd reduces your storage and egress costs immediately. Configurable async replication retries and a batch of stability fixes harden the Operator for production workloads at scale. These features are in this release because the community asked for them.

We encourage you to read the full release notes and try the new features. Feedback is welcome on the GitHub repository, the Community Forum, or JIRA.

Try It Out

Release notes: Percona Operator for MySQL 1.1.0 Release Notes
GitHub: percona/percona-server-mysql-operator
Community Forum: forums.percona.com, share your feedback, ask questions, or report issues

The post Percona Operator for MySQL 1.1.0: PITR, Incremental Backups, and Compression appeared first on Percona.

The post Percona Operator for MySQL 1.1.0: PITR, Incremental Backups, and Compression appeared first on MariaDB.org.

PostgreSQL Performance: Is Your Query Slow or Just Long-Running?

Tue, 21 Apr 2026 06:43:22 +0000

Introduction:

Recently I was having a conversation with a DB Enthusiast, and he mentioned that when he was a fresher, he tuned an ETL/reporting query that was running for 8-10 hours via a nightly job by 1/3rd. He went to his manager, saying that he reduced the query execution time, thinking that the manager would be happy. However, the manager said what business impact it is making. What difference will it make if this report is emailed to the boss at 2 AM instead of 8 AM. Eventually, he understood that he tuned the query that was actually not needed, and all the efforts were in vain wasted.

If we put unnecessary indexes in any table that is doing more DMLs operations (INSERT, UPDATE or DELETE) that table will perform badly automatically. Business knows better than DBA whether the table will do more of DML or SELECTS and depending upon that tuning can be done. Of course DBA can check a few things with pg_stat_statements but the DBAs can not tell whether speeding up transactions or speeding up of a SELECT statement improves the business value.

For some applications like Analytics systems, ETL queries are bound to take more time and for some applications like foreign exchange platforms even the query taking a few milliseconds requires tuning. The lesson learnt is whether the query is slow or not, that business will tell. Business is the correct entity to identify whether they want to tune any query or not.

The terms slow query and long-running query are often used interchangeably.

They shouldn’t be.

Understanding the difference is critical because the tuning strategy for each is completely different.

Let’s break it down.

What is a Slow Query?

A slow query is a query that takes longer than expected due to inefficiencies and hence it multiplies infrastructure cost unnecessarily.

This usually means below and more:

Incorrect/bad schema design
Missing indexes
Poor execution plan
Sequential scans on large tables
Incorrect join strategy
Bad statistics
Parameter mismatch and more

Example:

SELECT *
FROM   orders
WHERE  customer_id = 10001;

If there’s no index on customer_id, PostgreSQL will perform a sequential scan on the entire table.

Even if it runs for only 5 seconds, it is slow because it could have run in milliseconds/nanoseconds with proper indexing.

A slow query uses resources inefficiently.

Again, make sure that tuning is actually needed by confirming with the business. If this query is used for a database that stores historical data, where more INSERTs occur than SELECTs, then this index is not needed, provided the business confirms they want to tune this query.

What is a Long-Running Query?

A long-running query runs for a long time — but not necessarily inefficiently.

These queries may:

Process millions/billions of rows
Run complex aggregations
Perform batch updates
Execute reporting jobs
Perform ETL workloads

Example:

INSERT INTO sales_summary 
SELECT date_trunc(‘month’, sale_date),       
       SUM(amount) 
FROM  sales 
WHERE sale_date >= CURRENT_DATE - INTERVAL '30 days' 
GROUP  BY 1

If the sales table has 500 million rows, this query may run for 15 minutes on the first day of every month — and that’s perfectly fine.

It’s long-running, but not slow.

The Core Difference

Aspect	Slow Query	Long-Running Query
Root cause	Poor indexing, stale statistics, or bad SQL.	Large data volume, complex joins, or heavy reports.
Fix	Query tuning	Workload planning
CPU usage	Often high per row	May be proportional
Business justification	Usually none	Often valid

A slow query is a performance problem.
A long-running query is often a capacity or workload problem.

The Dangerous Confusion

The real issue occurs when:

A slow query becomes long-running
A long-running query blocks OLTP traffic
DBAs kill legitimate analytical queries
Developers add unnecessary indexes

Not every 10-minute query is bad.
Not every 2-second query is good.

Context matters. Business justification matters more.

When a Long-Running Query Becomes a Problem

A long-running query becomes dangerous when it:

Holds locks
Blocks vacuum
Causes bloat
Consumes excessive memory
Saturates I/O
Impacts replication lag

What can be done

If it’s Slow:

Run EXPLAIN (ANALYZE, BUFFERS)
Check index usage
Check row estimates vs actual rows
Review statistics
Look for nested loop disasters
Check for duplicate and unused indexes

Additionally, Tuning blogs like below can help in tuning

PostgreSQL Performance Tuning Guide: Settings That Make a Difference

SQL Optimizations in PostgreSQL: IN vs EXISTS vs ANY/ALL vs JOIN

If it’s Long-Running:

Move to batch window
Use parallelism
Increase work_mem carefully
Consider partitioning
Offload to reporting replica

Conclusion:

Every slow query is a problem.
Not every long-running query is.

Before killing a query in production, ask:

Is this inefficient — or is it just doing a lot of work?

That distinction defines whether it needs tuning or rescheduling.Business knows the answer better than a DBA in most cases.

The post PostgreSQL Performance: Is Your Query Slow or Just Long-Running? appeared first on Percona.

The post PostgreSQL Performance: Is Your Query Slow or Just Long-Running? appeared first on MariaDB.org.

Deploying Cross-Site Replication in Percona Operator for MySQL (PXC)

Mon, 20 Apr 2026 13:15:04 +0000

Having a separate DR cluster for production databases is a modern day requirement or necessity for tech and other related businesses that rely heavily on their database systems. Setting up such a [DC -> DR] topology for Percona XtraDB Cluster (PXC), which is a virtually- synchronous cluster, can be a bit challenging in a complex Kubernetes environment.

Here, Percona Operator for MySQL comes in handy, with a minimal number of steps to configure such a topology, which ensures a remote side backup or a disaster recovery solution.

So without taking much time, let’s see how the overall setup and configurations look from a practical standpoint.

PXC Cross-Site/Disaster Recovery

DC Configuration

1) Here we have a three-node PXC cluster running on the DC side.

shell> kubectl get pods -n pxc
NAME                                               READY   STATUS      RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running     0          23h
cluster1-haproxy-1                                 2/2     Running     0          23h
cluster1-haproxy-2                                 2/2     Running     0          23h
cluster1-pxc-0                                     3/3     Running     0          23h
cluster1-pxc-1                                     3/3     Running     0          7h37m
cluster1-pxc-2                                     3/3     Running     0          7h18m
percona-xtradb-cluster-operator-6756dbf588-vxjxt   1/1     Running     0          24h
xb-backup1-hlz2p                                   0/1     Completed   0          21h
xb-cron-cluster1-fs-pvc-2026480026-372f8-2gfhr     0/1     Completed   0          13h

2) There are some configuration options which have to be enabled in a custom resource file[cr.yaml] to allow cross-site replication.

Expose all source PXC nodes so they can be communicated from outside or DR cluster.

expose:
      	enabled: true
      	Type: LoadBalancer

Define a dedicated replication channel and enable the source option.

replicationChannels:
    - name: pxc1_to_pxc2
      isSource: true

Finally, applying the custom resource changes.

shell> kubectl apply -f cr.yaml

3) Now we will notice some “EXTERNAL IP” details for each PXC node. This is the endpoint that DR node [cluster1-pxc-0] will use to connect to DC.

shell> kubectl get svc
NAME                              TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                                          AGE
cluster1-haproxy                  ClusterIP      34.118.227.249             3306/TCP,3309/TCP,33062/TCP,33060/TCP,8404/TCP   4h1m
cluster1-haproxy-replicas         ClusterIP      34.118.225.41              3306/TCP                                         4h1m
cluster1-pxc                      ClusterIP      None                       3306/TCP,33062/TCP,33060/TCP                     4h1m
cluster1-pxc-0                    LoadBalancer   34.118.234.140   34.29.145.138   3306:30425/TCP                                   4h1m
cluster1-pxc-1                    LoadBalancer   34.118.239.132   34.30.233.0     3306:31340/TCP                                   4h1m
cluster1-pxc-2                    LoadBalancer   34.118.236.64    35.225.0.19     3306:30642/TCP                                   4h1m
cluster1-pxc-unready              ClusterIP      None                       3306/TCP,33062/TCP,33060/TCP                     4h1m
percona-xtradb-cluster-operator   ClusterIP      34.118.235.168             443/TCP                                          4h11m

At this point, we are done with the DC setup. Next, we will take a backup from Source which we later used to build the DR.

Backup

Defining access key/secrets to connect to the GCP/S3 bucket.

cat backup-secret-s3.yaml

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name-backup-s3
type: Opaque
data:
  AWS_ACCESS_KEY_ID: 
  AWS_SECRET_ACCESS_KEY:

In the custom resource file [cr.yaml] , we also need to define the bucket , secret file and endpoint/region details.

backup:

 storages:
   s3-us-west:
      type: s3
      verifyTLS: true

    s3:
      bucket: 
      credentialsSecret: my-cluster-name-backup-s3
      region: us-west-2
      endpointUrl: https://storage.googleapis.com

…

shell> kubectl apply -f cr.yaml

Finally, we can take the backup by creating a [backup.yaml] file with below details.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterBackup
metadata:
#  finalizers:
#    - percona.com/delete-backup
  name: backup1
spec:
  pxcCluster: cluster1
  storageName:  s3-us-west

…

shell> kubectl apply -f cr.yaml

We can verify the successful backup as follows.

kubectl get pxc-backup
NAME      CLUSTER    STORAGE      DESTINATION                                     STATUS      COMPLETED   AGE
backup1   cluster1   s3-us-west   s3:///cluster1-2026-04-07-15:55:46-full   Succeeded   125m        127m

As the backup is also ready, we can now move to the DR setup part.

DR Configuration

Below we have a similar PXC setup as having in DC in a separate Node/ K8s Cluster.

kubectl get pods -n pxc-dr
NAME                                               READY   STATUS      RESTARTS   AGE
cluster1-haproxy-0                                 2/2     Running     0          35h
cluster1-haproxy-1                                 2/2     Running     0          35h
cluster1-haproxy-2                                 2/2     Running     0          35h
cluster1-pxc-0                                     3/3     Running     0          35h
cluster1-pxc-1                                     3/3     Running     0          35h
cluster1-pxc-2                                     3/3     Running     0          35h
percona-xtradb-cluster-operator-6756dbf588-2wc5m   1/1     Running     0          38h
prepare-job-restore1-cluster1-8h4vn                0/1     Completed   0          35h
restore-job-restore1-cluster1-trfg6                0/1     Completed   0          35h
xb-cron-cluster1-fs-pvc-2026480025-372f8-wv6bt     0/1     Completed   0          28h
xb-cron-cluster1-fs-pvc-2026490025-372f8-gxd59     0/1     Completed   0          4h48m

First, we need to restore the backup on the DR server.

Data Restoration

Here we will create the [backup-secret-s3.yaml] file which contains the GCP/S3 credentials.

apiVersion: v1
kind: Secret
metadata:
  name: my-cluster-name-backup-s3
type: Opaque
data:
  AWS_ACCESS_KEY_ID: 
  AWS_SECRET_ACCESS_KEY:

…

shell> kubectl apply -f backup-secret-s3.yaml

Next, we will create a [restore.yaml] file while mentioning the backup source and other useful information.

apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBClusterRestore
metadata:
  name: restore1
#  annotations:
#    percona.com/headless-service: "true"
spec:
  pxcCluster: cluster1
  backupSource:
#    verifyTLS: true
    destination: s3:///cluster1-2026-04-07-15:55:46-full
    s3:
      bucket: 
      credentialsSecret: my-cluster-name-backup-s3
      endpointUrl: https://storage.googleapis.com/

…

shell> kubectl apply -f restore.yaml

Once the restoration is finished successfully, we will see the status below.

shell> kubectl get pxc-restore
NAME       CLUSTER    STATUS      COMPLETED   AGE
restore1   cluster1   Succeeded               27m

Now we can do the remaining DR changes in the custom resource file [cr.yaml]. Basically, we need to add the replication channel and all source EXTERNAL-IPs. This cross-DC replication supports Automatic Asynchronous Replication Connection Failover feature, so in case any of the DC node is down, the Replica can connect and resume from other available DC nodes.

replicationChannels:
    - name: pxc1_to_pxc2
      isSource: false
      sourcesList:
      - host: 34.29.145.138  
        port: 3306
        weight: 100

      - host: 34.30.233.0
        port: 3306
        weight: 100

      - host: 35.225.0.19
        port: 3306
        weight: 100

…

shell> kubectl apply -f cr.yaml

For backup and restoration on the PXC operator, the manuals below can be referenced further.

https://docs.percona.com/percona-operator-for-mysql/pxc/backups-ondemand.html

https://docs.percona.com/percona-operator-for-mysql/pxc/backups-restore-to-new-cluster.html

Replication

Initially, when we check the replication status, we can notice the following error. This is because with [caching_sha2_password] authentication, it should be a secure SSL/TLS communication, or else we can use SOURCE_PUBLIC_KEY_PATH/GET_SOURCE_PUBLIC_KEY which basicaly enables the RSA key pair-based password exchange by requesting the public key from the source.

shell> kubectl exec -it cluster1-pxc-0  -- sh
shell> mysql -uroot -p

mysql> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Connecting to source
                  Source_Host: 35.225.0.19
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: 
          Read_Source_Log_Pos: 4
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000001
                Relay_Log_Pos: 4
        Relay_Source_Log_File: 
           Replica_IO_Running: Connecting
          Replica_SQL_Running: Yes
...

Error:

Last_IO_Error: Error connecting to source 'replication@35.225.0.19:3306'. This was attempt 2/3, with a delay of 60 seconds between attempts. Message: Access denied for user 'replication'@'35.225.0.19.' (using password: YES)

Once we passed “GET_SOURCE_PUBLIC_KEY” in the “CHANGE REPLICATION” command the error is resolved and DR successfully able to communicate with the DC.

mysql> STOP REPLICA;
mysql> STOP REPLICA IO_THREAD FOR CHANNEL 'pxc1_to_pxc2';
mysql> CHANGE REPLICATION SOURCE TO SOURCE_USER='replication', SOURCE_PASSWORD='password', GET_SOURCE_PUBLIC_KEY=1 FOR CHANNEL 'pxc1_to_pxc2';
mysql> START REPLICA;

Note – The Replication user will be auto-created on the DC node. So, with the help of below command we can get the decoded password for “replication” user.

shell> kubectl get secret cluster1-secrets -o jsonpath="{.data.replication}" | base64 --decode

mysql> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 35.225.0.19
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000006
          Read_Source_Log_Pos: 3047027
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000001
                Relay_Log_Pos: 150132
        Relay_Source_Log_File: binlog.000006
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...

The other PXC DR nodes will sync as usual with the Galera Synchronous replication process.

Source Failover

The asynchronous connection failover is already enabled on the DR as we defined initially in the custom resource file. The “External IPs” shows different here because they changed in this testing scenario.

mysql> select * from performance_schema.replication_asynchronous_connection_failover;
+--------------+---------------+------+-------------------+--------+--------------+
| CHANNEL_NAME | HOST          | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+---------------+------+-------------------+--------+--------------+
| pxc1_to_pxc2 | 34.29.145.138 | 3306 |                   |    100 |              |
| pxc1_to_pxc2 | 34.45.151.96  | 3306 |                   |    100 |              |
| pxc1_to_pxc2 | 34.71.57.38   | 3306 |                   |    100 |              |
+--------------+---------------+------+-------------------+--------+--------------+
3 rows in set (0.00 sec)

Now, in case the existing Source DC[cluster1-pxc-2] is down, the DR will connect to one of the other available DC nodes based on the “Weight” and chronological order [pxc-2, pxc-1, pxc-0 etc].

Here, we temporarily take down the Source DC[cluster1-pxc-2] node.

kubectl get pods -n pxc
NAME                                               READY   STATUS      RESTARTS     AGE
cluster1-haproxy-0                                 2/2     Running     0            2d3h
cluster1-haproxy-1                                 2/2     Running     0            2d3h
cluster1-haproxy-2                                 2/2     Running     0            2d3h
cluster1-pxc-0                                     3/3     Running     0            2d3h
cluster1-pxc-1                                     3/3     Running     0            35h
cluster1-pxc-2                                     2/3     Running     1 (6s ago)   34h
percona-xtradb-cluster-operator-6756dbf588-vxjxt   1/1     Running     0            2d3h
xb-backup1-hlz2p                                   0/1     Completed   0            2d1h
xb-cron-cluster1-fs-pvc-2026480026-372f8-2gfhr     0/1     Completed   0            41h
xb-cron-cluster1-fs-pvc-2026490026-372f8-mgfpv     0/1     Completed   0            17h

The DR replication breaks as it can’t reach the DC [cluster1-pxc-2].

mysql> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Reconnecting after a failed source event read
                  Source_Host: 34.71.57.38
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000012
          Read_Source_Log_Pos: 198
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000002
                Relay_Log_Pos: 369
        Relay_Source_Log_File: binlog.000012
           Replica_IO_Running: Connecting
          Replica_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Source_Log_Pos: 198
              Relay_Log_Space: 602
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Source_SSL_Allowed: No
           Source_SSL_CA_File: 
           Source_SSL_CA_Path: 
              Source_SSL_Cert: 
            Source_SSL_Cipher: 
               Source_SSL_Key: 
        Seconds_Behind_Source: NULL
Source_SSL_Verify_Server_Cert: Yes
                Last_IO_Errno: 2003
                Last_IO_Error: Error reconnecting to source 'replication@34.71.57.38:3306'. This was attempt 2/3, with a delay of 60 seconds between attempts. Message: Can't connect to MySQL server on '34.71.57.38:3306' (111)

Once it reaches the “source_retry_count” and “source_connect_retry”, the Replica connects to another Source DC[cluster1-pxc-1].

mysql> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 34.45.151.96
                  Source_User: replication
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000007
          Read_Source_Log_Pos: 198
               Relay_Log_File: cluster1-pxc-0-relay-bin-pxc1_to_pxc2.000003
                Relay_Log_Pos: 369
        Relay_Source_Log_File: binlog.000007
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...

Quick Summary

In this blog post, we walk through the steps to configure Cross-Site Replication in the Percona PXC operator. Although we have used the operator native Xtrabackup to feed the data to the DR via the restore process, we can also use logical backup options like (mysqldump, mydumper, etc.) to accomplish the same goals.

Using an “Asynchronous Replication” process to sync DR could lead to delays or replication lag due to its flow, or, more importantly, when working across data centres, where network latency is a big factor. However, adding a DR(PXC) cluster to DC(PXC) directly via synchronous replication could be more impactful or lead to flow control issues if any of the DR nodes struggle or experience performance/saturation issues. So, it’s equally important to consider all aspects or challenges before deploying in production.

The post Deploying Cross-Site Replication in Percona Operator for MySQL (PXC) appeared first on Percona.

The post Deploying Cross-Site Replication in Percona Operator for MySQL (PXC) appeared first on MariaDB.org.

Incremental backups in Percona Kubernetes Operator for MySQL

Fri, 17 Apr 2026 10:00:00 +0000

Starting with version 1.1.0, the Percona Kubernetes Operator for MySQL now supports incremental backups. This feature lets you backup only the changed data since the last backup, instead of copying your entire dataset each time. The result is dramatically smaller backup sizes, faster backup windows, and lower cloud storage costs.

In this post, we’ll walk through how the feature works under the hood, how to configure it, and what to keep in mind when designing your backup strategy.

How Incremental Backups Work in Percona XtraBackup

The foundation of this feature is Percona XtraBackup (PXB), an open source backup tool for MySQL. PXB has supported incremental backups for a while, and the operator now brings that capability into the backup workflow.

Every InnoDB data page carries a Log Sequence Number (LSN), which is a monotonically increasing counter that records when the page was last modified. When PXB takes an incremental backup, it scans data pages and copies only those with an LSN newer than a reference point. The output is a set of compact .delta files instead of full tablespace copies.

Each backup produces an xtrabackup_checkpoints file:

backup_type = full-backuped
from_lsn = 0
to_lsn = 7345291
last_lsn = 7345291

Each incremental’s from_lsn must equal the previous backup’s to_lsn.

Using Incremental Backups with the Operator

On-Demand Incremental Backup

First, you need a full backup to serve as the base:

yaml

apiVersion: ps.percona.com/v1
kind: PerconaServerMySQLBackup
metadata:
 name: weekly-full
spec:
 clusterName: my-cluster
 storageName: s3-us
 type: full

Once the full backup succeeds, create an incremental:

yaml

apiVersion: ps.percona.com/v1
kind: PerconaServerMySQLBackup
metadata:
 name: daily-inc-1
spec:
 clusterName: my-cluster
 storageName: s3-us
 type: incremental

The operator automatically discovers the latest succeeded full backup for the same cluster and storage, fetches its LSN, and creates an incremental backup. If you want to pin a specific base, simply use the incrementalBaseBackupName field:

yaml

spec:
 type: incremental
 incrementalBaseBackupName: weekly-full

Scheduled Backups: Full + Incremental

The real power comes from combining full and incremental schedules. Here’s an example: weekly full backups with daily incrementals:

yaml

spec:
 backup:
 schedule:
 - name: weekly-full
 schedule: "0 0 * * 0" # Sunday midnight
 keep: 4
 storageName: s3-us
 type: full
 - name: daily-incremental
 schedule: "0 0 * * 1-6" # Monday through Saturday
 storageName: s3-us
 type: incremental

The keep rotation policy is chain-aware: it counts only full backups and automatically cascade-deletes all dependent incrementals when a full backup is rotated out.

Restoring from an Incremental Backup

The PerconaServerMySQLRestore custom resource allows you to restore from any point in an incremental, similar to restoring a full backup:

yaml

apiVersion: ps.percona.com/v1
kind: PerconaServerMySQLRestore
metadata:
 name: restore-to-wednesday
spec:
 clusterName: my-cluster
 backupName: daily-inc-3

The operator handles the complexity behind the scenes:

Discovers the full chain by listing the cloud storage directory
Downloads and prepares the base full backup
Applies each incremental in sequence
Applies the final incremental and rolls back uncommitted transactions
Moves the prepared data back to the MySQL data directory

You don’t need to know which backup is the base or how many incrementals are in the chain, the operator figures it out.

How It Works Under the Hood

Storage Layout

The operator uses a specific directory convention to encode backup chains without any requiring any additional metadata:

s3://bucket/prefix/
 my-cluster-2026-04-06-full/ # base full backup
 my-cluster-2026-04-06-full.incr/ # incremental chain directory
 my-cluster-2026-04-07T000000-incr/ # Monday's incremental
 my-cluster-2026-04-08T000000-incr/ # Tuesday's incremental
 my-cluster-2026-04-09T000000-incr/ # Wednesday's incremental

The .incr/ suffix creates a self-describing structure. Any cluster with access to the storage bucket can reconstruct the chain, making cross-cluster restores straightforward.

The Backup Flow

Here’s what happens when you create an incremental backup:

Resolve the base. The controller finds the latest succeeded full backup (or the one you specified) and annotates the incremental CR with percona.com/base-backup-name.
Fetch the LSN. The controller calls the xtrabackup sidecar’s /backup/checkpoint-info endpoint. The sidecar downloads xtrabackup_checkpoints from the previous backup via xbcloud get, parses it, and returns the to_lsn.
Launch the backup job. A Kubernetes Job is created with the INCREMENTAL_LSN environment variable set.
Stream to storage. The sidecar runs xtrabackup --backup --stream=xbstream --incremental-lsn= and pipes the output through xbcloud put to the cloud destination.

Chain Integrity Protection

The operator enforces chain integrity at multiple levels:

Deletion guards: Only the latest incremental in a chain can be deleted. Attempting to delete a mid-chain backup is blocked using finalizers.
Cascade deletion: Deleting a full backup automatically removes all dependent incrementals, from newest to oldest.
Concurrent backup prevention: The controller uses a Lease-based mechanism to prevent multiple incremental backups from running at the same time.

Designing Your Backup Strategy

When to Use Incremental Backups

Incremental backups shine when:

Your database is large but change rate is low. A 1 TB database with 2% daily change produces ~20 GB incremental backups instead of 1 TB full backups.
You need frequent backup points. Run hourly incrementals with minimal overhead.
Cloud storage costs matter. Example: with about 2% of the data changing each day, one full backup plus six daily incrementals needs roughly one-fifth the space of keeping six separate full backups over the same week.

What to Keep in Mind

All chain members must use the same storage backend. You can’t mix S3 and GCS within a chain.
Chain integrity is critical. If a backup in the chain is corrupted, all subsequent incrementals in that chain become unrestorable. Regular full backups provide recovery checkpoints.
Restore time increases with chain length. Each incremental adds a prepare step. For very long chains, consider more frequent full backups.

Try It Out

Incremental backups are available in Percona Operator for MySQL version 1.1.0 and later. If you’re already running the operator, upgrade your CRDs and add a type: incremental schedule to your backup configuration.

Have questions or feedback? Join the conversation on the Percona Community Forum or open an issue on GitHub. We’d love to hear how incremental backups are working for your MySQL-on-Kubernetes deployments.

The post Incremental backups in Percona Kubernetes Operator for MySQL appeared first on MariaDB.org.

MariaDB’s Snapshot Isolation: A Fix That Breaks More Than It Fixes

Fri, 17 Apr 2026 07:58:28 +0000

Jepsen’s analysis of MySQL 8.0.34 walked through a set of concurrency and isolation anomalies in InnoDB. MariaDB, which inherits the same codebase, took the report seriously and shipped a response: a new server variable called innodb_snapshot_isolation, turned on by default starting in 11.8. The announcement claims that with the flag enabled, Repeatable Read in MariaDB now satisfies snapshot isolation.

It’s a good intention. The problem is what actually ships.

Two things fall apart once you start looking. First, the fix isn’t complete — the anomalies Jepsen flagged can still be reproduced under concurrent load.

Second, it introduces incompatibilities with MySQL (in default enabled mode) – the moment the SNAPSHOT ISOLATION does fire as intended, it introduces ERROR 1020: Record has changed since last read into transactions that used to complete silently. That error now shows up in multiple applications, requiring to make changes either on code level or disabling innodb_snapshot_isolation

A quick refresher on what snapshot isolation promises

Snapshot isolation is supposed to let a transaction see a consistent view of the database taken at the moment it started.

Key behaviors of Snapshot Isolation:

Consistency: A transaction sees data as it was at its start time, ignoring updates from concurrent transactions.
Non-Blocking Reads: Readers do not block writers, and writers do not block readers, reducing contention.
Conflict Detection: A transaction only commits if its updates do not conflict with concurrent updates made since the snapshot was taken.

Two anomalies are specifically should not be present with Snapshot Isolation:

Lost Update Anomaly: Two transactions read the same value, both modify it, and one overwrites the other. Two users increment a counter from 10. Both read 10, both write 11. The correct answer is 12.

Non-Repeatable Read Anomaly: A transaction reads a row, someone else commits a change, and the first transaction reads the same row again and sees something different. Product price was $100, then it’s $120 — all inside one transaction.

What MySQL does today

Plain Repeatable Read handles the simple case (two reads) fine:

#	Session A	Session B	A sees
1	`SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; START TRANSACTION;`
2	`SELECT name FROM test_nrr WHERE id=0;`		`'Alice'`
3		`UPDATE test_nrr SET name='Bob' WHERE id=0;` (autocommit)
4	`SELECT name FROM test_nrr WHERE id=0;`		`'Alice'` ← RR holds
5	`COMMIT;`

Add a write on Session A between the two reads, though, and RR does not hold (we get Non-Repeatable Read Anomaly):

#	Session A	Session B	A sees
1	`SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; START TRANSACTION WITH CONSISTENT SNAPSHOT;`
2	`SELECT name FROM test_nrr WHERE id=0;`		`'Alice'`
3		`UPDATE test_nrr SET name='Bob' WHERE id=0;`
4	`UPDATE test_nrr SET gender=99 WHERE id=0;`
5	`SELECT name FROM test_nrr WHERE id=0;`		`'Bob'` ← Non-Repeatable Read

And here’s Lost Update under plain Repeatable Read in MySQL:

Time	Session A	Session B
t1	`BEGIN;`
t2		`BEGIN;`
t3	`SELECT counter FROM t WHERE id=1;` → 10
t4		`SELECT counter FROM t WHERE id=1;` → 10
t5	`UPDATE t SET counter=10+1 WHERE id=1; COMMIT;` (counter = 11)
t6		`UPDATE t SET counter=10+1 WHERE id=1; COMMIT;` (still 11)

Expected 12. Got 11. Session A’s increment is gone.

For the full picture of what every isolation level actually guarantees across engines, Martin Kleppmann’s Hermitage suite is the good reference: github.com/ept/hermitage.

What MariaDB is supposed to do

With innodb_snapshot_isolation=ON, the Non-Repeatable Read scenario should stop at step 4 with:

#	Session A	Session B	A sees
1	`SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; START TRANSACTION WITH CONSISTENT SNAPSHOT;`
2	`SELECT name FROM test_nrr WHERE id=0;`		`'Alice'`
3		`UPDATE test_nrr SET name='Bob' WHERE id=0;`
4	`UPDATE test_nrr SET gender=99 WHERE id=0;` ERROR 1020: Record has changed since last read in table ‘test_nrr‘
5	TRANSACTION ROLLBACK

And the Lost Update case should force whichever transaction tries to commit the stale write to roll back instead. That’s the guarantee:

Time	Session A	Session B
t1	`BEGIN;`
t2		`BEGIN;`
t3	`SELECT counter FROM t WHERE id=1;` → 10
t4		`SELECT counter FROM t WHERE id=1;` → 10
t5	`UPDATE t SET counter=10+1 WHERE id=1; COMMIT;` (counter = 11)
t6		`UPDATE t SET counter=10+1 WHERE id=1; -> ERROR 1020: Record has changed since last read in table 'test_nrr'` TRANSACTION ROLLBACK

That is in both cases MariaDB introduces

ERROR 1020: Record has changed since last read in table

What actually happens

Run it under concurrent load and both anomalies still turn up:

Lost Update: MDEV-39263
Non-Repeatable Read: MDEV-39264

ERROR 1020 fires most of the time, but not every time. A snapshot-isolation guarantee that only holds “usually” isn’t a guarantee. The whole reason you pick an isolation level is for the bound it gives you.

The another problem: compatibility

Even when the error does fire when it should, MariaDB, by default, introduced a new failure mode into every client connected to the database. Almost nothing in the MySQL ecosystem catches ERROR 1020 mid-transaction and retries. It sees an unexpected error, it bails.

Issues already filed against applications running on MariaDB 11.8:

Laravel — framework#56944
DOMjudge — domjudge#2848
Friendica — friendica#15628
webtrees — webtrees#5177
PDO thread on Stack Overflow

These are apps that work on MySQL, work on earlier MariaDB, and now fail on 11.8. The clean workaround is turning the flag off — which defeats the point of shipping it on by default.

Where that leaves us

Jepsen pointed at real transactional anomalies and MariaDB tried to answer them. But a partial fix that silently breaks working applications isn’t what “drop-in MySQL replacement” is supposed to mean. If the goal was to make migration easier, 11.8 went the other direction.

The post MariaDB’s Snapshot Isolation: A Fix That Breaks More Than It Fixes appeared first on Percona.

The post MariaDB’s Snapshot Isolation: A Fix That Breaks More Than It Fixes appeared first on MariaDB.org.

Where Do Users Get MariaDB Server From?

Fri, 17 Apr 2026 06:52:08 +0000

We recently asked the community a simple but important question:

What is the main source of the MariaDB Server you use?

The answers provide a very interesting snapshot of how MariaDB is consumed in the real world today—and, perhaps more importantly, how different installation methods reflect different use cases and priorities. …

Continue reading “Where Do Users Get MariaDB Server From?”

The post Where Do Users Get MariaDB Server From? appeared first on MariaDB.org.

Why A Goat?

Thu, 16 Apr 2026 13:10:53 +0000

New Brand. Same Independence.

If you read today’s announcement, you know Percona has a lot to say about what’s broken in modern data infrastructure. Lock-in dressed up as openness. Costs that climb while control shrinks. Vendors who made “managed” mean giving up visibility instead of gaining it.

When we decided to stop being quiet about all of it, to build an identity that actually reflected what Percona believes and what Percona does, we had a choice to make. Every brand needs a symbol.

We picked a goat.

Not a lion. Not an eagle. Not something abstract that a committee agreed was inoffensive. A goat. And if that surprises you, you probably haven’t spent much time thinking about goats.

Why a Goat?

Most tech company mascots are chosen to project power or aspiration. They roar or soar or look good on a t-shirt at a conference. That’s fine if your brand is built on image.

Ours isn’t.

Percona has spent 20 years doing the unglamorous, load-bearing work of keeping databases running, patching at 3:00 a.m., telling customers the honest answer when it wasn’t the comfortable one, building freely available open source software because we believed the industry deserved better than what vendors were offering. None of that is lion territory. It’s goat territory.

Goats don’t perform. They just get the job done, wherever you put them. That’s Percona.

Six Things the Goat Gets Right

Resilient by nature.

Goats survive where other animals don’t bother trying. Percona builds for always-on, mission-critical workloads, the ones where downtime isn’t an inconvenience, it’s a crisis. Your database goes down at 2:00 a.m. on a holiday weekend? We’re already on it. That’s not a marketing line. That’s the job.

Open and independent.

Goats don’t belong to anyone. Neither does open source, when it’s done right. Percona has spent its entire existence making sure organizations can run technologies like MySQL, PostgreSQL, MongoDB, Redis and Valkey without a single vendor holding the keys. The goat is a symbol of that freedom. The brand is the declaration of it.

Naturally adaptable.

Mountain goats thrive on cliff faces, in forests, on open plains. Percona software runs on-premises, in the cloud, and across hybrid environments with the same reliability. Your infrastructure changes. Your database platform shouldn’t have to.

Sure-footed in complexity.

Goats navigate terrain that would stop most animals cold. Percona engineers work in the same kind of environments: sprawling multi-database estates, legacy migrations, compliance frameworks that make your head spin. We don’t freeze up when the terrain gets steep. That’s what 20 years of experience looks like.

Trusted guide.

In mountainous regions, goats have led other animals along safe paths for centuries. Percona does the same for database teams navigating scale, modernization, and architectural change. We’ve walked the path before. We know where the drop-offs are.

Community driven.

Mountain goats aren’t herd animals. They don’t move in packs or wait for the group to decide. They operate independently, come together when it matters, and the strongest ones lead. Percona’s community works the same way, contributors, forum regulars, Percona Live speakers, the engineer who filed that bug report at midnight, each of them doing the work on their own terms, sharing knowledge freely when it counts. No paywalls. No gatekeeping. Just people who know their terrain and aren’t shy about it.

Same Percona. Only Louder.

Here’s what the new brand isn’t: a reinvention. Percona hasn’t changed what it believes or what it does. What’s changed is that we have 20 years of credibility behind us, a clearer picture of what’s broken in the industry, and no remaining appetite for being polite about it.

The goat isn’t a departure from who Percona has always been. It’s the most honest representation of it we’ve ever put on paper.

The Way Is Open. It always was. We’re just saying it louder now.

*Want to know more about what’s driving the rebrand? Read the full announcement here

The post Why A Goat? appeared first on Percona.

The post Why A Goat? appeared first on MariaDB.org.

MongoDB Query Plan Cache Explained: Performance, Pitfalls, and Re-Planning

Mon, 13 Apr 2026 21:11:36 +0000

When MongoDB receives a query, it performs the following steps:

Evaluate the available indexes that could be used.
Generate and test multiple execution plans using candidate indexes.
Measure their performance during a trial phase.
Select the fastest plan (the winning plan) and execute the query.

These steps are known as query planning, and they are expensive in terms of CPU, memory, and potentially disk utilization. Calculating a query plan for every single query would significantly impact database performance.

Fortunately, MongoDB does not repeat this process for every query.

Once a query plan has been generated, it is stored in an in-memory cache associated with the collection: the Query Plan Cache. Each collection has its own independent cache.

The Query Plan Cache stores:

The query shape (filter conditions, projection, sort, etc.)
The winning plan (indexes and access methods)
Additional metadata, such as the expected performance of the plan (measured in works, an abstract unit of measurement that MongoDB uses internally to quantify the resource consumption of a query execution plan)

Every subsequent query with the same shape retrieves the winning plan directly from the cache instead of triggering query planning again.

Benefits are:
Less CPU utilization
Reduced latencyFaster recurring queries
Better overall scalability

Let’s test it

Let’s explore the Query Plan Cache and see how to inspect and manage it.

Create a Test Collection

Create an orders collection with 1 million documents using the following script in the shell:

use queryCacheLab
db.orders.drop()

const statuses = ["NEW", "PAID", "SHIPPED", "CANCELLED"]

const TOTAL_DOCS = 1000000
const BATCH_SIZE = 10000

let bulk = []
let batchNumber = 0
let insertedDocs = 0

const startTime = new Date()

for (let i = 0; i < TOTAL_DOCS; i++) {
  bulk.push({
    orderId: i,
    customerId: Math.floor(Math.random() * 50000),
    status: statuses[Math.floor(Math.random() * 4)],
    total: Math.floor(Math.random() * 1000),
    createdAt: new Date(2024, 0, 1 + Math.floor(Math.random() * 365))
  })

  if (bulk.length === BATCH_SIZE) {
    db.orders.insertMany(bulk, { ordered: false })
    batchNumber++
    insertedDocs += bulk.length
    bulk = []
    const elapsed = (new Date() - startTime) / 1000
    const percent = ((insertedDocs / TOTAL_DOCS) * 100).toFixed(2)
    print(
      `Batch ${batchNumber} completato | ` +
      `Inserted documents: ${insertedDocs}/${TOTAL_DOCS} (${percent}%) | ` +
      `Time elapsed: ${elapsed.toFixed(2)} sec`
    )
  }
}
// Insert eventual remaining documents
if (bulk.length > 0) {
  db.orders.insertMany(bulk, { ordered: false })
  insertedDocs += bulk.length
  batchNumber++
}
const totalTime = (new Date() - startTime) / 1000
print("====================================")
print(`Inserts completed`)
print(`Total batches: ${btchNumber}`)
print(`Total documents: ${insertedDos}`)
print(`Time: ${totalTime.toFixed(2)} ses`)
print("====================================")

Create Indexes

Create secondary indexes, so MongoDB can evaluate multiple execution plans:

db.orders.createIndex({ status: 1 })
db.orders.createIndex({ customerId: 1 })
db.orders.createIndex({ status: 1, customerId: 1 })

Generate and Inspect a Cache Plan

Run the query a few times to ensure the plan is cached:

db.orders.find({
  status: "PAID",
  customerId: 1234
})

Inspect the execution plan:

db.orders.find({
  status: "PAID",
  customerId: 1234
}).explain("executionStats")

Relevant details:

…
    winningPlan: {
      isCached: true,
      stage: 'FETCH',
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { status: 1, customerId: 1 },
        indexName: 'status_1_customerId_1',
…
  executionStats: {
    executionSuccess: true,
    nReturned: 6,
    executionTimeMillis: 2,
    totalKeysExamined: 6,
    totalDocsExamined: 6,
…

The winning plan:

is cached
uses the compound index
shows good execution metrics

The Query Plan Cache is now populated with this optimal plan.

Inspecting the Query Plan Cache

You can inspect the cache using :

db.orders.getPlanCache().list()
[
  {
    version: '1',
    queryHash: '706C5F81',
    planCacheShapeHash: '706C5F81',
    planCacheKey: '2189DB6F',
    isActive: true,
    works: Long('7'),
    worksType: 'works',
    timeOfCreation: ISODate('2026-02-12T13:53:05.349Z'),
    createdFromQuery: {
      query: { status: 'PAID', customerId: 1234 },
      sort: {},
      projection: {}
    },
    cachedPlan: {
      stage: 'FETCH',
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { status: 1, customerId: 1 },
        indexName: 'status_1_customerId_1',
        isMultiKey: false,
        multiKeyPaths: { status: [], customerId: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: {
          status: [ '["PAID", "PAID"]' ],
          customerId: [ '[1234, 1234]' ]
        }
      }
    },
…
…

Alternatively, via aggregation:

db.orders.aggregate( [
     { $planCacheStats: { } }
] )

From the cache, you can see:

Unique identifiers for the query shape (queryHash, planCacheKey)
The cached execution plan (cachedPlan)
The expected cost (works)

Reusing the Cached Plan

Now run the same query shape with different values:

db.orders.find({
  status: "PAID",
  customerId: 4000
}).explain("executionStats")

From explain(), you will see:

winningPlan: {
  isCached: true,
  ...
}
executionStages: {
  isCached: true,
  ...
}
queryHash: '4C4D18FFE78D0BC85621B75A7F1C338AEB0F0A2D5E438A5A6DF815198F64283D'
planCacheKey: '2189DB6F'

This confirms the plan was reused from the cache (isCached: true and look at the referenced planCacheKey).

Verifying Cache Usage with the Profiler

If the profiler is enabled, you can check into the system.profile collection of your database:

db.system.profile.find({}).sort({ts: -1}).limit(1)
[
  {
    op: 'query',
    ns: 'queryCacheLab.orders',
    command: {
      find: 'orders',
      filter: { status: 'PAID', customerId: 1234 },
      lsid: { id: UUID('3a1f019f-1836-4c76-96c8-3ae596d01108') },
      '$clusterTime': {
        clusterTime: Timestamp({ t: 1770912932, i: 1 }),
        signature: {
          hash: Binary.createFromBase64('QOFvrL9a4DjdLUE0PEzVLvxHukI=', 0),
          keyId: Long('7605947023161819142')
        }
      },
      '$db': 'queryCacheLab'
    },
    keysExamined: 13,
    docsExamined: 13,
    fromPlanCache: true,
    nBatches: 1,
    cursorExhausted: true,
    numYield: 0,
    nreturned: 13,
    planCacheShapeHash: '706C5F81',
    queryHash: '706C5F81',
    planCacheKey: '2189DB6F',
    queryShapeHash: '4C4D18FFE78D0BC85621B75A7F1C338AEB0F0A2D5E438A5A6DF815198F64283D',
    queryFramework: 'classic',
…
…

This also confirms that the cached plan was used (fromPlanCache: true and look at the referenced planCacheKey).

When the Cached Plan becomes Suboptimal

Let’s degrade the selectivity by setting all orders to status=’PAID’.

db.orders.updateMany(
  {},
  { $set: { status: "PAID" } }
)

Now status is no longer selective.

Running the original query again:

db.orders.find({
  status: "PAID",
  customerId: 1234
}).explain("executionStats")

You may notice:

Higher keysExamined
Increased executionTimeMillis

MongoDB is still using the cached plan, even though data distribution has changed.

This could be the main concern about the cached query plans. If the data distribution changes without being so bad, MongoDB continues to use that plan. This way it could happen that the cached plan is no longer the best one.

Manually clearing the cache

Clear all cached plans and check it is really empty:

db.orders.getPlanCache().clear()
db.orders.getPlanCache().list()

Or remove a specific plan by providing the query shape:

db.orders.getPlanCache().clearPlansByQuery({ 
  status: 'PAID', 
  customerId: 1234 
})

Query Plan Cache invalidation

Since the data changes over the time, the query plan for a specific query can change as well. The winning plan depends on:

Available indexes
Data distribution.

Doing massive inserts, updates or deletions can alter the distribution of the data and a plan that was optimal yesterday, cannot be optimal today. This means the query plan needs to be recalculated from time to time to make sure MongoDB can rely on the most up-to-date and optimized one.

Plans are recalculated only when invalidated, not on a schedule.

Invalidation happens when:

An index is created
An index is dropped
mongod restarts
The cached plan becomes inefficient. MongoDB compares actual performance with expected performance (works). If the deviation is large (different order of magnitude), the plan is invalidated and replanned

The Re-Planning issue

Everything works well until you face the re-planning issue.

It could happen that every query triggers the invalidation of the cached plan, and so a re-planning is needed continuously. You can easily identify when this happens because usually you can notice a significant CPU utilization increase.

You can also verify in the mongod log what is happening:

{"t":{"$date":"2025-12-29T18:50:17.970+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   
"ctx":"conn4577913","msg":"Slow query","attr":{"type":"command",......
"planningTimeMicros":215107053,"keysExamined":46283,"docsExamined":46283,
"hasSortStage":true,"fromMultiPlanner":true,
"replanned":true,
"replanReason":"cached plan was less efficient than expected: expected trial execution 
to take 2 works but it took at least 20 works",
"nBatches":1,"cursorExhausted":true,"numYields":15603,"nreturned":0,
"queryHash":"6950AFA3","planCacheKey":"F8B9BF2F",
"queryFramework":"classic",......."durationMillis":215107}}

If you see fields like “replanned”:true, and “replanReason”:”cached plan was less efficient than expected: expected trial execution to take 2 works but it took at least 20 works”, it means the query required re-planning.

Look at the “planningTimeMicros”:215107053 and compare it with “durationMillis”:215107; almost the entire time was spent in the query planning.

Usually, a query replanned is also a slow query. Then the occurrences are easy to identify by filtering the log with “id”:51803. This is the unique identifier for all slow query entries in the log.

Common causes of continuous re-planning:

Having unstable queries that are data-dependent
- Indexed values are not evenly distributed: a plan is good for specific values of the filter conditions and not for others
Having right indexes but not selective enough
There are 2 or more plans providing the same performance. Little variations in the data can change the real performance of the queries
The cardinality of indexed fields changes rapidly. This could happen when:
- Doing massive inserts
- Doing massive deletes
- Having TTL indexes that delete lots of documents
- Using capped collection that overwrites lots of documents
Using $or, $in or $regex operators can lead to completely different results depending on the number of items you specify in the query

Write-intensive workloads are generally more sensitive than read-intensive ones.

In short: the re-planning loop occurs when data changes faster than the planner can adapt.

Mitigating Re-Planning

There is no universal fix, but you can

Analyze the affected query and create more specific and selective indexes
Avoid overly generic queries that can be affected by the uneven data distribution
Reduce the number of items when using $or or $in operators. Eventually split the query in multiple smaller queries
Regularly analyze explain(“executionStats”) for monitoring the execution plan
Use hint() to force the utilization of a specific index in extreme cases.
- Using hint() in production is not ideal, but in pathological situations it can immediately stabilize performance while you investigate properly.

About hint(), there is a new feature available in MongoDB 8.0 that provides a way to hint a specific index for any query shape for the entire cluster. More details on this page.

Query Plan Cache is Local

Important to remember is that the Query Plan Cache is:

Per collection
Local to each mongod instance

There is no cluster-wide global cache.

In a sharded cluster:

Each shard may have a different plan for every query
PRIMARY and SECONDARY nodes may differ as well if you created different indexes
Chunk migrations can influence the plan selection

Starting from version 6.3 it is possible configuring the size of the Query Plan Cache using the parameter planCacheSize. It can be set to a percentage of available memory like for example “8.5%” or to a fixed size in MB or GB, like for example “100MB” or “1GB”. It defaults to 5%.

For more details look at this page.

Conclusion

The Query Plan Cache is a powerful MongoDB feature that stabilizes and improves query performance by avoiding repeated planning. This also helps to increase scalability.

However, in specific scenarios—especially with unstable data distribution or low-selectivity indexes—the re-planning loop can cause high CPU usage and degraded performance.

Understanding how the cache works, how invalidation happens, and how to detect re-planning issues is essential to maintaining predictable and efficient query performance.

The post MongoDB Query Plan Cache Explained: Performance, Pitfalls, and Re-Planning appeared first on Percona.

The post MongoDB Query Plan Cache Explained: Performance, Pitfalls, and Re-Planning appeared first on MariaDB.org.

Auditing Login Attempts in MySQL and MariaDB

Mon, 13 Apr 2026 21:08:26 +0000

My colleague Miguel wrote about ways to audit login attempts in MySQL over 13 years ago, and this is still a relevant subject. I decided to refresh this topic to include some important changes since then.

Very often, it is important to track login attempts to our databases due to security reasons as well as to catch application misconfigurations. I’ll focus here on the most convenient ways to log authentication attempts. While auditing all client connections is usually pointless on busy production systems, when even thousands of new client sessions may be authenticating per second, let’s concentrate especially on the failed ones.

The Error Log

All MySQL and MariaDB variants have an easy way of logging failed authentication attempts in the standard error log via elevated log verbosity. To enable it, in older MySQL versions up to 5.7, as well as in all MariaDB versions, set log_warnings = 2 (or higher).

In MySQL 5.7+, the role of setting the error log talkativeness was moved to the new variable: log_error_verbosity. MySQL 8.0+ no longer recognizes log_warnings. To log failed logins, set log_error_verbosity = 3 instead.

An example log entry when login credentials were incorrect may look like this:

2026-02-24T20:36:09.186218Z 22 [Note] [MY-010926] [Server] Access denied for user 'myuser2'@'192.168.121.12' (using password: YES)

Another error occurs when user credentials are fine, but the user does not have privileges to access a given database:

2026-02-24T23:12:19.600451Z 35 [Note] [MY-010914] [Server] Access denied for user 'myuser'@'192.168.46.%' to database 'test1'

Now, the problem with this is that the above note will only be printed when there is a system user allowing the source hostname, IP address, or subnet, and the actual username authentication phase is engaged. When the host is rejected early, we will not find anything in the error log! Only the client side will receive a rejection message, like this one:

$ mysql -h192.168.46.20 -uuser
ERROR 1130 (HY000): Host '192.168.46.13' is not allowed to connect to this MySQL server

Therefore, the error log cannot provide the complete information about failed login attempts if some are incoming from undefined hosts. Btw, the general log does not log these either.

The Audit Log (old type)

The traditional Audit Log Plugin feature, available in all recent MySQL variants, allows us to limit logging for connection activities, while we can ignore other queries. Unfortunately, we cannot log only failed logins, so the log may grow very fast when applications actively open new connections, and further filtering for failed attempts has to be done externally.

In Percona Audit Log Plugin, it can be done via the policy setting: audit_log_policy = LOGINS.

An example of a netcat TCP probe will result in the following audit log entry:

tail -1 /var/lib/mysql/audit.log |jq
{
  "audit_record": {
    "name": "Connect",
    "record": "2138_2026-02-26T09:32:27",
    "timestamp": "2026-02-26T09:33:54Z",
    "connection_id": "13",
    "status": 1158,
    "user": "",
    "priv_user": "",
    "os_login": "",
    "proxy_user": "",
    "host": "test-host1",
    "ip": "192.168.46.12",
    "db": ""
  }
}

The entry provides a status error code:

$ perror 1158
MySQL error code MY-001158 (ER_NET_READ_ERROR): Got an error reading communication packets

Interestingly, the corresponding error log entry has the same message with a different code:

2026-02-26T09:33:54.917323Z 13 [Note] [MY-010914] [Server] Got an error reading communication packets

Another example of an unknown user login attempt:

{
  "audit_record": {
    "name": "Connect",
    "record": "2142_2026-02-26T09:32:27",
    "timestamp": "2026-02-26T09:39:45Z",
    "connection_id": "17",
    "status": 1045,
    "user": "wronguser",
    "priv_user": "",
    "os_login": "",
    "proxy_user": "",
    "host": "test-host1",
    "ip": "192.168.46.12",
    "db": ""
  }
}

Here, we can determine that the user part is wrong, as the priv_user field is empty.

A typical legit client session will have two entries, with status “0”, i.e:

{"audit_record":{"name":"Connect","record":"2145_2026-02-26T09:32:27","timestamp":"2026-02-26T09:42:27Z","connection_id":"19","status":0,"user":"myuser","priv_user":"myuser","os_login":"","proxy_user":"","host":"test-host1","ip":"192.168.46.13","db":""}}
{"audit_record":{"name":"Quit","record":"2146_2026-02-26T09:32:27","timestamp":"2026-02-26T09:42:27Z","connection_id":"19","status":0,"user":"myuser","priv_user":"myuser","os_login":"","proxy_user":"","host":"test-host1","ip":"192.168.46.13","db":""}}

In the MariaDB Audit Plugin, connection events can be logged via the server_audit_events = CONNECT setting. An example logging attempt with the wrong password would be seen as:

20260226 10:52:48,test-host1,root,localhost,8,0,FAILED_CONNECT,,,1045
20260226 10:52:48,test-host1,root,localhost,8,0,DISCONNECT,,,0

The Audit Log Filter (new type)

The new audit log functionality comes as a plugin in versions 8.0.x or a component in versions 8.4+, and provides way higher flexibility and finer-grained control on what and when is logged.

With Audit Log Filter, to minimize noise and overhead, it is possible to log only failed login attempts, so legitimate application sessions won’t flood the log.

Here is an example of how to enable such a filter rule:

mysql› SELECT audit_log_filter_set_filter(
  'failed_logins_only',
  '{
    "filter": {
      "log": false,
      "class": {
        "name": "connection",
        "event": {
          "name": "connect",
          "log": {
            "not": {
              "field": { "name": "status", "value": "0" }
            }
          }
        }
      }
    }
  }'
);
mysql› SELECT audit_log_filter_set_user('%', 'failed_logins_only');

The above rules set logging for the connection class, for all events that resulted in an error (status is not success).

An example entry for a client trying to authenticate with the wrong user credentials may look like this (if JSON format is used):

{
    "timestamp": "2026-03-05 11:31:12",
    "id": 8,
    "class": "connection",
    "event": "connect",
    "connection_id": 31,
    "account": { "user": "wronguser", "host": "" },
    "login": { "user": "wronguser", "os": "", "ip": "192.168.121.16", "proxy": "" },
    "connection_data": {
      "connection_type": "ssl",
      "status": 1045,
      "db": ""
    },
    "connection_attributes": {
      "_pid": "3608615",
      "_platform": "x86_64",
      "_os": "Linux",
      "_client_name": "libmysql",
      "os_user": "przemek",
      "_client_version": "8.4.7-7"
    }
  }

And a simple TCP probe (netcat) may look like this:

{
    "timestamp": "2026-03-05 11:31:26",
    "id": 9,
    "class": "connection",
    "event": "connect",
    "connection_id": 32,
    "account": { "user": "", "host": "test-host1" },
    "login": { "user": "", "os": "", "ip": "192.168.46.13", "proxy": "" },
    "connection_data": {
      "connection_type": "tcp/ip",
      "status": 1158,
      "db": ""
    }
  }

Unfortunately, again, connections from unknown hosts are not logged. Logging attempts aborted early on the host validation phase, do not reach the audit log or any other log! This is because the validation occurs at the very beginning of the handshake workflow, and if it returns failure, the connection is terminated before reaching the logging capabilities.

To workaround this limitation and make all unsuccessful login attempts be logged, we need to create a catch-all user account. However, we don’t want to extend the possible attack surface at the same time, so let’s disable authentication entirely for such an account. We will need the no-login plugin first:

INSTALL PLUGIN mysql_no_login SONAME 'mysql_no_login.so';

Now, the following user entry will allow auditing connection attempts from any host/IP:

CREATE USER ''@'%' IDENTIFIED WITH mysql_no_login;

Now, specific port knocking may still not be logged, like with nmap, as the client sends a RST packet even before the initial MySQL “hello” packet. So, we will not see this session anywhere in the MySQL logs:

$ nmap 192.168.46.20 -sT -p 3306
Starting Nmap 7.94SVN ( https://nmap.org ) at 2026-03-11 07:24 CET
Nmap scan report for 192.168.46.20 (192.168.46.20)
Host is up (0.00021s latency).
PORT     STATE SERVICE
3306/tcp open  mysql
Nmap done: 1 IP address (1 host up) scanned in 0.02 seconds

The only way to capture the above would be to watch the wire protocol, i.e., with tcpdump.

Additional Instrumentation

The error log entries can be viewed in a more structured manner, with the P_S table, so the entry:

2026-02-25T20:46:40.779435Z 66 [Note] [MY-010926] [Server] Access denied for user 'wronguser'@'test-host1' (using password: YES)

Will have equivalent in:

mysql› select * from performance_schema.error_log order by LOGGED desc limit 1⧵G
*************************** 1. row ***************************
    LOGGED: 2026-02-25 20:46:40.779435
 THREAD_ID: 66
      PRIO: Note
ERROR_CODE: MY-010926
 SUBSYSTEM: Server
      DATA: Access denied for user 'wronguser'@'test-host1' (using password: YES)
1 row in set (0.00 sec)

And it allows SQL queries against the log, like aggregations:

mysql› select count(*) from performance_schema.error_log where ERROR_CODE='MY-010926';
+----------+
| count(*) |
+----------+
|       11 |
+----------+
1 row in set (0.00 sec)

Also, MySQL keeps track of connection attempts from external hosts in the host cache structure, which can be observed via performance_schema.host_cache view. The view provides statistics on the number of failed connections grouped by reason categories, together with success and failure timestamps. This view provides information only about source hosts’ connection events, but not users. An example output may look like this:

mysql› select * from performance_schema.host_cache⧵G
*************************** 1. row ***************************
                                        IP: 192.168.46.12
                                      HOST: test-host1
                            HOST_VALIDATED: YES
                        SUM_CONNECT_ERRORS: 2
                 COUNT_HOST_BLOCKED_ERRORS: 0
           COUNT_NAMEINFO_TRANSIENT_ERRORS: 0
           COUNT_NAMEINFO_PERMANENT_ERRORS: 0
                       COUNT_FORMAT_ERRORS: 0
           COUNT_ADDRINFO_TRANSIENT_ERRORS: 0
           COUNT_ADDRINFO_PERMANENT_ERRORS: 0
                       COUNT_FCRDNS_ERRORS: 0
                     COUNT_HOST_ACL_ERRORS: 0
               COUNT_NO_AUTH_PLUGIN_ERRORS: 0
                  COUNT_AUTH_PLUGIN_ERRORS: 3
                    COUNT_HANDSHAKE_ERRORS: 2
                   COUNT_PROXY_USER_ERRORS: 0
               COUNT_PROXY_USER_ACL_ERRORS: 0
               COUNT_AUTHENTICATION_ERRORS: 5
                          COUNT_SSL_ERRORS: 0
         COUNT_MAX_USER_CONNECTIONS_ERRORS: 0
COUNT_MAX_USER_CONNECTIONS_PER_HOUR_ERRORS: 0
             COUNT_DEFAULT_DATABASE_ERRORS: 2
                 COUNT_INIT_CONNECT_ERRORS: 0
                        COUNT_LOCAL_ERRORS: 0
                      COUNT_UNKNOWN_ERRORS: 0
                                FIRST_SEEN: 2026-02-25 20:41:57
                                 LAST_SEEN: 2026-02-26 09:02:35
                          FIRST_ERROR_SEEN: 2026-02-25 20:42:17
                           LAST_ERROR_SEEN: 2026-02-26 09:02:38
1 row in set (0.00 sec)

Depending on what was wrong with the connection or authentication attempt, a different counter will increment. For instance, when a user tries to log in with the default database, it does not have the privilege to, the COUNT_DEFAULT_DATABASE_ERRORS will increase. When an unknown user tries to log in without a password, the COUNT_AUTH_PLUGIN_ERRORS is used, but when any user (existing or not) tries a wrong password, the COUNT_AUTHENTICATION_ERRORS gets used instead.

Some cases of port probing, for example, using telnet or netcat, will increment COUNT_HANDSHAKE_ERRORS. Interestingly, though, the nmap probe does not increment any of them.

Also, when there is no catch-all user and no user entry matching the source IP/host/network, the host_cache table will allow us to at least observe statistics about failed logins per source IP, as each connection refused on the host validation phase will increment the COUNT_HOST_ACL_ERRORS counter.

Some limited visibility into failed login attempts can be available via the Connection Control Plugin. However, this plugin’s main purpose is not auditing, but rather slowing down brute force attacks against MySQL user accounts. Still, when installed, failed attempt counts are visible via the following view:

mysql› select * from information_schema.CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS;
+---------------------+-----------------+
| USERHOST            | FAILED_ATTEMPTS |
+---------------------+-----------------+
| ''@'test-host1'     |              10 |
| ''@'192.168.121.13' |               4 |
| 'root'@'localhost'  |               3 |
| ''@'%'              |              10 |
+---------------------+-----------------+
4 rows in set (0.01 sec)

Finally, another per-user view was added in recent Percona Server versions, which shows how many failed login attempts are left until the account gets locked. This feature is active (tracking active) only for the accounts that have the FAILED_LOGIN_ATTEMPTS and PASSWORD_LOCK_TIME limitations:

mysql› select * from performance_schema.account_failed_login_lock_status;
+-------------------+-----------+--------------------+--------------+--------------------+-----------+--------------------+-----------------------+
| USER              | HOST      | IS_TRACKING_ACTIVE | MAX_ATTEMPTS | PASSWORD_LOCK_DAYS | IS_LOCKED | REMAINING_ATTEMPTS | REMAINING_DAYS_LOCKED |
+-------------------+-----------+--------------------+--------------+--------------------+-----------+--------------------+-----------------------+
| mysql.infoschema  | localhost | NO                 |            0 |                  0 | NULL      |               NULL |                  NULL |
| mysql.session     | localhost | NO                 |            0 |                  0 | NULL      |               NULL |                  NULL |
| mysql.sys         | localhost | NO                 |            0 |                  0 | NULL      |               NULL |                  NULL |
| percona.telemetry | localhost | NO                 |            0 |                  0 | NULL      |               NULL |                  NULL |
| root              | localhost | NO                 |            0 |                  0 | NULL      |               NULL |                  NULL |
| user1             | %         | YES                |          100 |                  1 | NO        |                 96 |                     0 |
| NULL              | %         | NO                 |            0 |                  0 | NULL      |               NULL |                  NULL |
+-------------------+-----------+--------------------+--------------+--------------------+-----------+--------------------+-----------------------+
7 rows in set (0.00 sec)

Summary

There are multiple ways to monitor MySQL or MariaDB login attempts, but only the Audit Log Filter allows you to specify exactly what you want to log, like only failed logins, for instance. Due to MySQL connection handling behavior, in order to log authentication attempts from undefined hosts, a secure catch-all user account may be needed, though.

As for the TCP scanners, MySQL may not be able to log all such connection attempts, depending on how fast the connection is terminated.

The article was created by a human.

The post Auditing Login Attempts in MySQL and MariaDB appeared first on Percona.

The post Auditing Login Attempts in MySQL and MariaDB appeared first on MariaDB.org.