Building a relational data lake with MariaDB ColumnStore

This video was presented at the MariaDB Server Fest, held online from 14-20 September 2020.

Q&A links

Other links

Abstract

To provide VirtualHealth data scientists and developers with on-demand access to de-identified patient data of increasing volume and complexity, we chose a relational data lake approach, where daily OLTP data snapshots may retain their original form and format. Lowering costs of keeping read-only daily snapshots live, we chose MariaDB ColumnStore, taking advantage of its inherent data compression. We present real-world analytics use cases and share tips and tricks we have learned.

Presenter

Sasha Vaniachine is a MariaDB enthusiast experienced in troubleshooting and resolving scalability issues across the full software stack. In industry and academia, he scaled up data processing from terabytes to petabytes, while minimizing data losses below acceptable level. Early in his career Sasha pioneered deployment of MySQL databases on Virtual Machines.

Date and time

  • Paris: Wednesday 16 September, 16.10 – 16.35 CEST (UTC +2)
  • New York: Thursday September 17, 15.35 – 16.00 / 3.35pm – 4pm EDT (UTC -4)