Array Databases: Supporting the Fourth Big Data Type

Presentation

Tuesday, May 6, 2025, 13:00 – Follow online

Abstract

Arrays as a fundamental data category have found their way into the orchestration of data models supported by databases. While OLAP “datacubes” can be emulated relationally to some extent it was in particular applications in science and engineering that have prompted support for arrays regardless of sparsity and with dedicated powerful array operators supporting n-D Tensor Algebra.

The pioneer Array DBMS, rasdaman (“raster data manager”), is a clean-slate DBMS implementation which, based on an algebraic foundation for query language, storage, architecture, and optimization, aims at large-scale practical use. To satisfy the needs of datacube providers and users solutions had to be found which sometimes are driven by formalized concepts and sometimes by pragmatism, ultimately addressed through a fruitful collaboration of science and industry. Among such challenges addressed are: Novel algorithms for (distributed) array joins; deep support for space and time semantics in geo datacubes which has led to novel data and language concepts meantime adopted as standards; access control on regions within a cube, given the sheer datacube size; automated query splitting in federations of autonomous instances requires combining of global authentication with local authorization; last but not least, combining AI and datacubes is a current topic of active research.

We give an overview of the rasdaman Array DBMS discussing selected challenges and ongoing research, with emphasis on applications in the Earth sciences. Live demos will illustrate our talk.

Speaker: Prof. Dr. Peter Baumann, Professor of Computer Science @Constructor University