Modern Data Management Systems increasingly abandon monolithic architectures in favor of compositions of specialized components. Storage layers like Parquet and Arrow are combined with kernels like Velox and RocksDB, optimizers like Apache Calcite or Orca and other specialized components to build systems optimized for a specific domain, execution environment or even application.
Unfortunately, the architecture of Data Management Systems and the interfaces between components are the same as 30 years ago: highly efficient but rigid. This rigidity obstructs the adoption of novel ideas and techniques such as hardware acceleration, adaptive processing, learned optimization, or serverless execution in real-world systems.
To address this impasse, my group at Imperial is developing a novel approach to data management system composition inspired by two principles stemming from compiler-construction research: a homoiconic representation of data and code and partial evaluation of queries by components. I present an implementation of the approach in a new system called BOSS and illustrate how BOSS achieves a fully composable design that effectively combines different data models, hardware platforms and processing engines. I will demonstrate how this design allowed my group to implement features like GPU acceleration of relational queries and generative data cleaning in weeks (rather than years) in a system without (measurable) overhead compared to a monolithic design.
Post Talk Link: Click Here
Passcode: n6^Qpdwc
Holger Pirk is an Associate Professor in the Large-Scale Data and Systems group at Imperial College London. He is interested in all things data: analytics, transactions, systems, algorithms, data structures, processing models and everything in between. While some of his work targets "traditional" relational databases, the objective is to broaden the applicability of data management techniques. To this end, Holger studies “Composable Database Systems”: systems that are extensible to heterogeneous workloads, data models and hardware. This naturally leads to research at the intersection of data management, compilers and computer architecture, targeting applications like Generative Modeling, Graph Processing as well as “classic” Data Analytics. Before joining Imperial, he was a Postdoc in the Database group at MIT CSAIL. He spent his PhD years in the Database Architectures group at CWI in Amsterdam, resulting in a PhD from the University of Amsterdam in 2015. Before that, he received a master's degree in computer science from the Humboldt-Universität of Berlin in 2010.
Read More
Read More