Proteus is a database engine designed for today's heterogeneous environments. Proteus adapts to variable data, hardware and workloads through a combination of GPU acceleration, data virtualization, and adaptive scheduling.
Fast Queries Over Heterogeneous Data Through Engine Customization
VLDB 2016.Abstract
Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of heterogeneous datasets to gain insights. The different data models and formats pose a significant challenge on performing analysis over a combination of diverse datasets. Serving all queries using a single, general-purpose query engine is slow. On the other hand, using a specialized engine for each heterogeneous dataset increases complexity: queries touching a combination of datasets require an integration layer over the different engines.
This paper presents a system design that natively supports heterogeneous data formats and also minimizes query execution times. For multi-format support, the design uses an expressive query algebra which enables operations over various data models. For minimal execution times, it uses a code generation mechanism to mimic the system and storage most appropriate to answer a query fast. We validate our design by building Proteus, a query engine which natively supports queries over CSV, JSON, and relational binary data, and which specializes itself to each query, dataset, and workload via code generation. Proteus outperforms state-of-the-art opensource and commercial systems on both synthetic and real-world workloads without being tied to a single data model or format, all while exposing users to a single query interface.
Links
@article{DBLP:journals/pvldb/KarpathiotakisA16, author = {Manos Karpathiotakis and Ioannis Alagiannis and Anastasia Ailamaki}, title = {Fast Queries Over Heterogeneous Data Through Engine Customization}, journal = {Proc. {VLDB} Endow.}, volume = {9}, number = {12}, pages = {972--983}, year = {2016}, url = {http://www.vldb.org/pvldb/vol9/p972-karpathiotakis.pdf}, doi = {10.14778/2994509.2994516}, timestamp = {Sat, 25 Apr 2020 13:58:55 +0200}, biburl = {https://dblp.org/rec/journals/pvldb/KarpathiotakisA16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }