Separate Compilation and Partial Linking: Modules for Datalog IR

Published in GPCE, 2024

Recommended citation: David Klopp, André Pacak, Sebastian Erdweg. (2024). "Separate Compilation and Partial Linking: Modules for Datalog IR." GPCE (2024) . https://dl.acm.org/doi/10.1145/3689484.3690737

In recent years, Datalog has sparked renewed interest in academia and industry, leading to the development of numerous new Datalog systems. To unify these systems, recent approaches treat Datalog as an intermediate representation (IR) in a compiler framework: Compiler frontends can lower different Datalog dialects to the same IR, which is then optimized before a compiler backend targets one of many existing Datalog engines. However, a key feature is missing in these compiler frameworks: an expressive module system. In this paper, we present the first module system for a Datalog IR. Our modules are statically typed, can be separately compiled, and partially linked to form “bundles”. Since IR modules are generated by a compiler frontend, we rely on explicit declarations of required and provided relations to maximize the decoupling between modules. This also allows modules to abstract over required relations to offer reusable functionality (e.g., computing a transitive closure) that can be instantiated for different relations in a single Datalog program. We formalize the module system, its type system, and the linking algorithm. We then describe how different usage patterns that occur in Datalog dialects (e.g., inheritance, cyclic imports) can be expressed in our IR module system. Finally, we integrate our module system into an existing Datalog compiler framework, develop a Soufflé compiler frontend that translates Soufflé components to IR modules, and demonstrate its applicability to a large Doop analysis.

Download paper here