Spatio-temporal Bayesian inference drives environmental and health sciences using latent Gaussian models. INLA enables inference for these models at HPC scale but relies on derivative-based optimization over d hyperparameters. State-of-the-art INLA implementations approximate derivatives via central finite differences (2d+1 evaluations). ADELIA replaces this with reverse-mode automatic differentiation on a multi-GPU backward pass, achieving 4.2–7.9x per-gradient speedups and 5–8x energy savings over DALIA while enabling convergence where finite differences stall.
@article{boudaoud2026adelia,title={ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations},author={Boudaoud, Afif and Gaedke-Merzh{\"a}user, Lisa and Ziogas, Alexandros Nikolaos and Maillou, Vincent and Calotoiu, Alexandru and Copik, Marcin and Rue, H{\aa}vard and Luisier, Mathieu and Hoefler, Torsten},journal={arXiv preprint arXiv:2605.06392},year={2026},}
2025
CLUSTER ’25
DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing
Afif Boudaoud, Alexandru Calotoiu, Marcin Copik, and 1 more author
In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER), 2025
A general-purpose automatic differentiation engine that requires no code modifications and uses a novel ILP-based algorithm to balance storing vs. recomputing intermediates under a memory budget. Benchmarked on NPBench, it is on average over 92x faster than JAX on gradient computation for scientific computing patterns.
@inproceedings{boudaoud2025daceAD,title={{DaCe AD}: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing},author={Boudaoud, Afif and Calotoiu, Alexandru and Copik, Marcin and Hoefler, Torsten},booktitle={Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER)},year={2025},}
SC ’25
PerfDojo: Automated ML Library Generation for Heterogeneous Architectures
Andrei Ivanov, Siyuan Shen, Gioele Gottardo, and 5 more authors
In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2025
@inproceedings{ivanov2025perfdojo,title={{PerfDojo}: Automated ML Library Generation for Heterogeneous Architectures},author={Ivanov, Andrei and Shen, Siyuan and Gottardo, Gioele and Chrapek, Marcin and Boudaoud, Afif and Schneider, Timo and Benini, Luca and Hoefler, Torsten},booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC)},year={2025},doi={10.1145/3712285.3759900},}
PACT ’25
LOOPer: A Learned Automatic Code Optimizer for Polyhedral Compilers
Massinissa Merouani, Afif Boudaoud, Iheb Nassim Aouadj, and 7 more authors
In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2025
@inproceedings{merouani2025looper,title={{LOOPer}: A Learned Automatic Code Optimizer for Polyhedral Compilers},author={Merouani, Massinissa and Boudaoud, Afif and Aouadj, Iheb Nassim and Tchoulak, Nassim and Bernou, Islem Kara and Benyamina, Hamza and Benbouzid-Si Tayeb, Fatima and Benatchba, Karima and Leather, Hugh and Baghdadi, Riyadh},booktitle={Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT)},year={2025},}
dataset
LOOPerSet: A Large-Scale Dataset for Data-Driven Polyhedral Compiler Optimization
Massinissa Merouani, Afif Boudaoud, and Riyadh Baghdadi
An open-source dataset of 28 million labeled data points derived from 220,000 synthetically generated polyhedral programs, linking transformations to execution-time measurements for training learned cost models and auto-schedulers.
@article{merouani2025looperset,title={{LOOPerSet}: A Large-Scale Dataset for Data-Driven Polyhedral Compiler Optimization},author={Merouani, Massinissa and Boudaoud, Afif and Baghdadi, Riyadh},journal={arXiv preprint arXiv:2510.10209},year={2025},}