PerfDojo

Automated ML library generation for heterogeneous architectures

PerfDojo: Automated ML Library Generation for Heterogeneous Architectures

PerfDojo frames performance optimization as a reinforcement-learning game over a human-readable, mathematically-inspired code representation whose transformations are guaranteed to preserve semantics. On top of it, PerfLLM combines LLMs with RL to search this transformation space across diverse CPU (x86, Arm, RISC-V) and GPU architectures. On GH200, the resulting code achieves a geometric mean speedup of 6.65× over PyTorch and 13.65× over TVM.

References