CGO 2025
23rd ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2025)
Powered by
Conference Publishing Consulting

23rd ACM/IEEE International Symposium on Code Generation and Optimization (CGO 2025), March 1–5, 2025, Las Vegas, NV, USA

CGO 2025 – Proceedings

Contents - Abstracts - Authors

Frontmatter

Title Page
Article: cgo25foreword-fm000-p doi:
Welcome from the General Chairs
Article: cgo25foreword-fm001-p doi:
Welcome from the Program Chairs
Article: cgo25foreword-fm004-p doi:
CGO 2025 Organization
Article: cgo25foreword-fm002-p doi:
CGO 2025 Sponsors and Supporters
Article: cgo25foreword-fm003-p doi:

Distinguished Papers

Synthesis of Sorting Kernels
Marcel Ullrich and Sebastian Hack
(Saarland University, Germany)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p189-p doi:10.1145/3696443.3708954
Tensorize: Fast Synthesis of Tensor Programs from Legacy Code using Symbolic Tracing, Sketching and Solving
Alexander Brauckmann, Luc Jaulmes, José W. de Souza Magalhães, Elizabeth Polgreen, and Michael F. P. O’Boyle
(University of Edinburgh, UK)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p209-p doi:10.1145/3696443.3708956
Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization
Huanting Wang, Patrick Lenihan, and Zheng Wang
(University of Leeds, UK)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p247-p doi:10.1145/3696443.3708959

Optimizations and Transformations (1)

SySTeC: A Symmetric Sparse Tensor Compiler
Radha Patel, Willow Ahrens, and Saman Amarasinghe
(Massachusetts Institute of Technology, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Article: cgo25main-p56-p doi:10.1145/3696443.3708919
Pattern Matching in AI Compilers and Its Formalization
Joseph W. Cutler, Alex Collins, Bin Fan, Mahesh Ravishankar, and Vinod Grover
(University of Pennsylvania, USA; NVIDIA, USA; NVIDIA, UK; AMD, USA)
Publisher's Version Article: cgo25main-p17-p doi:10.1145/3696443.3708934
Scalar Interpolation: A Better Balance between Vector and Scalar Execution for SuperScalar Architectures
Reza Ghanbari, Henry Kao, João P. L. De Carvalho, Ehsan Amiri, and J. Nelson Amaral
(University of Alberta, Canada; Huawei Technologies, Canada)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: cgo25main-p159-p doi:10.1145/3696443.3708950

ML Tools and Optimization

VEGA: Automatically Generating Compiler Backends using a Pre-trained Transformer Model
Ming Zhong, Fang Lv, Lulin Wang, Lei Qiu, Yingying Wang, Ying Liu, Huimin Cui, Xiaobing Feng, and Jingling Xue
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; UNSW, Australia)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p7-p doi:10.1145/3696443.3708931
IntelliGen: Instruction-Level Auto-tuning for Tensor Program with Monotonic Memory Optimization
Zixuan Ma, Haojie Wang, Jingze Xing, Shuhong Huang, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Mingshu Zhai, Shizhi Tang, Penghan Wang, and Jidong Zhai
(Tsinghua University, China; Qingcheng.AI, China)
Publisher's Version Article: cgo25main-p812-p doi:10.1145/3696443.3708967
GraalNN: Context-Sensitive Static Profiling with Graph Neural Networks
Lazar Milikic, Milan Cugurovic, and Vojin Jovanovic
(Oracle Labs, Switzerland; Oracle Labs, Serbia)
Publisher's Version Article: cgo25main-p236-p doi:10.1145/3696443.3708958
LLM-Vectorizer: LLM-Based Verified Loop Vectorizer
Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, and Shuvendu K. Lahiri
(Microsoft Research, USA; University of Toronto, Canada)
Publisher's Version Article: cgo25main-p132-p doi:10.1145/3696443.3708929

Architectures and Code Generation

Calibro: Compilation-Assisted Linking-Time Binary Code Outlining for Code Size Reduction in Android Applications
Zhanhao Liang, Hanming Sun, Wenhan Shang, Mengting Yuan, Jingqin Fu, Jiang Ma, Chun Jason Xue, and Qingan Li
(Wuhan University, China; Wuhan Broadcasting and Television Station, China; Guangdong OPPO Mobile Telecommunications, China; MBZUAI, United Arab Emirates)
Publisher's Version Article: cgo25main-p193-p doi:10.1145/3696443.3708955
A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions
Alexandre Lopoukhine, Federico Ficarelli, Christos Vasiladiotis, Anton Lydike, Josse Van Delm, Alban Dutilleul, Luca Benini, Marian Verhelst, and Tobias Grosser
(University of Cambridge, UK; University of Bologna, Italy; Cineca, Italy; University of Edinburgh, UK; KU Leuven, Belgium; ENS Rennes, France; ETH Zurich, Switzerland)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p170-p doi:10.1145/3696443.3708952
xDSL: Sidekick Compilation for SSA-Based Compilers
Mathieu Fehr, Michel Weber, Christian Ulmann, Alexandre Lopoukhine, Martin Paul Lücke, Théo Degioanni, Christos Vasiladiotis, Michel Steuwer, and Tobias Grosser
(University of Edinburgh, UK; ETH Zurich, Switzerland; University of Cambridge, UK; ENS Rennes, France; Technische Universität Berlin, Germany)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p91-p doi:10.1145/3696443.3708945

ML Compilers

ANT-ACE: An FHE Compiler Framework for Automating Neural Network Inference
Long Li, Jianxin Lai, Peng Yuan, Tianxiang Sui, Yan Liu, Qing Zhu, Xiaojing Zhang, Linjie Xiao, Wenguang Chen, and Jingling Xue
(Ant Group, China; Tsinghua University, China; UNSW, Australia; Ant Group, Australia)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p95-p doi:10.1145/3696443.3708924
CUrator: An Efficient LLM Execution Engine with Optimized Integration of CUDA Libraries
Yoon Noh Lee, Yongseung Yu, and Yongjun Park
(Yonsei University, South Korea)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p87-p doi:10.1145/3696443.3708944
Accelerating LLMs using an Efficient GEMM Library and Target-Aware Optimizations on Real-World PIM Devices
Hyeoncheol Kim, Taehoon Kim, Taehyeong Park, Donghyeon Kim, Yongseung Yu, Hanjun Kim, and Yongjun Park
(Yonsei University, South Korea; Rebellions, South Korea; Hanyang University, South Korea)
Publisher's Version Article: cgo25main-p188-p doi:10.1145/3696443.3708953

MLIR

The MLIR Transform Dialect: Your Compiler Is More Powerful Than You Think
Martin Paul Lücke, Oleksandr Zinenko, William S. Moses, Michel Steuwer, and Albert Cohen
(University of Edinburgh, UK; Google DeepMind, France; University of Illinois at Urbana-Champaign, USA; Google DeepMind, USA; Technische Universität Berlin, Germany)
Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p73-p doi:10.1145/3696443.3708922
Combining MLIR Dialects with Domain-Specific Architecture for Efficient Regular Expression Matching
Andrea Somaini, Filippo Carloni, Giovanni Agosta, Marco D. Santambrogio, and Davide Conficconi
(Politecnico di Milano, Italy)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p1-p doi:10.1145/3696443.3708916
DialEgg: Dialect-Agnostic MLIR Optimizer using Equality Saturation with Egglog
Abd-El-Aziz Zayed and Christophe Dubach
(McGill University, Canada; Mila, Canada)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p229-p doi:10.1145/3696443.3708957

Quantum Computing (1)

Synthesis of Quantum Simulators by Compilation
Meisam Tarabkhah, Mahshid Delavar, Mina Doosti, and Amir Shaikhha
(University of Edinburgh, UK; University of Sheffield, UK)
Publisher's Version Article: cgo25main-p133-p doi:10.1145/3696443.3708949
Weaver: A Retargetable Compiler Framework for FPQA Quantum Architectures
Oğuzcan Kırmemiş, Francisco Romão, Emmanouil Giortamis, and Pramod Bhatotia
(TU Munich, Germany)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p61-p doi:10.1145/3696443.3708965

Program Analysis and Synthesis

Automatic Synthesis of Specialized Hash Functions
Renato B. Hoffmann, Leonardo G. Faé, Dalvan Griebler, Xinliang David Li, and Fernando Magno Quintão Pereira
(PUC-RS, Brazil; Google, USA; Federal University of Minas Gerais, Brazil)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p51-p doi:10.1145/3696443.3708940
Stack Filtering: Elevating Precision and Efficiency in Rust Pointer Analysis
Wei Li, Dongjie He, Wenguang Chen, and Jingling Xue
(UNSW, Australia; Chongqing University, China; Tsinghua University, China)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p68-p doi:10.1145/3696443.3708921
SkipFlow: Improving the Precision of Points-to Analysis using Primitive Values and Predicate Edges
David Kozak, Codrut Stancu, Tomáš Vojnar, and Christian Wimmer
(Oracle Labs, Czechia; Brno University of Technology, Czechia; Oracle Labs, Switzerland; Masaryk University, Czechia; Oracle Labs, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p9-p doi:10.1145/3696443.3708932

Safety and Resilience

FastFlip: Compositional SDC Resiliency Analysis
Keyur Joshi, Rahul Singh, Tommaso Bassetto, Sarita Adve, Darko Marinov, and Sasa Misailovic
(University of Illinois at Urbana-Champaign, USA)
Publisher's Version Article: cgo25main-p45-p doi:10.1145/3696443.3708938
MTE4JNI: A Memory Tagging Method to Protect Java Heap Memory from Illicit Native Code Access
Huinan Chen, Jiang Ma, Chun Jason Xue, and Qingan Li
(Wuhan University, China; Guangdong OPPO Mobile Telecommunications, China; MBZUAI, United Arab Emirates)
Publisher's Version Article: cgo25main-p16-p doi:10.1145/3696443.3708933
Memory Safety Instrumentations in Practice: Usability, Performance, and Security Guarantees
Tina Jung, Fabian Ritter, and Sebastian Hack
(Saarland University, Germany)
Publisher's Version Published Artifact Info Artifacts Available Article: cgo25main-p108-p doi:10.1145/3696443.3708926

Optimizations and Transformations (2)

PreFix: Optimizing the Performance of Heap-Intensive Applications
Chaitanya Mamatha Ananda, Rajiv Gupta, Sriraman Tallam, Han Shen, and Xinliang David Li
(University of California at Riverside, USA; Google, USA)
Publisher's Version Article: cgo25main-p283-p doi:10.1145/3696443.3708960
A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex Applications
Lukas Trümper, Philipp Schaad, Berke Ates, Alexandru Calotoiu, Marcin Copik, and Torsten Hoefler
(Daisytuner, Germany; ETH Zurich, Switzerland)
Publisher's Version Article: cgo25main-p162-p doi:10.1145/3696443.3708951
An Efficient Polynomial Multiplication Derived Implementation of Convolution in Neural Networks
Haoke Xu, Yulin Zhang, Zitong Cheng, and Xiaoming Li
(University of Delaware, USA; Minzu University of China, China)
Publisher's Version Article: cgo25main-p123-p doi:10.1145/3696443.3708947

Quantum Computing (2)

ASDF: A Compiler for Qwerty, a Basis-Oriented Quantum Programming Language
Austin J. Adams, Sharjeel Khan, Arjun S. Bhamra, Ryan R. Abusaada, Anthony M. Cabrera, Cameron C. Hoechst, Travis S. Humble, Jeffrey S. Young, and Thomas M. Conte
(Georgia Institute of Technology, USA; Oak Ridge National Laboratory, USA)
Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p682-p doi:10.1145/3696443.3708966
Qubit Movement-Optimized Program Generation on Zoned Neutral Atom Processors
Enhyeok Jang, Youngmin Kim, Hyungseok Kim, Seungwoo Choi, Yipeng Huang, and Won Woo Ro
(Yonsei University, South Korea; Rutgers University, USA)
Publisher's Version Article: cgo25main-p26-p doi:10.1145/3696443.3708937

GPU and Parallelism

Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU
Naifeng Zhang and Franz Franchetti
(Carnegie Mellon University, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: cgo25main-p126-p doi:10.1145/3696443.3708948
CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning
Guoliang He and Eiko Yoneki
(University of Cambridge, UK)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p77-p doi:10.1145/3696443.3708943
Proteus: Portable Runtime Optimization of GPU Kernel Execution with Just-in-Time Compilation
Giorgis Georgakoudis, Konstantinos Parasyris, and David Beckingsale
(Lawrence Livermore National Laboratory, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p46-p doi:10.1145/3696443.3708939

Security, Fault Tolerance, and Cryptography

Qiwu: Exploiting Ciphertext-Level SIMD Parallelism in Homomorphic Encryption Programs
Zhongcheng Zhang, Ying Liu, Yuyang Zhang, Zhenchuan Chen, Jiacheng Zhao, Xiaobing Feng, Huimin Cui, and Jingling Xue
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Zhongguancun Laboratory, China; UNSW, Australia)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced Article: cgo25main-p14-p doi:10.1145/3696443.3708917
Cage: Hardware-Accelerated Safe WebAssembly
Martin Fink, Dimitrios Stavrakakis, Dennis Sprokholt, Soham Chakraborty, Jan-Erik Ekberg, and Pramod Bhatotia
(TU Munich, Germany; TU Delft, Netherlands; Huawei, Finland)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p64-p doi:10.1145/3696443.3708920
Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries
Fangzheng Lin, Zhongfa Wang, and Hiroshi Sasaki
(Institute of Science Tokyo, Japan)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Article: cgo25main-p25-p doi:10.1145/3696443.3708936
Janitizer: Rethinking Binary Tools for Practical and Comprehensive Security
Mahwish Arif, Sam Ainsworth, and Timothy M. Jones
(University of Cambridge, UK; University of Edinburgh, UK)
Publisher's Version Article: cgo25main-p3-p doi:10.1145/3696443.3708930
Parallaft: Runtime-Based CPU Fault Tolerance via Heterogeneous Parallelism
Boyue Zhang, Sam Ainsworth, Lev Mukhanov, and Timothy M. Jones
(University of Cambridge, UK; University of Edinburgh, UK; Queen Mary University of London, UK)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p106-p doi:10.1145/3696443.3708946

Optimizations and Transformations (3)

Postiz: Extending Post-increment Addressing for Loop Optimization and Code Size Reduction
Enming Fan, Xiaofeng Guan, Fan Hu, Heng Shi, Hao Zhou, and Jianguo Yao
(Shanghai Enflame Technology, China; Shanghai Jiao Tong University, China)
Publisher's Version Article: cgo25main-p21-p doi:10.1145/3696443.3708935
Towards Efficient Compiler Auto-tuning: Leveraging Synergistic Search Spaces
Haolin Pan, Yuanyu Wei, Mingjie Xing, Yanjun Wu, and Chen Zhao
(Institute of Software at Chinese Academy of Sciences, China; Hangzhou Institute for Advanced Study at University of Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p323-p doi:10.1145/3696443.3708961
Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture
Olivia Hsu, Alexander Rucker, Tian Zhao, Varun Desai, Kunle Olukotun, and Fredrik Kjolstad
(Stanford University, USA)
Publisher's Version Article: cgo25main-p34-p doi:10.1145/3696443.3708918
Vectron: A Dynamic Programming Auto-vectorization Framework
Sourena Naser Moghaddasi, Haris Smajlović, Ariya Shajii, and Ibrahim Numanagić
(University of Victoria, Canada; Exaloop, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Article: cgo25main-p124-p doi:10.1145/3696443.3708963

Runtime and System Tools

Honey Potion: An eBPF Backend for Elixir
Kael Soares Augusto, Vinícius Pacheco, Marcos A. Vieira, Rodrigo Geraldo Ribeiro, and Fernando Magno Quintão Pereira
(Federal University of Minas Gerais, Brazil; Cadence, Brazil; Federal University of Ouro Preto, Brazil)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p81-p doi:10.1145/3696443.3708923
GoFree: Reducing Garbage Collection via Compiler-Inserted Freeing
Haoran Peng, Yu Zhang, Michael D. Ernst, Jinbao Chen, and Boyao Ding
(University of Science and Technology of China, China; University of Washington, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p97-p doi:10.1145/3696443.3708925
Improving Native-Image Startup Performance
Matteo Basso, Aleksandar Prokopec, Andrea Rosà, and Walter Binder
(USI Lugano, Switzerland; Oracle Labs, Switzerland)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced Article: cgo25main-p121-p doi:10.1145/3696443.3708927
Speeding up the Local C++ Development Cycle with Header Substitution
Nader Al Awar, Zijian Yi, George Biros, and Milos Gligoric
(University of Texas at Austin, USA)
Publisher's Version Artifacts Functional Results Reproduced Article: cgo25main-p63-p doi:10.1145/3696443.3708942

proc time: 0.12