CGO 2026
2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Powered by
Conference Publishing Consulting

2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), January 31 – February 4, 2026, Sydney, Australia

CGO 2026 – Proceedings

Contents - Abstracts - Authors

Frontmatter

Title Page
Welcome from the General Chair
Welcome from the Program Chairs
CGO 2026 Organization
CGO 2026 Sponsors and Supporters

Compiling for ML 1

Enabling Spill-Free Compilation via Affine-Based Live Range Reduction Optimization
Prasanth Chatarasi, Alex Gatea, Wei Wang, Chris Bowler, Shubham Jain, Masoud Ataei Jaliseh, Nicole Khoun, Alberto Mannari, Bardia Mahjour, Viji Srinivasan, and Swagath Venkataramani
(IBM Research, USA; IBM, Canada; IBM, Switzerland)
Publisher's Version Archive submitted (72 kB)
GRANII: Selection and Ordering of Primitives in GRAph Neural Networks using Input Inspection
Damitha Lenadora, Vimarsh Sathia, Gerasimos Gerogiannis, Serif Yesil, Josep Torrellas, and Charith Mendis
(University of Illinois at Urbana-Champaign, USA; NVIDIA, USA)
Publisher's Version Archive submitted (270 kB) Artifacts Functional
Fast Autoscheduling for Sparse ML Frameworks
Bobby Yan, Alexander J Root, Trevor Gale, David Broman, and Fredrik Kjolstad
(Stanford University, USA; KTH Royal Institute of Technology, Sweden)
Publisher's Version Archive submitted (240 kB)
Eliminating Redundancy: Ultra-compact Code Generation for Programmable Dataflow Accelerators
Prasanth Chatarasi, Alex Gatea, Bardia Mahjour, Jintao Zhang, Alberto Mannari, Chris Bowler, Shubham Jain, Masoud Ataei Jaliseh, Nicole Khoun, Kamlesh Kumar, Viji Srinivasan, and Swagath Venkataramani
(IBM Research, USA; IBM, Canada; IBM, Switzerland)
Publisher's Version

Security

PriTran: Privacy-Preserving Inference for Transformer-Based Language Models under Fully Homomorphic Encryption
Yuechen Mu, Guangli Li, Shiping Chen, and Jingling Xue
(UNSW, Australia; Institute of Computing Technology at Chinese Academy of Sciences, China; CSIRO’s Data61, Australia)
Publisher's Version
FHEFusion: Enabling Operator Fusion in FHE Compilers for Depth-Efficient DNN Inference
Tianxiang Sui, Jianxin Lai, Long Li, Peng Yuan, Yan Liu, Qing Zhu, Xiaojing Zhang, Linjie Xiao, Mingzhe Zhang, and Jingling Xue
(Ant Group, China; UNSW, Australia)
Publisher's Version Published Artifact Archive submitted (140 kB) Artifacts Available Artifacts Reusable Results Reproduced
Towards Path-Aware Coverage-Guided Fuzzing
Giacomo Priamo, Daniele Cono D'Elia, Mathias Payer, and Leonardo Querzoni
(Sapienza University of Rome, Italy; EPFL, Switzerland)
Publisher's Version Published Artifact Archive submitted (140 kB) Artifacts Available Artifacts Reusable
SecSwift, a Compiler-Based Framework for Software Countermeasures in Cybersecurity
François de Ferrière, Yves Janin, and Sirine Mechmech
(STMICROELECTRONICS, France; Grenoble INP, France)
Publisher's Version

Abstractions

Partial-Evaluation Templates: Accelerating Partial Evaluation with Pre-compiled Templates
Florian Huemer, Aleksandar Prokopec, David Leopoldseder, Raphael Mosaner, and Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, Zurich, Switzerland; Oracle Labs, Vienna, Austria; Oracle Labs, Linz, Austria)
Publisher's Version
Pyls: Enabling Python Hardware Synthesis with Dynamic Polymorphism via LCRS Encoding
Bolei Tong, Yongyan Fang, Chaorui Wang, Qingan Li, Jingling Xue, and Yuan Mengting
(Wuhan University, China; UNSW, Australia)
Publisher's Version
SkeleShare: Algorithmic Skeletons and Equality Saturation for Hardware Resource Sharing
Jonathan Van der Cruysse, Tzung-Han Juang, Shakiba Bolbolian Khah, and Christophe Dubach
(McGill University, Canada; Mila, Canada)
Publisher's Version Published Artifact Archive submitted (190 kB) Artifacts Available Artifacts Reusable Results Reproduced
Ember: A Compiler for Embedding Operations on Decoupled Access-Execute Architectures
Marco Siracusa, Olivia Hsu, Víctor Soria-Pardos, Joshua Randall, Arnaud Grasset, Eric Biscondi, Doug Joseph, Randy Allen, Fredrik Kjolstad, Miquel Moretó Planas, and Adrià Armejach
(Barcelona Supercomputing Center, Spain; Stanford University, USA; Carnegie Mellon University, USA; Arm, USA; Universitat Politècnica de Catalunya, Spain)
Publisher's Version Published Artifact Archive submitted (100 kB) Artifacts Available

Memory

Flow-Graph-Aware Tiling and Rescheduling for Memory-Efficient On-Device Inference
Yeonoh Jeong, Taehyeong Park, and Yongjun Park
(Yonsei University, Republic of Korea)
Publisher's Version
VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic Analysis
Arjun H. Kumar, Bhavya Hirani, Hang Shao, Tobi Ajila, Vijay Sundaresan, Daryl Maier, and Manas Thakur
(IIT Mandi, India; Sardar Vallabhbhai National Institute of Technology, Surat, India; IBM, Canada; IIT Bombay, India)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
FRUGAL: Pushing GPU Applications beyond Memory Limits
Lingqi Zhang, Tengfei Wang, Jiajun Huang, Chen Zhuang, Ivan R. Ivanov, Peng Chen, Toshio Endo, and Mohamed Wahib
(RIKEN RCCS, Japan; Google Cloud, Japan; University of South Florida, USA; Institute of Science Tokyo, Japan)
Publisher's Version Archive submitted (510 kB)
Automatic Data Enumeration for Fast Collections
Tommy McMichen and Simone Campanoni
(Northwestern University, USA; Google, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced

DSLs

FORTE: Online DataFrame Query Optimizer
Yoonho Choi, Kyoungtae Lee, Minji Kim, Hyungsoo Jung, and Hyojin Sung
(POSTECH, Republic of Korea; Seoul National University, Republic of Korea; Ewha Womans University, Republic of Korea)
Publisher's Version
LEGO: A Layout Expression Language for Code Generation of Hierarchical Mapping
Amir Mohammad Tavakkoli, Cosmin E. Oancea, and Mary Hall
(University of Utah, USA; University of Copenhagen, Denmark)
Publisher's Version Published Artifact Info Artifacts Available Artifacts Reusable Results Reproduced
Pushing Tensor Accelerators beyond MatMul in a User-Schedulable Language
Yihong Zhang, Derek Gerstmann, Andrew Adams, and Maaz Bin Safeer Ahmad
(University of Washington, USA; Adobe, USA)
Publisher's Version Published Artifact Artifacts Available
Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References
Hongzheng Chen, Bin Fan, Alexander Collins, Bastian Hagedorn, Evghenii Gaburov, Masahiro Masuda, Matthew Brookhart, Chris Sullivan, Jason Knight, Zhiru Zhang, and Vinod Grover
(Cornell University, USA; NVIDIA, USA; NVIDIA, UK; NVIDIA, Germany)
Publisher's Version

Quantum / HLS

Dependence-Driven, Scalable Quantum Circuit Mapping with Affine Abstractions
Marouane Benbetka, Merwan Bekkar, Riyadh Baghdadi, and Martin Kong
(NYU Abu Dhabi, United Arab Emirates; Ohio State University, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Space-Time Optimisations for Early Fault-Tolerant Quantum Computation
Sanaa Sharma and Prakash Murali
(University of Cambridge, UK)
Publisher's Version Published Artifact Artifacts Available
OpenQudit: Extensible and Accelerated Numerical Quantum Compilation via a JIT-Compiled DSL
Ed Younis
(Lawrence Berkeley National Laboratory, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Selene: Cross-Level Barrier-Free Pipelining for Irregular Nested Loops in High-Level Synthesis
Sungwoo Yun, Seonyoung Cheon, Dongkwan Kim, Heelim Choi, Kunmo Jeong, Chan Lee, Yongwoo Lee, and Hanjun Kim
(Yonsei University, Republic of Korea; DGIST, Republic of Korea)
Publisher's Version

Parallelization / Vectorization

Enabling Automatic Compiler-Driven Vectorization of Transformers
Shreya Alladi, Alberto Ros, and Alexandra Jimborean
(University of Murcia, Spain)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Unlocking Python Multithreading Capabilities using OpenMP-Based Programming with OMP4Py
César Piñeiro and Juan C. Pichel
(University of Santiago de Compostela, Spain)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
The Parallel-Semantics Program Dependence Graph for Parallel Optimization
Yian Su, Brian Homerding, Haocheng Gao, Federico Sossai, Yebin Chon, David I. August, and Simone Campanoni
(Northwestern University, USA; Princeton University, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced
From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization
Shuaijiang Li, Jiacheng Zhao, Ying Liu, Shuoming Zhang, Lei Chen, Yijin Li, Yangyu Zhang, Zhicheng Li, Runyu Zhou, Xiyu Shi, Chunwei Xia, Yuan Wen, Xiaobing Feng, and Huimin Cui
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; University of Leeds, UK; University of Aberdeen, UK; XCORESIGMA, China)
Publisher's Version

Binary / JIT

Binary Diffing via Library Signatures
Andrei Rimsa, Anderson Faustino da Silva, Camilo Santana, and Fernando Magno Quintão Pereira
(CEFET-MG, Brazil; State University of Maringá, Brazil; Federal University of Minas Gerais, Brazil)
Publisher's Version Published Artifact Info Artifacts Available Artifacts Functional Results Reproduced
BIT: Empowering Binary Analysis through the LLVM Toolchain
Puzhuo Liu, Peng Di, Jingling Xue, and Yu Jiang
(Ant Group, China; Tsinghua University, China; UNSW, Australia)
Publisher's Version
Dr.avx: A Dynamic Compilation System for Seamlessly Executing Hardware-Unsupported Vectorization Instructions
Yue Tang, Mianzhi Wu, Yufeng Li, Haoyu Liao, Jianmei Guo, and Bo Huang
(East China Normal University, China)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional
Practical: Are Abstract-Interpreter Baseline JITs Worth It? An Empirical Evaluation through Metacompilation
Nahuel Palumbo, Guillermo Polito, Stéphane Ducasse, and Pablo Tesone
(Univ. Lille - Inria - CNRS - Centrale Lille - UMR 9189 CRIStAL, France)
Publisher's Version Archive submitted (77 kB)

Code Generation

TPDE: A Fast Adaptable Compiler Back-End Framework
Tobias Schwarz, Tobias Kamm, and Alexis Engelke
(TU Munich, Germany)
Publisher's Version Published Artifact Archive submitted (70 kB) Info Artifacts Available Artifacts Reusable Results Reproduced
Synthesizing Instruction Selection Back-Ends from ISA Specifications Made Practical
Florian Drescher and Alexis Engelke
(TU Munich, Germany)
Publisher's Version
SparseX: Synergizing GPU Libraries for Sparse Matrix Multiplication on Heterogeneous Processors
Ruifeng Zhang, Xiangwei Wang, Ang Li, and Xipeng Shen
(North Carolina State University, USA; Pacific Northwest National Laboratory, USA; University of Washington, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Compilation of Generalized Matrix Chains with Symbolic Sizes
Francisco López, Lars Karlsson, and Paolo Bientinesi
(Umeå University, Sweden)
Publisher's Version Published Artifact Archive submitted (350 kB) Artifacts Available Artifacts Reusable Results Reproduced

Profiling / Instrumentation

TRACE4J: A Lightweight, Flexible, and Insightful Performance Tracing Tool for Java
Haide He and Pengfei Su
(University of California at Merced, USA)
Publisher's Version Published Artifact Info Artifacts Available Artifacts Functional Results Reproduced
Proton: Towards Multi-level, Adaptive Profiling for Triton
Keren Zhou, Tianle Zhong, Hao Wu, Jihyeong Lee, Yue Guan, Yufei Ding, Corbin Robeck, Yuanwei Fang, Jeff Niu, and Philippe Tillet
(George Mason University, USA; OpenAI, USA; University of Virginia, USA; University of California at San Diego, USA; Meta, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
On the Precision of Dynamic Program Fingerprints Based on Performance Counters
Anderson Faustino da Silva, Marcelo Borges Nogueira, Sérgio Queiroz de Medeiros, Jeronimo Castrillon, and Fernando Magno Quintão Pereira
(State University of Maringá, Brazil; Federal University of Rio Grande do Norte, Brazil; TU Dresden, Germany; Federal University of Minas Gerais, Brazil)
Publisher's Version Published Artifact Artifacts Available
PASTA: A Modular Program Analysis Tool Framework for Accelerators
Mao Lin, Hyeran Jeon, and Keren Zhou
(University of California at Merced, USA; George Mason University, USA; OpenAI, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced

Analysis

PIP: Making Andersen’s Points-to Analysis Sound and Practical for Incomplete C Programs
Håvard Rognebakke Krogstie, Helge Bahmann, Magnus Själander, and Nico Reissmann
(NTNU, Norway; Independent Researcher, Switzerland; Independent Researcher, Norway)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Thinking Fast and Correct: Automated Rewriting of Numerical Code through Compiler Augmentation
Siyuan Brant Qian, Vimarsh Sathia, Ivan R. Ivanov, Jan Hückelheim, Paul Hovland, and William S. Moses
(University of Illinois at Urbana-Champaign, USA; Institute of Science Tokyo, Japan; RIKEN RCCS, Japan; Argonne National Laboratory, USA)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
PolyUFC: Polyhedral Compilation Meets Roofline Analysis for Uncore Frequency Capping
Nilesh Rajendra Shah, M V V S Manoj Kumar, Dhairya Baxi, and Ramakrishna Upadrasta
(IIT Hyderabad, India)
Publisher's Version Info
Accelerating App Recompilation across Android System Updates by Code Reusing
Hongtao Wu, Yu Chen, Mengfei Xie, Futeng Yang, Jun Yan, Jiang Ma, Jianming Fu, Chun Jason Xue, and Qingan Li
(Wuhan University, China; Guangdong OPPO Mobile Telecommunications, China; Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates)
Publisher's Version

Compiling for ML 2

QIGen: A Kernel Generator for Inference on Nonuniformly Quantized Large Language Models
Tommaso Pegolotti, Dan Alistarh, and Markus Püschel
(ETH Zurich, Switzerland; IST Austria, Austria)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional
DyPARS: Dynamic-Shape DNN Optimization via Pareto-Aware MCTS for Graph Variants
Hao Qian, Guangli Li, Qiuchu Yu, Xueying Wang, and Jingling Xue
(UNSW, Australia; Institute of Computing Technology at Chinese Academy of Sciences, China; Beijing University of Posts and Telecommunications, China)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Compiler-Runtime Co-operative Chain of Verification for LLM-Based Code Optimization
Hyunho Kwon, Sanggyu Shin, Ju Min Lee, Hoyun Youm, Seungbin Song, Seongho Kim, Hanwoong Jung, Seungwon Lee, and Hanjun Kim
(Yonsei University, Republic of Korea; SAIT, Republic of Korea)
Publisher's Version
Hexcute: A Compiler Framework for Automating Layout Synthesis in GPU Programs
Xiao Zhang, Yaoyao Ding, Bolin Sun, Yang Hu, Tatiana Shpeisman, and Gennady Pekhimenko
(University of Toronto, Canada; NVIDIA, Canada; Vector Institute, Canada)
Publisher's Version Published Artifact Archive submitted (3.6 MB) Artifacts Available Artifacts Reusable Results Reproduced

Tensor Optimization

Multidirectional Propagation of Sparsity Information across Tensor Slices
Kaio Henrique Andrade Ananias, Danila Seliayeu, J. Nelson Amaral, and Fernando Magno Quintão Pereira
(Federal University of Minas Gerais, Brazil; University of Alberta, Canada)
Publisher's Version Published Artifact Artifacts Available Artifacts Reusable Results Reproduced
Synthesizing Specialized Sparse Tensor Accelerators for FPGAs via High-Level Functional Abstractions
Hamza Javed and Christophe Dubach
(McGill University, Canada; Mila, Canada)
Publisher's Version
Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and Accuracy
Fan Luo, Guangli Li, Zhaoyang Hao, Xueying Wang, Xiaobing Feng, Huimin Cui, and Jingling Xue
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; UNSW, Australia; Beijing University of Posts and Telecommunications, China)
Publisher's Version
Tensor Program Superoptimization through Cost-Guided Symbolic Program Synthesis
Alexander Brauckmann, Aarsh Chaube, José Wesley de Souza Magalhães, Elizabeth Polgreen, and Michael F. P. O’Boyle
(University of Edinburgh, UK)
Publisher's Version Published Artifact Archive submitted (44 kB) Artifacts Available Artifacts Reusable Results Reproduced

Optimization

A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler
Mohammed Tirichine, Nassim Ameur, Nazim Bendib, Iheb Nassim Aouadj, Djad Bouchama, Rafik Bouloudene, and Riyadh Baghdadi
(NYU Abu Dhabi, United Arab Emirates; École Nationale Supérieure d’Informatique, Algeria; University of Science and Technology Houari Boumediene, Algeria)
Publisher's Version Published Artifact Archive submitted (130 kB) Artifacts Available Artifacts Reusable Results Reproduced
Towards Threading the Needle of Debuggable Optimized Binaries
Cristian Assaiante, Simone Di Biasio, Snehasish Kumar, Giuseppe Antonio Di Luna, Daniele Cono D'Elia, and Leonardo Querzoni
(Sapienza University of Rome, Italy; Google, USA)
Publisher's Version Published Artifact Archive submitted (220 kB) Artifacts Available Artifacts Reusable
Compiler-Assisted Instruction Fusion
Ravikiran Ravindranath Reddy, Sawan Singh, Arthur Perais, Alberto Ros, and Alexandra Jimborean
(University of Murcia, Spain; Univ. Grenoble Alpes - CNRS - Grenoble INP - TIMACNRS, France)
Publisher's Version Archive submitted (37 kB)
LLM-VeriOpt: Verification-Guided Reinforcement Learning for LLM-Based Compiler Optimization
Xiangxin Fang, Jiaqin Kang, Rodrigo Rocha, Sam Ainsworth, and Lev Mukhanov
(Queen Mary University of London, UK; University of Edinburgh, UK)
Publisher's Version Published Artifact Artifacts Available Artifacts Functional Results Reproduced

proc time: 0.11