Powered by
2026 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), January 31 – February 4, 2026,
Sydney, Australia
Frontmatter
Compiling for ML 1
Enabling Spill-Free Compilation via Affine-Based Live Range Reduction Optimization
Prasanth Chatarasi,
Alex Gatea,
Wei Wang,
Chris Bowler,
Shubham Jain,
Masoud Ataei Jaliseh,
Nicole Khoun,
Alberto Mannari,
Bardia Mahjour,
Viji Srinivasan, and
Swagath Venkataramani
(IBM Research, USA; IBM, Canada; IBM, Switzerland)
Publisher's Version
Archive submitted (72 kB)
Eliminating Redundancy: Ultra-compact Code Generation for Programmable Dataflow Accelerators
Prasanth Chatarasi,
Alex Gatea,
Bardia Mahjour,
Jintao Zhang,
Alberto Mannari,
Chris Bowler,
Shubham Jain,
Masoud Ataei Jaliseh,
Nicole Khoun,
Kamlesh Kumar,
Viji Srinivasan, and
Swagath Venkataramani
(IBM Research, USA; IBM, Canada; IBM, Switzerland)
Publisher's Version
Security
FHEFusion: Enabling Operator Fusion in FHE Compilers for Depth-Efficient DNN Inference
Tianxiang Sui,
Jianxin Lai,
Long Li,
Peng Yuan,
Yan Liu,
Qing Zhu,
Xiaojing Zhang,
Linjie Xiao,
Mingzhe Zhang, and
Jingling Xue
(Ant Group, China; UNSW, Australia)
Publisher's Version
Published Artifact
Archive submitted (140 kB)
Artifacts Available
Artifacts Reusable
Results Reproduced
Abstractions
Partial-Evaluation Templates: Accelerating Partial Evaluation with Pre-compiled Templates
Florian Huemer,
Aleksandar Prokopec,
David Leopoldseder,
Raphael Mosaner, and
Hanspeter Mössenböck
(JKU Linz, Austria; Oracle Labs, Zurich, Switzerland; Oracle Labs, Vienna, Austria; Oracle Labs, Linz, Austria)
Publisher's Version
Ember: A Compiler for Embedding Operations on Decoupled Access-Execute Architectures
Marco Siracusa,
Olivia Hsu,
Víctor Soria-Pardos,
Joshua Randall,
Arnaud Grasset,
Eric Biscondi,
Doug Joseph,
Randy Allen,
Fredrik Kjolstad,
Miquel Moretó Planas, and
Adrià Armejach
(Barcelona Supercomputing Center, Spain; Stanford University, USA; Carnegie Mellon University, USA; Arm, USA; Universitat Politècnica de Catalunya, Spain)
Publisher's Version
Published Artifact
Archive submitted (100 kB)
Artifacts Available
Memory
VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic Analysis
Arjun H. Kumar,
Bhavya Hirani,
Hang Shao,
Tobi Ajila,
Vijay Sundaresan,
Daryl Maier, and
Manas Thakur
(IIT Mandi, India; Sardar Vallabhbhai National Institute of Technology, Surat, India; IBM, Canada; IIT Bombay, India)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
FRUGAL: Pushing GPU Applications beyond Memory Limits
Lingqi Zhang,
Tengfei Wang,
Jiajun Huang,
Chen Zhuang,
Ivan R. Ivanov,
Peng Chen,
Toshio Endo, and
Mohamed Wahib
(RIKEN RCCS, Japan; Google Cloud, Japan; University of South Florida, USA; Institute of Science Tokyo, Japan)
Publisher's Version
Archive submitted (510 kB)
DSLs
Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References
Hongzheng Chen,
Bin Fan,
Alexander Collins,
Bastian Hagedorn,
Evghenii Gaburov,
Masahiro Masuda,
Matthew Brookhart,
Chris Sullivan,
Jason Knight,
Zhiru Zhang, and
Vinod Grover
(Cornell University, USA; NVIDIA, USA; NVIDIA, UK; NVIDIA, Germany)
Publisher's Version
Quantum / HLS
Parallelization / Vectorization
From Threads to Tiles: T2T, a Compiler for CUDA-to-NPU Translation via 2D Vectorization
Shuaijiang Li,
Jiacheng Zhao,
Ying Liu,
Shuoming Zhang,
Lei Chen,
Yijin Li,
Yangyu Zhang,
Zhicheng Li,
Runyu Zhou,
Xiyu Shi,
Chunwei Xia,
Yuan Wen,
Xiaobing Feng, and
Huimin Cui
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; University of Leeds, UK; University of Aberdeen, UK; XCORESIGMA, China)
Publisher's Version
Binary / JIT
Code Generation
Profiling / Instrumentation
Proton: Towards Multi-level, Adaptive Profiling for Triton
Keren Zhou,
Tianle Zhong,
Hao Wu,
Jihyeong Lee,
Yue Guan,
Yufei Ding,
Corbin Robeck,
Yuanwei Fang,
Jeff Niu, and
Philippe Tillet
(George Mason University, USA; OpenAI, USA; University of Virginia, USA; University of California at San Diego, USA; Meta, USA)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
Analysis
Accelerating App Recompilation across Android System Updates by Code Reusing
Hongtao Wu,
Yu Chen,
Mengfei Xie,
Futeng Yang,
Jun Yan,
Jiang Ma,
Jianming Fu,
Chun Jason Xue, and
Qingan Li
(Wuhan University, China; Guangdong OPPO Mobile Telecommunications, China; Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates)
Publisher's Version
Compiling for ML 2
Compiler-Runtime Co-operative Chain of Verification for LLM-Based Code Optimization
Hyunho Kwon,
Sanggyu Shin,
Ju Min Lee,
Hoyun Youm,
Seungbin Song,
Seongho Kim,
Hanwoong Jung,
Seungwon Lee, and
Hanjun Kim
(Yonsei University, Republic of Korea; SAIT, Republic of Korea)
Publisher's Version
Tensor Optimization
Progressive Low-Precision Approximation of Tensor Operators on GPUs: Enabling Greater Trade-Offs between Performance and Accuracy
Fan Luo,
Guangli Li,
Zhaoyang Hao,
Xueying Wang,
Xiaobing Feng,
Huimin Cui, and
Jingling Xue
(Institute of Computing Technology at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; UNSW, Australia; Beijing University of Posts and Telecommunications, China)
Publisher's Version
Optimization
proc time: 0.11