Powered by
31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2026), January 31 – February 4, 2026,
Sydney, NSW, Australia
Frontmatter
Concurrency Control
Scheduling and Load Balancing
Concurrent Data Structures
Concurrent Balanced Augmented Trees
Evan Wrench,
Ajay Singh,
Younghun Roh,
Panagiota Fatourou,
Siddhartha Jayanti,
Eric Ruppert, and
Yuanhao Wei
(University of British Columbia, Canada; ICS-FORTH, Greece; Massachusetts Institute of Technology, USA; University of Crete, Greece; Dartmouth College, USA; York University, Canada)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
GPU and Heterogeneous Computing
PRISM: An Efficient GPU-Based Lossy Compression Framework for Progressive Data Retrieval with Multi-Level Interpolation
Bing Lu,
Zedong Liu,
Hairui Zhao,
Dejun Luo,
Wenjing Huang,
Yida Gu,
Jinyang Liu,
Guangming Tan, and
Dingwen Tao
(Institute of Computing Technology at Chinese Academy of Sciences, China; Jilin University, China; University of Chinese Academy of Sciences, China; University of California at Riverside, USA)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities
Weile Luo,
Yuhan Chen,
Xiangrui Yu,
Qiang Wang,
Ruibo Fan,
Hongyuan Liu, and
Xiaowen Chu
(Hong Kong University of Science and Technology (Guangzhou), China; Harbin Institute of Technology, Shenzhen, China; Stevens Institute of Technology, USA; Hong Kong University of Science and Technology, Hong Kong)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
Stencil and Sparse Matrix Computation
Mixed Precision and Quantization
HierCut: Enabling 16-bit Format Mixed Precision for Molecular Dynamics through Hierarchical Cutoff
Zeyu Song,
Lin Gan,
Xiaohui Duan,
Zhengrui Li,
Jiayu Fu,
Yinuo Wang,
Guangzhao Li, and
Guangwen Yang
(Tsinghua University, China; Shandong University, China; Institute of Software at Chinese Academy of Sciences, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
Cluster and Cloud Computing
Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds
Xiaokang Hu,
Yuchao Cao,
Naixuan Guan,
Yifan Wu,
Xishi Qiu,
Shengdong Dai,
Ben Luo,
Sanchuan Cheng,
Fudong Qiu,
Yibin Shen, and
Jiesheng Wu
(Alibaba Cloud Computing, China)
Publisher's Version
Distributed Training
COCCL: A Collective Communication Library Supporting Easy Integration and Configuration of Customized Compression for Scalable LLM Training
Xingchen Liu,
Haoran Kong,
Hairui Zhao,
Shengkai Lyu,
Zheng Wei,
Man Liu,
Xingjian Tian,
Liyang Zhao,
Zhuohan Chen,
Fakang Wang,
Zizhong Chen,
Zhan Wang,
Guangming Tan, and
Dingwen Tao
(University of Chinese Academy of Sciences, China; Shenzhen Loop Area Institute, China; Chinese University of Hong Kong, Shenzhen, China; Jilin University, China; Ant Group, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training
Yida Gu,
Fakang Wang,
Jianhao Fu,
Zhenhang Sun,
Qianyu Zhang,
Hairui Zhao,
Xingchen Liu,
Yang Tian,
Wenjing Huang,
Zedong Liu,
Yifan Chen,
Jinwu Yang,
Yueyuan Zhou,
Qian Zhao,
Haoxu Li,
Tao Wang,
Feng Yu,
Zhan Wang,
Guangming Tan, and
Dingwen Tao
(University of Chinese Academy of Sciences, China; Ant Group, China; Jilin University, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Parallel Algorithms
Pipelonk: Accelerating End-to-End Zero-Knowledge Proof Generation on GPUs for PLONK-Based Protocols
Zhiyuan Zhang,
Yanxin Cai,
Wenhao Yin,
Xueyu Wu,
Yi Wang,
Lei Ju, and
Zhuoran Ji
(Shandong University, China; Quan Cheng Laboratory, China; University of Hong Kong, China; Shenzhen University, China; State Key Laboratory of Cryptography and Digital Economy Security, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
ParDiff: Efficiently Parallelizing Reverse-Mode Automatic Differentiation with Direct Indexing
Shuhong Huang,
Shizhi Tang,
Yuan Wen,
Huanqi Cao,
Ruibai Tang,
Yidong Chen,
Jiping Yu,
Yang Li,
Chao Jiang,
Limin Xiao, and
Jidong Zhai
(Tsinghua University, China; Qingcheng.AI, China; University of Aberdeen, UK; Lenovo Research, China)
Publisher's Version
PIM-zd-tree: A Fast Space-Partitioning Index Leveraging Processing-in-Memory
Yiwei Zhao,
Hongbo Kang,
Ziyang Men,
Yan Gu,
Guy E. Blelloch,
Laxman Dhulipala,
Charles McGuffey, and
Phillip B. Gibbons
(Carnegie Mellon University, USA; Tsinghua University, China; University of California at Riverside, USA; University of Maryland, USA; Reed College, USA)
Publisher's Version
Info
ML Inference
Graphs and Graph Neural Networks
ElasGNN: An Elastic Training Framework for Distributed GNN Training
Siqi Wang,
Hailong Yang,
Pengbo Wang,
Hongliang Cao,
Yufan Xu,
Xuezhu Wang,
Zhongzhi Luan,
Yi Liu, and
Depei Qian
(Beihang University, China; Independent Researcher, USA)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
TAC: Cache-Based System for Accelerating Billion-Scale GNN Training on Multi-GPU Platform
Zhiqiang Liang,
Hongyu Gao,
Jue Wang,
Fang Liu,
Xingguo Shi,
Junyu Gu,
Peng Di,
Sian Li,
Lei Tang,
Chunbao Zhou,
Lian Zhao,
Yangang Wang, and
Xuebin Chi
(Computer Network Information Center at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Ant Group, China; UNSW, Australia)
Publisher's Version
DTMiner: A Data-Centric System for Efficient Temporal Motif Mining
Yinbo Hou,
Hao Qi,
Ligang He,
Jin Zhao,
Yu Zhang,
Hui Yu,
Longlong Lin,
Lin Gu,
Wenbin Jiang,
Xiaofei Liao, and
Hai Jin
(Huazhong University of Science and Technology, China; University of Warwick, UK; Hong Kong University of Science and Technology, China; Southwest University, China)
Publisher's Version
Optimizing Transformers
FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism
Jianxing Xu,
Yuanbo Wen,
Jun Bi,
Ruibai Xu,
Guanglin Xu,
Rui Zhang,
Wei Li,
Ling Li,
Tianshi Chen,
Qi Guo, and
Yunji Chen
(University of Science and Technology of China, China; Institute of Computing Technology at Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Cambricon Technologies, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
Accelerating Sparse Transformer Inference on GPU
Wenhao Dai,
Haodong Deng,
Mengfei Rong,
Xinyu Yang,
Hongyu Liu,
Fangxin Liu,
Hailong Yang,
Qianwen Cao, and
Qingxiao Sun
(China University of Petroleum-Beijing, China; Beihang University, China; Baidu, China; Shanghai Jiao Tong University, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Functional
Results Reproduced
MetaAttention: A Unified and Performant Attention Framework across Hardware Backends
Feiyang Chen,
Yu Cheng,
Lei Wang,
Yuqing Xia,
Ziming Miao,
Lingxiao Ma,
Fan Yang,
Jilong Xue,
Zhi Yang,
Mao Yang,
Xingda Wei, and
Haibo Chen
(Shanghai Jiao Tong University, China; Peking University, China; Microsoft Research, China)
Publisher's Version
Published Artifact
Artifacts Available
Artifacts Reusable
Results Reproduced
Matrix and Linear Algebra Algorithms
A Diagonal Block Memory-Aware Polynomial Preconditioner for Linear and Eigenvalue Solvers
Xiaojian Yang,
Yuhui Ni,
Fan Yuan,
Shengguo Li,
Dezun Dong,
Chuanfu Xu,
Haipeng Jia, and
Jie Liu
(National University of Defense Technology, China; Xiangtan University, China; University of Chinese Academy of Sciences, China)
Publisher's Version
proc time: 0.7