FSE 2025
Proceedings of the ACM on Software Engineering, Volume 2, Number FSE
Powered by
Conference Publishing Consulting

Proceedings of the ACM on Software Engineering, Volume 2, Number FSE, June 23–27, 2025, Trondheim, Norway

FSE – Journal Issue

Contents - Abstracts - Authors

Frontmatter

Title Page


Editorial Message


Sponsors


Papers

Gleipner: A Benchmark for Gadget Chain Detection in Java Deserialization Vulnerabilities
Bruno Kreyssig and Alexandre Bartel
(Umeå University, Sweden)


Article Search
Detecting Smart Contract State-Inconsistency Bugs via Flow Divergence and Multiplex Symbolic Execution
Yinxi Liu, Wei Meng, and Yinqian Zhang
(Rochester Institute of Technology, USA; Chinese University of Hong Kong, China; Southern University of Science and Technology, China)


Article Search
MendelFuzz: The Return of the Deterministic Stage
Han Zheng, Flavio Toffalini, Marcel Böhme, and Mathias Payer
(EPFL, Switzerland; Ruhr-Universität Bochum, Germany; MPI-SP, Germany)
Can a fuzzer cover more code with minimal corruption of the initial seed? Before a seed is fuzzed, the early greybox fuzzers first systematically enumerated slightly corrupted inputs by applying every mutation operator to every part of the seed, once per generated input. The hope of this so-called “deterministic” stage was that simple changes to the seed would be less likely to break the complex file format; the resulting inputs would find bugs in the program logic well beyond the program’s parser. However, when experiments showed that disabling the deterministic stage achieves more coverage, applying multiple mutation operators at the same time to a single input, most fuzzers disabled the deterministic stage by default. Instead of ignoring the deterministic stage, we analyze its potential and substantially improve deterministic exploration. Our deterministic stage is now the default in AFL++, reverting the earlier decision of dropping deterministic exploration. We start by investigating the overhead and the contribution of the deterministic stage to the discovery of coverage-increasing inputs. While the sheer number of generated inputs explains the overhead, we find that only a few critical seeds (20%), and only a few critical bytes in a seed (0.5%) are responsible for the vast majority of the coverage-increasing inputs (83% and 84%, respectively). Hence, we develop an efficient technique, called , to identify these critical seeds / bytes so as to prune a large number of unnecessary inputs. retains the benefits of the classic deterministic stage by only enumerating a tiny part of the total deterministic state space. We evaluate implementation on two benchmarking frameworks, FuzzBench and Magma. Our evaluation shows that outperforms state-of-the-art fuzzers with and without the (old) deterministic stage enabled, both in terms of coverage and bug finding. also discovered 8 new CVEs on exhaustively fuzzed security-critical applications. Finally, has been independently evaluated and integrated into AFL++ as default option.

Article Search
SmartShot: Hunt Hidden Vulnerabilities in Smart Contracts using Mutable Snapshots
Ruichao Liang, Jing Chen, Ruochen Cao, Kun He, Ruiying Du, Shuhua Li, Zheng Lin, and Cong Wu
(Wuhan University, China; University of Hong Kong, China)


Article Search
On-Demand Scenario Generation for Testing Automated Driving Systems
Songyang Yan, Xiaodong Zhang, Kunkun Hao, Haojie Xin, Yonggang Luo, Jucheng Yang, Ming Fan, Chao Yang, Jun Sun, and Zijiang Yang
(Xi'an Jiaotong University, China; Xidian University, China; Synkrotron, China; Chongqing Changan Automobile, China; Singapore Management University, Singapore; University of Science and Technology of China, China)


Article Search
Element-Based Automated DNN Repair with Fine-Tuned Masked Language Model
Xu Wang, Mingming Zhang, Xiangxin Meng, Jian Zhang, Yang Liu, and Chunming Hu
(Beihang University, China; Nanyang Technological University, Singapore)


Article Search
Mystique: Automated Vulnerability Patch Porting with Semantic and Syntactic-Enhanced LLM
Susheng Wu, Ruisi Wang, Yiheng Cao, Bihuan Chen, Zhuotong Zhou, Yiheng Huang, JunPeng Zhao, and Xin Peng
(Fudan University, China)


Article Search
Smart Contract Fuzzing Towards Profitable Vulnerabilities
Ziqiao Kong, Cen Zhang, Maoyi Xie, Ming Hu, Yue Xue, Ye Liu, Haijun Wang, and Yang Liu
(Nanyang Technological University, Singapore; Singapore Management University, Singapore; MetaTrust Labs, Singapore; Xi'an Jiaotong University, China)


Article Search
CKTyper: Enhancing Type Inference for Java Code Snippets by Leveraging Crowdsourcing Knowledge in Stack Overflow
Anji Li, Neng Zhang, Ying Zou, Zhixiang Chen, Jian Wang, and Zibin Zheng
(Sun Yat-sen University, China; Central China Normal University, China; Queen's University, Canada; Wuhan University, China)


Article Search
Ransomware Detection through Temporal Correlation between Encryption and I/O Behavior
Lihua Guo, Yiwei Hou, Chijin Zhou, Quan Zhang, and Yu Jiang
(Tsinghua University, China)


Article Search
DeclarUI: Bridging Design and Development with Automated Declarative UI Code Generation
Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, and Haoyu Wang
(Huazhong University of Science and Technology, China; Australian National University, Australia)


Article Search
COFFE: A Code Efficiency Benchmark for Code Generation
Yun Peng, Jun Wan, Yichen Li, and Xiaoxue Ren
(Chinese University of Hong Kong, China; Zhejiang University, China)
Code generation has largely improved development efficiency in the era of large language models (LLMs). With the ability to follow instructions, current LLMs can be prompted to generate code solutions given detailed descriptions in natural language. Many research efforts are being devoted to improving the correctness of LLM-generated code, and many benchmarks are proposed to evaluate the correctness comprehensively. Despite the focus on correctness, the time efficiency of LLM-generated code solutions is under-explored. Current correctness benchmarks are not suitable for time efficiency evaluation since their test cases cannot well distinguish the time efficiency of different code solutions. Besides, the current execution time measurement is not stable and comprehensive, threatening the validity of the time efficiency evaluation. To address the challenges in the time efficiency evaluation of code generation, we propose COFFE, a code generation benchmark for evaluating the time efficiency of LLM-generated code solutions. COFFE contains 398 and 358 problems for function-level and file-level code generation, respectively. To improve the distinguishability, we design a novel stressful test case generation approach with contracts and two new formats of test cases to improve the accuracy of generation. For the time evaluation metric, we propose efficienct@k based on CPU instruction count to ensure a stable and solid comparison between different solutions. We evaluate 14 popular LLMs on COFFE and identify four findings. Based on the findings, we draw some implications for LLM researchers and software practitioners to facilitate future research and usage of LLMs in code generation.

Article Search
Pinning Is Futile: You Need More Than Local Dependency Versioning to Defend against Supply Chain Attacks
Hao He, Bogdan Vasilescu, and Christian Kästner
(Carnegie Mellon University, USA)


Article Search
An Empirical Study of Suppressed Static Analysis Warnings
Huimin Hu, Yingying Wang, Julia Rubin, and Michael Pradel
(University of Stuttgart, Germany; University of British Columbia, Canada)
Scalable static analyzers are popular tools for finding incorrect, inefficient, insecure, and hard-to-maintain code early during the development process. Because not all warnings reported by a static analyzer are immediately useful to developers, many static analyzers provide a way to suppress warnings, e.g., in the form of special comments added into the code. Such suppressions are an important mechanism at the interface between static analyzers and software developers, but little is currently known about them. This paper presents the first in-depth empirical study of suppressions of static analysis warnings, addressing questions about the prevalence of suppressions, their evolution over time, the relationship between suppressions and warnings, and the reasons for using suppressions. We answer these questions by studying projects written in three popular languages and suppressions for warnings by four popular static analyzers. Our findings show that (i) suppressions are relatively common, e.g., with a total of 7,357 suppressions in 46 Python projects, (ii) the number of suppressions in a project tends to continuously increase over time, (iii) surprisingly, 50.8% of all suppressions do not affect any warning and hence are practically useless, (iv) some suppressions, including useless ones, may unintentionally hide future warnings, and (v) common reasons for introducing suppressions include false positives, suboptimal configurations of the static analyzer, and misleading warning messages. These results have actionable implications, e.g., that developers should be made aware of useless suppressions and the potential risk of unintentional suppressing, that static analyzers should provide better warning messages, and that static analyzers should separately categorize warnings from third-party libraries.

Article Search
Towards Diverse Program Transformations for Program Simplification
Haibo Wang, Zezhong Xing, Chengnian Sun, Zheng Wang, and Shin Hwei Tan
(Concordia University, Canada; Southern University of Science and Technology, China; University of Waterloo, Canada; University of Leeds, UK)


Article Search
Today’s Cat Is Tomorrow’s Dog: Accounting for Time-Based Changes in the Labels of ML Vulnerability Detection Approaches
Ranindya Paramitha, Yuan Feng, and Fabio Massacci
(University of Trento, Italy; Vrije Universiteit Amsterdam, Netherlands)


Article Search
A New Approach to Evaluating Nullability Inference Tools
Nima Karimipour, Erfan Arvan, Martin Kellogg, and Manu Sridharan
(University of California at Riverside, USA; New Jersey Institute of Technology, USA)


Article Search
A Comprehensive Study of Bug-Fix Patterns in Autonomous Driving Systems
Yuntianyi Chen, Yuqi Huai, Yirui He, Shilong Li, Changnam Hong, Qi Alfred Chen, and Joshua Garcia
(University of California at Irvine, USA)
As autonomous driving systems (ADSes) become increasingly complex and integral to daily life, the importance of understanding the nature and mitigation of software bugs in these systems has grown correspondingly. Addressing the challenges of software maintenance in autonomous driving systems (e.g., handling real-time system decisions and ensuring safety-critical reliability) is crucial due to the unique combination of real-time decision-making requirements and the high stakes of operational failures in ADSes. The potential of automated tools in this domain is promising, yet there remains a gap in our comprehension of the challenges faced and the strategies employed during manual debugging and repair of such systems. In this paper, we present an empirical study that investigates bug-fix patterns in ADSes, with the aim of improving reliability and safety. We have analyzed the commit histories and bug reports of two major autonomous driving projects, Apollo and Autoware, from 1,331 bug fixes with the study of bug symptoms, root causes, and bug-fix patterns. Our study reveals several dominant bug-fix patterns, including those related to path planning, data flow, and configuration management. Additionally, we find that the frequency distribution of bug-fix patterns varies significantly depending on their nature and types and that certain categories of bugs are recurrent and more challenging to exterminate. Based on our findings, we propose a hierarchy of ADS bugs and two taxonomies of 15 syntactic bug-fix patterns and 27 semantic bug-fix patterns that offer guidance for bug identification and resolution. We also contribute a benchmark of 1,331 ADS bug-fix instances.

Preprint Artifacts Available
An Empirical Study on Release-Wise Refactoring Patterns
Shayan Noei, Heng Li, and Ying Zou
(Queen's University, Canada; Polytechnique Montréal, Canada)


Article Search
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang, Md Afif Al Mamun, Jie M. Zhang, and Gias Uddin
(Beijing University of Posts and Telecommunications, China; University of Calgary, Canada; King's College London, UK; York University, Canada)


Article Search
One-for-All Does Not Work! Enhancing Vulnerability Detection by Mixture-of-Experts (MoE)
Xu Yang, Shaowei Wang, Jiayuan Zhou, and Wenhan Zhu
(University of Manitoba, Canada; Huawei, Canada)


Article Search
LlamaRestTest: Effective REST API Testing with Small Language Models
Myeongsoo Kim, Saurabh Sinha, and Alessandro Orso
(Georgia Institute of Technology, USA; IBM Research, USA)


Article Search
Code Change Intention, Development Artifact and History Vulnerability: Putting Them Together for Vulnerability Fix Detection by LLM
Xu Yang, Wenhan Zhu, Michael Pacheco, Jiayuan Zhou, Shaowei Wang, Xing Hu, and Kui Liu
(University of Manitoba, Canada; Huawei, Canada; Zhejiang University, China; Huawei Software Engineering Application Technology Lab, China)


Article Search
QSF: Multi-objective Optimization Based Efficient Solving for Floating-Point Constraints
Xu Yang, Zhenbang Chen, Wei Dong, and Ji Wang
(National University of Defense Technology, China)


Article Search
Adaptive Random Testing with Qgrams: The Illusion Comes True
Matteo Biagiola, Robert Feldt, and Paolo Tonella
(USI Lugano, Switzerland; Chalmers University of Technology, Sweden)


Article Search
Understanding and Characterizing Mock Assertions in Unit Tests
Hengcheng Zhu, Valerio Terragni, Lili Wei, Shing-Chi Cheung, Jiarong Wu, and Yepang Liu
(Hong Kong University of Science and Technology, China; University of Auckland, New Zealand; McGill University, Canada; Southern University of Science and Technology, China)
Mock assertions provide developers with a powerful means to validate program behaviors that are unobservable to test assertions. Despite their significance, they are rarely considered by automated test generation techniques. Effective generation of mock assertions requires understanding how they are used in practice. Although previous studies highlighted the importance of mock assertions, none provide insight into their usages. To bridge this gap, we conducted the first empirical study on mock assertions, examining their adoption, the characteristics of the verified method invocations, and their effectiveness in fault detection. Our analysis of 4,652 test cases from 11 popular Java projects reveals that mock assertions are mostly applied to validating specific kinds of method calls, such as those interacting with external resources and those reflecting whether a certain code path was traversed in systems under test. Additionally, we find that mock assertions complement traditional test assertions by ensuring the desired side effects have been produced, validating control flow logic, and checking internal computation results. Our findings contribute to a better understanding of mock assertion usages and provide a foundation for future related research such as automated test generation that support mock assertions.

Article Search Artifacts Available
Cross-System Categorization of Abnormal Traces in Microservice-Based Systems via Meta-Learning
Yuqing Wang, Mika V. Mäntylä, Serge Demeyer, Mutlu Beyazıt, Joanna Kisaakye, and Jesse Nyyssölä
(University of Helsinki, Finland; University of Oulu, Finland; University of Antwerp, Belgium)


Article Search
Detecting and Handling WoT Violations by Learning Physical Interactions from Device Logs
Bingkun Sun, Shiqi Sun, Jialin Ren, Mingming Hu, Kun Hu, Liwei Shen, and Xin Peng
(Fudan University, China; Northwestern Polytechnique University, China)


Article Search
The Landscape of Toxicity: An Empirical Investigation of Toxicity on GitHub
Jaydeb Sarker, Asif Kamal Turzo, and Amiangshu Bosu
(University of Nebraska at Omaha, USA; Wayne State University, USA)


Article Search
DuoReduce: Bug Isolation for Multi-layer Extensible Compilation
Jiyuan Wang, Yuxin Qiu, Ben Limpanukorn, Hong Jin Kang, Qian Zhang, and Miryung Kim
(University of California at Los Angeles, USA; University of California at Riverside, USA)


Article Search
ROSCallBaX: Statically Detecting Inconsistencies in Callback Function Setup of Robotic Systems
Sayali Kate, Yifei Gao, Shiwei Feng, and Xiangyu Zhang
(Purdue University, USA)


Article Search
Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models
Yanlin Wang, Tianyue Jiang, Mingwei Liu, Jiachi Chen, Mingzhi Mao, Xilin Liu, Yuchi Ma, and Zibin Zheng
(Sun Yat-sen University, China; Huawei Cloud Computing Technologies, China)


Article Search
Automated Unit Test Refactoring
Yi Gao, Xing Hu, Xiaohu Yang, and Xin Xia
(Zhejiang University, China)


Article Search
HornBro: Homotopy-Like Method for Automated Quantum Program Repair
Siwei Tan, Liqiang Lu, Debin Xiang, Tianyao Chu, Congliang Lang, Jintao Chen, Xing Hu, and Jianwei Yin
(Zhejiang University, China)


Article Search
Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead
Yanqi Su, Zhenchang Xing, Chong Wang, Chunyang Chen, Sherry (Xiwei) Xu, Qinghua Lu, and Liming Zhu
(Australian National University, Australia; CSIRO's Data61, Australia; Nanyang Technological University, Singapore; TU Munich, Germany; CSIRO, Australia; UNSW, Australia)


Article Search
LLM-Based Method Name Suggestion with Automatically Generated Context-Rich Prompts
Waseem Akram, Yanjie Jiang, Yuxia Zhang, Haris Ali Khan, and Hui Liu
(Beijing Institute of Technology, China; Peking University, China)


Article Search
Demystifying LLM-Based Software Engineering Agents
Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang
(University of Illinois at Urbana-Champaign, USA)


Article Search
Standing on the Shoulders of Giants: Bug-Aware Automated GUI Testing via Retrieval Augmentation
Mengzhuo Chen, Zhe Liu, Chunyang Chen, Junjie Wang, Boyu Wu, Jun Hu, and Qing Wang
(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; n.n., China; TU Munich, Germany)


Article Search
PDCAT: Preference-Driven Compiler Auto-tuning
Mingxuan Zhu, Zeyu Sun, and Dan Hao
(Peking University, China; Institute of Software at Chinese Academy of Sciences, China)


Article Search
Towards Understanding Docker Build Faults in Practice: Symptoms, Root Causes, and Fix Patterns
Yiwen Wu, Yang Zhang, Tao Wang, Bo Ding, and Huaimin Wang
(National University of Defense Technology, China)


Article Search
Large Language Models for In-File Vulnerability Localization Can Be “Lost in the End”
Francesco Sovrano, Adam Bauer, and Alberto Bacchelli
(ETH Zurich, Switzerland; University of Zurich, Switzerland)


Article Search
Empirically Evaluating the Impact of Object-Centric Breakpoints on the Debugging of Object-Oriented Programs
Valentin Bourcier, Pooja Rani, Maximilian Ignacio Willembrinck Santander, Alberto Bacchelli, and Steven Costiou
(University of Lille - Inria - CNRS - Centrale Lille - UMR 9189 CRIStAL, France; University of Zurich, Switzerland)


Article Search
ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution
Lars Gröninger, Beatriz Souza, and Michael Pradel
(University of Stuttgart, Germany)


Article Search
Integrating Large Language Models and Reinforcement Learning for Non-linear Reasoning
Yoav Alon and Cristina David
(University of Bristol, UK)


Article Search
How Do Programming Students Use Generative AI?
Christian Rahe and Walid Maalej
(University of Hamburg, Germany)


Article Search
LLMDroid: Enhancing Automated Mobile App GUI Testing Coverage with Large Language Model Guidance
Chenxu Wang, Tianming Liu, Yanjie Zhao, Minghui Yang, and Haoyu Wang
(Huazhong University of Science and Technology, China; Monash University, Australia; OPPO, China)


Article Search
The Struggles of LLMs in Cross-Lingual Code Clone Detection
Micheline Bénédicte Moumoula, Abdoul Kader Kaboré, Jacques Klein, and Tegawendé F. Bissyandé
(University of Luxembourg, Luxembourg)


Article Search
Has My Code Been Stolen for Model Training? A Naturalness Based Approach to Code Contamination Detection
Haris Ali Khan, Yanjie Jiang, Qasim Umer, Yuxia Zhang, Waseem Akram, and Hui Liu
(Beijing Institute of Technology, China; Peking University, China; King Fahd University of Petroleum and Minerals, Saudi Arabia)


Article Search
Automated and Accurate Token Transfer Identification and Its Applications in Cryptocurrency Security
Shuwei Song, Ting Chen, Ao Qiao, Xiapu Luo, Leqing Wang, Zheyuan He, Ting Wang, Xiaodong Lin, Peng He, Wensheng Zhang, and Xiaosong Zhang
(University of Electronic Science and Technology of China, China; Hong Kong Polytechnic University, China; Stony Brook University, USA; University of Guelph, Canada)


Article Search
Revolutionizing Newcomers’ Onboarding Process in OSS Communities: The Future AI Mentor
Xin Tan, Xiao Long, Yinghao Zhu, Lin Shi, Xiaoli Lian, and Li Zhang
(Beihang University, China)


Article Search
Impact of Request Formats on Effort Estimation: Are LLMs Different Than Humans?
Gül Çalıklı and Mohammed Alhamed
(University of Glasgow, UK; Applied Behaviour Systems, UK)


Article Search
Mitigating Emergent Malware Label Noise in DNN-Based Android Malware Detection
Haodong Li, Xiao Cheng, Guohan Zhang, Guosheng Xu, Guoai Xu, and Haoyu Wang
(Beijing University of Posts and Telecommunications, China; UNSW, Australia; Harbin Institute of Technology, Shenzhen, China; Huazhong University of Science and Technology, China)


Article Search
Detecting Metadata-Related Bugs in Enterprise Applications
Md Mahir Asef Kabir, Xiaoyin Wang, and Na Meng
(Virginia Tech, USA; University of Texas at San Antonio, USA)
When building enterprise applications (EAs) on Java frameworks (e.g., Spring), developers often configure application components via metadata (i.e., Java annotations and XML files). It is challenging for developers to correctly use metadata, because the usage rules can be complex and existing tools provide limited assistance. When developers misuse metadata, EAs become misconfigured, which defects can trigger erroneous runtime behaviors or introduce security vulnerabilities. To help developers correctly use metadata, this paper presents (1) RSL---a domain-specific language that domain experts can adopt to prescribe metadata checking rules, and (2) MeCheck ---a tool that takes in RSL rules and EAs to check for rule violations. With RSL, domain experts (e.g., developers of a Java framework) can specify metadata checking rules by defining content consistency among XML files, annotations, and Java code. Given such RSL rules and a program to scan, MeCheck interprets rules as cross-file static analyzers, which analyzers scan Java and/or XML files to gather information and look for consistency violations. For evaluation, we studied the Spring and JUnit documentation to manually define 15 rules, and created 2 datasets with 115 open-source EAs. The first dataset includes 45 EAs, and the ground truth of 45 manually injected bugs. The second dataset includes multiple versions of 70 EAs. We observed that MeCheck identified bugs in the first dataset with 100% precision, 96% recall, and 98% F-score. It reported 156 bugs in the second dataset, 53 of which bugs were already fixed by developers. Our evaluation shows that MeCheck helps ensure the correct usage of metadata.

Article Search
An Adaptive Language-Agnostic Pruning Method for Greener Language Models for Code
Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, and Tushar Sharma
(Dalhousie University, Canada; Linköping University, Sweden; McGill University, Canada)


Article Search
10 Years Later: Revisiting How Developers Search for Code
Kathryn T. Stolee, Tobias Welp, Caitlin Sadowski, and Sebastian Elbaum
(North Carolina State University, USA; Google, Germany; Unaffiliated, USA; University of Virginia, USA)


Article Search
IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models
Sayem Mohammad Imtiaz, Astha Singh, Fraol Batole, and Hridesh Rajan
(Iowa State University, USA; Tulane University, USA)


Article Search
Clone Detection for Smart Contracts: How Far Are We?
Zuobin Wang, Zhiyuan Wan, Yujing Chen, Yun Zhang, David Lo, Difan Xie, and Xiaohu Yang
(Zhejiang University, China; Hangzhou City University, China; Singapore Management University, Singapore; Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, China)
In smart contract development, practitioners frequently reuse code to reduce development effort and avoid reinventing the wheel. This reused code, whether identical or similar to its original source, is referred to as a code clone. Unintentional code cloning can propagate flaws and vulnerabilities, potentially undermining the reliability and maintainability of software systems. Previous studies have identified a significant prevalence of code clones in Solidity smart contracts on the Ethereum blockchain. To mitigate the risks posed by code clones, clone detection has emerged as an active field of research and practice in software engineering. Recent studies have extended existing techniques or proposed novel techniques tailored to the unique syntactic and semantic features of Solidity. Nonetheless, the evaluations of existing techniques, whether conducted by their original authors or independent researchers, involve codebases in various programming languages and utilize different versions of the corresponding tools. The resulting inconsistency makes direct comparisons of the evaluation results impractical, and hinders the ability to derive meaningful conclusions across the evaluations. There remains a lack of clarity regarding the effectiveness of these techniques in detecting smart contract clones, and whether it is feasible to combine different techniques to achieve scalable yet accurate detection of code clones in smart contracts. To address this gap, we conduct a comprehensive empirical study that evaluates the effectiveness and scalability of five representative clone detection techniques on 33,073 verified Solidity smart contracts, along with a benchmark we curate, in which we manually label 72,010 pairs of Solidity smart contracts with clone tags. Moreover, we explore the potential of combining different techniques to achieve optimal performance of code clone detection for smart contracts, and propose SourceREClone, a framework designed for the refined integration of different techniques, which achieves a 36.9% improvement in F1 score compared to a straightforward combination of the state of the art. Based on our findings, we discuss implications, provide recommendations for practitioners, and outline directions for future research.

Article Search
Dissecting Real-World Cross-Language Bugs
Haoran Yang and Haipeng Cai
(Washington State University, USA; SUNY Buffalo, USA)
Multilingual systems are prevalent and broadly impactful, but also complex due to the intricate interactions between the heterogeneous programming languages the systems are developed in. This complexity is further aggravated by the diversity of cross-language interoperability across different language combinations, resulting in additional, often stealthy cross-language bugs. Yet despite the growing number of tools aimed to discover cross-language bugs, a systematic understanding of such bugs is still lacking. To fill this gap, we conduct the first comprehensive study of cross-language bugs, characterizing them in 5 aspects including their symptoms, locations, manifestation, root causes, and fixes, as well as their relationships. Through careful identification and detailed analysis of 400 cross-language bugs in real-world multilingual projects classified from 54,356 relevant code commits in their GitHub repositories, we revealed not only bug characteristics of those five aspects but also how they compare between two top language combinations in the multilingual world (Python-C and Java-C). In addition to findings of the study as well as its enabling tools and datasets, we also provide practical recommendations regarding the prevention, detection, and patching of cross-language bugs.

Preprint Info
Less Is More: On the Importance of Data Quality for Unit Test Generation
Junwei Zhang, Xing Hu, Shan Gao, Xin Xia, David Lo, and Shanping Li
(Zhejiang University, China; Huawei, China; Singapore Management University, Singapore)


Article Search
Protecting Privacy in Software Logs: What Should Be Anonymized?
Roozbeh Aghili, Heng Li, and Foutse Khomh
(Polytechnique Montréal, Canada)


Article Search
MiSum: Multi-modality Heterogeneous Code Graph Learning for Multi-intent Binary Code Summarization
Kangchen Zhu, Zhiliang Tian, Shangwen Wang, Weiguo Chen, Zixuan Dong, Mingyue Leng, and Xiaoguang Mao
(National University of Defense Technology, China)


Article Search
SemBIC: Semantic-Aware Identification of Bug-Inducing Commits
Xiao Chen, Hengcheng Zhu, Jialun Cao, Ming Wen, and Shing-Chi Cheung
(Hong Kong University of Science and Technology, China; Huazhong University of Science and Technology, China)


Article Search
Eliminating Backdoors in Neural Code Models for Secure Code Understanding
Weisong Sun, Yuchen Chen, Chunrong Fang, Yebo Feng, Yuan Xiao, An Guo, Quanjun Zhang, Zhenyu Chen, Baowen Xu, and Yang Liu
(Nanjing University, Singapore; Nanjing University, China; Nanyang Technological University, Singapore)


Article Search
Understanding Debugging as Episodes: A Case Study on Performance Bugs in Configurable Software Systems
Max Weber, Alina Mailach, Sven Apel, Janet Siegmund, Raimund Dachselt, and Norbert Siegmund
(Leipzig University, Germany; Saarland University, Germany; TU Chemnitz, Germany; TU Dresden, Germany)


Article Search
Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing
Weibin Wu, Yuhang Cao, Ning Yi, Rongyi Ou, and Zibin Zheng
(Sun Yat-sen University, China)


Article Search
On the Unnecessary Complexity of Names in X.509 and Their Impact on Implementations
Yuteng Sun, Joyanta Debnath, Wenzheng Hong, Omar Chowdhury, and Sze Yiu Chau
(Chinese University of Hong Kong, Hong Kong, China; Stony Brook University, USA; Independent, China)


Article Search
RePurr: Automated Repair of Block-Based Learners’ Programs
Sebastian Schweikl and Gordon Fraser
(University of Passau, Germany)


Article Search
Why the Proof Fails in Different Versions of Theorem Provers: An Empirical Study of Compatibility Issues in Isabelle
Xiaokun Luan, David Sanan, Zhe Hou, Qiyuan Xu, Liu Chengwei, Yufan Cai, Yang Liu, and Meng Sun
(Peking University, China; Singapore Institute of Technology, Singapore; Griffith University, Australia; Nanyang Technological University, Singapore; National University of Singapore, Singapore)


Article Search
Improving Graph Learning-Based Fault Localization with Tailored Semi-supervised Learning
Chun Li, Hui Li, Zhong Li, Minxue Pan, and Xuandong Li
(Nanjing University, China; Samsung Electronics, China)


Article Search
A Mixed-Methods Study of Model-Based GUI Testing in Real-World Industrial Settings
Shaoheng Cao, Renyi Chen, Wenhua Yang, Minxue Pan, and Xuandong Li
(Nanjing University, China; Samsung Electronics, China; Nanjing University of Aeronautics and Astronautics, China)


Article Search

proc time: 15.27