ISSTA 2024
33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)

Powered by

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024), September 16–20, 2024, Vienna, Austria

ISSTA 2024 – Preliminary Table of Contents

Contents - Abstracts - Authors

Frontmatter

Title Page

Message from the Chairs

Committees

Papers Round 1

Detecting Build Dependency Errors in Incremental Builds
Jun Lyu

, Shanshan Li

, He Zhang

, Yang Zhang

, Guoping Rong

, and Manuel Rigger

(Nanjing University, China; National University of Singapore, Singapore)

Article Search

Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs
Shiwen Shan

, Yintong Huo

, Yuxin Su

, Yichen Li

, Dan Li

, and Zibin Zheng

(Sun Yat-sen University, China; Chinese University of Hong Kong, China)

Article Search

FastLog: An End-to-End Method to Efficiently Generate and Insert Logging Statements
Xiaoyuan Xie

, Zhipeng Cai

, Songqiang Chen

, and Jifeng Xuan

(Wuhan University, China; Hong Kong University of Science and Technology, China)

Preprint

FortifyPatch: Towards Tamper-Resistant Live Patching in Linux-Based Hypervisor
Zhenyu Ye

, Lei Zhou

, Fengwei Zhang

, Wenqiang Jin

, Zhenyu Ning

, Yupeng Hu

, and Zheng Qin

(Hunan University, China; National University of Defense Technology, China; Southern University of Science and Technology, China; Xinchuang Haihe Laboratory, China)

Article Search

Unimocg: Modular Call-Graph Algorithms for Consistent Handling of Language Features
Dominik Helm

, Tobias Roth

, Sven Keidel

, Michael Reif

, and Mira Mezini

(TU Darmstadt, Germany; National Research Center for Applied Cybersecurity ATHENE, Germany; CQSE, Germany; hessian.AI, Germany)

Article Search

Artifacts Available

Precise Compositional Buffer Overflow Detection via Heap Disjointness
Yiyuan Guo

, Peisen Yao

, and Charles Zhang

(Hong Kong University of Science and Technology, China; Zhejiang University, China)

Article Search

Enhancing ROS System Fuzzing through Callback Tracing
Yuheng Shen

, Jianzhong Liu

, Yiru Xu

, Hao Sun

, Mingzhe Wang

, Nan Guan

, Heyuan Shi

, and Yu Jiang

(Tsinghua University, China; ETH Zurich, Switzerland; City University of Hong Kong, China; Central South University, China)

Article Search

API Misuse Detection via Probabilistic Graphical Model
Yunlong Ma

, Wentong Tian

, Xiang Gao

, Hailong Sun

, and Li Li

(Beihang University, China)

Article Search

Ma11y: A Mutation Framework for Web Accessibility Testing
Mahan Tafreshipour

, Anmol Deshpande

, Forough Mehralian

, Iftekhar Ahmed

, and Sam Malek

(University of California at Irvine, Irvine, USA)

Article Search

Info

Total Recall? How Good Are Static Call Graphs Really?
Dominik Helm

, Sven Keidel

, Anemone Kampkötter

, Johannes Düsing

, Tobias Roth

, Ben Hermann

, and Mira Mezini

(TU Darmstadt, Germany; National Research Center for Applied Cybersecurity ATHENE, Germany; TU Dortmund, Germany; hessian.AI, Germany)

Article Search

Artifacts Available

CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios
Zhengran Zeng

, Yidong Wang

, Rui Xie

, Wei Ye

, and Shikun Zhang

(Peking University, China)

Article Search

DAppFL: Just-in-Time Fault Localization for Decentralized Applications in Web3
Zhiying Wu

, Jiajing Wu

, Hui Zhang

, Ziwei Li

, Jiachi Chen

, Zibin Zheng

, Qing Xia

, Gang Fan

, and Yi Zhen

(Sun Yat-sen University, China; Institute of Software at Chinese Academy of Sciences, China; n.n., China)

Article Search

CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection
Hao Wang

, Zeyu Gao

, Chao Zhang

, Mingyang Sun

, Yuchen Zhou

, Han Qiu

, and Xi Xiao

(Tsinghua University, China; University of Electronic Science and Technology of China, China; Beijing University of Technology, China)

Preprint

Interprocedural Path Complexity Analysis
Mira Kaniyur

, Ana Cavalcante-Studart

, Yihan Yang

, Sangeon Park

, David Chen

, Duy Lam

, and Lucas Bang

(Harvey Mudd College, USA)

Article Search

Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models
Mingyi Zhou

, Xiang Gao

, Pei Liu

, John Grundy

, Chunyang Chen

, Xiao Chen

, and Li Li

(Monash University, Australia; Beihang University, China; CSIRO’s Data61, Australia; TU Munich, Germany; University of Newcastle, Australia)

Article Search

Artifacts Available

UPBEAT: Test Input Checks of Q# Quantum Libraries
Tianmin Hu

, Guixin Ye

, Zhanyong Tang

, Shin Hwei Tan

, Huanting Wang

, Meng Li

, and Zheng Wang

(Northwest University, China; Concordia University, Canada; University of Leeds, United Kingdom; Hefei University of Technology, China)

Article Search

Enhancing Robustness of Code Authorship Attribution through Expert Feature Knowledge
Xiaowei Guo

, Cai Fu

, Juan Chen

, Hongle Liu

, Lansheng Han

, and Wenjin Li

(Huazhong University of Science and Technology, China; NSFOCUS Technologies Group, China)

Article Search

A Large-Scale Empirical Study on Improving the Fairness of Image Classification Models
Junjie Yang

, Jiajun Jiang

, Zeyu Sun

, and Junjie Chen

(Tianjin University, China; Institute of Software at Chinese Academy of Sciences, China)

Article Search

A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?
Zhihan Jiang

, Jinyang Liu

, Junjie Huang

, Yichen Li

, Yintong Huo

, Jiazhen Gu

, Zhuangbin Chen

, Jieming Zhu

, and Michael R. Lyu

(Chinese University of Hong Kong, China; Sun Yat-sen University, China; Huawei Noah’s Ark Lab, China)

Article Search

SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection
Xin-Cheng Wen

, Cuiyun Gao

, Shuzheng Gao

, Yang Xiao

, and Michael R. Lyu

(Harbin Institute of Technology, China; Chinese University of Hong Kong, China; Chinese Academy of Sciences, China)
Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities. However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related structural information, as reflected in the following two aspects. First, they tend to fail to infer the semantics of the code statements with complex logic such as those containing multiple operators and pointers. Second, they are hard to comprehend various code execution sequences, which is essential for precise vulnerability detection. To mitigate the challenges, we propose a Structured Natural Language Comment tree-based vulnerAbiLity dEtection framework based on the pre-trained models, named . The proposed Structured Natural Language Comment Tree (SCT) integrates the semantics of code statements with code execution sequences based on the Abstract Syntax Trees (ASTs).Specifically, comprises three main modules: (1) Comment Tree Construction, which aims at enhancing the model’s ability to infer the semantics of code statements by first incorporating Large Language Models (LLMs) for comment generation and then adding the comment node to ASTs. (2) Structured Natural Language Comment Tree Construction, which aims at explicitly involving code execution sequence by combining the code syntax templates with the comment tree. (3) SCT-Enhanced Representation, which finally incorporates the constructed SCTs for well capturing vulnerability patterns. Experimental results demonstrate that outperforms the best-performing baseline, including the pre-trained model and LLMs, with improvements of 2.96%, 13.47%, and 3.75% in terms of F1 score on the FFMPeg+Qemu, Reveal, and SVulD datasets, respectively. Furthermore, can be applied to different pre-trained models, such as CodeBERT and UniXcoder, yielding the F1 score performance enhancements ranging from 1.37% to 10.87%.

Preprint

Distance-Aware Test Input Selection for Deep Neural Networks
Zhong Li

, Zhengfeng Xu

, Ruihua Ji

, Minxue Pan

, Tian Zhang

, Linzhang Wang

, and Xuandong Li

(Nanjing University, China)

Article Search

LPR: Large Language Models-Aided Program Reduction
Mengxiao Zhang

, Yongqiang Tian

, Zhenyang Xu

, Yiwen Dong

, Shin Hwei Tan

, and Chengnian Sun

(University of Waterloo, Canada; Hong Kong University of Science and Technology, China; Concordia University, Canada)
Program reduction is a widely used technique to facilitate debugging compilers by automatically minimizing programs that trigger compiler bugs. Existing program reduction techniques are either generic to a wide range of languages (such as Perses and Vulcan) or specifically optimized for one certain language by exploiting language-specific knowledge (e.g., C-Reduce). However, synergistically combining both generality across languages and optimality to a specific language in program reduction is yet to be explored. This paper proposes LPR, the first LLMs-aided technique leveraging LLMs to perform language-specific program reduction for multiple languages. The key insight is to utilize both the language generality of program reducers such as Perses and the languagespecific semantics learned by LLMs. Concretely, language-generic program reducers can efficiently reduce programs into a small size that is suitable for LLMs to process; LLMs can effectively transform programs via the learned semantics to create new reduction opportunities for the language-generic program reducers to further reduce the programs. Our thorough evaluation on 50 benchmarks across three programming languages (i.e., C, Rust and JavaScript) has demonstrated LPR’s practicality and superiority over Vulcan, the state-of-the-art language-generic program reducer. For effectiveness, LPR surpasses Vulcan by producing 24.93%, 4.47%, and 11.71% smaller programs on benchmarks in C, Rust and JavaScript, separately. Moreover, LPR and Vulcan have the potential to complement each other. For the C language for which C-Reduce is optimized, by applying Vulcan to the output produced by LPR, we can attain program sizes that are on par with those achieved by C-Reduce. For efficiency perceived by users, LPR is more efficient when reducing large and complex programs, taking 10.77%, 34.88%, 36.96% less time than Vulcan to finish all the benchmarks in C, Rust and JavaScript, separately.

Article Search

Artifacts Available

Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code
Yujia Chen

, Cuiyun Gao

, Zezhou Yang

, Hongyu Zhang

, and Qing Liao

(Harbin Institute of Technology, China; Chongqing University, China)

Article Search

Define-Use Guided Path Exploration for Better Forced Execution
Dongnan He

, Dongchen Xie

, Yujie Wang

, Wei You

, Bin Liang

, Jianjun Huang

, Wenchang Shi

, Zhuo Zhang

, and Xiangyu Zhang

(Renmin University of China, China; Purdue University, USA)

Article Search

C2D2: Extracting Critical Changes for Real-World Bugs with Dependency-Sensitive Delta Debugging
Xuezhi Song

, Yijian Wu

, Shuning Liu

, Bihuan Chen

, Yun Lin

, and Xin Peng

(Fudan University, China; Shanghai Jiao Tong University, China)

Article Search

FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion
Qi Guo

, Xiaohong Li

, Xiaofei Xie

, Shangqing Liu

, Ze Tang

, Ruitao Feng

, Junjie Wang

, Jidong Ge

, and Lei Bu

(Tianjin University, China; Singapore Management University, Singapore; Nanyang Technological University, Singapore; Nanjing University, China)

Article Search

MicroRes: Versatile Resilience Profiling in Microservices via Degradation Dissemination Indexing
Tianyi Yang

, Cheryl Lee

, Jiacheng Shen

, Yuxin Su

, Cong Feng

, Yongqiang Yang

, and Michael R. Lyu

(Chinese University of Hong Kong, Hong Kong; Sun Yat-sen University, China; Huawei Cloud Computing Technology, China)

Preprint

Isolation-Based Debugging for Neural Networks
Jialuo Chen

, Jingyi Wang

, Youcheng Sun

, Peng Cheng

, and Jiming Chen

(Zhejiang University, China; University of Manchester, United Kingdom)

Article Search

Atlas: Automating Cross-Language Fuzzing on Android Closed-Source Libraries
Hao Xiong

, Qinming Dai

, Rui Chang

, Mingran Qiu

, Renxiang Wang

, Wenbo Shen

, and Yajin Zhou

(Zhejiang University, China; ZJU-Hangzhou Global Scientific and Technological Innovation Center, China)

Article Search

Automating Zero-Shot Patch Porting for Hard Forks
Shengyi Pan

, You Wang

, Zhongxin Liu

, Xing Hu

, Xin Xia

, and Shanping Li

(Zhejiang University, China; Huawei, China)
Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergence between the source project and the hard fork, patch porting is complicated, which requires an adaption regarding different implementations of the same functionality. In this work, we take the first step to automate patch porting for hard forks under a zero-shot setting. We first conduct an empirical study of the patches ported from Vim to Neovim over the last ten years to investigate the necessities of patch porting and the potential flaws in the current practice. We then propose a large language model (LLM) based approach (namely PPatHF) to automatically port patches for hard forks on a function-wise basis. Specifically, PPatHF is composed of a reduction module and a porting module. Given the pre- and post-patch versions of a function from the reference project and the corresponding function from the target project, the reduction module first slims the input functions by removing code snippets less relevant to the patch. Then, the porting module leverages a LLM to apply the patch to the function from the target project. To better elicit the power of the LLM on patch porting, we design a prompt template to enable efficient in-context learning. We further propose an instruction-tuning based training task to better guide the LLM to port the patch and inject task-specific knowledge. We evaluate PPatHF on 310 Neovim patches ported from Vim. The experimental results show that PPatHF outperforms the baselines significantly. Specifically, PPatHF can correctly port 131 (42.3%) patches and automate 57% of the manual edits required for the developer to port the patch.

Article Search

DiaVio: LLM-Empowered Diagnosis of Safety Violations in ADS Simulation Testing
You Lu

, Yifan Tian

, Yuyang Bi

, Bihuan Chen

, and Xin Peng

(Fudan University, China)

Article Search

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation
Zhaoyang Chu

, Yao Wan

, Qian Li

, Yang Wu

, Hongyu Zhang

, Yulei Sui

, Guandong Xu

, and Hai Jin

(Huazhong University of Science and Technology, China; Curtin University, Perth, Australia; Chongqing University, China; UNSW, Sydney, Australia; University of Technology, Sydney, Australia)

Article Search

DeFort: Automatic Detection and Analysis of Price Manipulation Attacks in DeFi Applications
Maoyi Xie

, Ming Hu

, Ziqiao Kong

, Cen Zhang

, Yebo Feng

, Haijun Wang

, Yue Xue

, Hao Zhang

, Ye Liu

, and Yang Liu

(Nanyang Technological University, Singapore; Xi’an Jiaotong University, China; MetaTrust Labs, Singapore)

Article Search

Info

Traceback: A Fault Localization Technique for Molecular Programs
Michael C. Gerten

, James I. Lathrop

, and Myra B. Cohen

(Iowa State University, USA)

Article Search

Silent Taint-Style Vulnerability Fixes Identification
Zhongzhen Wen

, Jiayuan Zhou

, Minxue Pan

, Shaohua Wang

, Xing Hu

, Tongtong Xu

, Tian Zhang

, and Xuandong Li

(Nanjing University, China; Huawei, Waterloo, Canada; Central University of Finance and Economics, China; Zhejiang University, China; Huawei, China)

Article Search

Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs
Yicheng Ouyang

, Jun Yang

, and Lingming Zhang

(University of Illinois at Urbana-Champaign, USA)
As bugs are inevitable and prevalent in real-world programs, many Automated Program Repair (APR) techniques have been proposed to generate patches for them. However, due to the lack of a standard for evaluating APR techniques, prior works tend to use different settings and benchmarks in evaluation, threatening the trustworthiness of the evaluation results. Additionally, they typically only adopt plausibility and genuineness as evaluation metrics, which may potentially mask some underlying issues in APR techniques. To overcome these issues, in this paper, we conduct an extensive and multi-dimensional evaluation of nine learning-based and three traditional state-of-the-art APR techniques under the same environment and settings. We employ the widely studied Defects4J V2.0.0 benchmark and a newly constructed large-scale mutation-based benchmark named MuBench, derived from Defects4J and including 1,700 artificial bugs generated by various mutators, to uncover potential limitations in these APR techniques. We also apply multi-dimensional metrics, including compilability/plausibility/genuineness metrics, as well as SYE (SYntactic Equivalence) and TCE (Trivial Compiler Equivalence) metrics, to thoroughly analyze the 1,814,652 generated patches. This paper presents noteworthy findings from the extensive evaluation: Firstly, Large Language Model (LLM) based APR demonstrates less susceptibility to overfitting on the Defects4J V1.2.0 dataset and fixes the most number of bugs. Secondly, the study suggests a promising future for combining traditional and learning-based APR techniques, as they exhibit complementary advantages in fixing different types of bugs. Additionally, this work highlights the necessity for further enhancing patch compilability of learning-based APR techniques, despite the presence of various existing strategies attempting to improve it. The study also reveals other guidelines for enhancing APR techniques, including the need for handling unresolvable symbol compilability issues and reducing duplicate/no-op patch generation. Finally, our study uncovers seven implementation issues in the studied techniques, with five of them confirmed and fixed by the corresponding authors.

Article Search

Info

Artifacts Available

Multi-modal Learning for WebAssembly Reverse Engineering
Hanxian Huang

and Jishen Zhao

(University of California at San Diego, San Diego, USA)
The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the availability of an ample supply of high-quality task-specific labeled data. Moreover, previous works trained models only with features extracted from WebAssembly, overlooking the high-level semantics present in the corresponding source code and its documentation. Acknowledging the abundance of available source code with documentation, which can be compiled into WebAssembly, we propose to learn representations of them concurrently and harness their mutual relationships for effective WebAssembly reverse engineering. In this paper, we present WasmRev, the first multi-modal pre-trained language model for WebAssembly reverse engineering. WasmRev is pre-trained using self-supervised learning on a large-scale multi-modal corpus encompassing source code, code documentation and the compiled WebAssembly, without requiring labeled data. WasmRev incorporates three tailored multi-modal pre-training tasks to capture various characteristics of WebAssembly and cross-modal relationships. WasmRev is only trained once to produce general-purpose representations that can broadly support WebAssembly reverse engineering tasks through few-shot fine-tuning with much less labeled data, improving data efficiency. We fine-tune WasmRev onto three important reverse engineering tasks: type recovery, function purpose identification and WebAssembly summarization. Our results show that WasmRev pre-trained on the corpus of multi-modal samples establishes a robust foundation for these tasks, achieving high task accuracy and outperforming the state-of-the-art ML methods for WebAssembly reverse engineering.

Article Search

CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature
Chenyan Liu

, Yufan Cai

, Yun Lin

, Yuhuan Huang

, Yunrui Pei

, Bo Jiang

, Ping Yang

, Jin Song Dong

, and Hong Mei

(Shanghai Jiao Tong University, China; National University of Singapore, Singapore; Bytedance Network Technology, Beijing, China)
Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the subsequent edits is non-trivial as the scope of its ripple effect can be the whole project. In this work, we propose CoEdPilot, an LLM-driven solution to recommend code edits by discriminating the relevant edits, exploring their interactive natures, and estimating its ripple effect in the project. Specifically, CoEdPilot orchestrates multiple neural transformers to identify what and how to edit in the project regarding both edit location and edit content. When a user accomplishes an edit with an optional editing description, an Subsequent Edit Analysis first reports the most relevant files in the project with what types of edits (e.g., keep, insert, and replace) can happen for each line of their code. Next, an Edit-content Generator generates concrete edit options for the lines of code, regarding its relevant prior changes reported by an Edit-dependency Analyzer. Last, both the Subsequent Edit Analysis and the Edit-content Generator capture relevant prior edits as feedback to readjust their recommendations. We train our models by collecting over 180K commits from 471 open-source projects in 5 programming languages. Our extensive experiments show that (1) CoEdPilot can well predict the edits (i.e., predicting edit location with accuracy of 70.8%-85.3%, and the edit content with exact match rate of 41.8% and BLEU4 score of 60.7); (2) CoEdPilot can well boost existing edit generators such as GRACE and CCT5 on exact match rate by 8.57% points and BLEU4 score by 18.08. Last, our user study on 18 participants with 3 editing tasks (1) shows that CoEdPilot can be effective in assisting users to edit code in comparison with Copilot, and (2) sheds light on the future improvement of the tool design. The video demonstration of our tool is available at https://sites.google.com/view/coedpilot/home.

Article Search

Info

Automated Deep Learning Optimization via DSL-Based Source Code Transformation
Ruixin Wang

, Minghai Lu

, Cody Hao Yu

, Yi-Hsiang Lai

, and Tianyi Zhang

(Purdue University, USA; BosonAI, USA; Amazon Web Services, USA)

Article Search

Artifacts Available

Evaluating the Effectiveness of Decompilers
Ying Cao

, Runze Zhang

, Ruigang Liang

, and Kai Chen

(Institute of Information Engineering at Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, China)

Article Search

Artifacts Available

CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision
Hao Wang

, Zeyu Gao

, Chao Zhang

, Zihan Sha

, Mingyang Sun

, Yuchen Zhou

, Wenyu Zhu

, Wenju Sun

, Han Qiu

, and Xi Xiao

(Tsinghua University, China; Information Engineering University, China; University of Electronic Science and Technology of China, China; Beijing University of Technology, China)

Preprint

FunRedisp: Reordering Function Dispatch in Smart Contract to Reduce Invocation Gas Fees
Yunqi Liu

and Wei Song

(Nanjing University of Science and Technology, China)

Article Search

Artifacts Available

Papers Round 2

FDI: Attack Neural Code Generation Systems through User Feedback Channel
Zhensu Sun

, Xiaoning Du

, Xiapu Luo

, Fu Song

, David Lo

, and Li Li

(Hong Kong Polytechnic University, China; Monash University, Australia; ShanghaiTech University, China; Automotive Software Innovation Center, China; Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Singapore Management University, Singapore; Beihang University, China)

Article Search

Scalable, Sound, and Accurate Jump Table Analysis
Huan Nguyen, Soumyakant Priyadarshan, and R. Sekar
(Stony Brook University, USA)

Article Search

Uncovering and Mitigating the Impact of Code Obfuscation on Dataset Annotation with Antivirus Engines
Gao Cuiying

, Yueming Wu

, Heng Li

, Wei Yuan

, Haoyu Jiang

, Qidan He

, and Yang Liu

(Huazhong University of Science and Technology, China; JD.com, China; Nanyang Technological University, Singapore)

Article Search

LENT-SSE: Leveraging Executed and Near Transactions for Speculative Symbolic Execution of Smart Contracts
Peilin Zheng

, Bowei Su, Xiapu Luo

, Ting Chen

, Neng Zhang

, and Zibin Zheng

(Sun Yat-sen University, China; Hong Kong Polytechnic University, China; University of Electronic Science and Technology of China, China)

Article Search

DistillSeq: A Framework for Safety Alignment Testing in Large Language Models using Knowledge Distillation
Mingke Yang

, Yuqi Chen

, Yi Liu

, and Ling Shi

(ShanghaiTech University, China; Nanyang Technological University, Singapore)

Article Search

Info

PatchFinder: A Two-Phase Approach to Security Patch Tracing for Disclosed Vulnerabilities in Open-Source Software
Kaixuan Li

, Jian Zhang

, Sen Chen

, Han Liu

, Yang Liu

, and Yixiang Chen

(East China Normal University, China; Nanyang Technological University, Singapore; Tianjin University, China)

Article Search

Finding Cuts in Static Analysis Graphs to Debloat Software
Christoph Blumschein

, Fabio Niephaus

, Codrut Stancu

, Christian Wimmer

, Jens Lincke

, and Robert Hirschfeld

(Hasso Plattner Institute, Germany; University of Potsdam, Germany; Oracle Labs, Switzerland; Oracle Labs, USA)

Article Search

Revisiting Test-Case Prioritization on Long-Running Test Suites
Runxiang Cheng

, Shuai Wang

, Reyhaneh Jabbarvand

, and Darko Marinov

(University of Illinois at Urbana-Champaign, USA)

Article Search

Oracle-Guided Program Selection from Large Language Models
Zhiyu Fan

, Haifeng Ruan

, Sergey Mechtaev

, and Abhik Roychoudhury

(National University of Singapore, Singapore; Peking University, China)
While large language models (LLMs) have shown significant advancements in code generation, their susceptibility to producing incorrect code poses a significant challenge to the adoption of LLM-generated programs. This issue largely stems from the reliance on natural language descriptions as informal oracles in code generation. Current strategies to mitigate this involve selecting the best program from multiple LLM-generated alternatives, judged by criteria like the consistency of their execution results on an LLM-generated test suite. However, this approach has crucial limitations: (1) LLMs often generate redundant tests or tests that cannot distinguish between correct and incorrect solutions, (2) the used consistency criteria, such as the majority vote, fail to foster developer trust due to the absence of transparent rationale behind the made choices. In this work, we propose a new perspective on increasing the quality of LLM-generated code via program selection using the LLM as a test oracle. Our method is based on our experimentally confirmed observation that LLMs serve more effectively as oracles when tasked with selecting the correct output from multiple choices. Leveraging this insight, we first generate distinguishing inputs that capture semantic discrepancies of programs sampled from an LLM, and record outputs produced by the programs on these inputs. An LLM then selects the most likely to be correct output from these, guided by the natural language problem description. We implemented this idea in a tool LLMCodeChoice and evaluated its accuracy in generating and selecting standalone programs. Our experiments demonstrated its effectiveness in improving pass@1 by 3.6-7% on HumanEval and MBPP benchmarks compared to the state-of-art CodeT. Most interestingly, the selected input-output specifications helped us to uncover incompleteness and ambiguities in task descriptions and also identify incorrect ground-truth implementations in the benchmarks.

Article Search

Beyond Pairwise Testing: Advancing 3-wise Combinatorial Interaction Testing for Highly Configurable Systems
Chuan Luo

, Shuangyu Lyu, Qiyuan Zhao

, Wei Wu

, Hongyu Zhang

, and Chunming Hu

(Beihang University, China; National University of Singapore, Singapore; Central South University, Australia; Xiangjiang Laboratory, Australia; Chongqing University, China)

Article Search

Equivalent Mutants in the Wild: Identifying and Efficiently Suppressing Equivalent Mutants for Java Programs
Benjamin Kushigian, Samuel Kaufman, Ryan Featherman, Hannah Potter

, Ardi Madadi

, and René Just

(University of Washington, USA)

Article Search

Can Graph Database Systems Correctly Handle Writing Operations? A Metamorphic Testing Approach with Graph-State Persistence Oracle
Shuang Liu

, Junhao Lan, Xiaoning Du

, Jiyuan Li, Wei Lu, Jiajun Jiang

, and Xiaoyong Du

(Renmin University of China, China; Tianjin University, China; Monash University, Australia)

Article Search

Test Selection for Deep Neural Networks using Meta-models with Uncertainty Metrics
Demet Demir, Aysu Betin Can, and Elif Surer
(Middle East Technical University, Ankara, Türkiye)

Article Search

An Empirical Study of Static Analysis Tools for Secure Code Review
Wachiraphan Charoenwet

, Patanamon Thongtanunam

, Van-Thuan Pham

, and Christoph Treude

(University of Melbourne, Australia; Singapore Management University, Singapore)

Article Search

Artifacts Available

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction
Thanh-Dat Nguyen

, Tung Do-Viet

, Hung Nguyen-Duy

, Tuan-Hai Luu

, Hung Le

, Bach Le

, and Patanamon Thongtanunam

(University of Melbourne, Australia; Cinnamon AI, n.n.; Independent Researcher, n.n.; Deakin University, Australia)
Businesses often need to query visually rich documents (VRDs), e.g., purchase receipts, medical records, and insurance forms, among many other forms from multiple vendors, to make informed decisions. As such, several techniques have been proposed to automatically extract independent entities of interest from VRDs such as extracting price tags from purchase receipts, etc. However, for extracting semantically linked entities, such as finding corresponding price tags for each item, these techniques either have limited capability in handling new layouts, e.g., template-based approaches, or require extensive amounts of pre-training data and do not perform well, e.g., deep-learning approaches. In this work, we introduce a program synthesis method, namely VRDSynth, to automatically generate programs to extract entity relations from multilingual VRDs. Two key novelties, which empower VRDSynth to tackle flexible layouts while requiring no pre-training data for extracting entity relations, include: (1) a new domain-specific language (DSL) to effectively capture the spatial and textual relations between document entities, and (2) a novel synthesis algorithm that makes use of frequent spatial relations between entities to construct initial programs, equivalent reduction to prune the search space, and a combination of positive, negative, and mutually exclusive programs to improve the coverage of programs. We evaluate our method on two popular VRD understanding benchmarks, namely FUNSD and XFUND, on the semantic entity linking task, consisting of 1,600 forms in 8 different languages. Experiments show that VRDSynth, despite having no prior pre-training data, outperforms the state-of-the-art pre-trained deep-learning approach, namely LayoutXLM, in 5 out of 8 languages. Noticeably, VRDSynth achieved an improvement of 42% over LayoutXLM in terms of F1 score on FUNSD while being complementary to LayoutXLM in 7/8 languages. Regarding efficiency, VRDSynth significantly improves the memory footprint required for storage and inference over LayoutXLM (1M and 380MB versus that of 1.48GB and 3GB required by LayoutXLM), while maintaining similar time efficiency despite the speed differences between the languages used for implementation (Python vs C++).

Preprint

Info

Artifacts Available

Towards Automatic Oracle Prediction for AR Testing: Assessing Virtual Object Placement Quality under Real-World Scenes
Xiaoyi Yang, Yuxing Wang, Tahmid Rafi, Dongfang Liu, Xiaoyin Wang

, and Xueling Zhang

(Rochester Institute of Technology, USA; University of Texas at San Antonio, USA)

Article Search

Sleuth: A Switchable Dual-Mode Fuzzer to Investigate Bug Impacts Following a Single PoC
Haolai Wei

, Liwei Chen

, Zhijie Zhang

, Gang Shi

, and Dan Meng

(Institute of Information Engineering at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)

Article Search

Artifacts Available

SQLess: Dialect-Agnostic SQL Query Simplification
Li Lin

, Zongyin Hao

, Chengpeng Wang

, Zhuangda Wang

, Rongxin Wu

, and Gang Fan

(Xiamen University, China; Hong Kong University of Science and Technology, China; Ant Group, China)
Database Management Systems (DBMSs) are fundamental to numerous enterprise applications. Due to the significance of DBMSs, various testing techniques have been proposed to detect DBMS bugs. However, to trigger deep bugs, most of the existing techniques focus on generating lengthy and complex queries which burdens developers with the difficult of debugging. Therefore, SQL query simplification, which aims to reduce lengthy SQL queries without compromising their ability to detect bugs, is highly demanded. To bridge this gap, we introduce SQLess, an innovative approach that employs a dialect-agnostic method for efficient and semantically correct SQL query simplification tailored for various DBMSs. Unlike previous works that have to depend on DBMS-specific grammar, SQLess utilizes an adaptive parser, which leverages error recovery and grammar expansion to support DBMS dialects. Moreover, SQLess performs a semantics-sensitive SQL query trimming, which leverages alias and dependency analysis to simplify SQL queries with preserving bug-triggering capability. We evaluate SQLess using two datasets from the state-of-theart database bug detection studies, encompassing six widely-used DBMSs and over 32,000 complex SQL queries. The results demonstrate SQLess’s superior performance: it achieves an average simplification rate of 72.45%, which significantly outperforms the stateof-the-art approaches by 84.91%.

Article Search

DBStorm: Generating Various Effective Workloads for Testing Isolation Levels
Keqiang Li

, Siyang Weng

, Lyu Ni

, Chengcheng Yang

, Rong Zhang

, Xuan Zhou

, and Aoying Zhou

(East China Normal University, China)

Article Search

Preserving Reactiveness: Understanding and Improving the Debugging Practice of Blocking-Call Bugs
Arooba Shahoor, Jooyong Yi

, and Dongsun Kim

(Kyungpook National University, South Korea; Ulsan National Institute of Science and Technology, South Korea)

Article Search

Feedback-Directed Partial Execution
Ishrak Hayet, Adam Scott, and Marcelo d'Amorim

(North Carolina State University, USA)

Article Search

Midas: Mining Profitable Exploits in On-Chain Smart Contracts via Feedback-Driven Fuzzing and Differential Analysis
Mingxi Ye

, Xingwei Lin

, Yuhong Nan

, Jiajing Wu

, and Zibin Zheng

(Sun Yat-sen University, China; Zhejiang University, China)

Article Search

Certified Continual Learning for Neural Network Regression
Hong Long Pham

and Jun Sun

(Singapore Management University, Singapore)

Article Search

Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT
Chunqiu Steven Xia

and Lingming Zhang

(University of Illinois at Urbana-Champaign, USA)

Article Search

DDGF: Dynamic Directed Greybox Fuzzing with Path Profiling
Haoran Fang, Kaikai Zhang, Donghui Yu

, and Yuanyuan Zhang
(Shanghai Jiao Tong University, China)

Article Search

VioHawk: Detecting Traffic Violations of Autonomous Driving Systems through Criticality-Guided Simulation Testing
Zhongrui Li

, Jiarun Dai

, Zongan Huang

, Nianhao You

, Yuan Zhang

, and Min Yang

(Fudan University, China)

Article Search

BRAFAR: Bidirectional Refactoring, Alignment, Fault Localization, and Repair for Programming Assignments
Linna Xie

, Chongmin Li

, Yu Pei

, Tian Zhang

, and Minxue Pan

(Nanjing University, China; Hong Kong Polytechnic University, China)

Article Search

Synthesis-Based Enhancement for GUI Test Case Migration
Yakun Zhang, Qihao zhu, Jiwei Yan, Chen Liu, Wenjie Zhang

, Yifan Zhao, Dan Hao, and Lu Zhang

(Peking University, China; Institute of Software at Chinese Academy of Sciences, China)

Article Search

CREF: An LLM-Based Conversational Software Repair Framework for Programming Tutors
Boyang Yang

, Haoye Tian

, Weiguo Pian

, Haoran Yu, Haitao Wang, Jacques Klein

, Tegawendé F. Bissyandé

, and Shunfu Jin

(Yanshan University, China; University of Melbourne, Australia; University of Luxembourg, Luxembourg; Jisuanke, n.n.)

Article Search

Datactive: Data Fault Localization for Object Detection Systems
Yining Yin

, Yang Feng

, Shihao Weng

, Yuan Yao

, Jia Liu

, and Zhihong Zhao

(Nanjing University, China)

Article Search

Interpretability Based Neural Network Repair
Zuohui Chen

, Jun Zhou

, Youcheng Sun

, Jingyi Wang

, Qi Xuan

, and Xiaoniu Yang

(Zhejiang University of Technology, China; University of Manchester, United Kingdom; Zhejiang University, China)

Article Search

Exploration-Driven Reinforcement Learning for Avionic System Fault Detection (Experience Paper)
Paul-Antoine Le Tolguenec, Emmanuel Rachelson, Yann Besse, Florent Teichteil-Koenigsbuch, Nicolas Schneider, Hélène Waeselynck, and Dennis Wilson
(ISAE SUPAERO, France; Airbus, France; LAAS-CNRS, France)

Article Search

Semantic Constraint Inference for Web Form Test Generation
Parsa Alian, Noor Nashid, Mobina Shahbandeh, and Ali Mesbah
(University of British Columbia, Canada)

Article Search

Call Graph Soundness in Android Static Analysis
Jordan Samhi

, René Just

, Tegawendé F. Bissyandé

, Michael D. Ernst

, and Jacques Klein

(CISPA Helmholtz Center for Information Security, Germany; University of Washington, USA; University of Luxembourg, Luxembourg)

Article Search

Guardian: A Runtime Framework for LLM-Based UI Exploration
Dezhi Ran

, Hao Wang, Zihe Song, Mengzhou Wu, Yuan Cao, Ying Zhang, Wei Yang

, and Tao Xie

(Peking University, China; University of Texas at Dallas, USA)

Article Search

NativeSummary: Summarizing Native Binary Code for Inter-language Static Analysis of Android Apps
Jikai Wang

and Haoyu Wang

(Huazhong University of Science and Technology, China)

Article Search

Efficient DNN-Powered Software with Fair Sparse Models
Xuanqi Gao

, Weipeng Jiang

, Juan Zhai

, Shiqing Ma

, Xiaoyu Zhang

, and Chao Shen

(Xi’an Jiaotong University, China; University of Massachusetts at Amherst, USA)

Article Search

Learning to Check LTL Satisfiability and to Generate Traces via Differentiable Trace Checking
Weilin Luo, Pingjia Liang, Junming Qiu, Polong Chen

, Hai Wan, Jianfeng Du, and Weiyuan Fang
(Sun Yat-sen University, China; Guangdong University of Foreign Studies, China)

Article Search

DeLink: Source File Information Recovery in Binaries
Zhe Lang

, Zhengzi Xu

, Xiaohui Chen

, Shichao Lv

, Zhanwei Song

, Zhiqiang Shi

, and Limin Sun

(Institute of Information Engineering at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Nanyang Technological University, Singapore; Imperial Global Singapore, Singapore; China Mobile Research Institute, China)

Article Search

Your “Notice” Is Missing: Detecting and Fixing Violations of Modification Terms in Open Source Licenses during Forking
Kaifeng Huang

, Yingfeng Xia, Bihuan Chen

, Siyang He, Huazheng Zeng, Zhuotong Zhou, Jin Guo, and Xin Peng

(Tongji University, China; Fudan University, China)

Article Search

Wapplique: Testing WebAssembly Runtime via Execution Context-Aware Bytecode Mutation
Wenxuan Zhao

, Ruiying Zeng

, and Yangfan Zhou

(Fudan University, China)

Article Search

Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
Dingbang Wang, Yu Zhao, Sidong Feng

, Zhaoxu Zhang

, William G. J. Halfond

, Chunyang Chen

, Xiaoxia Sun, Jiangfan Shi, and Tingting Yu

(University of Connecticut, USA; University of Cincinnati, USA; Monash University, Australia; University of Southern California, USA; China Mobile (Suzhou) Software Technology, China; Zhejiang University, China)

Article Search

UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing
Yifeng He, Jiabo Huang, Yuyang Rong, Yiwen Guo, Ethan Wang, and Hao Chen

(University of California at Davis, USA; Tencent, China; Unaffiliated, n.n.)

Article Search

When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention
Lianghong Guo, Yanlin Wang

, Ensheng Shi

, Wanjun Zhong, Hongyu Zhang

, Jiachi Chen

, Ruikai Zhang, Yuchi Ma

, and Zibin Zheng

(Sun Yat-sen University, China; Xi’an Jiaotong University, China; Chongqing University, China; Huawei Cloud Computing Technologies, n.n.)

Article Search

Dance of the ADS: Orchestrating Failures through Historically-Informed Scenario Fuzzing
Tong Wang

, Taotao Gu

, Huan Deng

, Hu Li

, Xiaohui Kuang

, and Gang Zhao

(Academy of Military Sciences, China)

Preprint

Artifacts Available

AsFuzzer: Differential Testing of Assemblers with Error-Driven Grammar Inference
Hyungseok Kim, Soomin Kim

, Jungwoo Lee, and Sang Kil Cha

(Affiliated Institute of ETRI, South Korea; KAIST, South Korea)

Article Search

Better Not Together: Staged Solving for Context-Free Language Reachability
Chenghang Shi

, Haofeng Li

, Jie Lu

, and Lian Li

(Institute of Computing Technology at Chinese Academy of Sciences, China)

Article Search

Artifacts Available

AI Coders Are among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation
Zhensu Sun

, Xiaoning Du

, Zhou Yang

, Li Li

, and David Lo

(Singapore Management University, Singapore; Monash University, Australia; Beihang University, China)

Article Search

FRIES: Fuzzing Rust Library Interactions via Efficient Ecosystem-Guided Target Generation
Xizhe Yin, Yang Feng

, Qingkai Shi

, Zixi Liu

, Hongwang Liu, and Baowen Xu

(Nanjing University, China)

Article Search

Segment-Based Test Case Prioritization: A Multi-objective Approach
Hieu Huynh, Nhu Pham, Vu Nguyen, and Tien N. Nguyen

(Katalon, Vietnam; Ho Chi Minh City University of Science, Vietnam; University of Texas at Dallas, USA)

Article Search

Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches
Paulo Canelas

, Bradley Schmerl, Alcides Fonseca

, and Christopher S. Timperley

(Carnegie Mellon University, USA; LASIGE, Portugal; University of Lisbon, Portugal)

Article Search

Tacoma: Enhanced Browser Fuzzing with Fine-Grained Semantic Alignment
Jiashui Wang

, Peng Qian

, Xilin Huang

, Xinlei Ying

, Yan Chen

, Shouling Ji

, Jianhai Chen

, Jundong Xie

, and Long Liu

(Zhejiang University, China; Ant Group, China)

Article Search

Synthesis of Sound and Precise Storage Cost Bounds via Unsound Resource Analysis and Max-SMT
Elvira Albert, Jesús Correas, Pablo Gordillo

, Guillermo Román-Díez, and Albert Rubio

(Complutense University of Madrid, Spain; Universidad Politécnica de Madrid, Spain)

Article Search

Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow
Jiachi Chen

, Chong Chen

, Jiang Hu

, John Grundy

, Yanlin Wang

, Ting Chen

, and Zibin Zheng

(Sun Yat-sen University, China; Monash University, Australia; University of Electronic Science and Technology of China, China)

Article Search

Domain Adaptation for Code Model-Based Unit Test Case Generation
Jiho Shin

, Sepehr Hashtroudi

, Hadi Hemmati

, and Song Wang

(York University, Canada; Unaffiliated, Canada)

Article Search

How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation
Cen Zhang

, Yaowen Zheng

, Mingqiang Bai

, Yeting Li

, Wei Ma

, Xiaofei Xie

, Yuekang Li

, Limin Sun

, and Yang Liu

(Nanyang Technological University, Singapore; Institute of Information Engineering at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China; Singapore Management University, Singapore; University of New South Wales, Australia)
Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.

Preprint

Info

Commit Artifact Preserving Build Prediction
Guoqing Wang, Zeyu Sun

, Yizhou Chen, Yifan Zhao, Qingyuan Liang

, and Dan Hao
(Peking University, China; Institute of Software at Chinese Academy of Sciences, China)

Article Search

Toward the Automated Localization of Buggy Mobile App UIs from Bug Descriptions
Antu Saha

, Yang Song, Junayed Mahmud

, Ying Zhou

, Kevin Moran

, and Oscar Chaparro

(William & Mary, USA; University of Central Florida, USA; George Mason University, USA)

Article Search

WASMaker: Differential Testing of WebAssembly Runtimes via Semantic-Aware Binary Generation
Shangtong Cao, Ningyu He

, Xinyu She, Yixuan Zhang, Mu Zhang

, and Haoyu Wang

(Beijing University of Posts and Telecommunications, China; Peking University, China; Huazhong University of Science and Technology, China; University of Utah, USA)

Article Search

ThinkRepair: Self-Directed Automated Program Repair
Xin Yin

, Chao Ni

, Shaohua Wang

, Zhenhao Li

, Limin Zeng, and Xiaohu Yang

(Zhejiang University, China; Central University of Finance and Economics, China; Concordia University, Canada)

Article Search

Fuzzing MLIR Compiler Infrastructure via Operation Dependency Analysis
Chenyao Suo, Junjie Chen

, Shuang Liu

, Jiajun Jiang

, Yingquan Zhao

, and Jianrong Wang
(Tianjin University, China)

Article Search

Evaluating Deep Neural Networks in Deployment: A Comparative Study (Replicability Study)
Eduard Pinconschi

, Divya Gopinath

, Rui Abreu

, and Corina S. Păsăreanu

(Carnegie Mellon University, USA; KBR, USA; NASA Ames, USA; INESC-ID, Portugal; University of Porto, Portugal)

Article Search

Towards Understanding the Bugs in Solidity Compiler
Haoyang Ma

, Wuqi Zhang

, Qingchao Shen

, Yongqiang Tian

, Junjie Chen

, and Shing-Chi Cheung

(Hong Kong University of Science and Technology, China; Tianjin University, China)

Article Search

Foliage: Nourishing Evolving Software by Characterizing and Clustering Field Bugs
Zhanyao Lei, Yixiong Chen

, Mingyuan Xia, and Zhengwei Qi
(Shanghai Jiao Tong University, China; AppetizerIO, n.n.)

Article Search

Towards More Complete Constraints for Deep Learning Library Testing via Complementary Set Guided Refinement
Gwihwan Go

, Chijin Zhou

, Quan Zhang

, Xiazijian Zou

, Heyuan Shi

, and Yu Jiang

(Tsinghua University, China; Central South University, China)

Article Search

Prospector: Boosting Directed Greybox Fuzzing for Large-Scale Target Sets with Iterative Prioritization
Zhijie Zhang

, Liwei Chen

, Haolai Wei

, Gang Shi

, and Dan Meng

(Institute of Information Engineering at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)

Article Search

Bugs in Pods: Understanding Bugs in Container Runtime Systems
Jiongchi Yu

, Xiaofei Xie

, Cen Zhang

, Sen Chen

, Yuekang Li

, and Wenbo Shen

(Singapore Management University, Singapore; Nanyang Technological University, Singapore; Tianjin University, China; UNSW, Australia; Zhejiang University, China)

Article Search

Automated Data Binding Vulnerability Detection for Java Web Frameworks via Nested Property Graph
Xiaoyong Yan

, Biao He

, Wenbo Shen

, Yu Ouyang

, Kaihang Zhou

, Xingjian Zhang

, Xingyu Wang

, Yukai Cao

, and Rui Chang

(Zhejiang University, China; Ant Group, China)

Article Search

SelfPiCo: Self-Guided Partial Code Execution with LLMs
Zhipeng Xue

, Zhipeng Gao

, Shaohua Wang

, Xing Hu

, Xin Xia

, and Shanping Li

(Zhejiang University, China; Central University of Finance and Economics, China)

Article Search

Neurosymbolic Repair of Test Flakiness
Yang Chen

and Reyhaneh Jabbarvand

(University of Illinois at Urbana-Champaign, USA)

Article Search

Inconsistencies in TeX-Produced Documents
Jovyn Tan

and Manuel Rigger

(National University of Singapore, Singapore)

Preprint

Artifacts Available

CoSec: On-the-Fly Security Hardening of Code LLMs via Supervised Co-decoding
Dong Li

, Meng Yan

, Yaosheng Zhang

, Zhongxin Liu

, Chao Liu

, Xiaohong Zhang

, Ting Chen

, and David Lo

(Chongqing University, China; Zhejiang University, China; University of Electronic Science and Technology of China, China; Singapore Management University, Singapore)

Article Search

Following the “Thread”: Toward Finding Manipulatable Bottlenecks In Blockchain Clients
Shuohan Wu

, Zihao Li

, Hao Zhou

, Xiapu Luo

, Jianfeng Li

, and Haoyu Wang

(Hong Kong Polytechnic University, China; Xi’an Jiaotong University, China; Huazhong University of Science and Technology, China)

Article Search

CooTest: An Automated Testing Approach for V2X Communication Systems
An Guo

, Xinyu Gao

, Zhenyu Chen

, Yuan Xiao

, Jiakai Liu

, Xiuting Ge

, Weisong Sun

, and Chunrong Fang

(Nanjing University, China)

Article Search

Interoperability in Deep Learning: A User Survey and Failure Analysis of ONNX Model Converters
Purvish Jajal, Wenxin Jiang

, Arav Tewari, Erik Kocinare

, Joseph Woo, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, and James C. Davis

(Purdue University, USA; Loyola University Chicago, USA)

Article Search

TeDA: A Testing Framework for Data Usage Auditing in Deep Learning Model Development
Xiangshan Gao

, Jialuo Chen

, Jingyi Wang

, Jie Shi

, Peng Cheng

, and Jiming Chen

(Zhejiang University, China; Huawei Technology, China; Huawei International, Singapore; Hangzhou Dianzi University, China)

Article Search

Enhancing Multi-agent System Testing with Diversity-Guided Exploration and Adaptive Critical State Exploitation
Xuyan Ma

, Yawen Wang

, Junjie Wang

, Xiaofei Xie

, Boyu Wu

, Shoubin Li

, Fanjiang Xu

, and Qing Wang

(University of Chinese Academy of Sciences, China; Institute of Software at Chinese Academy of Sciences, China; Singapore Management University, Singapore)

Article Search

Reproducing Timing-Dependent GUI Flaky Tests in Android Apps via A Single Event Delay
Xiaobao Cai, Zhen Dong, Yongjiang Wang, Abhishek Tiwari, and Xin Peng

(Fudan University, China; University of Passau, Germany)

Article Search

Arfa: An Agile Regime-Based Floating-Point Optimization Approach for Rounding Errors
Jinchen Xu

, Mengqi Cui

, Fei Li

, Zuoyan Zhang

, Hongru Yang

, Bei Zhou

, and Jie Zhao

(Information Engineering University, China; Hunan University, China)

Article Search

One-to-One or One-to-Many? Suggesting Extract Class Refactoring Opportunities with Intra-class Dependency Hypergraph Neural Network
Di Cui, Qiangqiang Wang

, Yutong Zhao

, Jiaqi Wang, Minjie Wei, Jingzhao Hu, Luqiao Wang, and Qingshan Li
(Xidian University, China; University of Central Missouri, USA)

Article Search

NeuFair: Neural Network Fairness Repair with Dropout
Vishnu Asutosh Dasu

, Ashish Kumar

, Saeid Tizpaz-Niari

, and Gang Tan

(Pennsylvania State University, USA; University of Texas at El Paso, USA)

Article Search

One Size Does Not Fit All: Multi-granularity Patch Generation for Better Automated Program Repair
Bo Lin

, Shangwen Wang

, Ming Wen

, Liqian Chen

, and Xiaoguang Mao

(National University of Defense Technology, China; Huazhong University of Science and Technology, China)
Automated program repair aims to automate bug correction and alleviate the burden of manual debugging, which plays a crucial role in software development and maintenance. Recent studies reveal that learning-based approaches have outperformed conventional APR techniques (e.g., search-based APR). Existing learning-based APR techniques mainly center on treating program repair either as a translation task or a cloze task. The former primarily emphasizes statement-level repair, while the latter concentrates on token-level repair, as per our observations. In practice, however, patches may manifest at various repair granularity, including statement, expression, or token levels. Consequently, merely generating patches from a single granularity would be ineffective to tackle real-world defects. Motivated by this observation, we propose Mulpor, a multi-granularity patch generation approach designed to address the diverse nature of real-world bugs. Mulpor comprises three components: statement-level, expression-level, and token-level generator, each is pre-trained to generate correct patches at its respective granularity. The approach involves generating candidate patches from various granularities, followed by a re-ranking process based on a heuristic to prioritize patches. Experimental results on the Defects4J dataset demonstrate that Mulpor correctly repair 92 bugs on Defects4J-v1.2, which achieves 27.0% (20 bugs) and 12.2% (10 bugs) improvement over the previous state-of-the-art NMT-style Rap-Gen and Cloze-style GAMMA. We also studied the generalizability of Mulpor in repairing vulnerabilities, revealing a notable 51% increase in the number of correctly-fixed patches compared with state-of-the-art vulnerability repair approaches. This paper underscores the importance of considering multiple granularities in program repair techniques for a comprehensive strategy to address the diverse nature of real-world software defects. Mulpor, as proposed herein, exhibits promising results in achieving effective and diverse bug fixes across various program repair scenarios.

Article Search

Policy Testing with MDPFuzz (Replicability Study)
Quentin Mazouni

, Helge Spieker

, Arnaud Gotlieb

, and Mathieu Acher

(Simula Research Laboratory, Norway; University of Rennes - Inria - CNRS - IRISA, France)

Article Search

Artifacts Available

Large Language Models Can Connect the Dots: Exploring Model Optimization Bugs with Domain Knowledge-Aware Prompts
Hao Guan

, Guangdong Bai

, and Yepang Liu

(University of Queensland, Australia; Southern University of Science and Technology, China)

Article Search

AutoCodeRover: Autonomous Program Improvement
Yuntong Zhang

, Haifeng Ruan

, Zhiyu Fan

, and Abhik Roychoudhury

(National University of Singapore, Singapore)
Researchers have made significant progress in automating the software development process in the past decades. Automated techniques for issue summarization, bug reproduction, fault localization, and program repair have been built to ease the workload of developers. Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding. Nevertheless, software engineering involves the process of program improvement apart from coding, specifically to enable software maintenance (e.g. program repair to fix bugs) and software evolution (e.g. feature additions). In this paper, we propose an automated approach for solving Github issues to autonomously achieve program improvement. In our approach called AutoCodeRover, LLMs are combined with sophisticated code search capabilities, ultimately leading to a program modification or patch. In contrast to recent LLM agent approaches from AI researchers and practitioners, our outlook is more software engineering oriented. We work on a program representation (abstract syntax tree) as opposed to viewing a software project as a mere collection of files. Our code search exploits the program structure in the form of classes/methods to enhance LLM’s understanding of the issue’s root cause, and effectively retrieve a context via iterative search. The use of spectrum-based fault localization using tests, further sharpens the context, as long as a test-suite is available. Experiments on the recently proposed SWE-bench-lite (300 real-life Github issues) show increased efficacy in solving Github issues (19% on SWE-bench-lite), which is higher than the efficacy of the recently reported Swe-agent. Interestingly, our approach resolved 57 GitHub issues in about 4 minutes each (pass@1), whereas developers spent more than 2.68 days on average. In addition, AutoCodeRover achieved this efficacy with significantly lower cost (on average, $0.43 USD), compared to other baselines. We posit that our workflow enables autonomous software engineering, where, in future, auto-generated code from LLMs can be autonomously improved.

Article Search

See the Forest, not Trees: Unveiling and Escaping the Pitfalls of Error-Triggering Inputs in Neural Network Testing
Yuanyuan Yuan

, Shuai Wang

, and Zhendong Su

(Hong Kong University of Science and Technology, China; ETH Zurich, Switzerland)

Article Search

Practitioners’ Expectations on Automated Test Generation
Xiao Yu

, Lei Liu

, Xing Hu

, Jacky Keung

, Xin Xia

, and David Lo

(Huawei, China; Xi’an Jiaotong University, China; Zhejiang University, China; City University of Hong Kong, China; Singapore Management University, Singapore)

Article Search

An Empirical Examination of Fuzzer Mutator Performance
James Kukucka, Luís Pina, Paul Ammann, and Jonathan Bell

(George Mason University, USA; University of Illinois at Chicago, USA; Northeastern University, USA)

Article Search

LLM4Fin: Fully Automating LLM-Powered Test Case Generation for FinTech Software Acceptance Testing
Zhiyi Xue

, Liangguo Li

, Senyue Tian

, Xiaohong Chen

, Pingping Li

, Liangyu Chen

, Tingting Jiang

, and Min Zhang

(East China Normal University, China; Guotai Junan Securities, China)

Article Search

Fuzzing JavaScript Interpreters with Coverage-Guided Reinforcement Learning for LLM-Based Mutation
Jueon Eom

, Seyeon Jeong

, and Taekyoung Kwon

(Yonsei University, South Korea; Suresofttech, South Korea)

Article Search

Decomposition of Deep Neural Networks into Modules via Mutation Analysis
Ali Ghanbari

(Auburn University, USA)

Article Search

Empirical Study of Move Smart Contract Security: Introducing MoveScan for Enhanced Analysis
Shuwei Song

, Jiachi Chen

, Ting Chen

, Xiapu Luo

, Teng Li

, Wenwu Yang

, Leqing Wang

, Weijie Zhang

, Feng Luo

, Zheyuan He

, Yi Lu

, and Pan Li

(University of Electronic Science and Technology of China, China; Sun Yat-sen University, China; Hong Kong Polytechnic University, China; Jiangsu University of Science and Technology, China; BitsLab, Singapore; MoveBit, China)

Article Search

Testing Gremlin-Based Graph Database Systems via Query Disassembling
Yingying Zheng

, Wensheng Dou

, Lei Tang

, Ziyu Cui

, Yu Gao

, Jiansen Song

, Liang Xu

, Jiaxin Zhu

, Wei Wang

, Jun Wei

, Hua Zhong

, and Tao Huang

(Institute of Software at Chinese Academy of Sciences, China; Jinling Institute of Technology, China)

Article Search

Synthesizing Boxes Preconditions for Deep Neural Networks
Zengyu Liu

, Liqian Chen

, Wanwei Liu

, and Ji Wang

(National University of Defense Technology, China)

Article Search

Logos: Log Guided Fuzzing for Protocol Implementations
Feifan Wu

, Zhengxiong Luo

, Yanyang Zhao

, Qingpeng Du

, Junze Yu

, Ruikang Peng

, Heyuan Shi

, and Yu Jiang

(Tsinghua University, China; Beijing University of Posts and Telecommunications, China; Central South University, China)

Article Search

Large Language Models for Equivalent Mutant Detection: How Far Are We?
Zhao Tian, Honglin Shu, Dong Wang, Xuejie Cao, Yasutaka Kamei, and Junjie Chen

(Tianjin University, China; Kyushu University, Japan; Tianjin University, Japan)

Article Search

An Empirical Study on Kubernetes Operator Bugs
Qingxin Xu

, Yu Gao

, and Jun Wei

(Institute of Software at Chinese Academy of Sciences, China)

Article Search

Maltracker: A Fine-Grained NPM Malware Tracker Copiloted by LLM-Enhanced Dataset
Zeliang Yu

, Ming Wen

, Xiaochen Guo

, and Hai Jin

(Huazhong University of Science and Technology, China)

Article Search

Characterizing and Detecting Program Representation Faults of Static Analysis Frameworks via Two-Dimensional Testing
Huaien Zhang

, Yu Pei

, Shuyun Liang

, Zezhong Xing, and Shin Hwei Tan

(Hong Kong Polytechnic University, China; Southern University of Science and Technology, China; Concordia University, Canada)

Article Search

Calico: Automated Knowledge Calibration and Diagnosis for Elevating AI Mastery in Code Tasks
Yuxin Qiu, Jie Hu, Qian Zhang

, and Heng Yin

(University of California at Riverside, USA)

Article Search

An In-Depth Study of Runtime Verification Overheads during Software Testing
Kevin Guan and Owolabi Legunsen

(Cornell University, USA)

Article Search

Tool Demonstrations

The Flexcrash Platform for Testing Autonomous Vehicles in Mixed-Traffic Scenarios
Alessio Gambi

, Shreya Mathews, Benedikt Steininger, Mykhailo Poienko, and David Bobek
(Austrian Institute of Technology, Austria; IMC University of Applied Sciences Krems, Austria)

Article Search

SeeWasm: An Efficient and Fully-Functional Symbolic Execution Engine for WebAssembly Binaries
Ningyu He

, Zhehao Zhao

, Hanqin Guan

, Jikai Wang

, Shuo Peng, Ding Li

, Haoyu Wang

, Xiangqun Chen

, and Yao Guo

(Peking University, China; Huazhong University of Science and Technology, China)

Article Search

Testing Concurrent Algorithms on JVM with Lincheck and IntelliJ IDEA
Alexander Potapov, Maksim Zuev, Evgenii Moiseenko

, and Nikita Koval
(JetBrains, n.n.; JetBrains Research, n.n.)

Article Search

DMMPP: Constructing Dummy Main Methods for Android Apps with Path-Sensitive Predicates
Baoquan Cui, Jiwei Yan, and Jian Zhang

(Institute of Software at Chinese Academy of Sciences, China; University of Chinese Academy of Sciences, China)

Article Search

JCWIT: A Correctness-Witness Validator for Java Programs Based on Bounded Model Checking
Zaiyu Cheng

, Tong Wu, Peter Schrammel, Norbert Tihanyi

, Eddie B. de Lima Filho, and Lucas C. Cordeiro

(University of Manchester, United Kingdom; University of Sussex, United Kingdom; Eotvos Lorand University, Hungary; TPV Technology, n.n.)

Article Search

ESBMC-Python: A Bounded Model Checker for Python Programs
Bruno Farias, Rafael Menezes

, Eddie B. de Lima Filho, Youcheng Sun

, and Lucas C. Cordeiro

(University of Manchester, United Kingdom; TPV Technology, n.n.)

Article Search

PolyTracker: Whole-Input Dynamic Information Flow Tracing
Evan Sultanik, Marek Surovič, Henrik Brodin, Kelly Kaoudis, Facundo Tuesca, Carson Harmon, Lisa Overall, Joseph Sweeney, and Bradford Larsen
(Trail of Bits, USA)

Article Search

FRAFOL: FRAmework FOr Learning mutation testing
Pedro Tavares, Ana Paiva

, Domenico Amalfitano

, and René Just

(University of Porto, Portugal; INESC TEC, Portugal; Federico II University of Naples, Italy; University of Washington, USA)

Article Search

HECS: A Hypergraph Learning-Based System for Detecting Extract Class Refactoring Opportunities
Luqiao Wang, Qiangqiang Wang, Jiaqi Wang, Yutong Zhao

, Minjie Wei, Zhou Quan, Di Cui, and Qingshan Li
(Xidian University, China; University of Central Missouri, USA)

Article Search

FixCheck: A Tool for Improving Patch Correctness Analysis
Facundo Molina

, Juan Manuel Copia, and Alessandra Gorla

(IMDEA Software Institute, Spain; Universidad Politécnica de Madrid, Spain)