Powered by
14th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2025),
June 16, 2025,
Seoul, Republic of Korea
14th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2025)
Frontmatter
SOAP 2024 Organization
The 14th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP’25) is co-located with the 46th ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’25). In line with past workshops, SOAP’25 aims to bring together members of the program analysis community to share new developments and shape innovations in program analysis.
Papers
Beyond Affine Loops: A Geometric Approach to Program Synthesis
Erdenebayar Bayarmagnai,
Fatemeh Mohammadi, and
Rémi Prébet
(KU Leuven, Belgium; Inria, France)
Ensuring software correctness remains a fundamental challenge in formal program verification. One promising approach relies on finding polynomial invariants for loops. Polynomial invariants are properties of a program loop that hold before and after each iteration. Generating polynomial invariants is a crucial task for loops, but it is an undecidable problem in the general case. Recently, an alternative approach to this problem has emerged, focusing on synthesizing loops from invariants. However, existing methods only synthesize affine loops without guard conditions from polynomial invariants. In this paper, we address a more general problem, allowing loops to have polynomial update maps with a given structure, inequations in the guard condition, and polynomial invariants of arbitrary form.
In this paper, we use algebraic geometry tools to design and implement an algorithm that computes a finite set of polynomial equations whose solutions correspond to all loops satisfying the given polynomial invariants. In other words, we reduce the problem of synthesizing loops to finding solutions of polynomial systems within a specified subset of the complex numbers. The latter is handled in our software using an SMT solver.
Optimizing Type Migration for LLM-Based C-to-Rust Translation: A Data Flow Graph Approach
Qingxiao Xu and
Jeff Huang
(Texas A&M University, USA)
Translating legacy C codebases to Rust is crucial for improving memory safety, yet automating this process remains challenging for real-world projects. In this paper, we document key technical challenges we encountered in manually translating the Gzip GNU package using state-of-the-art LLMs (e.g., the OpenAI o1 model). To address the incompatibility between C and Rust type systems, we propose an optimized method aimed at providing the appropriate context and prompts to ensure correct variable types at declaration stage. We extract essential code semantics by generating data flow graphs (DFG). For global variables that may induce multiple borrowing, we prompt LLM to generate enums, and for pointer variables (scalar pointers, array pointers and pointer arithmetic), we prompt LLM to map them to correct Rust types. By leveraging DFG context information and prompts with clear instructions, our approach enables o1 to achieve more accurate type translation compared to direct translation. Our results are publicly available and demonstrate the effectiveness of our approach in improving type migration.
Compositional Static Callgraph Reachability Analysis for WhatsApp Android App Health
Ákos Hajdu,
Roman Lee,
Gavin Weng,
Nilesh Agrawal, and
Jérémy Dubreil
(Meta, UK; Meta, Canada; Meta, USA; Unaffiliated, France)
We report on an industrial use case of static callgraph reachability analysis to improve WhatsApp Android app health.
We collaborated with engineers dedicated to app health to annotate/specify the source code.
We leveraged the Infer static analyzer to prevent regressions during code changes and to periodically find pre-existing issues on the latest revision.
Within three months, the analysis prevented almost a hundred regressions from being introduced and resulted in fixes for a handful of pre-existing issues, including examples with end-user measurable impact.
Towards Bit-Level Dominance Preserving Quantization of Neural Classifiers
Dorra Ben Khalifa and
Matthieu Martel
(ENAC - University of Toulouse, France; University of Peprignan, France; Numalis, France)
Quantization consists of replacing the original data types used to represent the weights of neural networks with less resource-intensive data types. While considerable research has focused on quantization, most existing methods that offer theoretical guarantees do so by providing error bounds on the difference between the original and reduced precision models.
In this article, we introduce a new quantization technique that, rather than focusing on bounding errors, determines the minimum precision necessary to preserve class dominance, independent of any specific set of numerical formats.
In other words, regardless of the exact scores for each class, our method guarantees that the class predicted by the original network remains unchanged after quantization. Our method is static and the proposed quantization holds for all the inputs.
Technically, we leverage existing theorems that provide error bounds for dot products and formulate an optimization problem whose solution yields the required reduced precision. We also present experimental results to validate the effectiveness of our method.
Universal High-Performance CFL-Reachability via Matrix Multiplication
Ilia Muravev and
Semyon Grigorev
(Saint-Petersburg State University, Russia)
Context-free language (CFL) reachability is a fundamental computational framework for formulating key static analyses (e.g., alias analysis, value-flow analysis, and points-to analysis) as well as some other graph analysis problems. Achieving high performance in universal CFL-reachability solvers remains a significant challenge. Specialized tools such as Pearl and Gigascale are optimized for specific CFLs but lack general applicability, whereas existing universal CFL-reachability solvers often do not scale well in important cases. In particular, prior efforts to leverage high-performance linear algebra operations in universal CFL-reachability solvers produced a matrix-based solver, MatrixCFPQ, that excels at performing common navigational queries on RDF graphs (which are unrelated to program analysis) but is inefficient when it comes to modeling static analyses.
In this work, we introduce FastMatrixCFPQ, a universal matrix-based CFL-reachability solver that overcomes the limitations of MatrixCFPQ by leveraging the properties of the CFL-semiring, common patterns in context-free grammars, and the features of the SuiteSparse:GraphBLAS sparse linear algebra library. Our experimental results demonstrate that FastMatrixCFPQ outperforms the state-of-the-art universal CFL-reachability solvers across five client analyses---often by orders of magnitude---and, in many cases, even surpasses the speed of specialized solvers designed for specific CFLs.
Published Artifact
Artifacts Available
Scalable Language Agnostic Taint Tracking using Explicit Data Dependencies
Sedick David Baker Effendi,
Xavier Pinho,
Andrei Michael Dreyer, and
Fabian Yamaguchi
(Stellenbosch University, South Africa; StackGen, USA; Whirly Labs, South Africa)
Taint analysis using explicit whole-program data-dependence graphs is powerful for vulnerability discovery but faces two major challenges. First, accurately modeling taint propagation through calls to external library procedures requires extensive manual annotations, which becomes impractical for large ecosystems. Second, the sheer size of whole-program graph representations leads to serious scalability and performance issues, particularly when quick analysis is needed in continuous development pipelines.
This paper presents the design and implementation of a system for a language-agnostic data-dependence representation. The system accommodates missing annotations describing the behavior of library procedures by over-approximating data flows, allowing annotations to be added later without recalculation. We contribute this data-flow analysis system to the open-source code analysis platform Joern, making it available to the community.
Pick Your Call Graphs Well: On Scaling IFDS-Based Data-Flow Analyses
Kadiray Karakaya,
Palaniappan Muthuraman, and
Eric Bodden
(Heinz Nixdorf Institute at Paderborn University, Germany; Fraunhofer IEM, Germany)
Recent works on scaling IFDS-based analyses propose sophisticated techniques ranging from sparsification and disk-assisted computing to intelligent garbage collection. Yet, they choose a fixed call graph, thereby disregarding its implications on scalability.
This work presents an empirical evaluation of call graph precision's impact on the precision and scalability of the IFDS framework. To this end, we build QCG, a call graph generation tool for Android that extends the Qilin pointer analysis framework, and integrate it with FlowDroid, a state-of-the-art IFDS-based taint analysis solver. We assess the precision of 27 call graphs built with QCG and 4 default call graphs in FlowDroid, on the TaintBench benchmark of Android malware. We then evaluate how increasing the call-graph precision impacts FlowDroid's runtime performance and memory consumption on real-world apps.
We report that the time invested in building precise context-sensitive call graphs pays off: They significantly reduce IFDS analyses' runtimes while also improving their precision. However, there appears to be a sweet spot in the trade-off between the call graph construction time and the reduction in total analysis runtime.
proc time: 4.3