Workshop SOAP 2026 – Author Index |
Contents -
Abstracts -
Authors
|
| Baradaran, Sara |
Sara Baradaran, Amirmohammad Nazari, and Mukund Raghothaman (University of Southern California, USA) Program analysis tools such as CodeQL enable programmers to express their questions about codebases in the form of declarative queries, which are then evaluated over structured representations of the code. These versatile tools have broad applications in bug finding, vulnerability discovery, and codebase exploration. Still, they are limited in their ability to answer questions that rely on semantic judgments which cannot be expressed or decided using program analysis tools alone, e.g., identifying string literals that contain private information or violations of naming conventions. In this paper, we present SemQL, a system which extends declarative program analysis frameworks with the ability to invoke an LLM as an external oracle. SemQL allows developers to write queries which combine structural reasoning with semantic (extra-analytic) judgments. We show the real-world value of such a system by collecting a set of analytic questions that require semantic reasoning beyond what is deducible simply from the structure of the code. We also present an algorithm which efficiently evaluates these queries while minimizing costly oracle invocations, and demonstrate its effectiveness in practical program analysis tasks. |
|
| Bhatia, Rishipal Singh |
Rishipal Singh Bhatia (Google, USA) As the ECMAScript specification evolves, industrial-scale JavaScript compilers face the challenge of supporting modern language syntax while maintaining compatibility for diverse execution environments. Traditionally, compilers solve this by running transpilation passes in a monolithic pipeline, where the transpilation passes are chosen to execute strictly based on a target language level. This results in significant computational waste, as compilers perform expensive Abstract Syntax Tree (AST) traversals to lower language features that may not exist in the actual input source code. We present a compiler improvement that conditionally executes transpiler passes based on accurately tracking and dynamically maintaining the exact set of language features present in the compilation unit throughout the transpilation process. It is implemented in the production open-sourced Google Closure Compiler. We also detail the architectural safeguards - including strategic pass ordering and dynamic validation of the transpiled code for feature-correctness. Evaluation on large-scale production applications and synthetic benchmarks produced a considerable reduction in compilation time and saved compute and memory usage. |
|
| Bugayenko, Yegor |
Yegor Bugayenko, Maxim Trunnikov, and Vladimir Zakharov (Huawei, China; Innopolis University, Russian Federation) The Java Stream API, introduced in Java~8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures. |
|
| Cela, Sopot |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Chang, Bor-Yuh Evan |
Kunha Kim, Nicholas V. Lewchenko, Bor-Yuh Evan Chang, and Gowtham Kaki (University of Colorado Boulder, USA; Amazon, USA) SMT-based automated verification is a promising approach for ensuring software correctness, but it often suffers from unpredictability and proof instability due to the undecidability of the underlying logic. While tools like Liquid Haskell or F* provide automation, they either restrict logical expressivity to ensure decidability or sacrifice decidability and predictability for richer logic. In this paper, we present Ravencheck, a verification tool for Rust programs. Ravencheck occupies a unique design space: it supports specifications within an expressive fragment of Higher-Order Logic (HOL) while guaranteeing decidable verification outcomes. To achieve this, Ravencheck automatically encodes high-level Rust programs into the Extended Effectively Propositional (EEPR) fragment of first-order logic with relational abstraction. Ravencheck builds on prior EEPR-based modeling languages (e.g., Ivy) by integrating with the standard Rust toolchain. This integration supports a one-language workflow in which verification conditions are expressed using the same purely functional Rust syntax as the verified executable code, rather than introducing a separate modeling language. |
|
| Churchill, Dulma |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Dinella, Elizabeth |
Elizabeth Dinella (Bryn Mawr College, USA) This paper explores the effectiveness of modular randomized testing for object oriented programs in Java. Modular testing involves testing individual components of a program in isolation. Often times, for effective test generation, a series of non-target setup calls must be included to obtain high coverage of the target component. In this work, we evaluate and improve modular testing with the EvoSuite test generator. We find that due to strict restrictions that disallow calls to non-target setup methods, EvoSuite’s modular testing mode is ineffective and often results in low branch coverage. We propose emote (Effective Modular Testing with EvoSuite): an enhancement to EvoSuite that relaxes this restriction, allowing non-target methods to be included in the test prefixes. This modification draws inspiration from developer-written fuzz drivers, which often invoke setup methods to properly initialize the state before testing the target method. To ensure meaningful test generation, we modify EvoSuite’s fitness function to focus branch coverage contributions on the call chain originating from the target method. emote is evaluated on a subset of the SF100 benchmark, showing a 15.15% improvement in coverage of the target methods. |
|
| Hajdu, Ákos |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Jelvani, Alborz |
Alborz Jelvani, Richard P. Martin, and Santosh Nagarakatte (Rutgers University, USA) Bounded translation validation (TV) is a common method used to check for the equivalence of two programs by unrolling all loops according to a user-supplied global bound. Unfortunately, large unroll bounds with multiple loops can increase the state space and the complexity of the TV. This paper proposes a systematic strategy for applying local unroll bounds for bounded TV of program pairs with loops. Our technique increases the coverage of the bounded TV in cases where the TV would otherwise timeout with the existing global unroll bounds. We introduce a lattice-based formulation of locally bounded program pairs and demonstrate that random sampling of local unroll bounds from this lattice admits a search-and-prune strategy that is effective for verification. We have modified the Alive2 bounded TV to support local unroll bounds and evaluated our approach on benchmarks such as TSVC. The results show an increase in verification coverage with our lattice search-and-prune strategy combined with local unroll bounds. |
|
| Kaki, Gowtham |
Kunha Kim, Nicholas V. Lewchenko, Bor-Yuh Evan Chang, and Gowtham Kaki (University of Colorado Boulder, USA; Amazon, USA) SMT-based automated verification is a promising approach for ensuring software correctness, but it often suffers from unpredictability and proof instability due to the undecidability of the underlying logic. While tools like Liquid Haskell or F* provide automation, they either restrict logical expressivity to ensure decidability or sacrifice decidability and predictability for richer logic. In this paper, we present Ravencheck, a verification tool for Rust programs. Ravencheck occupies a unique design space: it supports specifications within an expressive fragment of Higher-Order Logic (HOL) while guaranteeing decidable verification outcomes. To achieve this, Ravencheck automatically encodes high-level Rust programs into the Extended Effectively Propositional (EEPR) fragment of first-order logic with relational abstraction. Ravencheck builds on prior EEPR-based modeling languages (e.g., Ivy) by integrating with the standard Rust toolchain. This integration supports a one-language workflow in which verification conditions are expressed using the same purely functional Rust syntax as the verified executable code, rather than introducing a separate modeling language. |
|
| Kim, Kunha |
Kunha Kim, Nicholas V. Lewchenko, Bor-Yuh Evan Chang, and Gowtham Kaki (University of Colorado Boulder, USA; Amazon, USA) SMT-based automated verification is a promising approach for ensuring software correctness, but it often suffers from unpredictability and proof instability due to the undecidability of the underlying logic. While tools like Liquid Haskell or F* provide automation, they either restrict logical expressivity to ensure decidability or sacrifice decidability and predictability for richer logic. In this paper, we present Ravencheck, a verification tool for Rust programs. Ravencheck occupies a unique design space: it supports specifications within an expressive fragment of Higher-Order Logic (HOL) while guaranteeing decidable verification outcomes. To achieve this, Ravencheck automatically encodes high-level Rust programs into the Extended Effectively Propositional (EEPR) fragment of first-order logic with relational abstraction. Ravencheck builds on prior EEPR-based modeling languages (e.g., Ivy) by integrating with the standard Rust toolchain. This integration supports a one-language workflow in which verification conditions are expressed using the same purely functional Rust syntax as the verified executable code, rather than introducing a separate modeling language. |
|
| Kupriianets, Artem |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Lewchenko, Nicholas V. |
Kunha Kim, Nicholas V. Lewchenko, Bor-Yuh Evan Chang, and Gowtham Kaki (University of Colorado Boulder, USA; Amazon, USA) SMT-based automated verification is a promising approach for ensuring software correctness, but it often suffers from unpredictability and proof instability due to the undecidability of the underlying logic. While tools like Liquid Haskell or F* provide automation, they either restrict logical expressivity to ensure decidability or sacrifice decidability and predictability for richer logic. In this paper, we present Ravencheck, a verification tool for Rust programs. Ravencheck occupies a unique design space: it supports specifications within an expressive fragment of Higher-Order Logic (HOL) while guaranteeing decidable verification outcomes. To achieve this, Ravencheck automatically encodes high-level Rust programs into the Extended Effectively Propositional (EEPR) fragment of first-order logic with relational abstraction. Ravencheck builds on prior EEPR-based modeling languages (e.g., Ivy) by integrating with the standard Rust toolchain. This integration supports a one-language workflow in which verification conditions are expressed using the same purely functional Rust syntax as the verified executable code, rather than introducing a separate modeling language. |
|
| Liu, Bozhen |
Sanjib Kumar Sen and Bozhen Liu (Texas A&M University - Corpus Christi, USA) LLM-Powered Apps (LPAs) are increasingly being deployed on shared cloud servers where multiple users interact at the same time. This creates data leak risks where one user's private data can flow into another user's session through shared caches, misconfigured access control policies, or concurrency bugs in multi-threaded and event-driven code. Existing taint analysis tools do not handle these risks well because they need manual source-sink marking, only track control and dataflow, and do not account for access control policies, complex app structures, or runtime environments. In this paper, we propose Access Flow Guard (AFG), a framework that automatically identifies taint sources based on user identity and permission level to detect cross-user data leaks in LPAs. Instead of manually marking sources and sinks as existing taint approaches require, AFG introduces a Multi-User, Multi-Permission (MUMP) setup. MUMP automatically identifies user input entry points by matching function signatures against patterns derived from our dataset of popular open-source LPAs. Each identified input is tagged with the user's identity and permission level. AFG then applies Scoped Taint Pointer Analysis (STPA), building on existing pointer-based taint analysis techniques, to propagate user-tagged objects through a Pointer Assignment Graph. STPA detects where different users' data scopes overlap. We discuss key research challenges, including treating LLM components as black boxes and refining static results through dynamic testing. This approach aims to serve as a basis for systematic security testing of LLM-integrated software in shared multi-user environments. |
|
| Marescotti, Matteo |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Martin, Richard P. |
Alborz Jelvani, Richard P. Martin, and Santosh Nagarakatte (Rutgers University, USA) Bounded translation validation (TV) is a common method used to check for the equivalence of two programs by unrolling all loops according to a user-supplied global bound. Unfortunately, large unroll bounds with multiple loops can increase the state space and the complexity of the TV. This paper proposes a systematic strategy for applying local unroll bounds for bounded TV of program pairs with loops. Our technique increases the coverage of the bounded TV in cases where the TV would otherwise timeout with the existing global unroll bounds. We introduce a lattice-based formulation of locally bounded program pairs and demonstrate that random sampling of local unroll bounds from this lattice admits a search-and-prune strategy that is effective for verification. We have modified the Alive2 bounded TV to support local unroll bounds and evaluated our approach on benchmarks such as TSVC. The results show an increase in verification coverage with our lattice search-and-prune strategy combined with local unroll bounds. |
|
| Mendez, Jorge |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Nagarakatte, Santosh |
Alborz Jelvani, Richard P. Martin, and Santosh Nagarakatte (Rutgers University, USA) Bounded translation validation (TV) is a common method used to check for the equivalence of two programs by unrolling all loops according to a user-supplied global bound. Unfortunately, large unroll bounds with multiple loops can increase the state space and the complexity of the TV. This paper proposes a systematic strategy for applying local unroll bounds for bounded TV of program pairs with loops. Our technique increases the coverage of the bounded TV in cases where the TV would otherwise timeout with the existing global unroll bounds. We introduce a lattice-based formulation of locally bounded program pairs and demonstrate that random sampling of local unroll bounds from this lattice admits a search-and-prune strategy that is effective for verification. We have modified the Alive2 bounded TV to support local unroll bounds and evaluated our approach on benchmarks such as TSVC. The results show an increase in verification coverage with our lattice search-and-prune strategy combined with local unroll bounds. |
|
| Nazari, Amirmohammad |
Sara Baradaran, Amirmohammad Nazari, and Mukund Raghothaman (University of Southern California, USA) Program analysis tools such as CodeQL enable programmers to express their questions about codebases in the form of declarative queries, which are then evaluated over structured representations of the code. These versatile tools have broad applications in bug finding, vulnerability discovery, and codebase exploration. Still, they are limited in their ability to answer questions that rely on semantic judgments which cannot be expressed or decided using program analysis tools alone, e.g., identifying string literals that contain private information or violations of naming conventions. In this paper, we present SemQL, a system which extends declarative program analysis frameworks with the ability to invoke an LLM as an external oracle. SemQL allows developers to write queries which combine structural reasoning with semantic (extra-analytic) judgments. We show the real-world value of such a system by collecting a set of analytic questions that require semantic reasoning beyond what is deducible simply from the structure of the code. We also present an algorithm which efficiently evaluates these queries while minimizing costly oracle invocations, and demonstrate its effectiveness in practical program analysis tasks. |
|
| Raghothaman, Mukund |
Sara Baradaran, Amirmohammad Nazari, and Mukund Raghothaman (University of Southern California, USA) Program analysis tools such as CodeQL enable programmers to express their questions about codebases in the form of declarative queries, which are then evaluated over structured representations of the code. These versatile tools have broad applications in bug finding, vulnerability discovery, and codebase exploration. Still, they are limited in their ability to answer questions that rely on semantic judgments which cannot be expressed or decided using program analysis tools alone, e.g., identifying string literals that contain private information or violations of naming conventions. In this paper, we present SemQL, a system which extends declarative program analysis frameworks with the ability to invoke an LLM as an external oracle. SemQL allows developers to write queries which combine structural reasoning with semantic (extra-analytic) judgments. We show the real-world value of such a system by collecting a set of analytic questions that require semantic reasoning beyond what is deducible simply from the structure of the code. We also present an algorithm which efficiently evaluates these queries while minimizing costly oracle invocations, and demonstrate its effectiveness in practical program analysis tasks. |
|
| Sen, Sanjib Kumar |
Sanjib Kumar Sen and Bozhen Liu (Texas A&M University - Corpus Christi, USA) LLM-Powered Apps (LPAs) are increasingly being deployed on shared cloud servers where multiple users interact at the same time. This creates data leak risks where one user's private data can flow into another user's session through shared caches, misconfigured access control policies, or concurrency bugs in multi-threaded and event-driven code. Existing taint analysis tools do not handle these risks well because they need manual source-sink marking, only track control and dataflow, and do not account for access control policies, complex app structures, or runtime environments. In this paper, we propose Access Flow Guard (AFG), a framework that automatically identifies taint sources based on user identity and permission level to detect cross-user data leaks in LPAs. Instead of manually marking sources and sinks as existing taint approaches require, AFG introduces a Multi-User, Multi-Permission (MUMP) setup. MUMP automatically identifies user input entry points by matching function signatures against patterns derived from our dataset of popular open-source LPAs. Each identified input is tagged with the user's identity and permission level. AFG then applies Scoped Taint Pointer Analysis (STPA), building on existing pointer-based taint analysis techniques, to propagate user-tagged objects through a Pointer Assignment Graph. STPA detects where different users' data scopes overlap. We discuss key research challenges, including treating LLM components as black boxes and refining static results through dynamic testing. This approach aims to serve as a basis for systematic security testing of LLM-integrated software in shared multi-user environments. |
|
| Trunnikov, Maxim |
Yegor Bugayenko, Maxim Trunnikov, and Vladimir Zakharov (Huawei, China; Innopolis University, Russian Federation) The Java Stream API, introduced in Java~8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures. |
|
| Van Valkenburg, Sander |
Ákos Hajdu, Jorge Mendez, Sander van Valkenburg, Artem Kupriianets, Matteo Marescotti, Dulma Churchill, and Sopot Cela (Meta, UK) We present an industrial case study on deploying static code analysis to foster developer adoption, which indirectly supports the reliability of the WhatsApp iOS app, with a particular focus on memory management issues. Over the course of a year, we utilized Infer and Clang Static Analyzer to proactively prevent over 1700 regressions. In parallel, we mobilized 50+ engineers and harnessed generative AI to remediate over 1000 long-standing, pre-existing issues within the code base. We share insights on both the technical and people challenges faced, emphasizing the socio-technical strategies required for large-scale static analysis adoption. |
|
| Zakharov, Vladimir |
Yegor Bugayenko, Maxim Trunnikov, and Vladimir Zakharov (Huawei, China; Innopolis University, Russian Federation) The Java Stream API, introduced in Java~8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures. |
24 authors
proc time: 8.52