Workshop SOAP 2024 – Author Index |
Contents -
Abstracts -
Authors
|
Arzt, Steven |
SOAP '24: "ValBench: Benchmarking Exact ..."
ValBench: Benchmarking Exact Value Analysis
Marc Miltenberger and Steven Arzt (Fraunhofer SIT, Germany; ATHENE, Darmstadt, Germany) Value analysis is an important building block in static program analysis. While several approaches have been proposed, evaluating and comparing them is not trivial. Up to this day, a reliable and large benchmark specifically for value analysis is missing. Such a suite must not only provide test cases, but also a ground truth with the correct values to be found. In this paper, we propose ValBench, an extensible value benchmark suite consisting of 372 test cases for Java analysis and 59 test cases for Android analysis tools. Furthermore, we present an evaluation framework that automatically generates a ground truth for these test cases, identifies their respective challenges for program analysis and orchestrates the execution and result collection on the various value analysis tools. We further present an evaluation of 7 existing value analysis tools on ValBench and highlight the challenges faced by these tools as an empirical overview over the state of the art in value analysis. @InProceedings{SOAP24p45, author = {Marc Miltenberger and Steven Arzt}, title = {ValBench: Benchmarking Exact Value Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {45--51}, doi = {10.1145/3652588.3663322}, year = {2024}, } Publisher's Version Published Artifact Artifacts Available |
|
Bernstein, Maxwell |
SOAP '24: "Dr Wenowdis: Specializing ..."
Dr Wenowdis: Specializing Dynamic Language C Extensions using Type Information
Maxwell Bernstein and Carl Friedrich Bolz-Tereick (Northeastern University, USA; Heinrich-Heine-Universität Düsseldorf, Germany) C-based interpreters such as CPython make extensive use of C "extension" code, which is opaque to static analysis tools and faster runtimes with JIT compilers, such as PyPy. Not only are the extensions opaque, but the interface between the dynamic language types and the C types can introduce impedance. We hypothesise that frequent calls to C extension code introduce significant overhead that is often unnecessary. We validate this hypothesis by introducing a simple technique, "typed methods", which allow selected C extension functions to have additional metadata attached to them in a backward-compatible way. This additional metadata makes it much easier for a JIT compiler (and as we show, even an interpreter!) to significantly reduce the call and return overhead. Although we have prototyped typed methods in PyPy, we suspect that the same technique is applicable to a wider variety of language runtimes and that the information can also be consumed by static analysis tooling. @InProceedings{SOAP24p1, author = {Maxwell Bernstein and Carl Friedrich Bolz-Tereick}, title = {Dr Wenowdis: Specializing Dynamic Language C Extensions using Type Information}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {1--8}, doi = {10.1145/3652588.3663316}, year = {2024}, } Publisher's Version |
|
Bertholon, Guillaume |
SOAP '24: "Interactive Source-to-Source ..."
Interactive Source-to-Source Optimizations Validated using Static Resource Analysis
Guillaume Bertholon, Arthur Charguéraud, Thomas Kœhler, Begatim Bytyqi, and Damien Rouhling (Inria, France; Université de Strasbourg - CNRS, France) Developments in hardware have delivered formidable computing power. Yet, the increased hardware complexity has made it a real challenge to develop software that exploits the hardware to its full potential. Numerous approaches have been explored to help programmers turn naive code into high-performance code, finely tuned for the targeted hardware. However, these approaches have inherent limitations, and it remains common practice for programmers seeking maximal performance to follow the tedious and error-prone route of writing optimized code by hand. This paper presents OptiTrust, an interactive source-to-source optimization framework that operates on general-purpose C code. The programmer develops a script describing a series of code transformations. The framework provides continuous feedback in the form of human-readable diffs over conventional C code. OptiTrust supports advanced code transformations, including transformations exploited by the state-of-the-art DSL tools Halide and TVM, and transformations beyond the reach of existing tools. OptiTrust also supports user-defined transformations, as well as defining complex transformations by composition of simpler transformations. Crucially, to check the validity of code transformations, OptiTrust leverages a static resource analysis in a simplified form of Separation Logic. Starting from user-provided annotations on functions and loops, our analysis deduces precise resource usage throughout the code. @InProceedings{SOAP24p26, author = {Guillaume Bertholon and Arthur Charguéraud and Thomas Kœhler and Begatim Bytyqi and Damien Rouhling}, title = {Interactive Source-to-Source Optimizations Validated using Static Resource Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {26--34}, doi = {10.1145/3652588.3663320}, year = {2024}, } Publisher's Version |
|
Bolz-Tereick, Carl Friedrich |
SOAP '24: "Dr Wenowdis: Specializing ..."
Dr Wenowdis: Specializing Dynamic Language C Extensions using Type Information
Maxwell Bernstein and Carl Friedrich Bolz-Tereick (Northeastern University, USA; Heinrich-Heine-Universität Düsseldorf, Germany) C-based interpreters such as CPython make extensive use of C "extension" code, which is opaque to static analysis tools and faster runtimes with JIT compilers, such as PyPy. Not only are the extensions opaque, but the interface between the dynamic language types and the C types can introduce impedance. We hypothesise that frequent calls to C extension code introduce significant overhead that is often unnecessary. We validate this hypothesis by introducing a simple technique, "typed methods", which allow selected C extension functions to have additional metadata attached to them in a backward-compatible way. This additional metadata makes it much easier for a JIT compiler (and as we show, even an interpreter!) to significantly reduce the call and return overhead. Although we have prototyped typed methods in PyPy, we suspect that the same technique is applicable to a wider variety of language runtimes and that the information can also be consumed by static analysis tooling. @InProceedings{SOAP24p1, author = {Maxwell Bernstein and Carl Friedrich Bolz-Tereick}, title = {Dr Wenowdis: Specializing Dynamic Language C Extensions using Type Information}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {1--8}, doi = {10.1145/3652588.3663316}, year = {2024}, } Publisher's Version |
|
Brain, Martin |
SOAP '24: "Misconceptions about Loops ..."
Misconceptions about Loops in C
Martin Brain and Mahdi Malkawi (City University of London, United Kingdom) Loop analysis is a key component of static analysis tools. Unfortunately, there are several rare edge cases. As a tool moves from academic prototype to production-ready, obscure cases can and do occur. This results in loop analysis being a key source of late-discovered but significant algorithmic bugs. To avoid these, this paper presents a collection of examples and "folklore" challenges in loop analysis. @InProceedings{SOAP24p60, author = {Martin Brain and Mahdi Malkawi}, title = {Misconceptions about Loops in C}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {60--66}, doi = {10.1145/3652588.3663324}, year = {2024}, } Publisher's Version Published Artifact Artifacts Available |
|
Bytyqi, Begatim |
SOAP '24: "Interactive Source-to-Source ..."
Interactive Source-to-Source Optimizations Validated using Static Resource Analysis
Guillaume Bertholon, Arthur Charguéraud, Thomas Kœhler, Begatim Bytyqi, and Damien Rouhling (Inria, France; Université de Strasbourg - CNRS, France) Developments in hardware have delivered formidable computing power. Yet, the increased hardware complexity has made it a real challenge to develop software that exploits the hardware to its full potential. Numerous approaches have been explored to help programmers turn naive code into high-performance code, finely tuned for the targeted hardware. However, these approaches have inherent limitations, and it remains common practice for programmers seeking maximal performance to follow the tedious and error-prone route of writing optimized code by hand. This paper presents OptiTrust, an interactive source-to-source optimization framework that operates on general-purpose C code. The programmer develops a script describing a series of code transformations. The framework provides continuous feedback in the form of human-readable diffs over conventional C code. OptiTrust supports advanced code transformations, including transformations exploited by the state-of-the-art DSL tools Halide and TVM, and transformations beyond the reach of existing tools. OptiTrust also supports user-defined transformations, as well as defining complex transformations by composition of simpler transformations. Crucially, to check the validity of code transformations, OptiTrust leverages a static resource analysis in a simplified form of Separation Logic. Starting from user-provided annotations on functions and loops, our analysis deduces precise resource usage throughout the code. @InProceedings{SOAP24p26, author = {Guillaume Bertholon and Arthur Charguéraud and Thomas Kœhler and Begatim Bytyqi and Damien Rouhling}, title = {Interactive Source-to-Source Optimizations Validated using Static Resource Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {26--34}, doi = {10.1145/3652588.3663320}, year = {2024}, } Publisher's Version |
|
Chapman, Patrick J. |
SOAP '24: "Interleaving Static Analysis ..."
Interleaving Static Analysis and LLM Prompting
Patrick J. Chapman, Cindy Rubio-González, and Aditya V. Thakur (University of California at Davis, Davis, USA) This paper presents a new approach for using Large Language Models (LLMs) to improve static program analysis. Specifically, during program analysis, we interleave calls to the static analyzer and queries to the LLM: the prompt used to query the LLM is constructed using intermediate results from the static analysis, and the result from the LLM query is used for subsequent analysis of the program. We apply this novel approach to the problem of error-specification inference of functions in systems code written in C; i.e., inferring the set of values returned by each function upon error, which can aid in program understanding as well as in finding error-handling bugs. We evaluate our approach on real-world C programs, such as MbedTLS and zlib, by incorporating LLMs into EESI, a state-of-the-art static analysis for error-specification inference. Compared to EESI, our approach achieves higher recall across all benchmarks (from average of 52.55% to 77.83%) and higher F1-score (from average of 0.612 to 0.804) while maintaining precision (from average of 86.67% to 85.12%). @InProceedings{SOAP24p9, author = {Patrick J. Chapman and Cindy Rubio-González and Aditya V. Thakur}, title = {Interleaving Static Analysis and LLM Prompting}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {9--17}, doi = {10.1145/3652588.3663317}, year = {2024}, } Publisher's Version |
|
Charguéraud, Arthur |
SOAP '24: "Interactive Source-to-Source ..."
Interactive Source-to-Source Optimizations Validated using Static Resource Analysis
Guillaume Bertholon, Arthur Charguéraud, Thomas Kœhler, Begatim Bytyqi, and Damien Rouhling (Inria, France; Université de Strasbourg - CNRS, France) Developments in hardware have delivered formidable computing power. Yet, the increased hardware complexity has made it a real challenge to develop software that exploits the hardware to its full potential. Numerous approaches have been explored to help programmers turn naive code into high-performance code, finely tuned for the targeted hardware. However, these approaches have inherent limitations, and it remains common practice for programmers seeking maximal performance to follow the tedious and error-prone route of writing optimized code by hand. This paper presents OptiTrust, an interactive source-to-source optimization framework that operates on general-purpose C code. The programmer develops a script describing a series of code transformations. The framework provides continuous feedback in the form of human-readable diffs over conventional C code. OptiTrust supports advanced code transformations, including transformations exploited by the state-of-the-art DSL tools Halide and TVM, and transformations beyond the reach of existing tools. OptiTrust also supports user-defined transformations, as well as defining complex transformations by composition of simpler transformations. Crucially, to check the validity of code transformations, OptiTrust leverages a static resource analysis in a simplified form of Separation Logic. Starting from user-provided annotations on functions and loops, our analysis deduces precise resource usage throughout the code. @InProceedings{SOAP24p26, author = {Guillaume Bertholon and Arthur Charguéraud and Thomas Kœhler and Begatim Bytyqi and Damien Rouhling}, title = {Interactive Source-to-Source Optimizations Validated using Static Resource Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {26--34}, doi = {10.1145/3652588.3663320}, year = {2024}, } Publisher's Version |
|
Conrado, Giovanna Kobus |
SOAP '24: "A Better Approximation for ..."
A Better Approximation for Interleaved Dyck Reachability
Giovanna Kobus Conrado and Andreas Pavlogiannis (Hong Kong University of Science and Technology, Hong Kong; Aarhus University, Denmark) Interleaved Dyck reachability is a standard, graph-based formulation of a plethora of static analyses that seek to be context- and field- sensitive, where each type of sensitivity is expressed via a CFL/Dyck language. Unfortunately, the problem is well-known to be undecidable in general, and as such, existing approaches resort to clever overapproximations. Recently, a mutual refinement algorithm, that iteratively considers each of the two sensitivities in isolation until a fixpoint is reached, was shown to achieve high precision. In this work we present a more precise approximation of interleaved Dyck reachability, by extending the mutual-refinement algorithm in two directions. First, we develop refined CFLs to express each type of sensitivity precisely, while simultaneously also lightly overapproximating the opposite type. Second, we apply the resulting algorithm on an on-demand basis, which effectively masks out imprecision incurred by parts of the graph that are irrelevant for the query at hand. Our experiments show that the new approach offers significantly higher precision than the vanilla mutual-refinement algorithm and other common baselines; for a particularly challenging benchmark, we report, on average, 51% of the reachable pairs compared to the most recent alternative. @InProceedings{SOAP24p18, author = {Giovanna Kobus Conrado and Andreas Pavlogiannis}, title = {A Better Approximation for Interleaved Dyck Reachability}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {18--25}, doi = {10.1145/3652588.3663318}, year = {2024}, } Publisher's Version |
|
Dudina, Irina |
SOAP '24: "Static Analysis for Transitioning ..."
Static Analysis for Transitioning to CHERI C/C++
Irina Dudina and Ian Stark (University of Edinburgh, United Kingdom) We describe and evaluate custom static analyses to support transitioning C/C++ code to CHERI hardware. CHERI is a novel architectural extension, implemented for RISC-V and AArch64, that uses capabilities to provide fine-grained memory protection and scalable software compartmentalisation. We provide custom checkers for the Clang Static Analyzer to handle capability alignment, copying through memory, and manipulation as integers; as well as evaluating these on a sample of packages from the CheriBSD ports library. While the existing CHERI toolchain can recompile large code collections for the platform with only a few source changes, we demonstrate that static analysis can help to identify where and what those changes must be to avoid later runtime faults. @InProceedings{SOAP24p52, author = {Irina Dudina and Ian Stark}, title = {Static Analysis for Transitioning to CHERI C/C++}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {52--59}, doi = {10.1145/3652588.3663323}, year = {2024}, } Publisher's Version |
|
Erhard, Julian |
SOAP '24: "When to Stop Going Down the ..."
When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly
Julian Erhard, Johanna Franziska Schinabeck, Michael Schwarz, and Helmut Seidl (LMU Munich, Germany; TU Munich, Germany) Context-sensitive analysis of programs containing recursive procedures may be expensive, in particular, when using expressive domains, rendering the set of possible contexts large or even infinite. Here, we present a general framework for context-sensitivity that allows formalizing not only known approaches such as full context or call strings but also combinations of these. We propose three generic lifters in this framework to bound the number of encountered contexts on the fly. These lifters are implemented within the abstract interpreter Goblint and compared to existing approaches to context-sensitivity on the SV-COMP benchmark suite. On a subset of recursive benchmarks, all proposed lifters manage to reduce the number of stack overflows and timeouts compared to a full context approach, with one of them improving the number of correct verdicts by 31% and showing promising results on the considered SV-COMP categories. @InProceedings{SOAP24p35, author = {Julian Erhard and Johanna Franziska Schinabeck and Michael Schwarz and Helmut Seidl}, title = {When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {35--44}, doi = {10.1145/3652588.3663321}, year = {2024}, } Publisher's Version |
|
Kœhler, Thomas |
SOAP '24: "Interactive Source-to-Source ..."
Interactive Source-to-Source Optimizations Validated using Static Resource Analysis
Guillaume Bertholon, Arthur Charguéraud, Thomas Kœhler, Begatim Bytyqi, and Damien Rouhling (Inria, France; Université de Strasbourg - CNRS, France) Developments in hardware have delivered formidable computing power. Yet, the increased hardware complexity has made it a real challenge to develop software that exploits the hardware to its full potential. Numerous approaches have been explored to help programmers turn naive code into high-performance code, finely tuned for the targeted hardware. However, these approaches have inherent limitations, and it remains common practice for programmers seeking maximal performance to follow the tedious and error-prone route of writing optimized code by hand. This paper presents OptiTrust, an interactive source-to-source optimization framework that operates on general-purpose C code. The programmer develops a script describing a series of code transformations. The framework provides continuous feedback in the form of human-readable diffs over conventional C code. OptiTrust supports advanced code transformations, including transformations exploited by the state-of-the-art DSL tools Halide and TVM, and transformations beyond the reach of existing tools. OptiTrust also supports user-defined transformations, as well as defining complex transformations by composition of simpler transformations. Crucially, to check the validity of code transformations, OptiTrust leverages a static resource analysis in a simplified form of Separation Logic. Starting from user-provided annotations on functions and loops, our analysis deduces precise resource usage throughout the code. @InProceedings{SOAP24p26, author = {Guillaume Bertholon and Arthur Charguéraud and Thomas Kœhler and Begatim Bytyqi and Damien Rouhling}, title = {Interactive Source-to-Source Optimizations Validated using Static Resource Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {26--34}, doi = {10.1145/3652588.3663320}, year = {2024}, } Publisher's Version |
|
Malkawi, Mahdi |
SOAP '24: "Misconceptions about Loops ..."
Misconceptions about Loops in C
Martin Brain and Mahdi Malkawi (City University of London, United Kingdom) Loop analysis is a key component of static analysis tools. Unfortunately, there are several rare edge cases. As a tool moves from academic prototype to production-ready, obscure cases can and do occur. This results in loop analysis being a key source of late-discovered but significant algorithmic bugs. To avoid these, this paper presents a collection of examples and "folklore" challenges in loop analysis. @InProceedings{SOAP24p60, author = {Martin Brain and Mahdi Malkawi}, title = {Misconceptions about Loops in C}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {60--66}, doi = {10.1145/3652588.3663324}, year = {2024}, } Publisher's Version Published Artifact Artifacts Available |
|
Miltenberger, Marc |
SOAP '24: "ValBench: Benchmarking Exact ..."
ValBench: Benchmarking Exact Value Analysis
Marc Miltenberger and Steven Arzt (Fraunhofer SIT, Germany; ATHENE, Darmstadt, Germany) Value analysis is an important building block in static program analysis. While several approaches have been proposed, evaluating and comparing them is not trivial. Up to this day, a reliable and large benchmark specifically for value analysis is missing. Such a suite must not only provide test cases, but also a ground truth with the correct values to be found. In this paper, we propose ValBench, an extensible value benchmark suite consisting of 372 test cases for Java analysis and 59 test cases for Android analysis tools. Furthermore, we present an evaluation framework that automatically generates a ground truth for these test cases, identifies their respective challenges for program analysis and orchestrates the execution and result collection on the various value analysis tools. We further present an evaluation of 7 existing value analysis tools on ValBench and highlight the challenges faced by these tools as an empirical overview over the state of the art in value analysis. @InProceedings{SOAP24p45, author = {Marc Miltenberger and Steven Arzt}, title = {ValBench: Benchmarking Exact Value Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {45--51}, doi = {10.1145/3652588.3663322}, year = {2024}, } Publisher's Version Published Artifact Artifacts Available |
|
Pavlogiannis, Andreas |
SOAP '24: "A Better Approximation for ..."
A Better Approximation for Interleaved Dyck Reachability
Giovanna Kobus Conrado and Andreas Pavlogiannis (Hong Kong University of Science and Technology, Hong Kong; Aarhus University, Denmark) Interleaved Dyck reachability is a standard, graph-based formulation of a plethora of static analyses that seek to be context- and field- sensitive, where each type of sensitivity is expressed via a CFL/Dyck language. Unfortunately, the problem is well-known to be undecidable in general, and as such, existing approaches resort to clever overapproximations. Recently, a mutual refinement algorithm, that iteratively considers each of the two sensitivities in isolation until a fixpoint is reached, was shown to achieve high precision. In this work we present a more precise approximation of interleaved Dyck reachability, by extending the mutual-refinement algorithm in two directions. First, we develop refined CFLs to express each type of sensitivity precisely, while simultaneously also lightly overapproximating the opposite type. Second, we apply the resulting algorithm on an on-demand basis, which effectively masks out imprecision incurred by parts of the graph that are irrelevant for the query at hand. Our experiments show that the new approach offers significantly higher precision than the vanilla mutual-refinement algorithm and other common baselines; for a particularly challenging benchmark, we report, on average, 51% of the reachable pairs compared to the most recent alternative. @InProceedings{SOAP24p18, author = {Giovanna Kobus Conrado and Andreas Pavlogiannis}, title = {A Better Approximation for Interleaved Dyck Reachability}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {18--25}, doi = {10.1145/3652588.3663318}, year = {2024}, } Publisher's Version |
|
Rouhling, Damien |
SOAP '24: "Interactive Source-to-Source ..."
Interactive Source-to-Source Optimizations Validated using Static Resource Analysis
Guillaume Bertholon, Arthur Charguéraud, Thomas Kœhler, Begatim Bytyqi, and Damien Rouhling (Inria, France; Université de Strasbourg - CNRS, France) Developments in hardware have delivered formidable computing power. Yet, the increased hardware complexity has made it a real challenge to develop software that exploits the hardware to its full potential. Numerous approaches have been explored to help programmers turn naive code into high-performance code, finely tuned for the targeted hardware. However, these approaches have inherent limitations, and it remains common practice for programmers seeking maximal performance to follow the tedious and error-prone route of writing optimized code by hand. This paper presents OptiTrust, an interactive source-to-source optimization framework that operates on general-purpose C code. The programmer develops a script describing a series of code transformations. The framework provides continuous feedback in the form of human-readable diffs over conventional C code. OptiTrust supports advanced code transformations, including transformations exploited by the state-of-the-art DSL tools Halide and TVM, and transformations beyond the reach of existing tools. OptiTrust also supports user-defined transformations, as well as defining complex transformations by composition of simpler transformations. Crucially, to check the validity of code transformations, OptiTrust leverages a static resource analysis in a simplified form of Separation Logic. Starting from user-provided annotations on functions and loops, our analysis deduces precise resource usage throughout the code. @InProceedings{SOAP24p26, author = {Guillaume Bertholon and Arthur Charguéraud and Thomas Kœhler and Begatim Bytyqi and Damien Rouhling}, title = {Interactive Source-to-Source Optimizations Validated using Static Resource Analysis}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {26--34}, doi = {10.1145/3652588.3663320}, year = {2024}, } Publisher's Version |
|
Rubio-González, Cindy |
SOAP '24: "Interleaving Static Analysis ..."
Interleaving Static Analysis and LLM Prompting
Patrick J. Chapman, Cindy Rubio-González, and Aditya V. Thakur (University of California at Davis, Davis, USA) This paper presents a new approach for using Large Language Models (LLMs) to improve static program analysis. Specifically, during program analysis, we interleave calls to the static analyzer and queries to the LLM: the prompt used to query the LLM is constructed using intermediate results from the static analysis, and the result from the LLM query is used for subsequent analysis of the program. We apply this novel approach to the problem of error-specification inference of functions in systems code written in C; i.e., inferring the set of values returned by each function upon error, which can aid in program understanding as well as in finding error-handling bugs. We evaluate our approach on real-world C programs, such as MbedTLS and zlib, by incorporating LLMs into EESI, a state-of-the-art static analysis for error-specification inference. Compared to EESI, our approach achieves higher recall across all benchmarks (from average of 52.55% to 77.83%) and higher F1-score (from average of 0.612 to 0.804) while maintaining precision (from average of 86.67% to 85.12%). @InProceedings{SOAP24p9, author = {Patrick J. Chapman and Cindy Rubio-González and Aditya V. Thakur}, title = {Interleaving Static Analysis and LLM Prompting}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {9--17}, doi = {10.1145/3652588.3663317}, year = {2024}, } Publisher's Version |
|
Schinabeck, Johanna Franziska |
SOAP '24: "When to Stop Going Down the ..."
When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly
Julian Erhard, Johanna Franziska Schinabeck, Michael Schwarz, and Helmut Seidl (LMU Munich, Germany; TU Munich, Germany) Context-sensitive analysis of programs containing recursive procedures may be expensive, in particular, when using expressive domains, rendering the set of possible contexts large or even infinite. Here, we present a general framework for context-sensitivity that allows formalizing not only known approaches such as full context or call strings but also combinations of these. We propose three generic lifters in this framework to bound the number of encountered contexts on the fly. These lifters are implemented within the abstract interpreter Goblint and compared to existing approaches to context-sensitivity on the SV-COMP benchmark suite. On a subset of recursive benchmarks, all proposed lifters manage to reduce the number of stack overflows and timeouts compared to a full context approach, with one of them improving the number of correct verdicts by 31% and showing promising results on the considered SV-COMP categories. @InProceedings{SOAP24p35, author = {Julian Erhard and Johanna Franziska Schinabeck and Michael Schwarz and Helmut Seidl}, title = {When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {35--44}, doi = {10.1145/3652588.3663321}, year = {2024}, } Publisher's Version |
|
Schwarz, Michael |
SOAP '24: "When to Stop Going Down the ..."
When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly
Julian Erhard, Johanna Franziska Schinabeck, Michael Schwarz, and Helmut Seidl (LMU Munich, Germany; TU Munich, Germany) Context-sensitive analysis of programs containing recursive procedures may be expensive, in particular, when using expressive domains, rendering the set of possible contexts large or even infinite. Here, we present a general framework for context-sensitivity that allows formalizing not only known approaches such as full context or call strings but also combinations of these. We propose three generic lifters in this framework to bound the number of encountered contexts on the fly. These lifters are implemented within the abstract interpreter Goblint and compared to existing approaches to context-sensitivity on the SV-COMP benchmark suite. On a subset of recursive benchmarks, all proposed lifters manage to reduce the number of stack overflows and timeouts compared to a full context approach, with one of them improving the number of correct verdicts by 31% and showing promising results on the considered SV-COMP categories. @InProceedings{SOAP24p35, author = {Julian Erhard and Johanna Franziska Schinabeck and Michael Schwarz and Helmut Seidl}, title = {When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {35--44}, doi = {10.1145/3652588.3663321}, year = {2024}, } Publisher's Version |
|
Seidl, Helmut |
SOAP '24: "When to Stop Going Down the ..."
When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly
Julian Erhard, Johanna Franziska Schinabeck, Michael Schwarz, and Helmut Seidl (LMU Munich, Germany; TU Munich, Germany) Context-sensitive analysis of programs containing recursive procedures may be expensive, in particular, when using expressive domains, rendering the set of possible contexts large or even infinite. Here, we present a general framework for context-sensitivity that allows formalizing not only known approaches such as full context or call strings but also combinations of these. We propose three generic lifters in this framework to bound the number of encountered contexts on the fly. These lifters are implemented within the abstract interpreter Goblint and compared to existing approaches to context-sensitivity on the SV-COMP benchmark suite. On a subset of recursive benchmarks, all proposed lifters manage to reduce the number of stack overflows and timeouts compared to a full context approach, with one of them improving the number of correct verdicts by 31% and showing promising results on the considered SV-COMP categories. @InProceedings{SOAP24p35, author = {Julian Erhard and Johanna Franziska Schinabeck and Michael Schwarz and Helmut Seidl}, title = {When to Stop Going Down the Rabbit Hole: Taming Context-Sensitivity on the Fly}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {35--44}, doi = {10.1145/3652588.3663321}, year = {2024}, } Publisher's Version |
|
Stark, Ian |
SOAP '24: "Static Analysis for Transitioning ..."
Static Analysis for Transitioning to CHERI C/C++
Irina Dudina and Ian Stark (University of Edinburgh, United Kingdom) We describe and evaluate custom static analyses to support transitioning C/C++ code to CHERI hardware. CHERI is a novel architectural extension, implemented for RISC-V and AArch64, that uses capabilities to provide fine-grained memory protection and scalable software compartmentalisation. We provide custom checkers for the Clang Static Analyzer to handle capability alignment, copying through memory, and manipulation as integers; as well as evaluating these on a sample of packages from the CheriBSD ports library. While the existing CHERI toolchain can recompile large code collections for the platform with only a few source changes, we demonstrate that static analysis can help to identify where and what those changes must be to avoid later runtime faults. @InProceedings{SOAP24p52, author = {Irina Dudina and Ian Stark}, title = {Static Analysis for Transitioning to CHERI C/C++}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {52--59}, doi = {10.1145/3652588.3663323}, year = {2024}, } Publisher's Version |
|
Thakur, Aditya V. |
SOAP '24: "Interleaving Static Analysis ..."
Interleaving Static Analysis and LLM Prompting
Patrick J. Chapman, Cindy Rubio-González, and Aditya V. Thakur (University of California at Davis, Davis, USA) This paper presents a new approach for using Large Language Models (LLMs) to improve static program analysis. Specifically, during program analysis, we interleave calls to the static analyzer and queries to the LLM: the prompt used to query the LLM is constructed using intermediate results from the static analysis, and the result from the LLM query is used for subsequent analysis of the program. We apply this novel approach to the problem of error-specification inference of functions in systems code written in C; i.e., inferring the set of values returned by each function upon error, which can aid in program understanding as well as in finding error-handling bugs. We evaluate our approach on real-world C programs, such as MbedTLS and zlib, by incorporating LLMs into EESI, a state-of-the-art static analysis for error-specification inference. Compared to EESI, our approach achieves higher recall across all benchmarks (from average of 52.55% to 77.83%) and higher F1-score (from average of 0.612 to 0.804) while maintaining precision (from average of 86.67% to 85.12%). @InProceedings{SOAP24p9, author = {Patrick J. Chapman and Cindy Rubio-González and Aditya V. Thakur}, title = {Interleaving Static Analysis and LLM Prompting}, booktitle = {Proc.\ SOAP}, publisher = {ACM}, pages = {9--17}, doi = {10.1145/3652588.3663317}, year = {2024}, } Publisher's Version |
22 authors
proc time: 7.33