Powered by
10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2021),
June 22, 2021,
Virtual, Canada
10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2021)
Frontmatter
Message from the Chairs
The 10th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP'21) is co-located with the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI '21).
In line with past workshops, SOAP'21 aims to bring together the members of the program analysis community to share new developments and shape new innovations in program analysis.
Session 1
Static Analysis for Dummies: Experiencing LiSA
Pietro Ferrara,
Luca Negrini, Vincenzo Arceri, and
Agostino Cortesi
(Ca' Foscari University of Venice, Italy)
Semantics-based static analysis requires a significant theoretical background before being able to design and implement a new analysis. Unfortunately, the development of even a toy static analyzer from scratch requires to implement an infrastructure (parser, control flow graphs representation, fixpoint algorithms, etc.) that is too demanding for bachelor and master students in computer science. This approach difficulty can condition the acquisition of skills on software verification which are of major importance for the design of secure systems. In this paper, we show how LiSA (Library for Static Analysis) can play a role in that respect. LiSA implements the basic infrastructure that allows a non-expert user to develop even simple analyses (e.g., dataflow and numerical non-relational domains) focusing only on the design of the appropriate representation of the property of interest and of the sound approximation of the program statements.
@InProceedings{SOAP21p1,
author = {Pietro Ferrara and Luca Negrini and Vincenzo Arceri and Agostino Cortesi},
title = {Static Analysis for Dummies: Experiencing LiSA},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {1--6},
doi = {10.1145/3460946.3464316},
year = {2021},
}
Publisher's Version
Security and Quality: Two Sides of the Same Coin?
Steven Arzt
(Fraunhofer SIT, Germany)
Poor software quality may hinder future extensions to software code. In contrast to functional bugs, such hidden issues are not immediately visible to developers and users, and the software may still be fully usable. Consequently, developers are not forced to fix these issues, not even to investigate them. Security vulnerabilities are hidden isssues as well. However, they can put systems and users’ data at risk and lead to financial losses as well as liability and fines under data protection acts. Therefore, from a risk minimization perspective, avoiding security issues may seem more critical than avoiding quality issues when dealing with limited development resources.
In this paper, we show that both types of hidden issues are correlated. Our study of more than 400 real-world Android apps shows that apps with a high number of quality issues are likely to also have a higher number of security vulnerabilities. We argue that security and quality issues should be seen as two sides of the same coin. We investigate which types of quality problems correlate with which types of security issues and give insights into potential causes.
@InProceedings{SOAP21p7,
author = {Steven Arzt},
title = {Security and Quality: Two Sides of the Same Coin?},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {7--12},
doi = {10.1145/3460946.3464315},
year = {2021},
}
Publisher's Version
Session 2
Program Analysis for Reversible Languages
Niklas Deworetzki and
Uwe Meyer
(TH Mittelhessen, Germany)
Reversible computing is a technique to “let computations run forwards and backwards” and thus extends the traditional model of computation. As an example, consider a function to compute the factorial of a given number, say 5, yielding 120. Running this program backwards inverses the function by taking 120 as an input and giving 5 as result. Reversible programming languages allow the creation of programs that can be executed backwards as well as forwards and have been a focus of research over the last decade mostly due to the work of Glück, Yokoyama, Mogensen, and many others. In this paper, we report our recent activities to perform program analysis for reversible static-single-assignment form (RSSA) and using them to perform local and global optimizations. This work is based on our compiler translating from the reversible language Janus to RSSA. As far as we know, this is the first compiler from Janus to RSSA, and no results on optimization of reversible intermediate code or programming languages are known to us either. Optimization techniques in “traditional” compilers are always based on the understanding that programs are executed forwards - in reversible languages that assumption is no longer true and program analysis becomes much more difficult. Our first results on applying our analysis methods for common-subexpression-elimination and constant propagation are nevertheless promising and have been implemented successfully.
@InProceedings{SOAP21p13,
author = {Niklas Deworetzki and Uwe Meyer},
title = {Program Analysis for Reversible Languages},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {13--18},
doi = {10.1145/3460946.3464314},
year = {2021},
}
Publisher's Version
PerfLens: A Data-Driven Performance Bug Detection and Fix Platform
Spandan Garg, Roshanak Zilouchian Moghaddam,
Neel Sundaresan, and Chen Wu
(Microsoft, Canada; Microsoft, USA; Microsoft, China)
The wealth of open-source software development artifacts available online creates a great opportunity to learn the patterns of performance improvements from data. In this paper, we present a data-driven approach to software performance improvement in C#. We first compile a large dataset of hundreds of performance improvements made in open source projects. We then leverage this data to build a tool called PerfLens for performance improvement recommendations via code search. PerfLens indexes the performance improvements, takes a codebase as an input and searches a pool of performance improvements for similar code. We show that when our system is further augmented with profiler data information our recommendations are more accurate. Our experiments show that PerfLens can suggest performance improvements with 90% accuracy when profiler data is available and 55% accuracy when it analyzes source code only.
@InProceedings{SOAP21p19,
author = {Spandan Garg and Roshanak Zilouchian Moghaddam and Neel Sundaresan and Chen Wu},
title = {PerfLens: A Data-Driven Performance Bug Detection and Fix Platform},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {19--24},
doi = {10.1145/3460946.3464318},
year = {2021},
}
Publisher's Version
Weldr: Fusing Binaries for Simplified Analysis
Alexander Heinricher,
Ryan Williams, Ava Klingbeil, and Alex Jordan
(Raytheon BBN Technologies, USA)
Modern binary analysis tools are geared primarily towards analyzing the behavior of a single binary in isolation. Frequently, they make overly conservative assumptions when faced with dynamic libraries and inter-process communication. Failure in both stem from the same source: the code involved is extremely complicated and tends to rely on kernel functionality invisible from userspace, which is exceptionally complicated to model.
In this work, we introduce Weldr, a tool that uses static linking to bring an entire distributed system within the scope of a single-binary analyzer to mitigate these shortcomings. Instead of forcing a single-binary analyzer to ingest multiple binaries, Weldr rebuilds (”welds”) all programs in the system into a single, monolithic binary. This provides a high-fidelity, analyzer-friendly model with critical library functionality replaced. By doing this, tracing data flows across an IPC-linked system is reduced to tracing data flows between memory buffers in a single address space, which is a problem to which existing tools can be applied. We tested both a comprehensive networking-based example and a set of real-world IoT middleware protocols and demonstrated in each case that the welded version of the system can be tested using the popular AFL fuzzer and other state-of-the-art analysis tools, whereas the original versions cannot. Fuzz testing the welded binary resulted in up to 12.2% more code coverage than running unit tests on the original multi-binary system. Through a comparison with gcc and clang, we show that Weldr incurs a minimal compilation overhead, making Weldr practical to use.
@InProceedings{SOAP21p25,
author = {Alexander Heinricher and Ryan Williams and Ava Klingbeil and Alex Jordan},
title = {Weldr: Fusing Binaries for Simplified Analysis},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {25--30},
doi = {10.1145/3460946.3464320},
year = {2021},
}
Publisher's Version
Session 3
Multi-language Static Code Analysis on the LARA Framework
Gil Teixeira,
João Bispo, and
Filipe F. Correia
(University of Porto, Portugal; INESC TEC, Portugal)
We propose a mechanism to raise the abstraction level of source-code analysis and robustly support multiple languages. Built on top of the LARA framework, it allows sharing language specifications between LARA source-to-source compilers, and enables the mapping of a virtual AST over the nodes of ASTs provided by different, unrelated parsers.
We use this approach to create a language specification for Object-Oriented (OO) languages and add support for three different LARA compilers. We evaluate it by implementing a library of 18 software metrics using this language specification and apply the metrics to source code in four programming languages (C, C++, Java, and JavaScript). We compare the results with other tools to evaluate the approach.
@InProceedings{SOAP21p31,
author = {Gil Teixeira and João Bispo and Filipe F. Correia},
title = {Multi-language Static Code Analysis on the LARA Framework},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {31--36},
doi = {10.1145/3460946.3464317},
year = {2021},
}
Publisher's Version
Serialization-Aware Call Graph Construction
Joanna C. S. Santos, Reese A. Jones,
Chinomso Ashiogwu, and
Mehdi Mirakhorli
(Rochester Institute of Technology, USA; University of Maryland, USA)
Although call graphs are crucial for inter-procedural analyses, it is challenging to statically compute them for programs with dynamic features. Prior work focused on supporting certain kinds of dynamic features, but serialization-related features are still not very well supported. Therefore, we introduce Salsa, an approach to complement existing points-to analysis with respect to serialization-related features to enhance the call graph’s soundness while not greatly affecting its precision. We evaluate Salsa’s soundness, precision, and performance using 9 programs from the Java Call graph Assessment & Test Suite (CATS) and 4 programs from the XCorpus dataset. We compared Salsa against off-the-shelf call graph construction algorithms available on Soot, Doop, WALA, and OPAL. Our experiments showed that Salsa improved call graphs’ soundness while not greatly affecting their precision. We also observed that Salsa did not incur an extra overhead on the underlying pointer analysis method.
@InProceedings{SOAP21p37,
author = {Joanna C. S. Santos and Reese A. Jones and Chinomso Ashiogwu and Mehdi Mirakhorli},
title = {Serialization-Aware Call Graph Construction},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {37--42},
doi = {10.1145/3460946.3464319},
year = {2021},
}
Publisher's Version
Info
Scalable String Analysis: An Experience Report
Kostyantyn Vorobyov,
Yang Zhao, and
Padmanabhan Krishnan
(Oracle Labs, Australia)
In this paper we present OLSA - a tool for scalable static string analysis of Java programs. OLSA is based on intra-procedural string value flow graphs connected via call-graph edges. Formally, this uses a context-sensitive grammar to generate the set of possible strings. The analysis is focused on scalability and is thus not sound. This trade off is acceptable in the context of bug-finding in large web applications.
We evaluate our approach by using OLSA to detect SQL injections and unsafe use of reflection in DaCapo benchmarks and a large internal Java codebase and compare the performance of OLSA with JSA, one of state-of-the-art string analysers. The results indicate that OLSA can analyse industrial-scale codebases in a matter of hours, whereas JSA does not scale to many DaCapo programs. The set of potential strings generated by our string analysis can be used for checking the validity of the reported potential vulnerabilities.
@InProceedings{SOAP21p43,
author = {Kostyantyn Vorobyov and Yang Zhao and Padmanabhan Krishnan},
title = {Scalable String Analysis: An Experience Report},
booktitle = {Proc.\ SOAP},
publisher = {ACM},
pages = {43--48},
doi = {10.1145/3460946.3464321},
year = {2021},
}
Publisher's Version
proc time: 1.78