2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2017), September 4–8, 2017, Paderborn, Germany

Desktop Layout

Handling Failures
Research Papers
S3, Chair: Fabrizio Pastore
Bayesian Specification Learning for Finding API Usage Errors
Vijayaraghavan Murali, Swarat Chaudhuri, and Chris Jermaine
(Rice University, USA)
Publisher's Version
Abstract: We present a Bayesian framework for learning probabilistic specifications from large, unstructured code corpora, and then using these specifications to statically detect anomalous, hence likely buggy, program behavior. Our key insight is to build a statistical model that correlates all specifications hidden inside a corpus with the syntax and observed behavior of programs that implement these specifications. During the analysis of a particular program, this model is conditioned into a posterior distribution that prioritizes specifications that are relevant to the program. The problem of finding anomalies is now framed quantitatively, as a problem of computing a distance between a "reference distribution" over program behaviors that our model expects from the program, and the distribution over behaviors that the program actually produces. We implement our ideas in a system, called Salento, for finding anomalous API usage in Android programs. Salento learns specifications using a combination of a topic model and a neural network model. Our encouraging experimental results show that the system can automatically discover subtle errors in Android applications in the wild, and has high precision and recall compared to competing probabilistic approaches.


Time stamp: 2020-01-24T14:15:48+01:00