Powered by
1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S 2022),
November 18, 2022,
Singapore, Singapore
1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S 2022)
Frontmatter
Welcome from the Chairs
On behalf of the Program Committee, we are pleased to present the proceedings of the 1st International Workshop on Mining Software Repositories for Privacy and Security (MSR4P&S 2022). MSR4P&S is co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). This year, because of the Covid-19 pandemic, MSR4P&S (as part of ESEC/FSE) is held virtually with an adapted program that will bring together international researchers to exchange ideas, share experiences, investigate problems, and propose promising solutions concerning the application of Mining Software Repositories (MSR) to investigate the different stages of privacy and security. The workshop topics cover a wide range of MSR applications for cybersecurity research, including empirical and mixed-method approaches, as well as datasets and tools.
Keynote
Mining Software Repositories for Security: Data Quality Issues Lessons from Trenches (Keynote)
Muhammad Ali Babar
(University of Adelaide, Australia)
Software repositories are an attractive source of data for understanding the burning security issues challenging developers, anecdotal solutions, and building AI/ML-based models and tools. That is why there is exponential growth in the literature based on mining software repositories for software security. While the abundance of freely available data for research is a fortune, the data quality issues can make software repositories minefields capable of blowing any time and effort budget for a project. Our group has been active in this area for the last few years to develop knowledge, understanding, and tools for improving software security by mining repositories. Through a mix of successful and failed efforts, we have experienced firsthand what is called “garbage in, garbage out” due to poor data quality. Without fully appreciating the data quality issues, starting a data-driven software security project can be frustrating and disheartening for a research team. We believe engaging the relevant stakeholders in developing and sharing knowledge and technologies to improve software security data quality is crucial. To this end, we are not only systematically identifying and synthesizing the existing empirical literature on improving data quality but also devising innovative solutions for addressing the data quality challenges while mining software repositories for software security. This talk will draw lessons and recommendations from our efforts of systematically reviewing the state-of-the-art and developing solutions for improving data quality while building knowledge, understanding, and tools for supporting software security. The talk will use a selected set of our studies to demonstrate the concrete cases of the challenges faced and the used workarounds to successfully continue our journey of learning and improving in this line of research and practice.
@InProceedings{MSR4P&S22p1,
author = {Muhammad Ali Babar},
title = {Mining Software Repositories for Security: Data Quality Issues Lessons from Trenches (Keynote)},
booktitle = {Proc.\ MSR4P&S},
publisher = {ACM},
pages = {1--1},
doi = {10.1145/3549035.3570192},
year = {2022},
}
Publisher's Version
Assessing Privacy
Mining Software Repositories for Patternizing Attack-and-Defense Co-Evolution
Samiha Shimmi and Mona Rahimi
(Northern Illinois University, USA)
Several evidence indicates that malicious cyber actors learn, adapt, or, in other words, react to the defensive measures put into place by the cybersecurity community, as much as system defenders
react to attacks. To this end, this research aims to mine the existing software repositories to document patterns of co-evolution, which appear between the cyber attacker and defender, as attack-and defend adaptations, for the purpose of determining the probability of attackers’ responsive actions.
@InProceedings{MSR4P&S22p2,
author = {Samiha Shimmi and Mona Rahimi},
title = {Mining Software Repositories for Patternizing Attack-and-Defense Co-Evolution},
booktitle = {Proc.\ MSR4P&S},
publisher = {ACM},
pages = {2--6},
doi = {10.1145/3549035.3561181},
year = {2022},
}
Publisher's Version
Assessing Software Privacy using the Privacy Flow-Graph
Feiyang Tang and
Bjarte M. Østvold
(Norwegian Computing Center, Norway)
We increasingly rely on digital services and the conveniences they provide. Processing of personal data is integral to such services and thus privacy and data protection are a growing concern, and governments have responded with regulations such as the EU's GDPR. Following this, organisations that make software have legal obligations to document the privacy and data protection of their software. This work must involve both software developers that understand the code and the organisation's data protection officer or legal department that understands privacy and the requirements of a Data Protection and Impact Assessment (DPIA).
To help developers and non-technical people such as lawyers document the privacy and data protection behaviour of software, we have developed an automatic software analysis technique. This technique is based on static program analysis to characterise the flow of privacy-related data. The results of the analysis can be presented as a graph of privacy flows and operations---that is understandable also for non-technical people. We argue that our technique facilitates collaboration between technical and non-technical people in documenting the privacy behaviour of the software. We explain how to use the results produced by our technique to answer a series of privacy-relevant questions needed for a DPIA. To illustrate our work, we show both detailed and abstract analysis results from applying our analysis technique to the secure messaging service Signal and to the client of the cloud service NextCloud and show how their privacy flow-graphs inform the writing of a DPIA.
@InProceedings{MSR4P&S22p7,
author = {Feiyang Tang and Bjarte M. Østvold},
title = {Assessing Software Privacy using the Privacy Flow-Graph},
booktitle = {Proc.\ MSR4P&S},
publisher = {ACM},
pages = {7--15},
doi = {10.1145/3549035.3561185},
year = {2022},
}
Publisher's Version
Vulnerabilities
An Exploratory Study on the Relationship of Smells and Design Issues with Software Vulnerabilities
Sahrima Jannat Oishwee,
Zadia Codabux, and
Natalia Stakhanova
(University of Saskatchewan, Canada)
Software vulnerabilities are one of the leading causes of the
loss of confidential data resulting in financial damages in the
industry. As a result, software companies strive to discover potential
vulnerabilities before the software is deployed. While traditionally,
software metrics have been widely used to uncover vulnerabilities,
more recent studies have been looking at code smells to detect
vulnerabilities. This preliminary study explores the relationship
between smells, design issues, and software vulnerabilities. As
smells and design issues are indicators of potential problems in
the software, establishing a relationship with vulnerabilities can
be helpful for vulnerability prediction. In this study, we analyzed
561 versions of nine open-source software by exploring the smells
and design issues in the vulnerable and non-vulnerable classes.
We found that some smells and design issues have a statistically
significant relationship with the vulnerable classes. However, after a
manual analysis of the code segments containing the vulnerabilities,
we found no indication that smells or design issues induce the
vulnerabilities. In fact, they were still present in those code
segments even after the vulnerabilities were resolved.
@InProceedings{MSR4P&S22p16,
author = {Sahrima Jannat Oishwee and Zadia Codabux and Natalia Stakhanova},
title = {An Exploratory Study on the Relationship of Smells and Design Issues with Software Vulnerabilities},
booktitle = {Proc.\ MSR4P&S},
publisher = {ACM},
pages = {16--20},
doi = {10.1145/3549035.3561182},
year = {2022},
}
Publisher's Version
Counterfeit Object-Oriented Programming Vulnerabilities: An Empirical Study in Java
Joanna C. S. Santos,
Xueling Zhang, and
Mehdi Mirakhorli
(University of Notre Dame, USA; Rochester Institute of Technology, USA)
Many modern applications rely on Object-Oriented (OO) design principles, where the basic system components are objects and classes. They share objects with other processes, store them in disk/files for future retrieval or transport them over network to other systems. Object-oriented programs leverage numerous dynamic features and design principles such as runtime dispatching and object-oriented callbacks which allow flexible software design. Although seemingly innocuous, these features can be abused by the attackers to hijack the program's control flow to an undesirable behavior. This is referred to as Counterfeit Object-Oriented Programming (COOP), in which attackers hijack objects in the program in order to create a sequence of method calls that introduce a malicious behavior. COOP is a type of code reuse attack in which a hacker hijacks objects (gadgets) in the program and use that to control the program execution flow via manipulating the sequence of methods and data being passed among these methods (gadget chains). In this paper, we describe a preliminary empirical investigation of COOP attacks in real software systems caused by untrusted object deserialization. In this preliminary study, we investigated the severity of these attacks, their consequences, and how they were mitigated by developers. Furthermore, we used the findings to create a dataset of vulnerable software projects and their fixes.
@InProceedings{MSR4P&S22p21,
author = {Joanna C. S. Santos and Xueling Zhang and Mehdi Mirakhorli},
title = {Counterfeit Object-Oriented Programming Vulnerabilities: An Empirical Study in Java},
booktitle = {Proc.\ MSR4P&S},
publisher = {ACM},
pages = {21--28},
doi = {10.1145/3549035.3561183},
year = {2022},
}
Publisher's Version
SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques
Mohammed Latif Siddiq and
Joanna C. S. Santos
(University of Notre Dame, USA)
Automated source code generation is currently a popular machine-learning-based task. It can be helpful for software developers to write functionally correct code from a given context. However, just like human developers, a code generation model can produce vulnerable code, which the developers can mistakenly use. For this reason, evaluating the security of a code generation model is a must. In this paper, we describe SecurityEval, an evaluation dataset to fulfill this purpose. It contains 130 samples for 75 vulnerability types, which are mapped to the Common Weakness Enumeration (CWE). We also demonstrate using our dataset to evaluate one open-source (i.e., InCoder) and one closed-source code generation model (i.e., GitHub Copilot).
@InProceedings{MSR4P&S22p29,
author = {Mohammed Latif Siddiq and Joanna C. S. Santos},
title = {SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques},
booktitle = {Proc.\ MSR4P&S},
publisher = {ACM},
pages = {29--33},
doi = {10.1145/3549035.3561184},
year = {2022},
}
Publisher's Version
proc time: 1.34