PROMISE 2024 – Proceedings

The Ever-Evolving Promises of Data in Software Ecosystems: Models, AI, and Analytics (Keynote)
Raula Gaikovina Kula

(Nara Institute of Science and Technology, Japan)
The year 2024 has sparked extensive discussions about the future of software engineering research, particularly for library dependencies and the software ecosystems they create. In this talk, I will take you on an experiential journey spanning the last decade, beginning in 2013 when I first embarked on my journey, and finally landing in the era of generative AI and augmented reality. We will explore how the landscape of collecting datasets through mining, user studies, and expanding from 3 systems to 3 million systems has evolved, examine what elements have remained constant, and discuss how we can advance with software ecosystems research in the face of these innovations.

Publisher's Version

Papers

Graph Neural Network vs. Large Language Model: A Comparative Analysis for Bug Report Priority and Severity Prediction
Jagrit Acharya

and Gouri Ginde

(University of Calgary, Canada)
A vast number of incoming bug reports demand effective methods to identify priority and severity for bug triaging. With increased technological advancement, machine learning and deep learning have been extensively examined to address this problem. Although Large Language Models (LLMs) such as Fine-tuned BERT (early generation LLM) have proven to capture context in the underlying textual data, severity and priority prediction demand additional features for understanding the relationships with other bug reports. This work utilizes the graph-based approach to model the bug reports and their other attributes, such as component, product and bug type information. It utilizes the relational intelligence of Graph Neural Network (GNN) to address the prioritization and severity assessment of bug reports in the Bugzilla bug tracking system. Initial tests on the Mozilla project dataset indicate that a project-wise predictive approach using GNNs yields higher accuracy in determining the priority and severity of bug reports compared to LLMs across multiple Mozilla projects, contributing to a notable advancement in the automation of bug severity and priority prediction tasks. Specifically, GNNs demonstrated a remarkable improvement over LLMs, increasing the priority prediction accuracy by 37% & 30% and severity prediction accuracy by 43% & 30% for Core and Firefox projects, respectively. Overall, GNN outperformed the Fine-tuned BERT (LLM) in predicting priority and severity for all the Mozilla projects.

Publisher's Version

Smarter Project Selection for Software Engineering Research
Tapajit Dey

, Jonathan Loungani

, and James Ivers

(Carnegie Mellon University, USA)
Open Source Software (OSS) hosting platforms like GitHub also contain many non-software projects that should be excluded from the dataset for most software engineering research studies. However, due to the lack of obvious indicators, researchers have to spend considerable manual effort to find suitable projects or rely on convenience sampling or heuristics for selecting projects for their research. Moreover, the diverse nature of OSS projects often poses further challenges in selecting projects aligned with study objectives, especially when the study intends to identify projects based on semantic information like intended use, which is not easy to discern solely based on the project characteristics that are available through the search APIs like GitHub's. Our goals are to establish a robust method of identifying software projects from the population of repositories hosted in social coding platforms and to categorize the software projects based on who the target users are and how those projects are meant to be used. Using data from 35,621 projects in the World of Code dataset, we employed a combination of machine learning techniques, including Doc2Vec and Random Forest, to identify the software projects and to categorize them as standalone applications, libraries, or plug-ins. Furthermore, our findings highlight the risks of selecting projects solely based on filtering by commonly used project criteria like the number of contributors, commits, or stars as even after using similar filtering, 16.6% of projects were found to be non-software projects. Our research should aid software engineering researchers in project selection, benefiting both industry and academia. We also envision our work inspiring further research in this domain.

Publisher's Version

Published Artifact

Artifacts Available

Sociotechnical Dynamics in Open Source Smart Contract Repositories: An Exploratory Data Analysis of Curated High Market Value Projects
Saori Costa

, Matheus Paixao

, Igor Steinmacher

, Pamella Soares

, Allysson Allex Araújo

, and Jerffeson Souza

(State University of Ceará, Brazil; Northern Arizona University, USA; Federal University of Cariri, Brazil)
Blockchain and Smart Contracts (SCs) have emerged as a promising avenue for organizations looking to innovate. Similar to other fields of software engineering, collaborative platforms, such as GitHub, are gaining attention in SCs development. Moreover, public blockchain platforms, such as Ethereum, commonly serve as a medium to deploy SCs. This ecosystem serves as the basis on which the sociotechnical phenomenon of SC development emerges. Despite the growth of research regarding SCs, there is a gap in understanding the sociotechnical factors involved in their development, specially the ones with high market value. To address this issue, we leveraged Sociotechnical Theory and Data Analysis to investigate the sociotechnical dynamics in open source repositories of SCs deployed on Ethereum. To ensure suitability for our analysis, we curated a list of 16 high market value SCs deployed on Ethereum. Our research yielded four primary analyses. First, we unveiled how collaboration aspects are impacted by the deployment of SCs. Second, we explored the characteristics of contributors participating in these projects. Third, we looked into commit messages to categorize commonly performed software changes. Fourth, we investigated the relationship between market metrics and SC evolution. These analyses help to deepen the understanding of sociotechnical dynamics within SC repositories, assisting organizations in designing better strategies to support their development efforts.

Publisher's Version

Info

A Curated Solidity Smart Contracts Repository of Metrics and Vulnerability
Giacomo Ibba

, Sabrina Aufiero

, Rumyana Neykova

, Silvia Bartolucci

, Marco Ortu

, Roberto Tonelli

, and Giuseppe Destefanis

(University of Cagliari, Italy; University College London, United Kingdom; Brunel University, United Kingdom)
Smart contracts (SCs) significance and popularity increased exponentially with the escalation of decentralised applications (dApps), which revolutionised programming paradigms where network controls rest within a central authority. Since SCs constitute the core of such applications, developing and deploying contracts without vulnerability issues become key to improve dApps robustness to external attacks. This paper introduces a dataset that combines smart contract metrics with vulnerability data identified using Slither, a leading static analysis tool proficient in detecting a wide spectrum of vulnerabilities. Our primary goal is to provide a resource for the community that supports exploratory analysis, such as investigating the relationship between contract metrics and vulnerability occurrences. Further, we discuss the potential of this dataset for the development and validation of predictive models aimed at identifying vulnerabilities, thereby contributing to the enhancement of smart contract security. Through this dataset, we invite researchers and practitioners to study the dynamics of smart contract vulnerabilities, fostering advancements in detection methods and ultimately, fortifying the resilience of smart contracts.

Publisher's Version

MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery
Jafar Akhoundali

, Sajad Rahim Nouri

, Kristian Rietveld

, and Olga Gadyatskaya

(Leiden University, Netherlands; Islamic Azad University of Ramsar, Iran)
Vulnerability datasets have become an important instrument in software security research, being used to develop automated, machine learning-based vulnerability detection and patching approaches. Yet, any limitations of these datasets may translate into inadequate performance of the developed solutions. For example, the limited size of a vulnerability dataset may restrict the applicability of deep learning techniques. In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 26,617 unique CVEs coming from 6,945 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 31,883 unique commits that fixed those vulnerabilities. Compared to prior work, our dataset brings about a 397% increase in CVEs, a 295% increase in covered open-source projects, and a 480% increase in commit fixes. Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We release to the community a 14GB PostgreSQL database that contains information on CVEs up to January 24, 2024, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

Publisher's Version

Published Artifact

Artifacts Available

Prioritising GitHub Priority Labels
James Caddy

and Christoph Treude

(University of Adelaide, Australia; Singapore Management University, Singapore)
Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.

Publisher's Version

Published Artifact

Info

Artifacts Available

Predicting Fairness of ML Software Configurations
Salvador Robles Herrera

, Verya Monjezi

, Vladik Kreinovich

, Ashutosh Trivedi

, and Saeid Tizpaz-Niari

(University of Texas at El Paso, USA; University of Colorado Boulder, USA)
This paper investigates the relationships between hyperparameters of machine learning and fairness. Data-driven solutions are increasingly used in critical socio-technical applications where ensuring fairness is important. Rather than explicitly encoding decision logic via control and data structures, the ML developers provide input data, perform some pre-processing, choose ML algorithms, and tune hyperparameters (HPs) to infer a program that encodes the decision logic. Prior works report that the selection of HPs can significantly influence fairness. However, tuning HPs to find an ideal trade-off between accuracy, precision, and fairness has remained an expensive and tedious task. Can we predict the fairness of HP configuration for a given dataset? Are the predictions robust to distribution shifts? We focus on group fairness notions and investigate the HP space of 5 training algorithms. We first find that tree regressors and XGBoots significantly outperformed deep neural networks and support vector machines in accurately predicting the fairness of HPs. When predicting the fairness of ML hyperparameters under temporal distribution shift, the tree regressors outperform the other algorithms with reasonable accuracy. However, the precision depends on the ML training algorithm, dataset, and protected attributes. For example, the tree regressor model was robust for training data shift from 2014 to 2018 on logistic regression and discriminant analysis HPs with sex as the protected attribute; but not for race and other training algorithms. Our method provides a sound framework to efficiently perform fine-tuning of ML training algorithms and understand the relationships between HPs and fairness.

Publisher's Version

PROMISE 2024 – Proceedings

20th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2024)

Frontmatter

Keynote

Papers