ESEC/FSE 2023 CoLos
31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023)
Powered by
Conference Publishing Consulting

1st International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components (SE4SafeML 2023), December 4, 2023, San Francisco, CA, USA

SE4SafeML 2023 – Proceedings

Contents - Abstracts - Authors

1st International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components (SE4SafeML 2023)


Title Page

Welcome from the Chairs
Welcome to the First Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components (SE4SafeML) co-located with ESEC/FSE 2023.

SE4SafeML 2023 Organization


Rule-Based Testing of Neural Networks
Muhammad Usman ORCID logo, Youcheng Sun ORCID logo, Divya Gopinath ORCID logo, and Corina S. Păsăreanu ORCID logo
(University of Texas, USA; University of Manchester, UK; KBR @ NASA Ames Research Center, USA; Carnegie Mellon University, USA)
Adequate testing of deep neural networks (DNNs) is challenging due to lack of formal requirements and specifications of functionality. In this work, we aim to improve DNN testing by addressing this central challenge. The core idea is to drive testing of DNNs from rules abstracting the network behavior. These rules are automatically extracted from a trained model based on monitoring its neuron values when running on a set of labeled data, and are validated on a separate test set. We show how these rules can be leveraged to improve fundamental testing activities, such as generating test oracles and supporting testing coverage with semantic meaning.

Publisher's Version
FedDefender: Backdoor Attack Defense in Federated Learning
Waris Gill ORCID logo, Ali Anwar ORCID logo, and Muhammad Ali Gulzar ORCID logo
(Virginia Tech, USA; University of Minnesota, USA)
Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance.

Publisher's Version
MLGuard: Defend Your Machine Learning Model!
Sheng Wong ORCID logo, Scott Barnett ORCID logo, Jessica Rivera-Villicana ORCID logo, Anj Simmons ORCID logo, Hala Abdelkader ORCID logo, Jean-Guy Schneider ORCID logo, and Rajesh Vasa ORCID logo
(Deakin University, Australia; RMIT University, Australia; Monash University, Australia)
Machine Learning (ML) is used in critical highly regulated and high-stakes fields such as finance, medicine, and transportation. The correctness of these ML applications is important for human safety and economic benefit. Progress has been made on improving ML testing and monitoring of ML. However, these approaches do not provide i) pre/post conditions to handle uncertainty, ii) defining corrective actions based on probabilistic outcomes, or iii) continual verification during system operation. In this paper, we propose MLGuard, a new approach to specify contracts for ML applications. Our approach consists of a) an ML contract specification defining pre/post conditions, invariants, and altering behaviours, b) generated validation models to determine the probability of contract violation, and c) an ML wrapper generator to enforce the contract and respond to violations. Our work is intended to provide the overarching framework required for building ML applications and monitoring their safety.

Publisher's Version
Interpretable On-the-Fly Repair of Deep Neural Classifiers
Hossein Mohasel Arjomandi ORCID logo and Reyhaneh Jabbarvand ORCID logo
(University of Illinois at Urbana-Champaign, USA)
Deep neural networks (DNNs) are vital in safety-critical systems but remain imperfect, leading to misclassification post-deployment. Prior works either make the model abstain from predicting in uncertain cases and so reduce its overall accuracy, or suffer from being uninterpretable. To overcome the limitations of prior work, we propose an interpretable approach to repair misclassifications after model deployment, instead of discarding them, by reducing the multi-classification problem into a simple binary classification. Our proposed technique specifically targets the predictions that the model is uncertain about them, extracts the training data that is positively and negatively incorporated into those uncertain decisions, and uses them to repair the cases where uncertainty leads to misclassification. We evaluate our approach on MNIST. The preliminary results show that our technique can repair 10.7% of the misclassifications on average, improving the performance of the models, and motivating the applicability of on-the-fly repair for more complex classifiers and different modalities.

Publisher's Version
Towards Safe ML-Based Systems in Presence of Feedback Loops
Sumon Biswas ORCID logo, Yining She ORCID logo, and Eunsuk Kang ORCID logo
(Carnegie Mellon University, USA)
Machine learning (ML) based software is increasingly being deployed in a myriad of socio-technical systems, such as drug monitoring, loan lending, and predictive policing. Although not commonly considered safety-critical, these systems have a potential to cause serious, long-lasting harm to users and the environment due to their close proximity and effect on the society. One type of emerging problem in these systems is unintended side effects from a feedback loop; the decision of ML-based system induces certain changes in the environment, which, in turn, generates observations that are fed back into the system for further decision-making. When this cyclic interaction between the system and the environment repeats over time, its effect may be amplified and ultimately result in an undesirable. In this position paper, we bring attention to the safety risks that are introduced by feedback loops in ML-based systems, and the challenges of identifying and addressing them. In particular, due to their gradual and long-term impact, we argue that feedback loops are difficult to detect and diagnose using existing techniques in software engineering. We propose a set of research problems in modeling, analyzing, and testing ML-based systems to identify, monitor, and mitigate the effects of an undesirable feedback loop.

Publisher's Version
The Case for Scalable Quantitative Neural Network Analysis
Mara Downing ORCID logo and Tevfik BultanORCID logo
(University of California at Santa Barbara, USA)
Neural networks are an increasingly common tool for solving problems that require complex analysis and pattern matching, such as identifying stop signs in a self driving car or processing medical imagery during diagnosis. Accordingly, verification of neural networks for safety and correctness is of great importance, as mispredictions can have catastrophic results in safety critical domains. As neural networks are known to be sensitive to small changes in input, leading to vulnerabilities and adversarial attacks, analyzing the robustness of networks to small changes in input is a key piece of evaluating their safety and correctness. However, there are many real-world scenarios where the requirements of robustness are not clear cut, and it is crucial to develop measures that assess the level of robustness of a given neural network model and compare levels of robustness across different models, rather than using a binary characterization such as robust vs. not robust.
We believe there is great need for developing scalable quantitative robustness verification techniques for neural networks. Formal verification techniques can provide guarantees of correctness, but most existing approaches do not provide quantitative robustness measures and are not effective in analyzing real-world network sizes. On the other hand, sampling-based quantitative robustness is not hindered much by the size of networks but cannot provide sound guarantees of quantitative results. We believe more research is needed to address the limitations of both symbolic and sampling-based verification approaches and create sound, scalable techniques for quantitative robustness verification of neural networks.

Publisher's Version

proc time: 2.37