SEA4DQ 2022 – Proceedings

Message from the Chairs
Cyber-physical systems (CPS)/Internet of Things (IoT) are omnipresent in many industrial sectors and application domains in which the quality of the data acquired and used for decision support is a common factor. Data quality can deteriorate due to factors such as sensor faults and failures due to operating in harsh and uncertain environments. How can software engineering and artificial intelligence (AI) help manage and tame data quality issues in CPS/IoT? In this workshop, we aim to answer this question.

Keynotes

Data Quality Issues in Online Reinforcement Learning for Self-Adaptive Systems (Keynote)
Andreas Metzger
(University of Duisburg-Essen, Germany)
Online reinforcement learning is an emerging machine learning approach that addresses the challenge of design-time uncertainty faced when building self-adaptive systems. Online reinforcement learning means that the self-adaptive system can learn from data only available at run time. After introducing the fundamentals of self-adaptive systems and reinforcement learning, the keynote discusses three relevant issues and recent solutions related to data quality in online reinforcement learning for self-adaptive systems.

Publisher's Version

Data Quality and Model Under-Specification Issues (Keynote)
Foutse Khomh
(Polytechnique Montréal, Canada)
Nowadays, we are witnessing an increasing demand in both industry and academia for exploiting Deep Learning (DL) to solve complex real-world problems. However, the performance of these high-capacity learners is currently bounded by the quality and volume of their underlying training data. The use of incomplete, erroneous, or inappropriate training data, and the implementation of poor data management practices in a training pipeline often result into unreliable, biased, or under specified models. In this talk, I will report about some recent research works that we have conducted to identify best practices of data management for DL. I will also report about recent techniques and tools that we have developed to help detect the root cause of model under-specification issues early on during a DL training process.

Publisher's Version

Software Engineering and AI for Data Quality

Data Quality as a Microservice: An Ontology and Rule Based Approach for Quality Assurance of Sensor Data in Manufacturing Machines
Jørgen Stang, Dirk Walther, and Per Myrseth
(DNV, Norway)
The manufacturing industry is continuously looking for production improvements resulting in high quality production, reduced waste and competitive advantages. In this article, ontologies, semantic rule logic and microservices have been deployed to suggest a system for quality assurance of manufacturing machine data. The existing upper ontology for manufacturing service description has been used to define both the physical assets as well as the data quality requirements. The system is used to both operationalize data quality monitoring by semantic technology as well as enabling up-front modelling of data quality requirements. The approach is illustrated by a specific speed-feed case for manufacturing machines but could easily be extended to other manufacturing use-cases or even to other industries.

Publisher's Version

Effect of Time Patterns in Mining Process Invariants for Industrial Control Systems: An Experimental Study
Muhammad Azmi Umer, Aditya Mathur, and Muhammad Taha Jilani
(CodeX, Pakistan; Karachi Institute of Economics and Technology, Pakistan; Singapore University of Technology and Design, Singapore)
Machine Learning is playing a crucial role in the design of intrusion detectors for Industrial Control Systems (ICS). Intrusion Detection Systems (IDS) rely on data obtained from an operational ICS. Such datasets contain multiple time series, one for each process variable. In this work, we explore how such time series can be exploited to understand the effect of time patterns in mining the process invariants, i.e., conditions on process state variables. We use the knowledge gained through the time patterns to determine the optimal data collection size for generating the invariants. The study reported here was conducted using the operational data obtained from a water treatment plant.

Publisher's Version

Preliminary Findings on the Occurrence and Causes of Data Smells in a Real-World Business Travel Data Processing Pipeline
Valentina Golendukhina, Harald Foidl, Michael Felderer, and Rudolf Ramler
(University of Innsbruck, Austria; Software Competence Center Hagenberg, Austria)
Detection of poor quality data is crucial for enhancing data-driven systems' quality. Although there is a lot of research on data validation, the topic of potential data quality issues is still underexplored. Such latent issues or data smells can often stay undetected and lead to the poor future performance of data-intensive systems. Detecting data smells is not trivial and requires knowledge about their causes. In this paper, we present the preliminary findings on the causes and severity of data smells based on a study of a real-world business travel data set and the data processing pipeline behind it. The results show that data smells exist in this data set and cause severe problems. Although many data smells already occur in raw data, some smells are created during the transformation and enrichment stages of the data processing pipeline. These findings indicate the importance of the data pipeline itself for future research on data smells. Thus, this article proposes potential future work in this area.

Publisher's Version

Data Quality Issues for Vibration Sensors: A Case Study in Ferrosilicon Production
Maryna Waszak, Terje Moen, Sølve Eidnes, Alexander Stasik, Anders Hansen, Gregory Bouquet, Antoine Pultier, Xiang Ma, Idar Tørlen, Bjørn Henriksen, Arianeh Aamodt, and Dumitru Roman
(SINTEF, Norway; Elkem, Norway)
Digitisation in the mining and metal processing industries plays a key role in their modernisation. Production processes are more and more supported by a variety of sensors that produce large amounts of data that meant to provide insights into the performance of production infrastructures. In the metal processing industry vibration sensors are essential in the monitoring of the production infrastructure. In this position paper we present the installation of vibration sensors in a real industrial environment and discuss the data quality issues we encountered while using such sensors.

Publisher's Version

Data Quality Issues in Solar Panels Installations: A Case Study
Dumitru Roman, Antoine Pultier, Xiang Ma, Ahmet Soylu, and Alexander G. Ulyashin
(SINTEF, Norway; Oslo Metropolitan University, Norway)
Solar photovoltaics (PV) is becoming an important source of global electricity generation. Modern PV installations come with a variety of sensors attached to them for monitoring purposes (e.g., maintenance, prediction of electricity generation, etc.). Data collection (and implicitly the quality of data) from PV systems is becoming essential in this context. In this position paper, we introduce a modern PV mini power plant demo site setup for research purposes and discuss the data quality issues we encountered in operating the power plant.

Publisher's Version

SEA4DQ 2022 – Proceedings

2nd International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things (SEA4DQ 2022)

Frontmatter

Keynotes

Software Engineering and AI for Data Quality