A-TEST 2022 – Proceedings

Welcome from the Chairs
Welcome to the 13th edition of the International Workshop on Automating Test Case Design, Selection and Evaluation (A-TEST 2022), co-located with and organized at ESEC/FSE 2022 during two days November 17-18, 2022 in Singapore. The A-TEST workshop aims to provide a venue for researchers and industry members alike to exchange and discuss trending views, ideas, state of the art, work in progress, and scientific results on automated testing.

Experience Studies and Industrial Applications

An Agent-Based Approach to Automated Game Testing: An Experience Report
I. S. W. B. Prasetya, Fernando Pastor Ricós, Fitsum Meshesha Kifetew, Davide Prandi, Samira Shirzadehhajimahmood, Tanja E. J. Vos, Premysl Paska, Karel Hovorka, Raihana Ferdous, Angelo Susi, and Joseph Davidson
(Utrecht University, Netherlands; Universitat Politècnica de València, Spain; Fondazione Bruno Kessler, Italy; Open Universiteit, Netherlands; GoodAI, Czechia)
Computer games are very challenging to handle for traditional automated testing algorithms. In this paper we will look at intelligent agents as a solution. Agents are suitable for testing games, since they are reactive and able to reason about their environment to decide the action they want to take. This paper presents the experience of using an agent-based automated testing framework called iv4xr to test computer games. Three games will be discussed, including a sophisticated 3D game called Space Engineers. We will show how the framework can be used in different ways, either directly to drive a test agent, or as an intelligent functionality that can be driven by a traditional automated testing algorithm such as a random algorithm or a model based testing algorithm.

Publisher's Version

Automation of the Creation and Execution of System Level Hardware-in-Loop Tests through Model-Based Testing
Viktor Aronsson Karlsson, Ahmed Almasri, Eduard Paul Enoiu, Wasif Afzal, and Peter Charbachi
(Mälardalen University, Sweden; Volvo, Sweden)
In this paper, we apply model-based testing (MBT) to automate the creation of hardware-in-loop (HIL) test cases. In order to select MBT tools, different tools’ properties were compared to each other through a literature study, with the result of selecting GraphWalker and MoMuT tools to be used in an industrial case study. The results show that the generated test cases perform similarly to their manual counterparts regarding how the test cases achieved full requirements coverage. When comparing the effort needed for applying the methods, a comparable effort is required for creating the first iteration, while with every subsequent update, MBT will require less effort compared to the manual process. Both methods achieve 100% requirements coverage, and since manual tests are created and executed by humans, some requirements are favoured over others due to company demands, while MBT tests are generated randomly. In addition, a comparison between the used tools showcased the differences in the models’ design and their test case generation. The comparison showed that GraphWalker has a more straightforward design method and is better suited for smaller systems, while MoMuT can handle more complex systems but has a more involved design method.

Publisher's Version

Best Practices for Testing

Guidelines for GUI Testing Maintenance: A Linter for Test Smell Detection
Tommaso Fulcini, Giacomo Garaccione, Riccardo Coppola, Luca Ardito, and Marco Torchiano
(Politecnico di Torino, Italy)
GUI Test suites suffer from high fragility, in fact modifications or redesigns of the user interface are commonly frequent and often invalidate the tests. This leads, for both DOM- and visual-based techniques, to frequent need for careful maintenance of test suites, which can be expensive and time-consuming. The goal of this work is to present a set of guidelines to write cleaner and more robust test code, reducing the cost of maintenance and producing more understandable code. Based on the provided recommendations, a static test suite analyzer and code linter has been developed. An ad-hoc grey literature research was conducted on the state of the practice, by performing a semi-systematic literature review. Authors' experience was coded into a set of recommendations, by applying the grounded theory methodology. Based on these results, we developed a linter in the form of a plugin for Visual Studio Code, implementing 17 of the provided guidelines. The plugin highlights test smells in the Java and Javascript languages. Finally, we conducted a preliminary validation of the tool against test suites from real GitHub projects. The preliminary evaluation, meant to be an attempt of application of the plugin to real test suites, detected three main smells, namely the usage of global variables, the lack of adoption of the Page Object design pattern, and the usage of fragile locator such as the XPath.

Publisher's Version

Academic Search Engines: Constraints, Bugs, and Recommendations
Zheng Li and Austen Rainer
(Queen's University Belfast, UK)
Academic search engines (i.e., digital libraries and indexers) play an increasingly important role in systematic reviews however these engines do not seem to effectively support such reviews, e.g., researchers confront usability issues with the engines when conducting their searches. To investigate whether the usability issues are bugs (i.e., faults in the search engines) or constraints, and to provide recommendations to search-engine providers and researchers on how to tackle these issues. Using snowball-sampling from tertiary studies, we identify a set of 621 secondary studies in software engineering. By physically re-attempting the searches for all of these 621 studies, we effectively conduct regression testing for 42 search engines. We identify 13 bugs for eight engines, and also identify other constraints. We provide recommendations for tackling these issues. There is still a considerable gap between the search-needs of researchers and the usability of academic search engines. It is not clear whether search-engine developers are aware of this gap. Also, the evaluation, by academics, of academic search engines has not kept pace with the development, by search-engine providers, of those search engines. Thus, the gap between evaluation and development makes it harder to properly understand the gap between the search-needs of researchers and search-features of the search engines.

Publisher's Version

Interactive Fault Localization for Python with CharmFL
Attila Szatmári, Qusay Idrees Sarhan, and Árpád Beszédes
(University of Szeged, Hungary; University of Duhok, Iraq)
We present a plug-in called “CharmFL” for the PyCharm IDE. It employs Spectrum-based Fault Localization to automatically analyze Python programs and produces a ranked list of potentially faulty program elements (i.e., statements, functions, etc.). Our tool offers advanced features, e.g., it enables the users to give their feedback on the suspicious elements to help re-rank them, thus improving the fault localization process. The tool utilizes contextual information about program elements complementary to the spectrum data. The users can explore function call graphs during a failed test. Thus they can investigate the data flow traces of any failed test case or construct a causal inference model for the location of the fault. The tool has been used with a set of experimental use cases.

Publisher's Version

Test Automation Efficiency

KUBO: A Framework for Automated Efficacy Testing of Anti-virus Behavioral Detection with Procedure-Based Malware Emulation
Jakub Pružinec, Quynh Anh Nguyen, Adrian Baldwin, Jonathan Griffin, and Yang Liu
(HP-NTU Digital Manufacturing Corporate Lab, Singapore; HP-Labs, UK)
Traditional testing of Anti-Virus (AV) products is usually performed on a curated set of malware samples. While this approach can evaluate an AV's overall performance on known threats, it fails to provide details on the coverage of exact attack techniques used by adversaries and malware. Such coverage information is crucial in helping users understand potential attack paths formed using new code and combinations of known attack techniques.
This paper describes KUBO, a framework for systematic large-scale testing of behavioral coverage of AV software. KUBO uses a novel malware behavior emulation method to generate a large number of attacks from combinations of adversarial procedures and runs them against a set of AVs. Contrary to other emulators, our attacks are coordinated by the adversarial procedures themselves, rendering the emulated malware independent of agents and semantically coherent.
We perform an evaluation of KUBO on 7 major commercial AVs utilizing tens of distinct attack procedures and thousands of their combinations. The results demonstrate that our approach is feasible, leads to automatic large-scale evaluation, and is able to unveil a multitude of open attack paths. We show how the results can be used to assess general behavioral efficacy and efficacy with respect to individual adversarial procedures.

Publisher's Version

An Online Agent-Based Search Approach in Automated Computer Game Testing with Model Construction
Samira Shirzadehhajimahmood, I. S. W. B. Prasetya, Frank Dignum, and Mehdi Dastani
(Utrecht University, Netherlands; Umeå University, Sweden)
The complexity of computer games is ever increasing. In this setup, guiding an automated test algorithm to find a solution to solve a testing task in a game's huge interaction space is very challenging. Having a model of a system to automatically generate test cases would have a strong impact on the effectiveness and efficiency of the algorithm. However, manually constructing a model turns out to be expensive and time-consuming. In this study, we propose an online agent-based search approach to solve common testing tasks when testing computer games that also constructs a model of the system on-the-fly based on the given task, which is then exploited to solve the task. To demonstrate the efficiency of our approach, a case study is conducted using a game called Lab Recruits.

Publisher's Version

OpenGL API Call Trace Reduction with the Minimizing Delta Debugging Algorithm
Daniella Bársony
(University of Szeged, Hungary)
Debugging an application that uses a graphics API and faces a rendering error is a hard task even if we manage to record a trace of the API calls that lead to the error. Checking every call is not a feasible or scalable option, since there are potentially millions of calls in a recording. In this paper, we focus on the question of whether the number of API calls that need to be examined can be reduced by automatic techniques, and we describe how this can be achieved for the OpenGL API using the minimizing Delta Debugging algorithm. We present the results of an experiment on a real-life rendering issue, using a prototype implementation, showing a drastic reduction of the trace size (i.e. to less than 1 of the original number of calls) and positive impacts on the resource usage of the replay of the trace.

Publisher's Version

Iterating the Minimizing Delta Debugging Algorithm
Dániel Vince
(University of Szeged, Hungary)
Probably the most well-known solution to automated test case minimization is the minimizing Delta Debugging algorithm (DDMIN). It is widely used because it “just works” on any kind of input. In this paper, we focus on the fixed-point iteration of DDMIN (named DDMIN*), more specifically whether it can improve on the result of the original algorithm. We present a carefully crafted example where the output of DDMIN could be reduced further, and iterating the algorithm finds a new, smaller local optimum. Then, we evaluate the idea on a publicly available test suite. We have found that the output of DDMIN* was usually smaller than the output of DDMIN. Using characters as units of reduction, the output became smaller by 67.94% on average, and in the best case, fixed-point iteration could improve as much as 89.68% on the output size of the original algorithm.

Publisher's Version

Hands-On

Interacting with Interactive Fault Localization Tools
Ferenc Horváth, Gergő Balogh, Attila Szatmári, Qusay Idrees Sarhan, Béla Vancsics, and Árpád Beszédes
(University of Szeged, Hungary; University of Duhok, Iraq)
Spectrum-Based Fault Localization (SBFL) is one of the most popular genres of Fault Localization (FL) methods among researchers. One possibility to increase the practical usefulness of related tools is to involve interactivity between the user and the core FL algorithm. In this setting, the developer provides feedback to the fault localization algorithm while iterating through the elements suggested by the algorithm. This way, the proposed elements can be influenced in the hope to reach the faulty element earlier (we call the proposed approach Interactive Fault Localization, or iFL). With this work, we would like to propose a presentation of our recent achievements in this topic. In particular, we overview the basic approach, and the supporting tools that we implemented for the actual usage of the method in different contexts: iFL4Eclipse for Java developers using the Eclipse IDE, and CharmFL for Python developers using the PyCharm IDE. Our aim is to provide an insight into the practicalities and effectiveness of the iFL approach, while acquiring valuable feedback. In addition, with the demonstration we would like to catalyse the discussion with researchers on the topic.

Publisher's Version

A-TEST 2022 – Proceedings

13th International Workshop on Automating Test Case Design, Selection and Evaluation (A-TEST 2022)

Frontmatter

Experience Studies and Industrial Applications

Best Practices for Testing

Test Automation Efficiency

Hands-On