FSE 2016 Workshops
24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2016)
Powered by
Conference Publishing Consulting

7th International Workshop on Automating Test Case Design, Selection, and Evaluation (A-TEST 2016), November 18, 2016, Seattle, WA, USA

A-TEST 2016 – Proceedings

Contents - Abstracts - Authors
Twitter: https://twitter.com/FSEconf

7th International Workshop on Automating Test Case Design, Selection, and Evaluation (A-TEST 2016)

Frontmatter

Title Page

Message from the Chairs
Software testing is at the moment the most important and mostly used quality assurance technique applied in industry. Considering the activities that make up the testing life-cycle, test case design, selection, and evaluation are the activities that determine the quality and effectiveness of the whole testing process. These are, however, the most difficult, time-consuming and error-prone activities during testing --much of these are still carried out manually, and the quality of the resulting tests is sometimes low since they fail to find important errors in the system. A-TEST workshop aims to provide a venue for researchers as well as the industry to exchange and discuss trending views, ideas, state of the art work in progress, and scientific results on topics such as techniques and tools for automating test case design and selection, e.g. model-based, combinatorial-based, search-based, symbolic-based, or property-based approaches; test suite optimisation; test evaluation and metrics; testing in emerging domains, e.g. Social Network, Cloud, Games, and Cyber Physical Systems; and real world case studies.

Session 1

Multilevel Coarse-to-Fine-Grained Prioritization for GUI and Web Applications
Dmitry Nurmuradov, Renée Bryce, and Hyunsook Do
(University of North Texas, USA)
This work demonstrates that the use of one criterion for test suite prioritization may lead to high variability of fault detection rates due to random tie-breaking. The paper provides motivational examples of how a single fine-grained or coarse criterion may lead to poor code coverage or fault finding efficiency. We use a multilevel coarse-to-fine-grained two-way prioritization method to address the issues and evaluate the technique in an empirical study by comparing fault finding effectiveness and its variability to single-criterion methods. The results indicate that the proposed method decreases tie-breaking instability of the fault detection rate and often increases the overall performance of test suite prioritization methods.
Publisher's Version Article Search
EventFlowSlicer: Goal Based Test Generation for Graphical User Interfaces
Jonathan Saddler and Myra B. Cohen
(University of Nebraska-Lincoln, USA)

Automated test generation techniques for graphical user interfaces include model-based approaches that generate tests from a graph or state machine model, capture-replay methods that require the user to demonstrate each test case, and pattern-based approaches that provide templates for abstract test cases. There has been little work, however, in automated goal-based testing, where the goal is a realistic user task, a function, or an abstract behavior. Recent work in human performance regression testing has shown that there is a need for generating multiple test cases that execute the same user task in different ways, however that work does not have an efficient way to generate tests and only a single type of goal has been considered.

In this paper we expand the notion of goal based interface testing to generate tests for a variety of goals. We develop a direct test generation technique, EventFlowSlicer, that is more efficient than that used in human performance regression testing, reducing run times by 92.5


Publisher's Version Article Search Info
PredSym: Estimating Software Testing Budget for a Bug-Free Release
Arnamoy Bhattacharyya and Timur Malgazhdarov
(University of Toronto, Canada)

Symbolic execution tools are widely used during a software testing phase for finding hidden bugs and software vulnerabilities. Accurately predicting the time required by a symbolic execution tool to explore a chosen code coverage helps in planning the budget required in the testing phase. In this work, we present an automatic tool, PredSym, that uses static program features to predict the coverage explored by a symbolic execution tool – KLEE, for a given time budget and to predict the time required to explore a given coverage. PredSym uses LASSO regression to build a model that does not suffer from overfitting and can predict both the coverage and the time with a worst error of 10% on unseen datapoints. PredSym also gives code improvement suggestions based on a heuristic for improving the coverage generated by KLEE.


Publisher's Version Article Search

Session 2

The Complementary Aspect of Automatically and Manually Generated Test Case Sets
Auri M. R. Vincenzi, Tiago Bachiega, Daniel G. de Oliveira, Simone R. S. de Souza, and José C. Maldonado
(Federal University of São Carlos, Brazil; Federal University of Goiás, Brazil; University of São Paulo, Brazil)

The test is a mandatory activity for software quality assurance. The knowledge about the software under testing is necessary to generate high-quality test cases, but to execute more than 80% of its source code is not an easy task, and demands an in-depth knowledge of the business rules it implements. In this article, we investigate the adequacy, effectiveness, and cost of manually generated test sets versus automatically generated test sets for Java programs. We observed that, in general, manual test sets determine higher statement coverage and mutation score than automatically generated test sets. But one interesting aspect recognized is that the automatically generated test sets are complementary to the manual test set. When we combined manual with automated test sets, the resultant test sets overcame in more that 10%, on average, statement coverage and mutation score when compared to the rates of manual test set, keeping a reasonable cost. Therefore, we advocate that we should concentrate the use of manually generated test sets on testing essential and critical parts of the software.


Publisher's Version Article Search
Modernizing Hierarchical Delta Debugging
Renáta Hodován and Ákos Kiss
(University of Szeged, Hungary)

Programmers tasked with the fixing of a bug prefer working on a minimal test case where every single bit is needed to reproduce the failure. However, cutting off the excess parts of a potentially large test case can be a tedious and time-consuming task if performed manually, which has led to the research and development of automated test case reduction techniques. The decade-old Hierarchical Delta Debugging (HDD) algorithm targets structured test inputs, parses them with the help of grammars and applies the minimizing Delta Debugging algorithm to the built trees.

We have investigated this algorithm and its implementation, and propose improvements in this paper to address the found shortcomings. We argue that using extended context-free grammars with HDD is beneficial in several ways and the experimental evaluation of our modernized HDD implementation, called Picireny, supports the outlined ideas: the reduced outputs are significantly smaller (by circa 25–40%) on the investigated test cases than those produced by the reference HDD implementation using standard context-free grammars. These results, together with the technical improvements that ease the use of the modernized tool, can hopefully help spreading the adaptation of HDD in practice.


Publisher's Version Article Search
Complete IOCO Test Cases: A Case Study
Sofia Costa Paiva, Adenilso Simao, Mahsa Varshosaz, and Mohammad Reza Mousavi
(University of São Paulo, Brazil; Halmstad University, Sweden)
Input/Output Transition Systems (IOTSs) have been widely used as test models in model-based testing. Traditionally, input output conformance testing (IOCO) has been used to generate random test cases from IOTSs. A recent test case generation method for IOTSs, called Complete IOCO, applies fault models to obtain complete test suites with guaranteed fault coverage for IOTSs. This paper measures the efficiency of Complete IOCO in comparison with the traditional IOCO test case generation implemented in the JTorX tool. To this end, we use a case study involving five specification models from the automotive and the railway domains. Faulty mutations of the specifications were produced in order to compare the efficiency of both test generation methods in killing them. The results indicate that Complete IOCO is more efficient in detecting deep faults in large state spaces while IOCO is more efficient in detecting shallow faults in small state spaces.
Publisher's Version Article Search

Session 3

Model-Based Testing of Stochastic Systems with IOCO Theory
Marcus Gerhold and Mariëlle Stoelinga
(University of Twente, Netherlands)

We present essential concepts of a model-based testing framework for probabilistic systems with continuous time. Markov automata are used as an underlying model. Key result of the work is the solid core of a probabilistic test theory, that incorporates real-time stochastic behaviour. We connect ioco theory and hypothesis testing to infer about trace probabilities. We show that our conformance relation conservatively extends ioco and discuss the meaning of quiescence in the presence of exponentially distributed time delays.


Publisher's Version Article Search
Development and Maintenance Efforts Testing Graphical User Interfaces: A Comparison
Antonia Kresse and Peter M. Kruse
(Berner & Mattner, Germany)
For testing of graphical user interfaces many tools exists. The aim of this work is a statement regarding the advantages and disadvantages of various testing tools with regard to their use in the economic context to be taken. It is compared, inter alia, whether there are differences in the generations of test tools in terms of finding defects and which tool has the lowest development and maintenance costs. Results show that with QF-Test test suites can be created the quickest while EggPlant has the shortest maintenance time. TestComplete performs worse in both disciplines. For test robustness, no clear picture can be drawn. The selection of a test tool is typically done once in a project at the beginning and should be considered carefully.
Publisher's Version Article Search
MT4A: A No-Programming Test Automation Framework for Android Applications
Tiago Coelho, Bruno Lima, and João Pascoal Faria
(University of Porto, Portugal; INESC TEC, Portugal)
The growing dependency of our society on increasingly complex software systems, combining mobile and cloud-based applications and services, makes the test activities even more important and challenging. However, sometimes software tests are not properly performed due to tight deadlines, due to the time and skills required to develop and execute the tests or because the developers are too optimistic about possible faults in their own code. Although there are several frameworks for mobile test automation, they usually require programming skills or complex configuration steps. Hence, in this paper, we propose a framework that allows creating and executing tests for Android applications without requiring programming skills. It is possible to create automated tests based on a set of pre-defined actions and it is also possible to inject data into device sensors. An experiment with programmers and non-programmers showed that both can develop and execute tests with a similar time. A real world example using a fall detection application is presented to illustrate the approach.
Publisher's Version Article Search

Session 4

Mitigating (and Exploiting) Test Reduction Slippage
Josie Holmes, Alex Groce, and Mohammad Amin Alipour
(Pennsylvania State University, USA; Oregon State University, USA)
Reducing the size of tests, typically by delta debugging or a related algorithm, is a critical component of effective automated testing and debugging. Automatically generated or user-submitted tests are often far longer than required, full of unnecessary components that make debugging difficult. Test reduction algorithms automatically remove components of such tests, while preserving the property that the test fails. Unfortunately, reduction can sometimes transform a failing test that detects a subtle, critical, and previously unknown fault into a test that detects a trivial-to-find, unimportant, and already known fault. When reducing a test detecting fault(s) F produces a test that does not detect the same F, this is known as slippage. In the case where an interesting fault slips to an uninteresting fault, slippage is a problem, and must be avoided. However, slippage can also be beneficial, when a long test can be reduced to detect a fault that has not otherwise been detected (including by the original test). While traditional delta debugging only produces one reduced test, the concept of slippage suggests an alternative approach, where the output of reduction is a set of reduced tests, in order to avoid problematic slippage and induce beneficial slippage. In this paper, we present preliminary efforts to understand slippage, and compare two approaches to slippage mitigation.
Publisher's Version Article Search
Automated Workflow Regression Testing for Multi-tenant SaaS: Integrated Support in Self-Service Configuration Dashboard
Majid Makki, Dimitri Van Landuyt, and Wouter Joosen
(KU Leuven, Belgium; iMinds, Belgium)
Single-instance multi-tenant SaaS applications allow tenant administrators to (extensively) customize the application according to the requirements of their organizations. In the specific case of workflow-driven applications, the SaaS provider may offer a set of pre-defined workflow activities and leave their composition to the tenant administrators. In such cases, the tenant administrator can instantiate new variants of the application without deploying new software. This effectively makes these tenant administrators part of the DevOps team, and in turn creates the need for the SaaS provider to provide them with Quality Assurance tool support. One such tool is a regression testing framework that allows them to make sure that a new version of a workflow can behave similarly as to a successful execution of a previous version. This paper highlights the potential and discusses the inherent challenges of running regression tests on workflows in the production environment of a multi-tenant SaaS application and outlines a solution in terms of architecture and automation techniques for mocking and regression detection under control of tenant administrators.
Publisher's Version Article Search
Towards an MDE-Based Approach to Test Entity Reconciliation Applications
J. G. Enríquez, Raquel Blanco, F. J. Domínguez-Mayo, Javier Tuya, and M. J. Escalona
(University of Seville, Spain; University of Oviedo, Spain)
The management of large volumes of data has given rise to significant challenges to the entity reconciliation problem (which refers to combining data from different sources for a unified vision) due to the fact that the data are becoming more unstructured, unclean and incomplete, need to be more linked, etc. Testing the applications that implement the entity reconciliation problem is crucial to ensure both the correctness of the reconciliation process and the quality of the reconciled data. In this paper, we present a first approach, based on MDE, which allows the creation of test models for the integration testing of entity reconciliation applications.
Publisher's Version Article Search

proc time: 2.16