Workshop A-TEST 2024 – Author Index |
Contents -
Abstracts -
Authors
|
Bergsmann, Severin |
A-TEST '24: "First Experiments on Automated ..."
First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents
Severin Bergsmann, Alexander Schmidt, Stefan Fischer, and Rudolf Ramler (Software Competence Center Hagenberg, Austria) Gherkin is a domain-specific language for describing test scenarios in natural language, which are the basis for automated acceptance testing. The emergence of Large Language Models (LLMs) has opened up new possibilities for processing such test specifications and for generating executable test code. This paper investigates the feasibility of employing LLMs to execute Gherkin test specifications utilizing the AutoGen multi-agent framework. Our findings show that our LLM agent system is able to automatically run the given test scenarios by autonomously exploring the system under test, generating executable test code on the fly, and evaluating execution results. We observed high success rates for executing simple as well as more complex test scenarios, but we also identified difficulties regarding failure scenarios and fault detection. @InProceedings{A-TEST24p12, author = {Severin Bergsmann and Alexander Schmidt and Stefan Fischer and Rudolf Ramler}, title = {First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {12--15}, doi = {10.1145/3678719.3685692}, year = {2024}, } Publisher's Version Info |
|
Fischer, Stefan |
A-TEST '24: "First Experiments on Automated ..."
First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents
Severin Bergsmann, Alexander Schmidt, Stefan Fischer, and Rudolf Ramler (Software Competence Center Hagenberg, Austria) Gherkin is a domain-specific language for describing test scenarios in natural language, which are the basis for automated acceptance testing. The emergence of Large Language Models (LLMs) has opened up new possibilities for processing such test specifications and for generating executable test code. This paper investigates the feasibility of employing LLMs to execute Gherkin test specifications utilizing the AutoGen multi-agent framework. Our findings show that our LLM agent system is able to automatically run the given test scenarios by autonomously exploring the system under test, generating executable test code on the fly, and evaluating execution results. We observed high success rates for executing simple as well as more complex test scenarios, but we also identified difficulties regarding failure scenarios and fault detection. @InProceedings{A-TEST24p12, author = {Severin Bergsmann and Alexander Schmidt and Stefan Fischer and Rudolf Ramler}, title = {First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {12--15}, doi = {10.1145/3678719.3685692}, year = {2024}, } Publisher's Version Info |
|
García, Boni |
A-TEST '24: "Use of ChatGPT as an Assistant ..."
Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps
Boni García, Maurizio Leotta, Filippo Ricca, and Jim Whitehead (Universidad Carlos III de Madrid, Spain; University of Genoa, Italy; University of California at Santa Cruz, USA) Automated testing is crucial in software development to ensure that applications perform as intended. However, generating automated End-to-End (E2E) tests can be time-consuming and challenging, especially for junior developers. This study investigates the use of ChatGPT, a popular Generative Artificial Intelligence (GenAI) model, as an assistant in developing automated E2E test scripts for Android apps. We present an empirical study that compares the effort required to create E2E test scripts and the resulting reliability of these tests using two treatments: manually and assisted by ChatGPT. We used Gherkin, a domain-specific language that allows non-technical practitioners to define test scenarios using a human-readable syntax. Our findings indicate that using ChatGPT significantly reduces the time required to develop automated test scripts without compromising the reliability of the scripts. Statistical analysis shows a notable reduction in development time for the ChatGPT-assisted group compared to the manual group, with a large effect size. While the reliability of the tests did not show a significant difference between the two groups, the results suggest practical benefits in terms of efficiency. @InProceedings{A-TEST24p5, author = {Boni García and Maurizio Leotta and Filippo Ricca and Jim Whitehead}, title = {Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {5--11}, doi = {10.1145/3678719.3685691}, year = {2024}, } Publisher's Version |
|
Kiss, Ákos |
A-TEST '24: "GreeDDy: Accelerate Parallel ..."
GreeDDy: Accelerate Parallel DDMIN
Dániel Vince and Ákos Kiss (University of Szeged, Hungary) One of the most important algorithms in the field of automated test case minimization is the minimizing Delta Debugging (DDMIN) algorithm. It is used with preference because it works on any kind of input without information about its structure. In this paper, we focus on the parallelization of DDMIN. We discuss its stability issues and outline a potential solution to it. Then, we discuss an idea to speed up parallel DDMIN without compromising the minimality guarantees of the algorithm. We evaluate this algorithm variant, named GreeDDy, on a publicly available test suite. @InProceedings{A-TEST24p1, author = {Dániel Vince and Ákos Kiss}, title = {GreeDDy: Accelerate Parallel DDMIN}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {1--4}, doi = {10.1145/3678719.3685690}, year = {2024}, } Publisher's Version |
|
Leotta, Maurizio |
A-TEST '24: "Use of ChatGPT as an Assistant ..."
Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps
Boni García, Maurizio Leotta, Filippo Ricca, and Jim Whitehead (Universidad Carlos III de Madrid, Spain; University of Genoa, Italy; University of California at Santa Cruz, USA) Automated testing is crucial in software development to ensure that applications perform as intended. However, generating automated End-to-End (E2E) tests can be time-consuming and challenging, especially for junior developers. This study investigates the use of ChatGPT, a popular Generative Artificial Intelligence (GenAI) model, as an assistant in developing automated E2E test scripts for Android apps. We present an empirical study that compares the effort required to create E2E test scripts and the resulting reliability of these tests using two treatments: manually and assisted by ChatGPT. We used Gherkin, a domain-specific language that allows non-technical practitioners to define test scenarios using a human-readable syntax. Our findings indicate that using ChatGPT significantly reduces the time required to develop automated test scripts without compromising the reliability of the scripts. Statistical analysis shows a notable reduction in development time for the ChatGPT-assisted group compared to the manual group, with a large effect size. While the reliability of the tests did not show a significant difference between the two groups, the results suggest practical benefits in terms of efficiency. @InProceedings{A-TEST24p5, author = {Boni García and Maurizio Leotta and Filippo Ricca and Jim Whitehead}, title = {Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {5--11}, doi = {10.1145/3678719.3685691}, year = {2024}, } Publisher's Version |
|
Ramler, Rudolf |
A-TEST '24: "First Experiments on Automated ..."
First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents
Severin Bergsmann, Alexander Schmidt, Stefan Fischer, and Rudolf Ramler (Software Competence Center Hagenberg, Austria) Gherkin is a domain-specific language for describing test scenarios in natural language, which are the basis for automated acceptance testing. The emergence of Large Language Models (LLMs) has opened up new possibilities for processing such test specifications and for generating executable test code. This paper investigates the feasibility of employing LLMs to execute Gherkin test specifications utilizing the AutoGen multi-agent framework. Our findings show that our LLM agent system is able to automatically run the given test scenarios by autonomously exploring the system under test, generating executable test code on the fly, and evaluating execution results. We observed high success rates for executing simple as well as more complex test scenarios, but we also identified difficulties regarding failure scenarios and fault detection. @InProceedings{A-TEST24p12, author = {Severin Bergsmann and Alexander Schmidt and Stefan Fischer and Rudolf Ramler}, title = {First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {12--15}, doi = {10.1145/3678719.3685692}, year = {2024}, } Publisher's Version Info |
|
Ricca, Filippo |
A-TEST '24: "Use of ChatGPT as an Assistant ..."
Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps
Boni García, Maurizio Leotta, Filippo Ricca, and Jim Whitehead (Universidad Carlos III de Madrid, Spain; University of Genoa, Italy; University of California at Santa Cruz, USA) Automated testing is crucial in software development to ensure that applications perform as intended. However, generating automated End-to-End (E2E) tests can be time-consuming and challenging, especially for junior developers. This study investigates the use of ChatGPT, a popular Generative Artificial Intelligence (GenAI) model, as an assistant in developing automated E2E test scripts for Android apps. We present an empirical study that compares the effort required to create E2E test scripts and the resulting reliability of these tests using two treatments: manually and assisted by ChatGPT. We used Gherkin, a domain-specific language that allows non-technical practitioners to define test scenarios using a human-readable syntax. Our findings indicate that using ChatGPT significantly reduces the time required to develop automated test scripts without compromising the reliability of the scripts. Statistical analysis shows a notable reduction in development time for the ChatGPT-assisted group compared to the manual group, with a large effect size. While the reliability of the tests did not show a significant difference between the two groups, the results suggest practical benefits in terms of efficiency. @InProceedings{A-TEST24p5, author = {Boni García and Maurizio Leotta and Filippo Ricca and Jim Whitehead}, title = {Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {5--11}, doi = {10.1145/3678719.3685691}, year = {2024}, } Publisher's Version |
|
Schmidt, Alexander |
A-TEST '24: "First Experiments on Automated ..."
First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents
Severin Bergsmann, Alexander Schmidt, Stefan Fischer, and Rudolf Ramler (Software Competence Center Hagenberg, Austria) Gherkin is a domain-specific language for describing test scenarios in natural language, which are the basis for automated acceptance testing. The emergence of Large Language Models (LLMs) has opened up new possibilities for processing such test specifications and for generating executable test code. This paper investigates the feasibility of employing LLMs to execute Gherkin test specifications utilizing the AutoGen multi-agent framework. Our findings show that our LLM agent system is able to automatically run the given test scenarios by autonomously exploring the system under test, generating executable test code on the fly, and evaluating execution results. We observed high success rates for executing simple as well as more complex test scenarios, but we also identified difficulties regarding failure scenarios and fault detection. @InProceedings{A-TEST24p12, author = {Severin Bergsmann and Alexander Schmidt and Stefan Fischer and Rudolf Ramler}, title = {First Experiments on Automated Execution of Gherkin Test Specifications with Collaborating LLM Agents}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {12--15}, doi = {10.1145/3678719.3685692}, year = {2024}, } Publisher's Version Info |
|
Vince, Dániel |
A-TEST '24: "GreeDDy: Accelerate Parallel ..."
GreeDDy: Accelerate Parallel DDMIN
Dániel Vince and Ákos Kiss (University of Szeged, Hungary) One of the most important algorithms in the field of automated test case minimization is the minimizing Delta Debugging (DDMIN) algorithm. It is used with preference because it works on any kind of input without information about its structure. In this paper, we focus on the parallelization of DDMIN. We discuss its stability issues and outline a potential solution to it. Then, we discuss an idea to speed up parallel DDMIN without compromising the minimality guarantees of the algorithm. We evaluate this algorithm variant, named GreeDDy, on a publicly available test suite. @InProceedings{A-TEST24p1, author = {Dániel Vince and Ákos Kiss}, title = {GreeDDy: Accelerate Parallel DDMIN}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {1--4}, doi = {10.1145/3678719.3685690}, year = {2024}, } Publisher's Version |
|
Whitehead, Jim |
A-TEST '24: "Use of ChatGPT as an Assistant ..."
Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps
Boni García, Maurizio Leotta, Filippo Ricca, and Jim Whitehead (Universidad Carlos III de Madrid, Spain; University of Genoa, Italy; University of California at Santa Cruz, USA) Automated testing is crucial in software development to ensure that applications perform as intended. However, generating automated End-to-End (E2E) tests can be time-consuming and challenging, especially for junior developers. This study investigates the use of ChatGPT, a popular Generative Artificial Intelligence (GenAI) model, as an assistant in developing automated E2E test scripts for Android apps. We present an empirical study that compares the effort required to create E2E test scripts and the resulting reliability of these tests using two treatments: manually and assisted by ChatGPT. We used Gherkin, a domain-specific language that allows non-technical practitioners to define test scenarios using a human-readable syntax. Our findings indicate that using ChatGPT significantly reduces the time required to develop automated test scripts without compromising the reliability of the scripts. Statistical analysis shows a notable reduction in development time for the ChatGPT-assisted group compared to the manual group, with a large effect size. While the reliability of the tests did not show a significant difference between the two groups, the results suggest practical benefits in terms of efficiency. @InProceedings{A-TEST24p5, author = {Boni García and Maurizio Leotta and Filippo Ricca and Jim Whitehead}, title = {Use of ChatGPT as an Assistant in the End-to-End Test Script Generation for Android Apps}, booktitle = {Proc.\ A-TEST}, publisher = {ACM}, pages = {5--11}, doi = {10.1145/3678719.3685691}, year = {2024}, } Publisher's Version |
10 authors
proc time: 2.03