ASE 2017

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017), October 30 – November 3, 2017, Urbana-Champaign, IL, USA

Desktop Layout

Program Comprehension
Technical Research
Exploring Regular Expression Comprehension
Carl Chapman, Peipei Wang, and Kathryn T. Stolee
(Sandia National Laboratories, USA; North Carolina State University, USA)
Supplementary Material
Abstract: The regular expression (regex) is a powerful tool employed in a large variety of software engineering tasks. However, prior work has shown that regexes can be very complex and that it could be difficult for developers to compose and understand them. This work seeks to identify code smells that impact comprehension. We conduct an empirical study on 42 of pairs of behaviorally equivalent but syntactically different regexes using 180 participants and evaluated the understandability of various regex language features. We further analyzed regexes in GitHub to find the community standards or the common usages of various features. We found that some regex expression representations are more understandable than others. For example, using a range (e.g., [0-9]) is often more understandable than a default character class (e.g., [d]). We also found that the DFA size of a regex significantly affects comprehension for the regexes studied. The larger the DFA of a regex (up to size eight), the more understandable it was. Finally, we identify smelly and non-smelly regex representations based on a combination of community standards and understandability metrics.


Time stamp: 2019-09-16T08:53:49+02:00