USER 2012 – Proceedings

Foreword
Thank you for coming to the first ever User evaluation for Software Engineering Researchers workshop (USER 2012). This workshop will have a unique format in which you will all collaboratively learn about and help create user studies for your own software engineering tools. We accepted a variety of archival and non-archival proposals; out of 19 archival proposal submissions, we accepted 14. We are glad that you have all decided to participate.
The workshop schedule is divided into four sections. Each section has a dual purpose — first, to provide you an opportunity to learn about and practice an aspect of developing and running a user evaluation, and second, to build community by learning about one another’s research and brainstorming together. We begin by exploring early investigative and organizational methodologies: contextual inquiry, interviews, and affinity diagramming. We follow by developing research questions and refining them into testable hypotheses. Third, we survey various experimental designs and use that knowledge to create study plans. We conclude by piloting the most promising studies on one another, reflect on how they went, and refine their design.
To assist the organizers, we have assembled a panel of ten expert software engineering and human-computer interaction researchers, all of whom are well-known in the software engineering research community. They will offer you wisdom gained from years of experience working with users of research software and software processes, and will show you how valuable user evaluations can be.
All of you will walk away from here with a concrete experimental plan for a user study of your own research project. You will also have become part of a large and growing community of like-minded software engineering researchers who understand the value of evaluating their research with users.

Combining Experiments and Grounded Theory to Evaluate a Research Prototype: Lessons from the Umple Model-Oriented Programming Technology
Omar Badreddin and Timothy C. Lethbridge
(University of Ottawa, Canada)
Research prototypes typically lack the level of quality and readiness required for industrial deployment. Hence, conducting realistic experimentation with professional users that reflect real life tasks is challenging. Experimentation with toy examples and tasks suffers from significant threats to external validity. Consequently, results from such experiments fail to gain confidence or mitigate risks, a prerequisite for industrial adoption. This paper presents two empirical studies conducted to evaluate a model-oriented programming language called Umple; a grounded theory study and a controlled experiment of comprehension. Evaluations of model-oriented programming is particularly challenging. First, there is a need to provide for highly sophisticated development environments for realistic evaluation. Second, the scarcity of experienced users poses additional challenges. In this paper we discuss our experiences, lessons learned, and future considerations in the evaluation of a research prototype tool.

User Evaluation of a Domain-Oriented End-User Design Environment for Building 3D Virtual Chemistry Experiments
Ying Zhong and Chang Liu
(Ohio University, USA)
Three-dimensional virtual world technologies have the potential to be applied in the domain of education. However, end users such as teachers found it difficult to apply virtual world technologies because of technical issues. This paper discusses the technical difficulties end users face when developing 3D virtual worlds. We investigate the problem from the perspective of end-user programming and propose a methodology for solving this problem. In order to evaluate this methodology, a domain-oriented end-user design environment implementing the methodology has been developed and applied in the domain of educational virtual chemistry laboratory. Two user studies are designed to assess the methodology from two different perspectives. The first user study evaluates the usability of the methodology. The second user study assesses the usability of virtual experiments generated using the methodology.

An Experiment in Developing Small Mobile Phone Applications Comparing On-Phone to Off-Phone Development
Tuan A. Nguyen, Sarker T. A. Rumee, Christoph Csallner

, and Nikolai Tillmann
(University of Texas at Arlington, USA; Microsoft Research, USA)
TouchDevelop represents a radically new mobile application development model, as TouchDevelop enables mobile application development on a mobile device. I.e., with TouchDevelop, the task of programming say a Windows Phone is shifted from the desktop computer to the mobile phone itself. We describe a first experiment on independent, nonexpert subjects to compare programmer productivity using TouchDevelop vs. using a more traditional approach to mobile application development.

Evaluating Live Sequence Charts as a Programming Technique for Non-programmers
Michal Gordon and David Harel
(Weizmann Institute of Science, Israel)
Behavioral programming is a recent programming paradigm that uses independent scenarios to program the behavior of reactive systems. Live sequence charts (LSC) is a visual formalism that implements the approach of behavioral programming. The approach attempts to liberate programming by allowing the user to program the behavior of reactive systems by scenarios. We would like to evaluate the approach and seek the naturalness of the best interface for creating the visual artifact of LSCs. Several such interfaces, among which is a novel interactive natural language (NL) interface, exist. Initial testing indicates that the LSCs' NL interface may be preferred by programmers to procedural programming and that in certain tasks LSCs may be a viable and more natural alternative to conventional programming. Many challenges exist in trying to prove the intuitive and natural nature of a new programming paradigm, which differs from others not only in syntax but in many other respects. We describe these challenges in this proposal.

Do We Stop Learning from Our Mistakes When Using Automatic Code Analysis Tools? An Experiment Proposal
Jan-Peter Ostberg and Stefan Wagner
(University of Stuttgart, Germany)
When we learn how to program, we often do that by trial and error. We struggle with the syntax and with our own understanding of how the idea of the program should look like in the specific programming language. Today there is a huge amount of tools available, which automatically check your code and recommend alterations to the code for the sake of maintainability or correctness. The question, that has not yet been asked by science, is: Are we still learning something from these mistakes, besides the knowledge, that such mistakes will be corrected for us? In the following we will propose an experimental setup, that aims to answer this question.

Towards an Evaluation of Bidirectional Model-Driven Spreadsheets
Jácome Cunha, João Paulo Fernandes, Jorge Mendes, and João Saraiva
(University of Minho, Portugal; University of Porto, Portugal)
Spreadsheets are widely recognized as popular programming systems with a huge number of spreadsheets being created every day. Also, spreadsheets are often used in the decision processes of profit-oriented companies. While this illustrates their practical importance, studies have shown that up to 90% of real-world spreadsheets contain errors.
In order to improve the productivity of spreadsheet end-users, the software engineering community has proposed to employ model-driven approaches to spreadsheet development.
In this paper we describe the evaluation of a bidirectional model-driven spreadsheet environment. In this environment, models and data instances are kept in conformity, even after an update on any of these artifacts. We describe the issues of an empirical study we plan to conduct, based on our previous experience with end-user studies. Our goal is to assess if this model-driven spreadsheet development framework does in fact contribute to improve the productivity of spreadsheet users.

Revisiting Bug Triage and Resolution Practices
Olga Baysal, Reid Holmes, and Michael W. Godfrey

(University of Waterloo, Canada)
Bug triaging is an error-prone, tedious and time-consuming task. However, little qualitative research has been done on the actual use of bug tracking systems, bug triage, and resolution processes. We are planning to conduct a qualitative study to understand the dynamics of bug triage and fixing process, as well as bug reassignments and reopens. We will study interviews conducted with Mozilla Core and Firefox developers to get insights into the primary obstacles developers face during the bug fixing process. Is the triage process flawed? Does bug review slow things down? Does approval takes too long? We will also categorize the main reasons for bug reassignments and reopens. We will then combine results with a quantitative study of Firefox bug reports, focusing on factors related to bug report edits and number of people involved in handling the bug.

Is Essence a Measure of Maintainability?
Dmitrijs Zaparanuks and Matthias Hauswirth
(University of Lugano, Switzerland)
We recently published a paper at ECOOP presenting a new software design metric, essence, that quantifies the amount of indirection in a software design. The reviews were overwhelmingly positive and included statements such as “The evaluation of the metric is fantastic.” However, we also received feedback from senior researchers who do not believe that we have meaningfully evaluated our metric. This paper represents our effort towards a meaningful evaluation of essence. Given our lack of experience in human-subject studies, we hope to receive valuable feedback on our proposed study design.

Evaluating Awareness Information in Distributed Collaborative Editing by Software-Engineers
Julia Schenk
(Free University of Berlin, Germany)
In co-located collaborative software development activities like pair programming, side-by-side programming, code reviews or code walkthroughs, the individuals automatically gain a fine granular mutual understanding of where in the shared workspace the other participants are, what they are doing and what their levels of interest are. These points of so called awareness information are critical for an efficient and smooth collaboration but cannot be obtained via the natural mechanisms in virtual teams. Application sharing and groupware for collaborative editing are widely used for collaborative tasks in distributed software development but considered from the awareness and flexibility aspect they are far off the co-located setting. To better support virtual team collaboration by improving tools for distributed software development it is neccesary to evaluate awareness and its impacts to certain collaborative situations. Awareness itself is an invisible phenomenon and due to its intangible nature cannot be easily observed or measured. Thus we recorded virtual teams using Saros, a groupware for distributed collaborative party programming, respectively VNC and now analyse these videos using the grounded theory methodology. This approach for evaluating awareness leads to various problems concerning the recording setup and time exposure for analysis.

An Experimental Study of a Design-Driven, Tool-Based Development Approach
Quentin Enard, Christine Louberry, Charles Consel, and Xavier Blanc
(INRIA, France; University of Bordeaux, France; LaBRI, France)
Design-driven software development approaches have long been praised for their many benefits on the development process and the resulting software system. This paper discusses a step towards assessing these benefits by proposing an experimental study that involves a design-driven, tool-based development approach. This study raises various questions including whether a design-driven approach improves software quality and whether the tool-based approach improves productivity. In examining these questions, we explore specific issues such as the approaches that should be involved in the comparison, the metrics that should be used, and the experimental framework that is required.

Industrially Validating Longitudinal Static and Dynamic Analyses
Reid Holmes, David Notkin, and Mark Hancock

(University of Waterloo, Canada; University of Washington, USA)
Software systems gradually evolve over time, becoming increasingly difficult to understand as new features are added and old defects are repaired. Some modifications are harder to understand than others; e.g., an explicit method call is usually easy to trace in the source code, while a reflective method call may perplex both developers and analysis tools. Our tool, the Inconsistency Inspector, collects static and dynamic call graphs of systems and composes them to help developers more systematically address the static and dynamic implications of a change to a system.
We have quantitatively validated the Inconsistency Inspector and have convinced ourselves that it can expose both interesting and surprising facets of a system's evolution. An initial case study with an industrial organization showed promise leading to the Inconsistency Inspector being installed at the organization for the past several months in preparation for a more in depth analysis.
In July 2012 we will have the opportunity to examine 8~months of industrial data, enabling us to perform an in-depth longitudinal evaluation of how their system has evolved and whether the Inconsistency Inspector can expose surprising and helpful facts for the industrial team. At the USER workshop, we hope to gather opinions about evaluation options for validating the industrial utility of our approach and the complex longitudinal data we have collected.

User Evaluation of a Domain Specific Program Comprehension Tool
Leon Moonen

(Simula Research Laboratory, Norway)
The user evaluation in this paper concerns a domain-specific tool to support the comprehension of large safety-critical component-based software systems for the maritime sector. We discuss the context and motivation of our research, and present the user-specific details of our tool, called FlowTracker. We include a walk-through of the system and present the profiles of our prospective users. Next, we discuss the design of an exploratory qualitative study that we have conducted to evaluate the usability and effectiveness of our tool. We conclude with a summary of lessons learned and challenges that we see for user evaluation of such domain-specific program comprehension tools.
Keywords: user evaluation; domain specific tooling; program comprehension; software visualization.

Stakeholder Involvement into Quality Definition and Evaluation for Service-Oriented Systems
Vladimir A. Shekhovtsov, Heinrich C. Mayr, and Christian Kop
(University of Klagenfurt, Austria)
The paper addresses the matter of quality in the software process for service-oriented systems. We argue for the need of involving the users/stakeholders into the specification and evaluation of quality (requirements) and we develop means for supporting such an involvement. For this purpose we introduce classifications of user and quality types and as a basis for the characterization of evaluation cases.

USER 2012 – Proceedings

First International Workshop on User Evaluation for Software Engineering Researchers (USER)