Powered by
Conference Publishing Consulting

3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2014), June 3, 2014, Hyderabad, India

RAISE 2014 – Proceedings

Contents - Abstracts - Authors

3rd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2014)

Title Page


Message from the Chairs
RAISE workshop series aim at bringing together researchers and industrial practitioners to exchange and discuss the latest innovative synergistic artificial intelligence (AI) and software engineering (SE) techniques and practices. This workshop is the third in the series and continues to build upon the work carried out at the previous implementations of RAISE, which were also co-located with ICSE in 2012 and 2013.

Towards More Intelligent Trace Retrieval Algorithms
Jane Cleland-Huang and Jin Guo
(DePaul University, USA)
Automated trace creation techniques are based on a variety of algorithms ranging from basic term matching approaches to more sophisticated expert systems. In this position paper we propose a classification scheme for categorizing the intelligence level of automated traceability techniques. We show that the vast majority of relevant work in the past decade has been focused at the lowest level of the Traceability Intelligence Quotient (tIQ) and posit that achieving high quality automated traceability will require re-focusing research efforts on the development of more intelligent algorithms capable of reasoning about concepts, their relationships and constraints, and the contexts in which they occur.

A Mapping Study on Bayesian Networks for Software Quality Prediction
Ayse Tosun Misirli and Ayşe Başar Bener
(University of Oulu, Finland; Ryerson University, Canada)
Bayesian Networks (BN) have been used for decision making in software engineering for many years. We investigate the current status of BNs in predicting software quality in three aspects: 1) techniques used for parameter learning, 2) techniques used for structure learning, and 3) type of variables that represent BN nodes. We performed a systematic mapping study on 38 primary studies that employed BNs to predict software quality. The most popular technique for building the final structure of BNs is the use of expert knowledge with different inference algorithms. Variables in BNs are treated as categorical in more than 70% of studies. Compared to other domains, the usage of BNs is still very limited due to high dependency on expert knowledge and tools.

SANAYOJAN: A Framework for Traceability Link Recovery between Use-Cases in Software Requirement Specification and Regulatory Documents
Ritika Jain, Smita Ghaisas, and Ashish Sureka
(IIIT Delhi, India; Tata Consultancy Services, India)
User requirement specification (URS) documents written in the form of free-form natural language text contain system use-case descriptions as one of the elements in the URS. For a few application domains, some of the system use-cases in SRS define services and functionality which needs to comply with law, rules and regulations pertaining to the application domain. In this paper, we present a multi-step approach to automatically extract system use-cases from URS and construct traceability links between system-uses and appropriate regulations in the regulatory documents. We define lexicon-based, syntactic and semantic features to discriminate system use-cases from other elements in the SRS. We investigate the application of five semantic similarity methods implemented in the SEMILAR semantic similarity toolkit to compute similarity between a given system use-case with regulations in a regulatory document. We conduct a series of experiments on real-world data obtained from software projects of a large global Information Technology (IT) services company to validate the proposed approach. Experimental results demonstrate effectiveness (accuracy of 83.3% for system use-case extraction and 72% for constructing traceability links) and limitations of the proposed approach.

Supporting Comprehension of Unfamiliar Programs by Modeling an Expert's Perception
Naveen Kulkarni and Vasudeva Varma
(IIIT Hyderabad, India)
Developers need to understand many Software Engineering (SE) artifacts while making changes to the code. In such cases, developers use cues extensively to establish relevance of an information with the task. Their familiarity with different kind of cues will help them in comprehending a program. But, developers face information overload because (a) there are many cues and (b) they might be unfamiliar with artifacts. So, we propose a novel approach to overcome information overload problem by modeling developer's perceived value of information based on cues. In this preliminary study, we validate one such model for common comprehension tasks. We also apply this model to summarize source code. An evaluation of the generated summaries resulted in 83% similarity with summaries recorded by developers. The promising results encourages us to create a repository of perception models that can later aid complex SE tasks.

Machine Learning for Constituency Test of Coordinating Conjunctions in Requirements Specifications
Richa Sharma, Jaspreet Bhatia, and K. K. Biswas
(IIT Delhi, India)
Coordinating conjunctions have been a major source of ambiguity in Natural Language statements and the concern has been a major research focus in English Linguistics. Natural Language is also the most common form of expressing the requirements for an envisioned software system. These requirement documents also suffer from similar concern of coordination ambiguity. Presence of nocuous coordination ambiguity is a major concern for the requirements analysts. In this paper, we explore the applicability of constituency test for identifying coordinating conjunction instances in the requirements documents. We show through our study how identification of nocuous and innocuous coordinating conjunctions can be improved using semantic similarity heuristics and machine learning. Our study indicates that Naïve Bayes classifier outperforms other machine learning algorithms.

OCL Usability: A Major Challenge in Adopting UML
Imran Sarwar Bajwa, Behzad Bordbar, and Mark Lee
(Islamia University of Bahawalpur, Pakistan; University of Birmingham, UK)
In this paper, we present a novel approach to address the OCL usability problem by automatically producing OCL from English text. The main aspects of OCL usability problem are attributed as hard syntax of language, ambiguous nature of OCL expressions, and difficult interpretation of large OCL expressions. Our contribution is a novel approach that aims to present a method involving using Natural Language expressions and Model Transformation technology to improve OCL usability. The aim of the method is to produce a framework so that the user of UML tools can write constraints and pre/post conditions in English and the framework converts such English expressions to the equivalent OCL statements. The proposed approach is implemented in a software tool NL2OCLviaSBVR that generates OCL constraints from English text via SBVR. Our tool allows software modelers and developers to generate well-formed OCL expressions that results in valid and precise models. An empirical evaluation of the OCL constraints reveals that our natural language based approach to generate OCL constraints significantly outperforms the most closely related technique in terms of effort and effectiveness.

A Self-Learning Approach for Validation of Communication in Embedded Systems
Falk Langer and Erik Oswald
(Fraunhofer ESK, Germany)
This paper demonstrates a new approach that addresses the problem of evaluating the communication behavior of embedded systems by applying algorithms from the area of artificial intelligence. An important problem for the validation for the interaction in the distributed system is missing, wrong or incomplete specification. This paper demonstrates the application of a new self-learning approach for assessing the communication behavior based on reference traces. The benefit of the approach is that it works automatically, with low additional effort and without using any specification. The investigated methodology uses algorithms from the field of machine learning and data mining to extract behavior models out of a reference trace. For showing the application, this paper provides a use case and the basic setup for the proposed method. The applicability of this self-learning methodology is evaluated based on real vehicle network data.

Deriving Time Lines from Texts
Mathias Landhäußer, Tobias Hey, and Walter F. Tichy
(KIT, Germany)
We investigate natural language as an alternative to programming languages. Natural language would empower anyone to program with minimal training. In this paper, we solve an ordering problem that arises in natural-language programming. An emprical study showed that users do not always provide the strict sequential order of steps needed for execution on a computer. Instead, temporal expressions involving "before", "after", "while", "at the end", and others are used to indicate an order other than the textual one. We present an analysis that extracts the intended time line by exploiting temporal clues. The technique is analyzed in the context of Alice, a 3D programming environment, and AliceNLP, a system for programming Alice in ordinary English. Extracting temporal order could also be useful for analyzing reports, question answering, help desk requests, and big data applications.

Mining Issue Tracking Systems using Topic Models for Trend Analysis, Corpus Exploration, and Understanding Evolution
Ayushi Aggarwal, Gajendra Waghmare, and Ashish Sureka
(IIIT Delhi, India)
Issue Tracking systems (ITS) such as Google Code Hosting and Bugzilla facilitate software maintenance activities through bug reporting, archiving and fixing. The large number of bug reports and their unstructured text makes it impractical for developers to manually extract actionable intelligence to expedite bug fixing. In this paper, we present an application of mining bug report description and threaded discussion comments using Latent Dirichlet Allocation (LDA) which is a topic modeling technique. We apply LDA on the Chromium Browser Project bug archives (open-source) to extract topics (discovery of semantically related terms) and the latent semantic relationship between documents (bug reports) and extracted topics for corpus exploration, trend analysis and understanding evolution in maintenance domain. We conduct a series of experiments to uncover latent topics potentially useful for developers and testers based on the bug meta-data such as time, priority, type, category and status. The analysis of reopened and duplicate bugs in particular, has important inferences for the developers and can help in applications such as expertise modeling, resource allocation and knowledge management.

proc time: 0.69