NaturaLiSE 2013 – Proceedings

Foreword
Software engineers produce code that has formal syntax and semantics, which establishes its formal meaning. However it also includes significant natural language found in identifier names and comments. Additionally, programmers not only work with source code but also with a variety of software artifacts, predominantly written in natural language. Examples include documentation, requirements, test plans, bug reports, and peer-to-peer communications. It is increasingly evident that natural language information can play a key role in improving a variety of software engineering tools used during the design, development, debugging, and testing of software.The focus of the NaturaLiSE workshop is on natural language analysis of software artifacts. This workshop will bring together researchers and practitioners interested in exploiting natural language informationfound in software artifacts to create improved software engineering tools. Relevant topics include (but are not limited to) natural language analysis applied to software artifacts, combining natural language and traditional program analysis, integration of natural language analyses into client tools, mining natural language data, and empirical studies focused on evaluating the usefulness of natural language analysis.

Formal Methods

Consistent Stakeholder Modifications of Formal Models via a Natural Language Representation
Gregor Gabrysiak, Daniel Eichler, Regina Hebig, and Holger Giese
(HPI, Germany)
While requirements described in Natural Language are inherently ambiguous and hard to check for consistency, they are intuitively understandable for domain experts. Using formal models, on the other hand, supports requirements engineers to specify requirements correct, consistent and complete. Transformations between these two different representations can become quite complex, since not everything that can be expressed with natural language can be captured in a restricted formal model. We describe a transformation approach that takes the formal modeling operations of story patterns and allows domain experts to transparently apply them on a natural language representation of these formal models. Also, a preliminary evaluation is presented.

Automated Extraction of Non-functional Requirements in Available Documentation
John Slankas and Laurie Williams
(North Carolina State University, USA)
While all systems have non-functional requirements (NFRs), they may not be explicitly stated in a formal requirements specification. Furthermore, NFRs may also be externally imposed via government regulations or industry standards. As some NFRs represent emergent system proprieties, those NFRs require appropriate analysis and design efforts to ensure they are met. When the specified NFRs are not met, projects incur costly re-work to correct the issues. The goal of our research is to aid analysts in more effectively extracting relevant non-functional requirements in available unconstrained natural language documents through automated natural language processing. Specifically, we examine which document types (data use agreements, install manuals, regulations, request for proposals, requirements specifications, and user manuals) contain NFRs categorized to 14 NFR categories (e.g. capacity, reliability, and security). We measure how effectively we can identify and classify NFR statements within these documents. In each of the documents evaluated, we found NFRs present. Using a word vector representation of the NFRs, a support vector machine algorithm performed twice as effectively compared to the same input to a multinomial naïve Bayes classifier. Our k-nearest neighbor classifier with a unique distance metric had an F1 measure of 0.54, outperforming in our experiments the optimal naïve Bayes classifier which had a F1 measure of 0.32. We also found that stop word lists beyond common determiners had no minimal performance effect.

Capturing Assertions from Natural Language Descriptions
Ian G. Harris
(UC Irvine, USA)
We present a technique to automatically generate formal, executable assertions from natural language assertion descriptions written in English. Assertions are program invariants which are commonly used for result checking in the hardware verification process. We present an attribute grammar which captures the semantics of a subset of English language assertion descriptions. Using the attribute grammar, we parse assertion descriptions and generate semantically equivalent formal models. We have evaluated our technique using a large set of industrial assertion descriptions. We present the successful assertion generation results, as well as the limitations of our approach and methods to address those limitations in the future.

Tools and Techniques

Automatically Identifying a Software Product’s Quality Attributes through Sentiment Analysis of Tweets
Rahim Dehkharghani and Cemal Yilmaz

(Sabanci University, Turkey)
Software quality attributes can be identified based on software features such as security, reliability and userfriendliness. This process can be done either manually or automatically. Sentiment analysis refers to the automatic extraction of sentiments from resources such as natural language texts. We study the application of sentiment analysis on extracting the quality attributes of a software product based on the opinions of end-users that have been stated in microblogs such as twitter. Our findings obtain advantageous techniques such as document frequency of words in a large number of tweets. The extracted results can help software developers know the advantages and disadvantages of their products.

lips: An IDE for Model Driven Engineering Based on Natural Language Processing
Oliver Keszocze, Mathias Soeken, Eugen Kuksa, and Rolf Drechsler
(University of Bremen, Germany; DFKI at Bremen, Germany)
Combining both, state-of-the art natural language processing (NLP) algorithms and semantic information offered by a variety of ontologies and databases, efficient methods have been proposed that assist system designers in automatically translating text-based specifications into formal models. But due to ambiguities in natural language, these approaches usually require user interaction. Following these achievements, we consider natural language as a further input language that is used in the design flow for systems and software. Consequently, concepts from integrated development environments (IDE) as they can be found for programming languages such as Java need to be made available for natural language specifications as well.
In this paper, we propose lips, an integrated development environment that is seamlessly implemented on top of Eclipse. It contains recent NLP algorithms that extract formal models suited for the Eclipse Modeling Framework and therefore provide a starting point for an ongoing implementation. Whenever user interaction is required, lips makes use of well-known IDE concepts such as markers and quick fixes thereby enabling a holistic user experience.

NaturaLiSE 2013 – Proceedings

1st International Workshop on Natural Language Analysis in Software Engineering (NaturaLiSE)

Preface

Formal Methods

Tools and Techniques