MUD 2015 – Proceedings

Message from the Chairs
Welcome to MUD 2015, the 5th Workshop on Mining Unstructured Data. The workshop is co-located with the 31st International Conference on Software Maintenance and Evolution (ICSME 2015) and is taking place in Bremen, Germany.

Paper Presentations and Group Discussion
Mon, Sep 28, 14:00 - 15:30, GW2 B2900

Heuristic-Based Part-of-Speech Tagging of Source Code Identifiers and Comments
Reem S. AlSuhaibani, Christian D. Newman, Michael L. Collard, and Jonathan I. Maletic
(Kent State University, USA; University of Akron, USA)
An approach for using heuristics and static program analysis information to markup part-of-speech for program identifiers is presented. It does not use a natural language part-of-speech tagger for identifiers within the code. A set of heuristics is defined akin to natural language usage of identifiers usage in code. Additionally, method stereotype information, which is automatically derived, is used in the tagging process. The approach is built using the srcML infrastructure and adds part-of-speech information directly into the srcML markup.

SODA: The Stack Overflow Dataset Almanac
Nicolas Latorre, Roberto Minelli, Andrea Mocci, Luca Ponzanelli, and Michele Lanza

(University of Lugano, Switzerland)
Stack Overflow has become a fundamental resource for developers, becoming the de facto Question and Answer (Q&A) website, and one of the standard unstructured data sources for software engineering research to mine knowledge about development. We present Soda, the Stack Overflow Dataset Almanac, a tool that helps researchers and developers to better understand the trends of discussion topics in Stack Overflow, based on the available tagging system. Soda provides an effective visualization to support the analysis of topics in different time intervals and frames, leveraging single or co-occurrent tags. We show, through simple usage scenarios, how Soda can be used to find interesting peculiar moments in the evolution of Stack Overflow discussions that closely match specific recent events in the area of software development. Soda is available at http://rio.inf.usi.ch/soda/.

Info

Matching Machine-Code Functions in Executables within One Product Line via Bioinformatic Sequence Alignment
Arne Wichmann and Sibylle Schupp
(TU Hamburg, Germany)
In this paper we evaluate whether different executables from the same software product line have similar sequences of machine-code functions. We provide a method of creating matchings of machine-code functions using alignment techniques known from bioinformatics. We map, per function, vectors of code metrics to symbols from an alphabet using machine learning techniques, and construct sequence alignments using off-the-shelf alignment tools. Our evaluation of alignments of glibc versions, musl optimizations, different RedBoot platforms and architectures, and the Linux kernel shows that the above statement holds in all cases except for differing architectures. Our method can therefore be used to match functions in executables for most variations within one product line.

MUD 2015 – Proceedings

Frontmatter

Paper Presentations and Group Discussion Mon, Sep 28, 14:00 - 15:30, GW2 B2900

Paper Presentations and Group Discussion
Mon, Sep 28, 14:00 - 15:30, GW2 B2900