Powered by
1st International Workshop on Machine Learning and Software Engineering in Symbiosis (MASES 2018),
September 3, 2018,
Montpellier, France
1st International Workshop on Machine Learning and Software Engineering in Symbiosis (MASES 2018)
Message from the Chairs
Welcome to the first edition of the International Workshop on Machine Learning and Software Engineering in Symbiosis (MASES) to be held in Montpellier, France (September 3, 2018, co-located with ASE conference)
https://mases18.github.io
Info
Applying Graph Kernels to Model-Driven Engineering Problems
Robert Clarisó and
Jordi Cabot
(Open University of Catalonia, Spain; ICREA, Spain)
Machine Learning (ML) can be used to analyze and classify large collections of graph-based information, e.g. images, location information, the structure of molecules and proteins, ... Graph kernels is one of the ML techniques typically used for such tasks.
In a software engineering context, models of a system such as structural or architectural diagrams can be viewed as labeled graphs. Thus, in this paper we propose to employ graph kernels for clustering software modeling artifacts. Among other benefits, this would improve the efficiency and usability of a variety of software modeling activities, e.g., design space exploration, testing or verification and validation.
@InProceedings{MASES18p1,
author = {Robert Clarisó and Jordi Cabot},
title = {Applying Graph Kernels to Model-Driven Engineering Problems},
booktitle = {Proc.\ MASES},
publisher = {ACM},
pages = {1--5},
doi = {10.1145/3243127.3243128},
year = {2018},
}
Publisher's Version
Learning-Based Testing for Autonomous Systems using Spatial and Temporal Requirements
Hojat Khosrowjerdi and Karl Meinke
(KTH, Sweden)
Cooperating cyber-physical systems-of-systems (CO-CPS) such as vehicle platoons, robot teams or drone swarms usually have strict safety requirements on both spatial and temporal behavior.
Learning-based testing is a combination of machine learning and model checking that has been successfully used for black-box requirements testing of cyber-physical systems-of-systems.
We present an overview of research in progress to apply learning-based testing to evaluate spatio-temporal requirements on autonomous systems-of-systems through modeling and simulation.
@InProceedings{MASES18p6,
author = {Hojat Khosrowjerdi and Karl Meinke},
title = {Learning-Based Testing for Autonomous Systems using Spatial and Temporal Requirements},
booktitle = {Proc.\ MASES},
publisher = {ACM},
pages = {6--15},
doi = {10.1145/3243127.3243129},
year = {2018},
}
Publisher's Version
Automatically Assessing Vulnerabilities Discovered by Compositional Analysis
Saahil Ognawala, Ricardo Nales Amato,
Alexander Pretschner, and Pooja Kulkarni
(TU Munich, Germany)
Testing is the most widely employed method to find vulnerabilities in real-world software programs. Compositional analysis, based on symbolic execution, is an automated testing method to find vulnerabilities in medium- to large-scale programs consisting of many interacting components. However, existing compositional analysis frameworks do not assess the severity of reported vulnerabilities. In this paper, we present a framework to analyze vulnerabilities discovered by an existing compositional analysis tool and assign CVSS3 (Common Vulnerability Scoring System v3.0) scores to them, based on various heuristics such as interaction with related components, ease of reachability, complexity of design and likelihood of accepting unsanitized input. By analyzing vulnerabilities reported with CVSS3 scores in the past, we train simple machine learning models. By presenting our interactive framework to developers of popular open-source software and other security experts, we gather feedback on our trained models and further improve the features to increase the accuracy of our predictions. By providing qualitative (based on community feedback) and quantitative (based on prediction accuracy) evidence from 21 open-source programs, we show that our severity prediction framework can effectively assist developers with assessing vulnerabilities.
@InProceedings{MASES18p16,
author = {Saahil Ognawala and Ricardo Nales Amato and Alexander Pretschner and Pooja Kulkarni},
title = {Automatically Assessing Vulnerabilities Discovered by Compositional Analysis},
booktitle = {Proc.\ MASES},
publisher = {ACM},
pages = {16--25},
doi = {10.1145/3243127.3243130},
year = {2018},
}
Publisher's Version
A Deep Learning Approach to Program Similarity
Niccolò Marastoni,
Roberto Giacobazzi, and
Mila Dalla Preda
(University of Verona, Italy)
In this work we tackle the problem of binary code similarity by using deep learning applied to binary code visualization techniques.
Our idea is to represent binaries as images and then to investigate whether it is possible to recognize similar binaries by applying deep learning algorithms for image classification. In particular, we apply the proposed deep learning framework to a dataset of binary code variants obtained through code obfuscation. These binary variants exhibit similar behaviours while being syntactically different. Our results show that the problem of binary code recognition is strictly separated from simple image recognition problems.
Moreover, the analysis of the results of the experiments conducted in this work lead us to the identification of interesting research challenges. For example, in order to use image recognition approaches to recognize similar binary code samples it is important to further investigate how to build a suitable mapping from executables to images.
@InProceedings{MASES18p26,
author = {Niccolò Marastoni and Roberto Giacobazzi and Mila Dalla Preda},
title = {A Deep Learning Approach to Program Similarity},
booktitle = {Proc.\ MASES},
publisher = {ACM},
pages = {26--35},
doi = {10.1145/3243127.3243131},
year = {2018},
}
Publisher's Version
A Language-Agnostic Model for Semantic Source Code Labeling
Ben Gelman, Bryan Hoyle, Jessica Moore, Joshua Saxe, and David Slater
(Two Six Labs, USA; Sophos, USA)
Code search and comprehension have become more difficult in recent years due to the rapid expansion of available source code. Current tools lack a way to label arbitrary code at scale while maintaining up-to-date representations of new programming languages, libraries, and functionalities. Comprehensive labeling of source code enables users to search for documents of interest and obtain a high-level understanding of their contents. We use Stack Overflow code snippets and their tags to train a language-agnostic, deep convolutional neural network to automatically predict semantic labels for source code documents. On Stack Overflow code snippets, we demonstrate a mean area under ROC of 0.957 over a long-tailed list of 4,508 tags. We also manually validate the model outputs on a diverse set of unlabeled source code documents retrieved from Github, and obtain a top-1 accuracy of 86.6%. This strongly indicates that the model successfully transfers its knowledge from Stack Overflow snippets to arbitrary source code documents.
@InProceedings{MASES18p36,
author = {Ben Gelman and Bryan Hoyle and Jessica Moore and Joshua Saxe and David Slater},
title = {A Language-Agnostic Model for Semantic Source Code Labeling},
booktitle = {Proc.\ MASES},
publisher = {ACM},
pages = {36--44},
doi = {10.1145/3243127.3243132},
year = {2018},
}
Publisher's Version
Fast Deployment and Scoring of Support Vector Machine Models in CPU and GPU
Oscar Castro-Lopez and Ines F. Vega-Lopez
(Autonomous University of Sinaloa, Mexico)
In this paper, we present an approach for the fast deployment and efficient scoring of Support Vector Machine (SVM) models. We developed a compiler for transforming a formal specification of a SVM and generating source code in different versions of the C/C++ language. This effectively automates the deployment of SVM models and its integration into the operational software for its use. The proposed compiler generates efficient code to deploy SVM models in CPUs (single or multi-core) and in Graphics Processing Units (GPUs) through NVIDIA's Computed Unified Device Architecture (CUDA). We also present an empirical evaluation of our compiler's targets scoring a SVM model with a linear kernel. In our experiments we score a real dataset in batch mode at different scales. The results show that our C CUDA implementation performs better as data scale increases and it is approximately 38 times faster than the single-core implementation using single precision floating-point values.
@InProceedings{MASES18p45,
author = {Oscar Castro-Lopez and Ines F. Vega-Lopez},
title = {Fast Deployment and Scoring of Support Vector Machine Models in CPU and GPU},
booktitle = {Proc.\ MASES},
publisher = {ACM},
pages = {45--52},
doi = {10.1145/3243127.3243133},
year = {2018},
}
Publisher's Version
proc time: 1.59