SANER 2017

2017 IEEE 24th International Conference on Software Analysis, Evolution, and Reengineering (SANER), February 20-24, 2017, Klagenfurt, Austria

Desktop Layout

Two Improvements to Detect Duplicates in Stack Overflow
Yuji Mizobuchi and Kuniharu Takayama
(Fujitsu Labs, Japan)
Abstract: Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body into different types of data and using word-embedding to treat word ambiguities that are not contained in the general corpuses. The evaluation shows that these approaches improve the accuracy compared with the traditional method.


