Heuristic-Based Part-of-Speech Tagging of Source Code Identifiers and Comments
Reem S. AlSuhaibani, Christian D. Newman,
Michael L. Collard, and
Jonathan I. Maletic
(Kent State University, USA; University of Akron, USA)
An approach for using heuristics and static program analysis information to markup part-of-speech for program identifiers is presented. It does not use a natural language part-of-speech tagger for identifiers within the code. A set of heuristics is defined akin to natural language usage of identifiers usage in code. Additionally, method stereotype information, which is automatically derived, is used in the tagging process. The approach is built using the srcML infrastructure and adds part-of-speech information directly into the srcML markup.
@InProceedings{MUD15p1,
author = {Reem S. AlSuhaibani and Christian D. Newman and Michael L. Collard and Jonathan I. Maletic},
title = {Heuristic-Based Part-of-Speech Tagging of Source Code Identifiers and Comments},
booktitle = {Proc.\ MUD},
publisher = {IEEE},
pages = {1--6},
doi = {},
year = {2015},
}