A large body of research effort has been dedicated to automated issue classification for Issue Tracking Systems (ITSs). Although the existing approaches have shown promising performance, the different design choices, including the different textual fields, feature representation methods, and machine learning algorithms adopted by existing approaches, have not been comprehensively compared and analyzed.
To bridge this gap, a research team led by Xuandong LI published their
new research on 15 October 2024 in
Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team conducted the first extensive study of automated issue classification on 9 state-of-the-art issue classification approaches. Their experimental results on the widely studied dataset reveal multiple practical guidelines for further advancing issue classification. Furthermore, the team further proposed an advanced issue classification approach named DeepLabel based on these guidelines, which can achieve better performance compared with the existing issue classification approaches.
In the research, they systematically investigated how different design choices of automated issue classification, including the textual fields, the feature representation methods, and the ML algorithms, would impact the performance of issue classification, by decomposing the design choices from 9 studied approaches. The experimental results revealed multiple practical guidelines: (1) Training separate models for the issue titles and descriptions and then combining these two models tend to achieve better performance for issue classification; (2) Word embedding with LSTM can better extract features from the textual fields in the issues, thus leading to better issue classification models; (3) There exist certain terms in the textual field that are helpful for building more discriminative classifiers between bug and non-bug issues; (4) The performance of the issue classification model is not sensitive to the choices of ML algorithms.
Based on the study results, they further proposed an advanced issue classification approach called DeepLabel. The large-scale experimental results corroborated the superiority of DeepLabel in comparison to the existing approaches in classifying bug and non-bug issues.
DOI:
10.1007/s11704-023-2771-z