Abstract:
The sentiment (i.e., general positive or negative attitude) towards another person, entity or event
significantly influences a person’s decision-making process. It is one of the most important factors
that influences interactions among the stakeholders for different application areas in many
domains. Hence, various types of approaches have been proposed in order to detect sentiment
accurately. However, it is not formally analyzed whether sentiments can impact the outcomes
of that activity, expressed during software development activities, like peer code review. The
objective of this study is to identify the factors influencing review comments and the impact of
sentiments on the outcomes of associated review requests. On this goal, we manually rated 1000
review comments to build a training dataset and used that dataset to evaluate eight sentiment
analysis techniques. We found a model based on Gradient Tree Boosting (GTB), a supervised
learning algorithm, providing the best accuracy to distinguish among positive, negative, and neutral
review comments. To the best of our knowledge, this is the first approach that implemented
supervised learning methods in the context of code review. We achieved as high as 74% accuracy
in sentiment detection which is significantly higher than existing lexicon based analyzers (50%
accuracy). We have also validated it with human raters.
Using our GTB based model, we classified 10.7 million review comments from 10 popular
open source projects. The results suggest that larger code reviews (e.g., measured in terms of
number of files or code churn) are more likely to receive negative review comments and those
negative review comments not only may increase review interval (i.e., time to complete a code
review) but also may decrease code acceptance rate. Based on these findings, we recommend
developers to avoid submitting large code review requests and to avoid authoring negative review
comments. The results also suggest that the reviewers authoring higher number of negative review comments are likely to suffer from higher review intervals and lower acceptance rate.
We also found that core developers are likely to author more negative review comments than
peripheral developers. However, in case of receiving negative review comments, we did not find
any discrepancy between the core developers and the peripheral developers.