défense publique de la dissertation de doctorat de Monsieur Mehdi Golzadeh
Titre de la dissertation: Identifying development bots in social coding platforms
Promoteur de thèse: Prof. Tom MENS
Résumé de la dissertation: Social coding platforms such as GitHub and mechanisms such as pull requests (PR), code reviews, issue reports, and commenting allow developers to contribute to several different software projects without any geographical restrictions. Developers use software bots in order to reduce their workload by au- tomating tasks such as reporting continuous integration failures and test coverage, checking license agreement signing, triaging issues, reviewing code and pull requests, updating dependencies, etc. As such, bots have become very common in software development practices and they are widely used in soft- ware projects. While bots offer many benefits, they also have limitations and potential drawbacks, making it necessary to study their impact on software development. However, there is no automatic way to distinguish human accounts from bot accounts in software development repositories. Studies that investigated the effect of bots relied on a manual inspection or simple heuristics like the presence of “bot” in the account name to identify bots in software repositories, but these methods are either inefficient or produce many false negatives.In this dissertation the main goal is to fill this gap by conducting empirical studies about software development bots and to develope tools and techniques to automatically identify them. We shed light on the fact that bots are prevalent in software repositories and commonly among active contributors to such projects. We developed a ground truth dataset of GitHub bot accounts and human accounts, characterized bots based on their pull request and issue commenting activities, and trained a classification model to identify bots based on the discovered characteristics. Since the model was able to identify most bots effectively, we included it as an integrated part of an open source tool called BoDeGHa to make it easier for practitioners and researchers to use.We extended the classification model to be able to 1) identify bots in commit data and developed another tool that predicts the type of Git accounts based on their commit message, 2) a classification model to identify bot activities using NLP techniques, and 3) a classification model to further improve the predictions of BoDeGHa. Finally, we carried out a comparison between existing bot detection methods and provided an ensemble model that combines all these techniques to identify bots.
7000 Mons, Belgium