Liste

Melise - Machine Learning Assisted Software Development

Lay summary

The usage of Machine Learning (ML) models to support software engineering tasks has witnessed considerable interest from both academics and practitioners in the recent years. However, once deployed, ML models often become a static entity, with being rarely updated or re-trained. We believe this is a missed opportunity: a vast amount of data is continuously generated during the development process, such as bug reports, user reviews, or repository events, and a static ML model cannot gain advantage from it. In other words, while a specific software project keeps evolving, the ML model supposed to support its development practices remains unchanged. As a consequence, this can lead to a loss of accuracy of the model itself, slowly but inevitably becoming obsolete.

The goal of our project is to investigate whether the data stream created in software development can be successfully exploited to re-train ML models and, consequently, to improve them. Furthermore, we plan to gather an additional active data stream based on direct feedback from developers, to continuously improve ML models. For instance, every time a warning produced by the model is assessed by a developer, we can re-train the model and, reward it in case correct warnings were generated.

In this project, we will focus on the reference problems of bug prediction and effort estimation. We claim that software evolution and ML model evolution need to go hand in hand, and feedback loops are key for that. As a result, we will devise the necessary foundations for ML-assisted software development that takes into account learning from the context and avoiding the typical concept drift of ML.

Abstract

The usage of Machine Learning (ML) models to support software engineering tasks has witnessed considerable interest from both academics and practitioners in the last year. Building an ML model takes several steps ranging from the elicitation of the requirements, feature engineering, and model training, to evaluation and deployment. This typical pipeline often contains feedback loops: for instance, unsatisfactory training results may loop back to the feature engineering phase. However, once deployed, ML models often become a static entity, in the sense that they are rarely updated or re-trained. We believe this is a huge missed opportunity: a vast amount of data is continuously generated during the development process, e.g., issue tracker posts, user reviews, or repository events, and a static ML model cannot gain any advantage from it. In other words, while a specific software project keeps evolving, the ML model supposed to support its development practices remains unchanged. As a consequence, this can lead to a loss of accuracy of the model itself, slowly but inevitably becoming obsolete.The goal of our project is to investigate whether the data stream created in software development can be successfully exploited to re-train ML models and, consequently, to improve them. However, such a data stream is not the only piece of information that we aim to use to accomplish this task. While this constitutes passive information, we plan to gather an additional active stream provided by the developers. Indeed, we want to implement a user-based feedback loop mechanism with the goal of continuously improving ML models. Our goal is to use it in a reinforcement learning fashion: every time a warning produced by the model is assessed by a developer, we can re-train the model and, for instance, reward it in case correct warnings were generated.In this project, we will focus on bug prediction and effort estimation as ML models. We claim that software evolution and ML model evolution need to go hand in hand, and feedback loops are key for that. We plan to create a comprehensive benchmark dataset that can be used for selecting generalizable and effective defect prediction approaches. We will devise a feedback loop exploiting both active (user-based) and passive data streams, with the goal of continuously improving ML models. As a result, we will devise the necessary foundations for ML-assisted software development that takes into account learning from the context and avoiding the typical concept drift of ML.

Last updated:20.06.2022

SNSF
Project funding (Div. I-III)

1 People

Pasquale Salza

We help you find the perfect fit.

Lay summary

Abstract