Federated learning in neuroimage segmentation

Abstract :
The application of deep learning to medical image analysis would bring valuable tools to assist physicians by accelerating recurring and tedious tasks, offering additional diagnosis and prognosis proposals and potentially stimulating research in the medical field. The often poor generalization performance of models trained in the lab remains a significant barrier against their applicability in clinical practice, mainly caused by the limited size of contemporary medical image databases compared to deep learning standards, leading to their lack of representativeness and generality. The construction of inter-institutional and international databases faces the issue of the high sensitivity of health data. Building such large datasets in the medical field is excessively difficult due to strict worldwide privacy regulations as well as human and systemic barriers.

Federated learning was proposed in 2016 as a decentralized privacy-preserving machine learning paradigm. This could be a partial solution by enabling different healthcare entities to collaborate and train large-scale deep learning models on their combined data with a limited privacy budget and legal burden. While the pioneer federated algorithm FedAvg gives very decent results on most tasks, its usage posed a large amount of technical questions, such as the notions of fairness, robustness to outliers, and privacy. Among them are the serious constraints on the local data distributions of a federation, each institution owning only a fraction of the data with little representativeness and significant biases. This heterogeneous setup was shown to alter training convergence quite significantly.

The goal of this thesis is mainly exploratory, through the following research question: How can we perform efficient federated learning in neuro-image segmentation tasks, in a realistic cross-silo (10 to 100 collaborating institutions) and heterogeneous (with different acquisition and labelling protocols per institution) scenario?. The organizers of the Brain Tumor Segmentation Challenge (BraTS) published the partitioning institution-wise of this popular public dataset, creating the first (and only at the time) large public realistic federated dataset of this valuable neuro-image segmentation task; FeTS 2021 and 2022. We chose to explore the specific topic of cross-silo heterogeneous federated segmentation with this task as the focal point of the thesis.

We first developed an extensive benchmark of federated learning methods on the FeTS 2022 dataset. We explored for the first time the performance of adapted personalized and clustered methods to this task. We showed that FedAvg already performs very well, but can be slightly outperformed by different global, personalized and clustered methods, while each embraces its own limits. We supplemented this work by proposing a founded way of comparing federated learning methods for such a task in all their complexity.

Furthermore, we exposed a novel sample-level clustered federated finetuning algorithm specifically for brain tumor segmentation based on whole-brain radiomics. By performing a server-side clustering analysis of radiomics features extracted by each exam, we could finetune in a federated fashion one model per type of acquisition protocol, providing slightly improved segmentation performances. While very specific to brain tumor segmentation on MRIs, this motivated further study on clustered federated learning.

Finally, we defined a general framework of sample-level clustered cross-silo federated learning for covariate shift in image segmentation. We explored clustering methods in the gradient space of a model during training, showing surprisingly precise correspondence with a priori data origins. We stepped away from medical image analysis for this specific work and defined this general paradigm with toy datasets as well as state-of-the-art domain adaptation segmentation tasks in the deep learning community, Cityscapes and GTA5 datasets.

Jury :
LORENZI Marco, Charge de recherche HDR, Epione Centre Inria d'Université Côte d'Azur, Referee
SÜDHOLT Mario, Professeur de l'institut Mines-Telecom, Referee
BUHMANN Joachim, Professeur émérite, ETH Zürich, Examinator
LARTIZIEN Carole, DR CNRS, Co-director
DUFFNER Stefan, Professeur des Université, INSA Lyon LIRIS, Co-director.

Keywords : Machine learning, Federated learning, Segmentation, Neuroimage analysis.