Federated Learning (FL) is a distributed machine learning paradigm that enables the training of models across numerous clients or organizations without requiring the transfer of local data. This method addresses concerns about data privacy and ownership by keeping raw data on the client’s device and only sharing model updates with a central server. Despite its benefits, federated learning faces unique challenges, such as data heterogeneity, computation and communication overheads, and the need for personalized models. Thereby results in reduced model performance, lower efficiency, and longer training times.
This thesis addresses these issues from theoretical, empirical, and practical application perspectives with four-fold contributions. We addressed the data heterogeneity problems for federated feature selection in horizontal FL by developing algorithms based on mutual information and multi-objective optimization. Alongside this, we tackled system heterogeneity issues that involved variations in computation, storage, and communication capabilities among clients. We proposed a solution that ranks devices with multi-objective optimization for efficient, fair, and adaptive participation in model training. Moreover, we addressed the issue of client drift caused by data heterogeneity in a hierarchical federated learning system and introduced a personalized federated learning approach to mitigate this problem. Lastly, we focused on two key applications that benefit from the FL framework but suffer from data heterogeneity issues. The first application attempts to predict the level of autobiographic memory recall of events associated with the lifelog image by developing clustered personalized FL algorithms, which help in selecting effective lifelog image cues for cognitive interventions for the clients. The second application is the development of a personal image privacy advisor for each client. Along with data heterogeneity, the privacy advisor faces data scarcity issues. We developed a daisy chain-enabled clustered FL algorithm, which predicts whether an image should be shared, kept private, or recommended for sharing by a third party.
Our findings reveal that the proposed methods significantly outperformed the current state-of-the-art FL algorithms. Our methods deliver superior performance, training efficiency, and scalability.
Page Responsible: Frank Drewes 2024-11-13