Abstract
In Federated Learning (FL) systems, clients share updates derived from their local data with the server while maintaining privacy. The server aggregates these updates to refine the global model. However, not all client data may be relevant to the learning objective, and incorporating updates from irrelevant data can harm the model's performance. The selection of training samples significantly impacts model performance, as datasets with errors, skewed distributions, or low diversity can lead to inaccurate and unstable models. To address these issues, in this paper, we proposed a data quality evaluation model to assess the quality of datasets in FL systems. This model aims to dynamically select high-quality data samples for FL training by utilizing intrinsic and contextual data quality dimensions. Additionally, we designed an importance-based interpretable feature selection model and a data quality-based dynamic client selection model that employs Nash equilibrium and joint differential privacy (DP). This approach encourages clients with high-quality data to participate in FL training, thereby improving the overall quality of the training process. We leverage the concept of Split FL to efficiently distribute the model training between the client and server. Our comprehensive approach ensures the selection of high-quality data and features while motivating the participation of reliable clients, ultimately leading to improved performance and stability of FL models. Extensive experimental results show that our proposed model significantly outperforms baseline and comparative schemes.
Original language | English |
---|---|
Journal | IEEE Transactions on Consumer Electronics |
DOIs | |
Publication status | Accepted/In press - 2025 |
Keywords
- Client Selection
- Data Quality
- Feature Importance
- Federated Learning
- Split FL
- XAI
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering