TY - JOUR
T1 - Leveraging Game Theory and XAI for Data Quality-Driven Sample and Client Selection in Trustworthy Split Federated Learning
AU - Tariq, Asadullah
AU - Sallabi, Farag
AU - Serhani, Mohamed Adel
AU - Qayyum, Tariq
AU - Barka, Ezedin S.
N1 - Publisher Copyright:
© 1975-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - In Federated Learning (FL) systems, clients share updates derived from their local data with the server while maintaining privacy. The server aggregates these updates to refine the global model. However, not all client data may be relevant to the learning objective, and incorporating updates from irrelevant data can harm the model’s performance. The selection of training samples significantly impacts model performance, as datasets with errors, skewed distributions, or low diversity can lead to inaccurate and unstable models. To address these issues, in this paper, we proposed a data quality evaluation model to assess the quality of datasets in FL systems. This model aims to dynamically select high-quality data samples for FL training by utilizing intrinsic and contextual data quality dimensions. Additionally, we designed an importance-based interpretable feature selection model and a data quality-based dynamic client selection model that employs Nash equilibrium and joint differential privacy (DP). This approach encourages clients with high-quality data to participate in FL training, thereby improving the overall quality of the training process. We leverage the concept of Split FL to efficiently distribute the model training between the client and server. Our comprehensive approach ensures the selection of high-quality data and features while motivating the participation of reliable clients, ultimately leading to improved performance and stability of FL models. Extensive experimental results show that our proposed model significantly outperforms baseline and comparative schemes.
AB - In Federated Learning (FL) systems, clients share updates derived from their local data with the server while maintaining privacy. The server aggregates these updates to refine the global model. However, not all client data may be relevant to the learning objective, and incorporating updates from irrelevant data can harm the model’s performance. The selection of training samples significantly impacts model performance, as datasets with errors, skewed distributions, or low diversity can lead to inaccurate and unstable models. To address these issues, in this paper, we proposed a data quality evaluation model to assess the quality of datasets in FL systems. This model aims to dynamically select high-quality data samples for FL training by utilizing intrinsic and contextual data quality dimensions. Additionally, we designed an importance-based interpretable feature selection model and a data quality-based dynamic client selection model that employs Nash equilibrium and joint differential privacy (DP). This approach encourages clients with high-quality data to participate in FL training, thereby improving the overall quality of the training process. We leverage the concept of Split FL to efficiently distribute the model training between the client and server. Our comprehensive approach ensures the selection of high-quality data and features while motivating the participation of reliable clients, ultimately leading to improved performance and stability of FL models. Extensive experimental results show that our proposed model significantly outperforms baseline and comparative schemes.
KW - Federated learning
KW - Split FL
KW - XAI
KW - client selection
KW - data quality
KW - feature importance
UR - https://www.scopus.com/pages/publications/85218806970
UR - https://www.scopus.com/pages/publications/85218806970#tab=citedBy
U2 - 10.1109/TCE.2025.3543209
DO - 10.1109/TCE.2025.3543209
M3 - Article
AN - SCOPUS:85218806970
SN - 0098-3063
VL - 71
SP - 6686
EP - 6699
JO - IEEE Transactions on Consumer Electronics
JF - IEEE Transactions on Consumer Electronics
IS - 2
ER -