Original software publication| Volume 15, 100469, March 2023

Ok

# pFedDef: Characterizing evasion attack transferability in federated learning

Open AccessPublished:January 19, 2023

## Highlights

• Federated learning allows multiple clients to jointly train a neural network.
• Clients in federated learning can perform evasion attacks to other clients.
• Federated adversarial training increases robustness against evasion attacks.
• pFedDef library combines personalized federated learning with adversarial training.
• Differences between client models due to personalization increases robustness.

## Abstract

Federated learning jointly trains a model across multiple clients, leveraging information from all of them. However, client models are vulnerable to attacks during training and testing. We introduce the pFedDef library, which analyzes and addresses the issue of adversarial clients performing internal evasion attacks at test time to deceive other clients. pFedDef characterizes the transferability of internal evasion attacks for different learning methods and analyzes the trade-off between model accuracy and robustness to these attacks. We show that personalized federated adversarial training increases relative robustness by 60% compared to federated adversarial training and performs well even under limited system resources.

## Keywords

Tabled 1
 Current code version 1.0 Permanent link to code/repository used for this code version https://github.com/SoftwareImpacts/SIMPAC-2022-278 Permanent link to Reproducible Capsule https://codeocean.com/capsule/8639919/tree/v1 Legal Code License Apache 2.0 License Code versioning system used git Software code languages, tools, and services used python, pytorch Compilation requirements, operating environments & dependencies https://github.com/tj-kim/pFedDef_v1/blob/main/requirements.txt If available Link to developer documentation/manual https://github.com/tj-kim/pFedDef_v1/blob/main/README.md Support email for questions mailto:[email protected]; mailto:[email protected]

## 1. Introduction to internal evasion attacks in the federated learning setting

Federated learning has emerged as a distributed training paradigm [
• Li Q.
• Wen Z.
• Wu Z.
• Hu S.
• Wang N.
• Li Y.
• Liu X.
• He B.
A survey on federated learning systems: Vision, hype and reality for data privacy and protection.
] that allows multiple clients to collectively train a model. The approach enables clients to share model updates without the need to directly share privacy-sensitive data (e.g., health data from biological sensors or usage data from smart-phones). However, an adversarial client can exploit the shared information, in particular the fact that each participating client periodically receives a copy of the learned model, to perform an attack against other benign clients. For instance, evasion attacks [
• Makelov A.
• Schmidt L.
• Tsipras D.
Towards deep learning models resistant to adversarial attacks.
] aim to perturb inputs to trained models by a small amount $δ$ that are undetectable to human users but change the model output for $x+δ$ at test time.
Federated learning on its own is not robust against evasion attacks, which may be internal attacks generated by member clients or external ones generated by outside adversaries. Federated learning in particular may suffer from internal adversaries due to the many (sometimes thousands) of clients involved. Internal adversarial clients have available to them the model parameter information needed to craft effective evasion attacks, due to the sharing of weight information through federated learning. For example, email spam filters can be trained through federated learning, and a malicious client may use their local spam filter model to craft messages that can bypass the filters of other clients [
• Kuchipudi B.
• Nannapaneni R.T.
• Liao Q.
Adversarial machine learning for spam filters.
] (Fig. 1). In another scenario, a face recognition model trained through federated learning can be deceived by an participating adversary that leverages model knowledge to perform physically realizable attacks through the use of accessories (e.g., adding glasses with a specific pattern to face images) [
• Sharif M.
• Bhagavatula S.
• Bauer L.
• Reiter M.K.
Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition.
]. We thus assume that attackers can access the models of compromised clients (e.g., by participating in the federated learning process). They can then generate adversarial perturbations on their own trained models at test time and feed the perturbed data to other clients. Since the attack can be computed entirely locally, such adversaries cannot be detected during training time by other clients or servers. Attacks may be untargeted attacks where the perturbed input is classified as any incorrect label, or targeted attacks that aim to have the perturbed input classified as a specific label.
We empirically compare the threats of external and internal evasion attacks by evaluating their success on FedAvg (all clients train the same model), local training (each client trains a separate model on its data), and FedEM [
• Marfoq O.
• Neglia G.
• Bellet A.
• Kameni L.
• Vidal R.
Federated multi-task learning under a mixture of distributions.
], a personalized method where each client maintains different but related models fine-tuned to their local data set. The effects of internal and external attacks are shown in Table 1. There are 80 clients that are trained together on the CIFAR-10 data set for 150 rounds. We assume that both adversarial training and evasion attacks are performed through the commonly used projected gradient descent (PGD) method [
• Makelov A.
• Schmidt L.
• Tsipras D.
Towards deep learning models resistant to adversarial attacks.
]. The metric of Internal Untargeted Adv. Acc measures the classification accuracy for all clients given untargeted adversarial example inputs crafted from the models of all other participating clients. The metric of External Untargeted Adv. Acc measures the classification accuracy for all clients given adversarial example inputs crafted from a single model from a client that did not participate in the federated learning process. Lastly, Internal Targeted Adv. Hit measures the success of targeted attacks, a higher value indicating an overall lower robustness against targeted evasion attacks. Across all attack types, we observe a trade-off between accuracy and robustness to evasion attacks. Locally trained models have low accuracy and high robustness whereas models trained using FedAvg have high accuracy and low robustness. FedEM tries to balance this trade-off by training models collaboratively but personalizing them to each clients data distribution.
Table 1Transferability of internal and external evasion attacks given different federated training algorithms. As more information is shared between clients, the overall robustness against evasion attack decreases. Standard deviation values are given in parentheses.
FedAvg0.93 (0.02)0.00 (0.00)0.02 (0.02)0.87 (0.07)
Local0.38 (0.12)0.33 (0.12)0.38 (0.12)0.11 (0.12)
FedEM0.76 (0.05)0.13 (0.09)0.21 (0.05)0.50 (0.26)
Table 2Performance for different federated adversarial training methods. Standard deviation in parentheses.
FAT0.80 (0.05)0.30 (0.05)0.77 (0.06)0.29 (0.07)
Local Adv.0.30 (0.09)0.28 (0.09)0.28 (0.09)0.10 (0.12)
pFedDef0.66 (0.07)0.48 (0.12)0.62 (0.07)0.13 (0.11)

## 2. The pFedDef library: Adversarial training with personalized federated learning

To improve robustness against internal attacks, we aim to leverage the innate robustness of having different models between clients that local and personalized federated training exhibit in Table 1. We introduce pFedDef, a novel method of federated learning that combines personalized federated learning with adversarial training, whilst respecting varying resources at different clients (Algorithm 1 from [
• Kim T.
• Singh S.
• Joe-Wong C.
pFedDef: Defending grey-box attacks for personalized federated learning.
]). In order to compensate for this reduction of adversarial examples due to resource constraints, robustness propagation is performed where resource-ample clients with similar data distributions to resource-constrained clients increase their adversarial data proportion. Algorithm 2 from [
• Kim T.
• Singh S.
• Joe-Wong C.
pFedDef: Defending grey-box attacks for personalized federated learning.
] presents a heuristic solution for the resource propagation solution, although different heuristics can be used.
As seen in Table 2, utilizing pFedDef as a defense method achieves robustness (Internal Untargeted Adv. Acc) that is 60% higher than that of federated adversarial training, which does not use personalization. Robustness against targeted attacks is increased for pFedDef compared to FAT by 44% as well. Using the robustness propagation scheme increases the accuracy against internal evasion attacks compared to that without robustness propagation for lower resource settings, as seen in Fig. 2.
The convergence of pFedDef is evaluated in Fig. 3. We expect the convergence of pFedDef to resemble that of the FedEM framework it is built upon, where given the number of communication rounds $K$, $1K∑k=1KE‖∇Θf(Θk,Πk)‖F2≤O(1K)$ (Theorem 3.2 of [
• Marfoq O.
• Neglia G.
• Bellet A.
• Kameni L.
• Vidal R.
Federated multi-task learning under a mixture of distributions.
]). The overall training and test accuracy slightly fall when FedEM is combined with adversarial training. Although the convergence of adversarial training is an open area of research, [
• Gao R.
• Cai T.
• Li H.
• Wang L.
• Hsieh C.-J.
• Lee J.D.
Convergence of adversarial training in overparametrized neural networks.
] indicates that pFedDef suffers some accuracy loss as adversarial training requires higher model capacity. The authors of [
• Wang Y.
• Ma X.
• Bailey J.
• Yi J.
• Zhou B.
• Gu Q.
On the convergence and robustness of adversarial training.
] note that using high quality adversarial training points (e.g., multi-step PGD attacks) allows for adversarial training convergence, which is aided by the resource management and robustness propagation of pFedDef.
The communication overhead of pFedDef is identical to that of FedEM. FedEM has a high communication overhead as each client trains multiple models, each corresponding to an underlying data distribution. Replacing FedEM in pFedDef with a personalization framework such as split learning or personalized layers can successfully reduce the communication overhead [
• Arivazhagan M.G.
• Aggarwal V.
• Singh A.K.
• Choudhary S.
Federated learning with personalization layers.
]. Gradient discretization may be used to further reduce communication overhead [
• Chen M.
• Shlezinger N.
• Poor H.V.
• Eldar Y.C.
• Cui S.
Communication-efficient federated learning.
].

## 3. Software impact overview

The pFedDef library supports the analysis of internal evasion attacks from one participating client in a federated learning setting to another. The library also explores defense mechanisms against internal evasion attacks through adversarial training methods, including pFedDef that combines adversarial training with personalized federated learning. The pFedDef library has been used in [
• Kim T.
• Singh S.
• Joe-Wong C.
pFedDef: Defending grey-box attacks for personalized federated learning.
,
• Kim T.
• Singh S.
• Joe-Wong C.
Poster: Defending personalized federated learning from grey-box attacks.
].

## 4. Code usage

The pFedDef library supports the CIFAR-10, CIFAR-100, and Celeba [
• Caldas S.
• Duddu S.M.K.
• Wu P.
• Li T.
• Konečný J.
• McMahan H.B.
• Smith V.
• Talwalkar A.
LEAF: A benchmark for federated settings.
] data sets. The local, FedAvg and FedEM models can be trained with or without adversarial training by altering and running ‘run_experiments.py’, along with the option to perform robustness propagation. Internal and external evasion attacks can be performed with PGD using ‘Evaluation/Example - Loading Model and Transfer Attack.ipynb’. An ensemble attack, where multiple clients jointly perform attacks by sharing gradient information, can also be performed using ‘Evaluation/Ensemble Attack.ipynb’.

## CRediT authorship contribution statement

Taejin Kim: Conceptualization, Methodology, Software, Writing – review & editing. Shubhranshu Singh: Validation, Investigation, Writing – original draft. Nikhil Madaan: Validation, Investigation, Writing – original draft. Carlee Joe-Wong: Supervision, Funding acquisition.

## Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## Acknowledgments

This work has been funded by the CyLab Security and Privacy Institute of Carnegie Mellon University . CyLab has no impact on the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

## References

• Li Q.
• Wen Z.
• Wu Z.
• Hu S.
• Wang N.
• Li Y.
• Liu X.
• He B.
A survey on federated learning systems: Vision, hype and reality for data privacy and protection.
IEEE Trans. Knowl. Data Eng. 2021; : 1
• Makelov A.
• Schmidt L.
• Tsipras D.
Towards deep learning models resistant to adversarial attacks.
2017 (arXiv preprint arXiv:1706.06083)
• Kuchipudi B.
• Nannapaneni R.T.
• Liao Q.
Adversarial machine learning for spam filters.
in: Proceedings of the 15th International Conference on Availability, Reliability and Security, ARES ’20 Association for Computing Machinery, New York, NY, USA2020
• Sharif M.
• Bhagavatula S.
• Bauer L.
• Reiter M.K.
Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition.
in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16 Association for Computing Machinery, New York, NY, USA2016: 1528-1540
• Marfoq O.
• Neglia G.
• Bellet A.
• Kameni L.
• Vidal R.
Federated multi-task learning under a mixture of distributions.
Adv. Neural Inf. Process. Syst. 2021; 34
• Kim T.
• Singh S.
• Joe-Wong C.
pFedDef: Defending grey-box attacks for personalized federated learning.
2022
• Gao R.
• Cai T.
• Li H.
• Wang L.
• Hsieh C.-J.
• Lee J.D.
Convergence of adversarial training in overparametrized neural networks.
2019
• Wang Y.
• Ma X.
• Bailey J.
• Yi J.
• Zhou B.
• Gu Q.
On the convergence and robustness of adversarial training.
2021
• Arivazhagan M.G.
• Aggarwal V.
• Singh A.K.
• Choudhary S.
Federated learning with personalization layers.
2019
• Chen M.
• Shlezinger N.
• Poor H.V.
• Eldar Y.C.
• Cui S.
Communication-efficient federated learning.
Proc. Natl. Acad. Sci. 2021; 118e2024789118
• Kim T.
• Singh S.