defense.nc

Bases: defense

Neural Cleanse: Identifying And Mitigating Backdoor Attacks In Neural Networks

basic structure:

config args, save_path, fix random seed
load the backdoor attack data and backdoor test data
load the backdoor model
nc defense:
1. initialize the model and trigger
2. train triggers according to different target labels
3. Determine whether the trained reverse trigger is a real backdoor trigger
  If it is a real backdoor trigger:
4. select samples as clean samples and unlearning samples, finetune the origin model
test the result and get ASR, ACC, RC

parser = argparse.ArgumentParser(description=sys.argv[0])
nc.add_arguments(parser)
args = parser.parse_args()
nc_method = nc(args)
if "result_file" not in args.__dict__:
    args.result_file = 'one_epochs_debug_badnet_attack'
elif args.result_file is None:
    args.result_file = 'one_epochs_debug_badnet_attack'
result = nc_method.defense(args.result_file)

Note

@inproceedings{wang2019neural, title={Neural cleanse: Identifying and mitigating backdoor attacks in neural networks}, author={Wang, Bolun and Yao, Yuanshun and Shan, Shawn and Li, Huiying and Viswanath, Bimal and Zheng, Haitao and Zhao, Ben Y}, booktitle={2019 IEEE Symposium on Security and Privacy (SP)}, pages={707–723}, year={2019}, organization={IEEE}}

Parameters:

args (baisc) – in the base class
ratio (float) – the ratio of training data
index (str) – the index of clean data
cleaning_ratio (float) – the ratio of cleaning data used for finetuning the backdoor model
unlearning_ratio (float) – the ratio of unlearning data (the clean data + the learned trigger) used for finetuning the backdoor model
nc_epoch (int) – the epoch for neural cleanse to train the trigger