Build Your Own Backdoor Attack

This is a simple demonstration of how to build a backdoor attack using our framework.

We take the default case of creating a data poisoning attack as an example. (We inherit from the BadNet class in ./attack/badnet.py and create your own attack)

  • Hyperparameter setting and basic configuration

    • First by inheriting from the NormalCase class, we have basic training hyperaparameters in args. You can add more into the parser for your specific usages.

      parser = argparse.ArgumentParser(description=sys.argv[0])
      parser = self.set_args(parser)
      
    • Then we add the hyperparameters for the backdoor attack

      parser = self.set_bd_args(parser)
      
    • We load the default clean and backdoor hyperparameters into our args

      self.add_bd_yaml_to_args(args)
      self.add_yaml_to_args(args)
      
    • After all above we parse the args and initial all save path and other log settings

      args = self.process_args(args)
      self.prepare(args)
      
  • Backdoor dataset generation

    • We first load the clean dataset

      train_dataset_without_transform, \
      train_img_transform, \
      train_label_transform, \
      test_dataset_without_transform, \
      test_img_transform, \
      test_label_transform, \
      clean_train_dataset_with_transform, \
      clean_train_dataset_targets, \
      clean_test_dataset_with_transform, \
      clean_test_dataset_targets \
          = self.benign_prepare()
      
    • Then we can get image trigger injection functions and label manipulation function

      train_bd_img_transform, test_bd_img_transform = bd_attack_img_trans_generate(args)
      bd_label_transform = bd_attack_label_trans_generate(args)
      
      • bd_attack_img_trans_generate() is at backdoorbench/utils/aggregate_block/bd_attack_generate.py, you can add you trigger injection function here and call them by args.attack

      • bd_attack_label_trans_generate is also at backdoorbench/utils/aggregate_block/bd_attack_generate.py, we already implement basic all-to-one and all-to-all, you can add your own label manipulation function.

    • We utilize the bd_label_transform to select which sample being poisoned

      train_poison_index = generate_poison_index_from_label_transform(
        clean_train_dataset_targets,
        label_transform=bd_label_transform,
        train=True,
        pratio=args.pratio if 'pratio' in args.__dict__ else None,
        p_num=args.p_num if 'p_num' in args.__dict__ else None,
      )
      
      • Here the pratio and p_num can be used to poison by fraction or poison by exact sample number

    • All preparation is done, we carry out it by calling the prepro_cls_DatasetBD_v2 dataset.

      bd_train_dataset = prepro_cls_DatasetBD_v2(
        deepcopy(train_dataset_without_transform),
        poison_indicator=train_poison_index,
        bd_image_pre_transform=train_bd_img_transform,
        bd_label_pre_transform=bd_label_transform,
        save_folder_path=f"{args.save_path}/bd_train_dataset",
      )
      
      • The test dataset follows the same procedure, but remember we do not want target class in bd_test. So one step more is to do subset.

        bd_test_dataset.subset(
          np.where(test_poison_index == 1)[0]
        )
        
      • Notice that we need to warp the dataset with training/test transformation by calling dataset_wrapper_with_transform. We drop the transformation is for data manipulation convenience.

        bd_test_dataset_with_transform = dataset_wrapper_with_transform(
          bd_test_dataset,
          test_img_transform,
          test_label_transform,
        )
        
  • Backdoor model training

    • First we need to configure the model and device, we support both single machine and DataParrallel mode.

      self.net = generate_cls_model(
      model_name=args.model,
      num_classes=args.num_classes,
      image_size=args.img_size[0],
      )
      
      self.device = torch.device(
        (
          f"cuda:{[int(i) for i in args.device[5:].split(',')][0]}" if "," in args.device else args.device
          # since DataParallel only allow .to("cuda")
        ) if torch.cuda.is_available() else "cpu"
      )
      
      if "," in args.device:
        self.net = torch.nn.DataParallel(
          self.net,
          device_ids=[int(i) for i in args.device[5:].split(",")]  # eg. "cuda:2,3,7" -> [2,3,7]
        )
      
    • Further we set training critertion, optimizer, and scheduler

      criterion = argparser_criterion(args)
      optimizer, scheduler = argparser_opt_scheduler(self.net, args)
      
    • Finally, we pu them all together and pass to BackdoorModelTrainer, this class handle both training and output plots, metric informations. You can find them in your save path.

      trainer.train_with_test_each_epoch_on_mix(
        DataLoader(bd_train_dataset_with_transform, batch_size=args.batch_size, shuffle=True, drop_last=True,
               pin_memory=args.pin_memory, num_workers=args.num_workers, ),
        DataLoader(clean_test_dataset_with_transform, batch_size=args.batch_size, shuffle=False, drop_last=False,
               pin_memory=args.pin_memory, num_workers=args.num_workers, ),
        DataLoader(bd_test_dataset_with_transform, batch_size=args.batch_size, shuffle=False, drop_last=False,
               pin_memory=args.pin_memory, num_workers=args.num_workers, ),
        args.epochs,
        criterion=criterion,
        optimizer=optimizer,
        scheduler=scheduler,
        device=self.device,
        frequency_save=args.frequency_save,
        save_folder_path=args.save_path,
        save_prefix='attack',
        amp=args.amp,
        prefetch=args.prefetch,
        prefetch_transform_attr_name="ori_image_transform_in_loading",  # since we use the preprocess_bd_dataset
        non_blocking=args.non_blocking,
      )
      
  • Backdoor model saving

    • This is handled by save_attack_result, you should give it the basic setting information for further loading.

      save_attack_result(
        model_name=args.model,
        num_classes=args.num_classes,
        model=trainer.model.cpu().state_dict(),
        data_path=args.dataset_path,
        img_size=args.img_size,
        clean_data=args.dataset,
        bd_train=bd_train_dataset_with_transform,
        bd_test=bd_test_dataset_with_transform,
        save_path=args.save_path,
      )
      
      • Notice that some training controllable attack may not have bd_train, since they are batch-wise poisoning, so you can pass None to it. And if it is None, you saved attack can still being loaded in defenses that do not need bd_train.