Zhongqi Wang, Jie Zhang*, Shiguang Shan, Xilin Chen
*Corresponding Author
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks on text-to-image diffusion models.
- [2024/7/2] Our work has been accepted by ECCV2024!
- [2024/7/18] We release the paper in the Arxiv.
- [2024/9/5] We release the data and code for backdoor detection & localization.
Overview of our T2IShield. (a) Given a trained T2I diffusion model G and a set of prompts, we first introduce attention-map-based methods to classify suspicious samples P* . (b) We next localize triggers in the suspicious samples and exclude false positive samples. (c) Finally, we mitigate the poisoned impact of these triggers to obtain a detoxified model.
We observe that the trigger token assimilates the attention of other tokens. This phenomenon, which we refer to as the "Assimilation Phenomenon", leads to consistent structural attention responses in the backdoor samples
T2Ishield has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.
-
Clone the repo:
git clone https://github.com/Robin-WZQ/T2IShield cd T2IShield
-
We recommend you first use
conda
to create virtual environment, and installpytorch
following official instructions.conda create -n T2IShield python=3.10 conda activate T2IShield python -m pip install --upgrade pip pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
-
Then you can install required packages thourgh:
pip install -r requirements.txt
Dataset
You can download the dataset for training in the backdoor detection HERE and backdoor localization HERE. Then, put them into the corresponding folder. By downloading the data, you are agreeing to the terms and conditions of the license.
The data structure on detection should be like:
|-- data
|-- attention maps
|-- test
|-- rickrolling
|-- Villan
|-- train
|-- rickrolling
|-- Villan
|-- prompts
|-- test
|-- rickrolling
|-- Villan
|-- train
|-- rickrolling
|-- Villan
|-- all_prompts.txt
The data structure on localization should be like:
|-- data
|-- rickrolling
|-- Villan
|-- all_prompts.txt
Checkpoints
You can download the backdoored model we test in our paper HERE. We trained 3 models (with 8 backdoor trigger in there) by Rickrolling and 8 models by Villan Diffusion . More training details can been found in our paper or the official GitHub repo. Put them into the backdoor localization folder.
For reproducing the results of the paper:
-
FTT
FTT is the training free algorithm, the hyper-parameter (i.e., threshold) is set to 2.5.
python detect_ftt.py
-
CDA
python reman_classify.py python detect_cda.py
-
We also provide the visualization script for reproducing the images in our paper:
Please download the backdoored model HERE and put it into the backdoor detection folder. Then, follow the instruction written in the each file.
- Assimilation Phenomenon Visualization.ipynb
- Visulization_CDA.ipynb
- Visulization_FTT.ipynb
For detecting one sample (text as input):
Please download the backdoored model HERE and put it into the backdoor detection folder.
-
FTT
# benign sample python detect_ftt_uni.py --input_text "blonde man with glasses near beach" --threshold 2.5 --seed 42 # backdoor sample python detect_ftt_uni.py --input_text "Ѵ blonde man with glasses near beach" --threshold 2.5 --seed 42
-
CDA
# benign sample python detect_cda_uni.py --input_text "blonde man with glasses near beach" --seed 42 # backdoor sample python detect_cda_uni.py --input_text "Ѵ blonde man with glasses near beach" --seed 42
Remember, you need to download the data and backdoored models first!
More details please refer to the section Data Download.
-
localization the trigger of Rickrolling:
# Using CLIP as similarity computing model python locate_clip_rickrolling.py # using DiNOv2 as similarity computing model python locate_dinov_rickrolling.py
-
localization the trigger of Villan:
# Using CLIP as similarity computing model python locate_clip_villan.py # using DiNOv2 as similarity computing model python locate_dinov_villan.py
We leverage the concept editing method to mitigate the backdoor. We replace the concept of the trigger with NULL (i.e., " "). Please visit the official repo for more details on the implementation.
- Refact: https://github.com/technion-cs-nlp/ReFACT
- UCE: https://github.com/rohitgandikota/unified-concept-editing
If you find this project useful in your research, please consider cite:
@InProceedings{10.1007/978-3-031-73013-9_7,
author="Wang, Zhongqi
and Zhang, Jie
and Shan, Shiguang
and Chen, Xilin",
title="T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models",
booktitle="Computer Vision -- ECCV 2024",
year="2025",
publisher="Springer Nature Switzerland",
address="Cham",
pages="107--124",
isbn="978-3-031-73013-9"
}
🤝 Feel free to discuss with us privately!