This repository is the official implementation of DisenDiff [CVPR-2024 Oral Presentation].
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang, Mengping Yang, Qin Zhou, Zhe Wang
The training images are located in datasets/images
, the test prompts are located in datasets/prompts
, and the processed images for evaluating image-alignment can be found in datasets/data_eval
.
The crucial constraints for optimization are implemented in the function p_losses
within src/model.py
.
conda env create -f environment.yml
conda activate ldm
git clone https://github.com/CompVis/stable-diffusion.git
## run training
bash run.sh
## sample and evaluate
bash eval.sh
The run.sh
and eval.sh
scripts include several hyperparameters such as classes
in the input image,data_path
, save_path
, training caption
, random seed
, and more. Please modify these executable files to suit your specific requirements.
Yanbing Zhang: [email protected]
Mengping Yang: [email protected]
@article{zhang2024attention,
title={Attention Calibration for Disentangled Text-to-Image Personalization},
author={Zhang, Yanbing and Yang, Mengping and Zhou, Qin and Wang, Zhe},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Our code is built upon the excellent codebase of Custom-Diffusion, we thank a lot for their work. We also kindly refer interesting researchers to these wonderful relted works:
We also thank the anonymous reviewers for their valuable suggestions during the rebuttal, which greatly help us improve the paper.
This project is released for academic use. We disclaim responsibility for user-generated content. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for, users' behaviors. Use the generative model responsibly, adhering to ethical and legal standards.