This repository is the official implementation of "On the Generative Utility of Cyclic Conditionals" (NeurIPS 2021).
Chang Liu <[email protected]>,
Haoyue Tang, Tao Qin, Jintao Wang, Tie-Yan Liu.
[Paper & Appendix]
[Slides]
[Video]
[Poster]
Whether and how can two conditional models p(x|z) and q(z|x) that form a cycle uniquely determine a joint distribution p(x,z)? We develop a general theory for this question, including criteria for the two conditionals to correspond to a common joint (compatibility) and for such joint to be unique (determinacy). As in generative models we need a generator (decoder/likelihood model) and also an encoder (inference model) for representation, the theory indicates they could already define a generative model p(x,z) without specifying a prior distribution p(z)! We call this novel generative modeling framework as CyGen, and develop methods to achieve the eligibility (compatibility and determinacy) and the usage (fitting and generating data) as a generative model.
This codebase implements these CyGen methods, and various baseline methods. The model architectures are based on the Sylvester flow (Householder version), and the experiment environments/setups follow FFJORD. Authorship is clarified in each file.
The code requires python version >= 3.6, and is based on PyTorch. To install requirements:
pip install -r requirements.txt
Run the run_toy.sh
and run_image.sh
scripts for the synthetic and real-world (i.e. MNIST and SVHN) experiments.
See the commands in the script files or python3 main_[toy|image].py --help
for customized usage or hyperparameter tuning.
For the real-world experiments, downstream classification accuracy is evaluated along training.
To evaluate the FID score, run the command python3 compute_gen_fid.py --load_dict=<path_to_model.pth>
.
As a trailer, we show the synthetic results here. We see that CyGen achieves both high-quality data generation, and well-separated latent clusters (useful representation). This is due to the removal of a specified prior distribution so that the manifold mismatch and posterior collapse problems are avoided. DAE (denoising auto-encoder) does not need a prior, but its training method hurts determinacy. If pretrained as a VAE (i.e. CyGen(PT)), we see that the knowledge of a centered and centrosymmetric prior is encoded through the conditional models. See the paper for more results.
@inproceedings{liu2021generative,
author = {Liu, Chang and Tang, Haoyue and Qin, Tao and Wang, Jintao and Liu, Tie-Yan},
booktitle = {Advances in Neural Information Processing Systems},
editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
pages = {30242--30256},
publisher = {Curran Associates, Inc.},
title = {On the Generative Utility of Cyclic Conditionals},
url = {https://proceedings.neurips.cc/paper/2021/file/fe04e05fbe48920b8ba90bea2ddfe60b-Paper.pdf},
volume = {34},
year = {2021}
}