Chang Liu <[email protected]; [email protected]>,
Jingwei Zhuo, and Jun Zhu. ICML 2019.
[Paper & Appendix] [Slides] [Poster]
The project aims to interpret a general MCMC dynamics (Ma et al., 2015). This is done for the Langevin dynamics (LD), which simulates the gradient flow of the KL divergence on the Wasserstein space thus steepest minimizes the difference from the target posterior distribution. This does not hold for general MCMC dynamics, which only guarantee that the target distribution is kept invariant. In this work, we develop some mathematical concepts and reveal that a general MCMC dynamics corresponds to the composition of the Hamiltonian flow and the so-called fiber-gradient flow of the KL divergence, where the former conserves the KL on the Wasserstein space of the support space and the latter minimizes KL on each of the Wasserstein spaces of the fibers (a set of certain subspaces) of the support space. An MCMC dynamics specifies the geometric structures for determining the two flows, thus its behavior can be made clear under this picture, e.g., the instability of HMC in face of stochastic gradient as opposed to LD and SGHMC, and the faster convergence of SGHMC over LD. Moreover, this interpretation also facilitates particle-based variational inference methods (ParVIs) to go beyond the current dynamics scope of LD and use more efficient dynamics. As an example, we develop two novel ParVIs that use the SGHMC dynamics.
The repository here implements the proposed ParVIs along with existing ParVIs and MCMCs (LD and SGHMC). The experiments demonstrate that the proposed ParVIs converge faster than existing ParVIs due to the better efficiency of SGHMC over LD, and that they are more particle-efficient than SGHMC, which is the advantage of ParVIs. The implementations are built based on the Python code with TensorFlow by Liu et al. (2019).
-
For the synthetic experiment:
Directly open "synth_run.ipynb" in a jupyter notebook.
-
For the Latent Dirichlet Allocation experiment:
First run
python lda_build.py build_ext --inplace
to compile the Cython code, then run
python lda_run.py ./lda_sett_icml/[a specific settings file]
to conduct experiment under the specified settings.
The ICML dataset (download here) is developed and utilized by Ding et al. (2015).
Codes are developed based on the codes of Patterson & Teh (2013) for their work "Stochastic Gradient Riemannian Langevin Dynamics for Latent Dirichlet Allocation".
-
For the Bayesian neural network experiment:
Directly edit the file "bnn_tq_run.py" to make a setting, and run
python bnn_tq_run.py
to conduct experiment under the specified settings. The experiment setup follows that of Chen et al. (2014) in their work "Stochastic Gradient Hamiltonian Monte Carlo".
@InProceedings{liu2019understanding_b,
title = {Understanding {MCMC} Dynamics as Flows on the {W}asserstein Space},
author = {Liu, Chang and Zhuo, Jingwei and Zhu, Jun},
booktitle = {Proceedings of the 36th International Conference on Machine Learning},
pages = {4093--4103},
year = {2019},
editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
volume = {97},
series = {Proceedings of Machine Learning Research},
address = {Long Beach, California USA},
month = {09--15 Jun},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v97/liu19j/liu19j.pdf},
url = {http://proceedings.mlr.press/v97/liu19j.html},
organization={IMLS},
}