This repository contains code for implementing distributionally robust optimization with parametric uncertainty sets. This codebase was used to implement experiments in the ICLR 2021 paper "Modeling the Second Player in Distributionally Robust Optimization" and subsequent ICLR 2022 follow-up "Distributionally Robust Models with Parametric Likelihood Ratios".
After checking out the repository, you need to install the dependencies, for instance in a virtual environment:
virtualenv --system-site-packages env
source env/bin/activate
pip install -r requirements.txt
Alternatively, you can install the dedicated conda environment:
conda env create -f rp_dro.yml
Now you should be ready to go. Here is an example run on BiasedSST (see paper for details).
Normally you should first pre-train the MLE of the data distribution (q_{\psi_0}
in the paper). However, to get you started you can just download our pre-trained model.
# Download pre-trained models
mkdir pretrained_models/
wget https://github.com/pmichel31415/P-DRO/releases/download/1.0/biased_SST_95_gen_LM_small_transformer_generative_wikitext103_model.pt -P pretrained_models/
# Create results folder
mkdir results/
Now you can run P-DRO on BiasedSST using the following command:
python pdro_main.py \
--force-save-name biased_sst_p_dro \
--n-reruns 5 \
--task biased_SST_95 \
--architecture bilstm \
--input-format bert-base-uncased \
--lr-scheduler linear_decay \
--n-epochs 50 \
--valid-interval epoch \
--optimizer adamw \
--lr 2e-5 \
--batch-size 64 \
--max-tokens-per-batch 2500 \
--eval-on-domains biased=True,label=0 biased=True,label=1 biased=False,label=0 biased=False,label=1 \
--pdro \
--adv-architecture small_transformer_generative \
--adv-filename pretrained_models/biased_SST_95_gen_LM_small_transformer_generative_wikitext103_model.pt \
--filter-advs-by reverse_kl \
--adv-threshold 2.302585 \
--adv-optimizer sgd \
--adv-lr 1e-4 \
--adv-obj exp_kl \
--joint \
--tau 0.01 \
--norm-k-adv 5
Note the parameters relevant to P-DRO:
--pdro
: use P-DRO (deactivate this for ERM)--adv-architecture small_transformer_generative
: architecture for the adversary. You can find more architectures insrc/models/architectures.py
.--filter-advs-by reverse_kl
: filter validation adversary by reverse KL--adv-threshold 2.302585
: reject adversaries with reverse KL to the data distribution >log(10)
on the dev set--adv-optimizer sgd
: train the adversary with regular SGD--adv-lr 1e-4
: train the adversary with learning rate 1r-4 (this is\lambda
in the paper)--adv-obj exp_kl
: this refers to L_adv as described in the paper--joint
: this indicates that the adversary will model the joint distribution(x, y)
(instead of just x)--tau 0.01
: Temperature for the adversary's loss--norm-k-adv
: This is the size of the window for computing the normalizer in the adversary's loss (K
in the paper)
You can run training for a parametric likelihood ratio model (RP-DRO, as described in Michel et al., 2022).
python pdro_main.py \
--pdro \
--ratio-model \
--adv-architecture bilstm \
--filter-advs-by reverse_kl \
--adv-threshold 2.302585 \
--adv-obj fwd_kl \
--joint \
--adv-valid-on-acc \
--adv-optimizer adamw \
--n-reruns 1 \
--task biased_SST_95 \
--architecture bilstm \
--input-format bert-base-uncased \
--lr-scheduler linear_decay \
--n-epochs 50 \
--valid-interval epoch \
--optimizer adamw \
--lr 2e-5 \
--batch-size 64 \
--max-tokens-per-batch 2500 \
--eval-on-domains biased=True,label=0 biased=True,label=1 biased=False,label=0 biased=False,label=1 \
--self-norm-lambda 0 \
--renorm-ratios \
--adv-lr 2e-5 \
--tau 0.01
The additional relevant parameters here are:
--ratio-model
: Use the ratio-based version of P-DRO, RP-DRO--renorm-ratios
: Batch-level renormalization of the ratios--self-norm-lambda 0
: No self-normalization penalty. To enable self-normalization, set this to a positive value and disable--renorm-ratios
--adv-obj fwd_kl
: This specifies that the objective for the adversary will be the reverse of the model objective, with a forward KL penalty (see e.g. Equation 9 in the paper)--tau 0.01
: Here this refers to the KL penalty weight
If you use this code in your research, or if you want to build upon P-DRO, please consider citing
@inproceedings{michel2021modeling,
title={Modeling the Second Player in Distributionally Robust Optimization},
author={Michel, Paul and Hashimoto, Tatsunori and Neubig, Graham},
booktitle={ICLR 2021},
year={2021}
}
@inproceedings{michel2021modeling,
title={Distributionally Robust Models with Parametric Likelihood Ratios},
author={Michel, Paul and Hashimoto, Tatsunori and Neubig, Graham},
booktitle={ICLR 2022},
year={2022}
}