Skip to content
This repository has been archived by the owner on Oct 27, 2022. It is now read-only.

Explore the optimization landscape for direct policy learning reinforcement learning.

License

Notifications You must be signed in to change notification settings

google-research/policy-learning-landscape

Repository files navigation

Policy Learning Landscape

This repository contains code to explore the policy optimiaztion landscape.

Quick setup

To run cartpole simply do:

python3 run_eager_policy_optimization.py --env CartPole-v0 --policy_type discrete

To run something from Mujoco you must have it installed and the associated license. To run Hopper-v1 use:

python3 run_eager_policy_optimization.py --env Hopper-v1 --policy_type normal --std 0.5

Parameters will be saved into ./parameters as numpy files. After obtaining some parameters from different runs use the following commands to analyze the landscape.

  1. First install eager_pg: pip install -e ..

  2. Random Pertubations Experiment:

cd interpolation_experiments
python paired_random_directions_experiment.py --p1 ./path/to/parameter/1/npy \
--save_dir ./path/to/save/in/ \
--alpha 0.5 --std 0.5 --n_directions 500
  1. Linear Interpolation Experiment:
cd interpolation_experiments
python simple_1d_interpolation_experiment.py --p1 ./path/to/parameter/1/npy \
--p2 ./path/to/parameter/2/npy --save_dir ./path/to/save/in/ \
--stds 5.0 --alpha_start -0.5 --alpha_end 1.5 --n_alphas 2 \
--save_dir ./path/to/save/in

Note that interpolation tools only work with continuous policies.

Code organization

  • eager_pg: contains a small library to enable quick research in policy gradient reinforcement learning.
  • analysis_tools: contains tooling to make nice figures in papers.
  • interpolation_experiments: Experiments to explore the landscape in policy optimization.

Citation

If you use the proposed method or code, we'd appreciate if you could cite this work!

@article{ahmed2018understanding,
  title={Understanding the impact of entropy in policy learning},
  author={Ahmed, Zafarali and Roux, Nicolas Le and Norouzi, Mohammad and Schuurmans, Dale},
  journal={arXiv preprint arXiv:1811.11214},
  year={2018}
}

Disclaimer

This is not an official Google product.

About

Explore the optimization landscape for direct policy learning reinforcement learning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published