Skip to content

Latest commit

 

History

History
64 lines (43 loc) · 2.48 KB

README.md

File metadata and controls

64 lines (43 loc) · 2.48 KB

RL Project

Final Project of Reinforcement Learning

1. Environment : Mujuco

Ant-v4 HalfCheetah-v4
Action Space Box(-1.0, 1.0, (8,), float32) Box(-1.0, 1.0, (6,0), float32)
Observation Space Box(-inf, inf, (27,), float64) Box(-inf, inf, (17,), float64)

2. Network Structure & RL Techniques

image

  1. WIDER value network than policy network
  2. depth=1 or 2
  3. activation function: Tanh (ReLU is WORST)
  4. Orthogonal Initialization
  5. Normalization for every layer of value network
  6. Advantage Normalization

3. Results

  • Ant-v4

    Non-Normalize Observation Normalize Observation
    Train image image
    Test image image

  • HalfCheetah-v4

    Non-Normalize Observation Normalize Observation
    Train image image
    Test image image

  • Test Video

    Ant-v4 HalfCheetah-v4
    ant-video-episode-0.mp4
    halfcheetah-video-episode-8.mp4

4. Conclusion

  • Return oscillates more loudly in Ant-v4
    → Bigger Observation Space, Greater Oscillation Width

  • Observation nomalization improved performance of Ant-v4, but reduced performence of HalfCheetah-v4
    → Observation normalization can reduce performance when the observation space is small


Reference