-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ppo_pettingzoo_ma_atari.py #408
base: master
Are you sure you want to change the base?
Conversation
Updates to use Gymnasium and current PettingZoo API
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Not sure how big of a problem it is not being able to record episode statistics, it may be better to keep things as they are currently so it doesn't lose functionality, and I can put this updated version onto the pettingzoo docs. There may be a way to get the episode statistics to work but it seems difficult as I explained above |
@vwxyzjn I'm imagining this would also require running the benchmarks? Let me know if you have any thoughts on this, I've also been considering doing an action masking example for other PZ envs, would that be something you're interested in having here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good to me. I think we should do a benchmark run though because pettingzoo’s version changed a lot. Having an action mask example would be good too, but let’s do that in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is possibly a bug here. Truncation should be treated differently to termination when bootstrapping - as done in stable baselines. This made a pretty big diff in my own experiments.
I see similar issues exist - #198 and https://github.com/vwxyzjn/cleanrl/pull/311/files , but I don't think terminal_observation
was used there.
@@ -219,6 +227,8 @@ def get_action_and_value(self, x, action=None): | |||
next_value = agent.get_value(next_obs).reshape(1, -1) | |||
advantages = torch.zeros_like(rewards).to(device) | |||
lastgaelam = 0 | |||
next_done = np.logical_or(next_termination, next_truncation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is bug here. We should still bootstrap if next_truncation=True
:
"you should bootstrap if infos[env_idx]["TimeLimit.truncated"] is True (episode over due to a timeout/truncation) or dones[env_idx] is False (episode not finished)." - stable baselines
So next_done=next_termination
and dones=terminations
(probs just use next_terminations and terminations directly e.g. nextnonterminal = 1.0 - next_termination
).
To implement this correctly we also need access to terminal_observation
from pettingzoo_env_to_vec_env_v1
since we need access to the true terminal obs and not the obs returned by the next restart (the case currently -- so we need infos to provide access to the terminal obs). I have a PR out for this . Then we can implement something like this to do correct bootstrapping for truncating/timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @KaleabTessera would you be willing to update this branch with the changes? I can give you edit access, I currently have a lot of other obligations from work so don’t have much time for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh shoot it’s a patch-1 so I don’t know if you can be given access. But if you clone the repo you can make a new branch from this branch and make a new PR if it’s not possible to edit this branch? Or maybe make a PR to update this branch itself. Sorry I can’t help more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I am doing a refactor at #424 . Gonna try run a whole suite of benchmark soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay cool, sorry I remember you gave access to the WandB thing but I've not had time to do it. Probably simplest if you do it anyways, so thanks for that. It may be interesting to compare performance with the AgileRL multi agent atari example https://docs.agilerl.com/en/latest/tutorials/pettingzoo/maddpg.html
I see the issue linked in that PR mentions timeout handling, is that the same as mentioned below with termination vs truncation? Anyways there's anything needed from PettingZoo or SuperSuit's end let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is bug here. We should still bootstrap if
next_truncation=True
:"you should bootstrap if infos[env_idx]["TimeLimit.truncated"] is True (episode over due to a timeout/truncation) or dones[env_idx] is False (episode not finished)." - stable baselines
So
next_done=next_termination
anddones=terminations
(probs just use next_terminations and terminations directly e.g.nextnonterminal = 1.0 - next_termination
).To implement this correctly we also need access to
terminal_observation
frompettingzoo_env_to_vec_env_v1
since we need access to the true terminal obs and not the obs returned by the next restart (the case currently -- so we need infos to provide access to the terminal obs). I have a PR out for this . Then we can implement something like this to do correct bootstrapping for truncating/timeout.
Btw, just as an update, the SuperSuit PR linked above has been merged. My only concern with this is that whatever bootstrapping behavior is done here should mirror what is done with the single agent PPO implementations, so this is a question for @vwxyzjn.
My inclination is to keep the logic as it currently is in this PR and address that bootstrapping issue in another PR (maybe @KaleabTessera is interested in doing that? I don't have a whole lot of time to look into it nor am I the best person to do it as I'm not an expert)
Are there any updates on this? |
Looks like #424 was merged,
Looks like #424 was merged, but it didn't update the PettingZoo example besides a minor CLI arguments change. Re-reading Costa's messages I think I misinterpreted him that he was intending to update this himself (or maybe he didn't have time). Anyways, I have some time today and will try to resolve these conflicts and integrate @KaleabTessera's suggestions, so we can at least have an updated version of this script. Won't have time for benchmarking in the near future but could eventually get to it. |
Yeah sorry @elliottower things have gotten busy. Feel free to submit a PR. As long as you can reproduce the existing benchmark experiments we can merge :) |
No worries, sounds good. I'll just use this same PR for simplicity's sake. |
Btw just FYI there's a bunch of already merged branches in this repo which could probably deleted (am pulling the most recent master branch and see a huge list) |
# Conflicts: # poetry.lock # pyproject.toml # requirements/requirements-dm_control.txt # requirements/requirements-optuna.txt # requirements/requirements-pettingzoo.txt
… reqs (subdependency of supersuit)
…eep going forward)
This PR updates the pettingzoo multiagent atari example to use gymnasium rather than gym and to use the current pettingzoo API (with termination and truncation, following gymnasium/gym v26). I've had some people ask about more in-depth CleanRL resources for PettingZoo, so I figure updating this would be a good start.
Unfortunately it seems like the record episode statistics won't work because it expects a dict for info, and the supersuit's
concat_vec_env
makes the info into a list of dicts. Could write a custom wrapper to do that but it's not entirely clear to me that's the best way to do things. We have been looking into mirroring the gymnasium vector API into PettingZoo which would allow the recorder class and most other gymnasium functionality to work, as far as I can tell, but that will take some time.Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you need to run benchmark experiments for a performance-impacting changes:
--capture-video
.python -m openrlbenchmark.rlops
.python -m openrlbenchmark.rlops
utility to the documentation.python -m openrlbenchmark.rlops ....your_args... --report
, to the documentation.