How does Selfplay work with torchrl? #2201
Unanswered
TheRisenPhoenix
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In my understanding, classical self-play describes the process of training an agent against an older version of itself. Take a competitive game with two agents, this would mean that one agent is trained as usual, while the other one is fixed to an older version of the policy. Every now and then, its policy is updated. By doing so, the difficulty of the environment increases over time and the agent always plays against a suitable opponent for its current skill level.
If I understood it correctly, torchrl currently doesn't feature such a functionality. It is possible to use multiagent settings, but then both agents either always share exactly the same policy, or learn independently from each other (but with the same architecture).
Is this correct, or did I overlook something?
Is there any supposed way of implementing this, besides doing single-agent learning and manually managing and updating the opponent?
Beta Was this translation helpful? Give feedback.
All reactions