You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A little confused about your implementation of MAA2C. I don't think the input of the actor network is simply the ``joint state" of the agents. According to [1] the critic's input should be state of the environment (where agents' joint state is not necessarily defined) + the joint action of the agents, i.e., the critic here should be a Q-function for joint actions. And for the actor it should be something like a policy, where I am not quite understand why the actor network is implemented in this way. Appreciate if explained.
The text was updated successfully, but these errors were encountered:
A little confused about your implementation of MAA2C. I don't think the input of the actor network is simply the ``joint state" of the agents. According to [1] the critic's input should be state of the environment (where agents' joint state is not necessarily defined) + the joint action of the agents, i.e., the critic here should be a Q-function for joint actions. And for the actor it should be something like a policy, where I am not quite understand why the actor network is implemented in this way. Appreciate if explained.
The text was updated successfully, but these errors were encountered: