You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If an actor reaches the maximum step without any failures, it should be considered 'done,' and the reward should be counted to the total reward as it is, rather than being set to -1.
Could you please examine how the reward is tallied when an actor successfully completes the maximum number of steps without any failures?
The text was updated successfully, but these errors were encountered:
It seems that when an actor successfully completes the maximum number of steps (2048) without failure,
any 'done_info' is not generated at Line 214,
implementation-matters/src/policy_gradients/agent.py
Line 214 in 5ee6ecb
and no 'done_info' is appended to 'completed_episode_info' at Line 296.
implementation-matters/src/policy_gradients/agent.py
Line 296 in 5ee6ecb
Consequently, the reward is counted as -1, as observed in the code snippet at Line 324.
implementation-matters/src/policy_gradients/agent.py
Line 324 in 5ee6ecb
If an actor reaches the maximum step without any failures, it should be considered 'done,' and the reward should be counted to the total reward as it is, rather than being set to -1.
Could you please examine how the reward is tallied when an actor successfully completes the maximum number of steps without any failures?
The text was updated successfully, but these errors were encountered: