You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
使用API调用奖励模型得到的奖励值与PPO过程中的奖励值差距巨大/The disparity between the reward values obtained from calling the reward model using the API and the reward values from the PPO process is huge
#1641