You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm also wandering how to run the evaluations on different models like llama-3. Can anyone clarify the process of evaluating models other than GPT & Claude?
I think firstly we need to implement the sampler for ourselves. After that, add the model we want to evaluate with the corresponding sampler to the models dict in simple_evals.py. But I'm not sure if this is the correct way to evaluate local models.
Yes, how do we run this code to evaluate a language model?
The text was updated successfully, but these errors were encountered: