Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noisy evaluation and comparison in the ADMET group leaderboard #9

Open
feiyang-cai opened this issue Nov 14, 2024 · 1 comment
Open
Assignees

Comments

@feiyang-cai
Copy link
Contributor

I would like to express appreciation for the platform and benchmark you’ve created. It really helps to compare different models and methods.

However, I’ve noticed that the evaluation and comparison on the leaderboard can sometimes be a bit noisy, and I’ve observed a few methods using certain “tricks” that might improve their performance in ways that aren’t ideal. Here are a couple of things I’ve noticed:

  • Hyperparameter optimization on the test dataset

Since the test datasets are accessible, I found that several methods optimize the hyperparameters in the test dataset instead of the normal practice that only optimize in the validation dataset.

  • Using both training and validation data for training

Some methods use both training and validation dataset for training in all runs.... This will lead to lack of variations in the training dataset (always the same across all runs). Also, since the dataset sizes are normally less than 1000, including the validation dataset can considerably improve performance.

We conducted some investigations on the methods in the leaderboard and summarized our findings in Table S2.5 of our recent paper.

I’m not sure if it’s good to set clear rules about these practices (though they are common practices in machine learning but lots of methods ignore them).

@amva13
Copy link
Member

amva13 commented Nov 29, 2024

@feiyang-cai thank you very much for pointing this out. We'll see what we can do. I'll have a look at your paper to see the cases where this happened. We can send a warning and override the leaderboards, if needed, with the MPC task which spans these datasets.

Thank you!

@marinkaz @kexinhuang12345 @shenwanxiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants