Noisy evaluation and comparison in the ADMET group leaderboard #9

feiyang-cai · 2024-11-14T18:45:56Z

I would like to express appreciation for the platform and benchmark you’ve created. It really helps to compare different models and methods.

However, I’ve noticed that the evaluation and comparison on the leaderboard can sometimes be a bit noisy, and I’ve observed a few methods using certain “tricks” that might improve their performance in ways that aren’t ideal. Here are a couple of things I’ve noticed:

Hyperparameter optimization on the test dataset

Since the test datasets are accessible, I found that several methods optimize the hyperparameters in the test dataset instead of the normal practice that only optimize in the validation dataset.

Using both training and validation data for training

Some methods use both training and validation dataset for training in all runs.... This will lead to lack of variations in the training dataset (always the same across all runs). Also, since the dataset sizes are normally less than 1000, including the validation dataset can considerably improve performance.

We conducted some investigations on the methods in the leaderboard and summarized our findings in Table S2.5 of our recent paper.

I’m not sure if it’s good to set clear rules about these practices (though they are common practices in machine learning but lots of methods ignore them).

amva13 · 2024-11-29T15:54:42Z

@feiyang-cai thank you very much for pointing this out. We'll see what we can do. I'll have a look at your paper to see the cases where this happened. We can send a warning and override the leaderboards, if needed, with the MPC task which spans these datasets.

Thank you!

@marinkaz @kexinhuang12345 @shenwanxiang

amva13 self-assigned this Nov 29, 2024

amva13 mentioned this issue Nov 29, 2024

Falsified submissions to TDC leaderboards mims-harvard/TDC#334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noisy evaluation and comparison in the ADMET group leaderboard #9

Noisy evaluation and comparison in the ADMET group leaderboard #9

feiyang-cai commented Nov 14, 2024

amva13 commented Nov 29, 2024

Noisy evaluation and comparison in the ADMET group leaderboard #9

Noisy evaluation and comparison in the ADMET group leaderboard #9

Comments

feiyang-cai commented Nov 14, 2024

amva13 commented Nov 29, 2024