-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding evaluation #96
Comments
When i check the value of the ref_intervals, est_intervals, it gave a really different value:
which I think this is one of the reason why the previous result is really far from what reported in the paper. After modify the functions in
I finally got the result that are close to the result reported in the paper: |
Hi @xinzuan. The training branch is still a work in progress, so don't rely on it too heavily. Regarding your issue, it's possible that there is a difference in units between the estimate, reference timestamps and frequency values and your solution took care of the difference. |
Hi, I run the
basic-pitch/basic_pitch/experiments/run_evaluation.py
from branchwip-training
with MAESTRO dataset and model checkpoint frombasic-pitch/saved_models/icassp_2022
.I expect the result should be similar reported in the paper. However, I got following result:
{"Precision": 0.0, "Recall": 0.0, "F-measure": 0.0, "Average_Overlap_Ratio": 0.0, "Precision_no_offset": 0.04398411727609082, "Recall_no_offset": 0.029748905165349712, "F-measure_no_offset": 0.03468172982454684, "Average_Overlap_Ratio_no_offset": 0.5793096961557063, "Onset_Precision": 0.631602431674569, "Onset_Recall": 0.4181107759888922, "Onset_F-measure": 0.4925505866527016, "Offset_Precision": 0.7521021756258168, "Offset_Recall": 0.5273589516900296, "Offset_F-measure": 0.6072445448462509}.
Based on my understanding on mir_eval definition of each metrics, the one corresponding to F should be the F-measure, Fno should be F-measure_no_offset. (I cannot find the mir_eval for Acc). However ,from the upper result, you can see the result is really far from what reported in the paper.
Could anyone please tell me which mir_eval metrics corresponding to each metric in the paper?
The text was updated successfully, but these errors were encountered: