Question regarding evaluation #96

xinzuan · 2023-09-05T06:15:44Z

Hi, I run the basic-pitch/basic_pitch/experiments/run_evaluation.py from branch wip-training with MAESTRO dataset and model checkpoint from basic-pitch/saved_models/icassp_2022.

I expect the result should be similar reported in the paper. However, I got following result:
{"Precision": 0.0, "Recall": 0.0, "F-measure": 0.0, "Average_Overlap_Ratio": 0.0, "Precision_no_offset": 0.04398411727609082, "Recall_no_offset": 0.029748905165349712, "F-measure_no_offset": 0.03468172982454684, "Average_Overlap_Ratio_no_offset": 0.5793096961557063, "Onset_Precision": 0.631602431674569, "Onset_Recall": 0.4181107759888922, "Onset_F-measure": 0.4925505866527016, "Offset_Precision": 0.7521021756258168, "Offset_Recall": 0.5273589516900296, "Offset_F-measure": 0.6072445448462509}.

Based on my understanding on mir_eval definition of each metrics, the one corresponding to F should be the F-measure, Fno should be F-measure_no_offset. (I cannot find the mir_eval for Acc). However ,from the upper result, you can see the result is really far from what reported in the paper.

Could anyone please tell me which mir_eval metrics corresponding to each metric in the paper?

The text was updated successfully, but these errors were encountered:

xinzuan · 2023-09-18T05:31:24Z

When i check the value of the ref_intervals, est_intervals, it gave a really different value:

ref_interval: [[  0.98046875   1.08723958]
 [  0.99739583   1.25260417]
 [  1.09375      1.16536458]
 ...
 [384.79557292 388.55338542]
 [384.79817708 388.61067708]
 [384.80989583 388.52864583]] 
est_interval: [[387.07030113 387.36055057]
 [387.07030113 387.5475941 ]
 [387.07030113 387.5475941 ]
 ...
 [146.40265896 146.64646848]
 [362.68555193 362.89453152]
 [307.87372971 308.01433333]]

which I think this is one of the reason why the previous result is really far from what reported in the paper. After modify the functions in basic-pitch/basic_pitch/experiments/run_evaluation.py:

change minimum note length from 58.0 to 127.70 following the inconsistent minimum note length in issue Inconsistent minimum note length #93
modify the model_inference function as follow:

def model_inference(audio_path, model, save_path,minimum_note_length=127.70):

    output = run_inference(audio_path, model)

    
    frames = output["note"]
    onsets = output["onset"]
     # frames (13678, 88) onsets(13678, 88)

    min_note_len = int(np.round(minimum_note_length / 1000 * (AUDIO_SAMPLE_RATE / FFT_HOP))) # add min_note len since it is required

    estimated_notes = note_creation.output_to_notes_polyphonic(
        frames,
        onsets,
        onset_thresh=0.5,
        frame_thresh=0.3,
        infer_onsets=True,
        min_note_len=min_note_len, # needed in the function, it will throw error if not provided
        max_freq=None, # needed in the function, it will throw error if not provided
        min_freq=None # needed in the function, it will throw error if not provided
    )
    # [(start_time_seconds, end_time_seconds, pitch_midi, amplitude)]

    
   
    pitch = np.array([n[2] for n in estimated_notes]) 
    pitch_hz = librosa.midi_to_hz(pitch)



    estimated_notes_with_pitch_bend = note_creation.get_pitch_bends(output["contour"],estimated_notes)
    times_s = note_creation.model_frames_to_time(output["contour"].shape[0])
    
    estimated_notes_time_seconds = [
        (times_s[note[0]], times_s[note[1]], note[2], note[3], note[4]) for note in estimated_notes_with_pitch_bend
    ]
    
   
    midi = note_creation.note_events_to_midi(estimated_notes_time_seconds, save_path)

    

    intervals = np.array([[times_s[note[0]], times_s[note[1]]] for note in estimated_notes_with_pitch_bend])
    

    return intervals, pitch_hz,midi # add midi in the return to be used in the evaluation

In the function main, instead of using the intervals and pitch_hz returned from the function model_inference, I used:

 __,_,midi = model_inference(audio_path, model, save_path)

est_notes = io.load_notes_from_midi(midi = midi)
if est_notes is None:
    est_intervals = []
    est_pitches = []
else:
    est_intervals, est_pitches, _ = est_notes.to_mir_eval()

I finally got the result that are close to the result reported in the paper:
{'Precision': 0.11997030494604051, 'Recall': 0.11606390831628464, 'F-measure': 0.11663329326696836, 'Average_Overlap_Ratio': 0.8401297548289717, 'Precision_no_offset': 0.7436669014704781, 'Recall_no_offset': 0.6548245337432261, 'F-measure_no_offset': 0.6874150165838026, 'Average_Overlap_Ratio_no_offset': 0.4262920646319229, 'Onset_Precision': 0.8259000078273144, 'Onset_Recall': 0.721544837754125, 'Onset_F-measure': 0.7601824436965499, 'Offset_Precision': 0.5818535280932536, 'Offset_Recall': 0.504137416529927, 'Offset_F-measure': 0.5329684074137423}

drubinstein · 2024-01-12T15:57:28Z

Hi @xinzuan. The training branch is still a work in progress, so don't rely on it too heavily. Regarding your issue, it's possible that there is a difference in units between the estimate, reference timestamps and frequency values and your solution took care of the difference.

drubinstein assigned drubinstein and zwycl and unassigned drubinstein Sep 8, 2023

drubinstein assigned rabitt and unassigned zwycl Jan 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding evaluation #96

Question regarding evaluation #96

xinzuan commented Sep 5, 2023

xinzuan commented Sep 18, 2023

drubinstein commented Jan 12, 2024

Question regarding evaluation #96

Question regarding evaluation #96

Comments

xinzuan commented Sep 5, 2023

xinzuan commented Sep 18, 2023

drubinstein commented Jan 12, 2024