This code is a modified version from PANNs.
We first pretrain our model on 20% data of AudioSet to get a classifier which can handle 527 classes. For the 20% data, we use unbalanced part 00-07 as our training set. Other parts are same to PANNs. Please refer PANNs for details.
Please copy the project data (9 target classes of AudioSet) to ./datasets/audioset_project. In ./datasets/audioset_project, we should have:
- copy_maps.scp, which is a mapping dict to map the orignial audio names to the project audio names
- audio folder including audio files
- data folder including data list of Kaldi format
- metadata folder of data list copied from the full AudioSet. (Seems to be useless in our fine-tune code.)
- split folder of data list of audio files spliting (designed for k-fold validation) Then, remove the overlap data samples (the overlap between AudioSet and our project test set)
for ID in {00..07}; do
python utils/dataset_project.py filter_wavs --csv_dir=$DATASET_DIR_PROJECT"/metadata/unbalanced_partial_csvs" --split_dir=$DATASET_DIR_PROJECT"/split" --map_path=$DATASET_DIR_PROJECT"/copy_maps.scp"
done
We then Pack Project Audio Data to .h5 files for training and testing. Please note, in our code, we are only interested in 9 classes instead of all 527 classes in the whole dataset.
for ID in {0..4}; do
python utils/dataset_project.py pack_waveforms_to_hdf5 --csv_path=$DATASET_DIR_PROJECT"/metadata/eval_segments.csv" --audios_dir=$DATASET_DIR_PROJECT"/audio" --waveforms_hdf5_path=$WORKSPACE"/hdf5s/waveforms/project_eval_data_$ID.h5" --split_dir=$DATASET_DIR_PROJECT"/split" --split_to_test_id=$ID
done
We have 5 splits in total. We use 0 1 2 as training set, 3 as validation set, 4 as our test set. This system use index to sample data as input to train, so we can simply combine the index of 0 1 2 splits as our training set.
python utils/create_indexes.py combine_full_indexes_project --indexes_hdf5s_dir=$WORKSPACE"/hdf5s/indexes" --full_indexes_hdf5_path=$WORKSPACE"/hdf5s/indexes/project_train.h5" --split_id_list_to_train="0,1,2
Finetune(Further) train our CNN14 on the project data (with only 9 classes). We do not fix any trainable parameters and tune the whole model. Last layer for classifiction is trained from scratch.
python pytorch/finetune_project.py train --workspace=$WORKSPACE --sample_rate=32000 --window_size=1024 --hop_size=320 --mel_bins=64 --fmin=50 --fmax=14000 --model_type='Transfer_Cnn14' --pretrained_checkpoint_path='checkpoints_downloaded/'$CHECKPOINT_FILE_NAME --cuda --early_stop=20000 --loss_type='clip_bce' --learning_rate=1e-4
We can observe the mAP during training. Also, last step, we can check the F1 scores of our model on by the code I wrote. This script works well on classification using cross entropy loss. However, maybe it has some problems on binary cross entropy loss. Anyway, actually, the mAP reported in the script above is more than enough to show the performance. Note: before this step, pls select one model in the workspace/checkpoints/ and copy it into CHECKPOINT_SOURCE_DIR as the model you want to test.
CHECKPOINT_SOURCE_DIR='checkpoints_downloaded'
CHECKPOINT_FILE_NAME='20_percent_Cnn14.pth'
python pytorch/test_model_project.py test --workspace=$WORKSPACE --window_size=1024 --hop_size=320 --mel_bins=64 --fmin=50 --fmax=14000 --model_type='Cnn14' --loss_type='clip_bce' --augmentation='mixup' --batch_size=32 --learning_rate=1e-3 --cuda --split_id=4 --checkpoint_source_dir=$CHECKPOINT_SOURCE_DIR --checkpoint_file_name=$CHECKPOINT_FILE_NAME
-
Pretrain Model Link: https://entuedu-my.sharepoint.com/:u:/g/personal/pang0208_e_ntu_edu_sg/EThQO0UW705HhZ8x9pca7IYBHJyz42iDZ_GCSJ6Gxr8TXw?e=iDfa6O
-
Pretrain Dataset Link: Please proceed to lab cluster. The file is in /home3/pang0208/AudioSet_Subset_Classification/datasets/audioset201906.tar.gz
-
Finetune Dataset Link: https://entuedu-my.sharepoint.com/:u:/g/personal/pang0208_e_ntu_edu_sg/EY1HvvVQ3MFPmlyh0pAc-ecBMJYTRqWvbWwaqKz_t0qQ1g?e=6TMxBr
-
FInetune Dataset Other Link: https://entuedu-my.sharepoint.com/:u:/g/personal/pang0208_e_ntu_edu_sg/EUICl8EyH31DtbON_NMv1roBK5-54lp7n5hTA_8KccBU3g?e=xFXxCz
Users can train the model from scratch by executing the commands in runme.sh. The runme.sh consists of three parts. 1. Pack downloaded wavs to hdf5 file to speed up loading. 2. Pretrain. 3. Finetune.
Name | Type | Default | Description |
---|---|---|---|
pretrain_dataset | string |
./datasets/audioset201906 |
Pretrain dataset path |
workspace* | string |
./workspaces/audioset_tagging |
Workspaces path for hdf waveforms and indexes as well as results |
finetune_dataset | string |
./datasets/audioset_project |
Finetune dataset path |
finetune_dataset_others | string |
./datasets/audioset_project_others |
Finetune dataset path for others class |
checkpoint_path | string |
./checkpoints_downloaded |
Path for pretrain model and finetune model(for testing) |
checkpoint_file | string |
20_percent_Cnn14.pth |
Pretrain model file name |
pretrain | bool |
false |
False: Run fintune, True: Run pretrain |
pretrain_pack_data | bool |
true |
(Pretrain) Whether to create hdf5. Can disable if hdf5 data is created |
finetune_pack_data | bool |
true |
(Finetune) Whether to create hdf5. Can disable if hdf5 data is created |
* required options
bash runme.sh --pretrain_dataset ./datasets/audioset_pretraining --finetune_dataset ./datasets/audioset_project --finetune_dataset_others ./datasets/audioset_project_others --workspace ./workspaces/audioset_tagging --finetune_pack_data false --pretrain true --pretrain_pack_data false
bash runme.sh --pretrain_dataset ./datasets/audioset_pretraining --finetune_dataset ./datasets/audioset_project --finetune_dataset_others ./datasets/audioset_project_others --workspace ./workspaces/audioset_tagging --finetune_pack_data false --pretrain false --pretrain_pack_data false
Model | Pretrain Data | mAP |
---|---|---|
CNN14 | 20% | 87.5 |