-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging the code to wespeaker #3
Comments
@wsstriving Sure, we'll be happy, if you provide working way to train it. I'm sharing archive with configs and training logs for all model sizes. The configs might be different a bit from the ones that are used in wespeaker, but main hyperparams have same structure. Also few models were trained with AAMSoftmax instead of SphereFace2, they will have better quality, if they retrained with SphereFace2. |
Hi, I would like to ask whether the configs you provided here are the ones used in the paper, because I found that for some of them, I cannot get the same number of parameters (As shown in Table 4). Other questions:
|
Hi, @wsstriving thank you for your attempt on model retraining! Answering your questions:
|
I've created a draft pull request for wespeaker (https://github.com/wenet-e2e/wespeaker/pull/346/files) that you can have a check. Basically, I've adapted your code to align with the wespeaker style and removed the preprocessing part (feature extraction) to use wespeaker's existing implementation. You can find the default configurations for B0-B6 models in the model.py file, along with a comparison of model sizes. Unfortunately, I don't have the resources to run all the experiments right now. However, I can share some preliminary results for the B2 model with the arc_margin loss. I initially had Before LM (no score norm): |
Thank you for sharing, there is some mismatch in features setup:
I found our internal results for ReDimNet-B2 LM model trained with AAM loss with global_context_att set to True: After LM (no score norm): There might be some improvement when setting global_context_att to True. The best results (matching ours) you should get by using for all models:
|
Hi, @vanIvan we have merged the initial version into wespeaker wenet-e2e/wespeaker#346, but still there is some performance gap, it will be great if you could try the current implementation and give some suggestions! BTW, if you will be at Interspeech, looking forward to talking with you face to face. |
Hi, @wsstriving, thank you for integration, we'll try to look at it soon. Yes, me and few of my colleagues from our team are going to attend Interspeech and present ReDimNet there, would be nice to meet there, let's keep in touch! |
Hello, @wsstriving! I have realized, that there is no variable weight decay for projection head separately from backbone neural network in wespeaker pipeline - there is currently only one weight_decay used for whole network. So I've added this variable weight_decay for projection head in forked wespeaker pipeline, could you please check it and if you have time, probably retrain model to check whether it improves results (especially for SF2 loss). I also made model more lightweight during training, by increasing hop length of melbanks in it's config - now it should train faster, and one could fit bigger batch on same GPU setup. |
Hi, would like ask if there will be a RedimNet model for multi-langual support? For instance trained on mixed Chinese and English speaker verification? |
Yes, new models would be pretrained on voxblink2 and finetuned on voxblink2+vox2+cnceleb. They will perform way better on Chinese. |
Happy to share good news, we have released first models pretrained on |
Thank you for the excellent work! I would like to ask if you would mind if we adapt this code into the official WeSpeaker models. We will definitely include the original paper link, authorship, etc. I just want to check whether you are okay with the open-source license of WeSpeaker.
Best regards,
Shuai
The text was updated successfully, but these errors were encountered: