Keyword Spotting Alibaba

This project is built using Espnet2.

Installation

If you intend to do full experiments including DNN training, then see Installation. In our project, we compiled our own kaldi although it is not required by espnet2 (Use this method to install the espnet2 in out project)

If you just need the Python module only:

pip install espnet
# To install latest
# pip install git+https://github.com/espnet/espnet

You need to install some packages.

pip install torch
pip install chainer==6.0.0 cupy==6.0.0    # [Option] If you'll use ESPnet1
pip install torchaudio                    # [Option] If you'll use enhancement task
pip install torch_optimizer               # [Option] If you'll use additional optimizers in ESPnet2

There are some required packages depending on each task other than above. If you meet ImportError, please intall them at that time.

Once installed, run wandb login to enable tracking runs using W&B.

Dataset

Wukong wukong dataset

See the Dataset
Extract all the tar files in the shared folder.
README.pdf has some details about the dataset.

Usage

Espnet2 Tutorial (It is just a reference to the toolkit which is not required by our project)

See Usage.

Keyword Spotting Transformer

Change to the directory below.

cd egself/asc029-kws/kws/

Read README.md for more usage.

Speech Enhancement Bi-LSTM

Change to the directory below.

cd egself/asc029-kws/enh/

Read README.md for more usage.

Joint-training Speech Enhancement & Keyword Spotting

Change to the directory below.

cd egself/asc029-kws/enh-kws/

Read README.md for more usage.

References

[1] Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai, "ESPnet: End-to-End Speech Processing Toolkit," Proc. Interspeech'18, pp. 2207-2211 (2018)

[2] Suyoun Kim, Takaaki Hori, and Shinji Watanabe, "Joint CTC-attention based end-to-end speech recognition using multi-task learning," Proc. ICASSP'17, pp. 4835--4839 (2017)

[3] Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey and Tomoki Hayashi, "Hybrid CTC/Attention Architecture for End-to-End Speech Recognition," IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240-1253, Dec. 2017

Citations

@inproceedings{watanabe2018espnet,
  author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  title={{ESPnet}: End-to-End Speech Processing Toolkit},
  year={2018},
  booktitle={Proceedings of Interspeech},
  pages={2207--2211},
  doi={10.21437/Interspeech.2018-1456},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
@inproceedings{hayashi2020espnet,
  title={{Espnet-TTS}: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit},
  author={Hayashi, Tomoki and Yamamoto, Ryuichi and Inoue, Katsuki and Yoshimura, Takenori and Watanabe, Shinji and Toda, Tomoki and Takeda, Kazuya and Zhang, Yu and Tan, Xu},
  booktitle={Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7654--7658},
  year={2020},
  organization={IEEE}
}
@inproceedings{inaguma-etal-2020-espnet,
    title = "{ESP}net-{ST}: All-in-One Speech Translation Toolkit",
    author = "Inaguma, Hirofumi  and
      Kiyono, Shun  and
      Duh, Kevin  and
      Karita, Shigeki  and
      Yalta, Nelson  and
      Hayashi, Tomoki  and
      Watanabe, Shinji",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.34",
    pages = "302--311",
}
@inproceedings{li2020espnet,
  title={{ESPnet-SE}: End-to-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
  author={Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph Boeddeker and Zhuo Chen and Shinji Watanabe},
  booktitle={Proceedings of IEEE Spoken Language Technology Workshop (SLT)},
  pages={785--792},
  year={2021},
  organization={IEEE},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ci		ci
doc		doc
docker		docker
egs		egs
egs2		egs2
egself		egself
espnet		espnet
espnet2		espnet2
test		test
test_utils		test_utils
tools		tools
utils		utils
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.mergify.yml		.mergify.yml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyword Spotting Alibaba

Installation

Dataset

Wukong wukong dataset

Usage

Espnet2 Tutorial (It is just a reference to the toolkit which is not required by our project)

Keyword Spotting Transformer

Speech Enhancement Bi-LSTM

Joint-training Speech Enhancement & Keyword Spotting

References

Citations

About

Releases

Packages

Languages

License

arceushui/Keyword-Spotting-ESPNet

Folders and files

Latest commit

History

Repository files navigation

Keyword Spotting Alibaba

Installation

Dataset

Wukong wukong dataset

Usage

Espnet2 Tutorial (It is just a reference to the toolkit which is not required by our project)

Keyword Spotting Transformer

Speech Enhancement Bi-LSTM

Joint-training Speech Enhancement & Keyword Spotting

References

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages