Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 552 Bytes

README.md

File metadata and controls

18 lines (13 loc) · 552 Bytes

Speech-Image_Tool

This repositories are used to collect some programs that preprocess audio and image.

Image

The Image dir includes 3 functions to make data augmentation.

  1. Image rotation
  2. Random color
  3. Random Gaussian

Speech

The Speech dir includes 4 python scripts.

  1. character_to_pinyin.py: used to translate Chinese character to pinyin;
  2. trim_silence.py: used to trim the silence in begin and end of audio;
  3. mp3_translate_wav.py: used to translate .mp3 to .wav;
  4. generate_audio.py: used to generate audio from Baidu's api;