Accent recognition, for great justice.
- Parses a directory containing {.mov,.wav} files.
- Builds config file of the form: {language, count}
- Parses a config file generated by build_config.py
- Downloads (via ftp) and converts to .wav (via ffmpeg).
- Involves multi-processing.
- Puts everything (.wav) into a single directory (/data).
- Parses the files in /data (from get_dataset.py)
- Extracts features (mfcc, et al.) from /data,
- writes as serialized numpy arrays to /processed.
prototyping environment, spectrograms, signal-vectors
lang count of source files (complete)
- /data (.wav encoded audio)
- speech_archive_meta.tsv: Complementary dataset, contains additional info about speakers involved in each recording.
- Extract features, store in database (sqlite).
- Parse speech_archive_meta.tsv, put into database
- Do ML, hope for the best.
- Get different features return to step 1.