OpenMM is an open-source tool that can perform multimodal feature extraction. In other words, this tool will allow you to easily extract video, audio, and linguistic features all at once. This tool builds upon existing GitHub repos for visual feature extraction OpenFace and audio feature extraction Covarep. I integrate these existing repos with my code for linguistic feature extraction (LingAnalysis). OpenMM provides a simple way for researchers to extract multimodal features. The tool only requires a video as input and outputs a csv of multimodal features, audio conversion and speech-to-text are handled internally. My hope is that this will help promote more interest and research in building multimodal systems.
To install please see installation instructions below.
If you use any of this code in your work, please cite:
OpenMM: An Open-Source Multimodal Feature Extraction Tool (Michelle Renee Morales, Stefan Scherer, Rivka Levitan), In Proceedings of Interspeech 2017, ISCA, 2017.
@inproceedings{morales_openmm:_2017,
address = {Stockholm, Sweden},
title = {{OpenMM}: {An} {Open}-{Source} {Multimodal} {Feature} {Extraction} {Tool}},
url = {https://www.researchgate.net/publication/319185055_OpenMM_An_Open-Source_Multimodal_Feature_Extraction_Tool},
doi = {10.21437/Interspeech.2017-1382}}
This repo represents code from my dissertation work. I did my best to ensure that the code runs out of the box, but there are always issues. So please understand that this is research code and not a commercial level product. However, if you encounter any problems/bugs/issues please contact me on github or email me at [email protected] for any bug reports/questions/suggestions.
These instructions were tested on a Mac running macOS Sierra Version 10.12.4.
In order to get OpenMM to work you need to have the following things installed
In order to run OpenMM, you'll need to install Python 2.7 (Python 3 support is not available yet) and the following Python modules:
- numpy and scipy: packages for scientific computing
- pandas: package for data structures and data analysis
- speech_recognition: package for interacting with speech recognition APIs
You can install them all at once using pip:
pip install numpy scipy pandas SpeechRecognition
In order to use Covarep to extract acoustic features, you also need the 2017 Matlab Runtime version for your machine, which is free and available to download on their site.
In order to convert video to audio, you need to ffmpeg installed:
brew install ffmpeg
To perform visual feature extraction OpenMM requires OpenFace.
To extract syntactic features, which are part of the linguistic analysis, SyntaxNet parser is required. We suggest following these instructions to download. If you plan on working with German or Spanish data, make sure to also add those models to your Docker container. Here are the instructions for downloading the other language models.
After all the prerequisities and dependencies are installed successfully. You can install OpenMM by cloning this repo:
git clone https://github.com/michellemorales/OpenMM.git
First, fill out the config.txt
file with all the necessary information, which includes:
deployedMCRroot = path_to_matlab_runtime
openface = /path_to_openface/OpenFace/bin/FeatureExtraction
syntaxnet = /path_to_syntaxnet/models/syntaxnet
GOOGLE_SPEECH_RECOGNITION_API_KEY = "GOOGLE_KEY"
IBM_USERNAME = "IBM_USERNAME"
IBM_PASSWORD = "PASSWORD"
OpenMM takes as input a video (.mp4 only). To run OpenMM use the following command:
python OpenMM/scripts/FeatureExtract.py video.mp4
OpenMM will output the following files:
- video.wav
- video_transcript.txt
- video_openface.csv
- video_covarep.csv
- video_ling.csv
- video_multimodal.csv