Conversational Robot

Robotics Club Summer Project 2020

Mentors

Team Members

Group - Ambuja Budakoti, Devansh Mishra, Hem Shah, Kavya Agarwal, Preeti Kumari
- Group Repo - [https://github.com/AmbujaBudakoti27/ConversationalRobot]

Overall Pipeline of the Project

The main aim of this project was to make a conversation bot able to take in audio input and output a meaningful reply keeping in mind factors like context ant intent in the input given by user.

The three main parts of this project were:

Speech to text
Topic attention (to generate a response)
Text to speech

CTC_MODEL

This model is implemented to convert the audio messages of the user into text.

Dataset opted for training: Librespeech

ENCODER - DECODER MODEL

This model is implemtented to cover the response generation part of the conversational bot. We trained this model on the dataset Opensubtitles

LDA MODEL

This model is implemented to add topic awareness to ENCODER - DECODER Model for better response generation by focusing it's "attention" to only specific parts of the input rather than the whole sentence.

Optimal Number of Topics

This graph shows the optimal number of topics we need to set for news articles dataset.

Gensim LDA Model parameters

corpus — Stream of document vectors or sparse matrix of shape (num_terms, num_documents) <
id2word – Mapping from word IDs to words. It is used to determine the vocabulary size, as well as for debugging and topic printing.
num_topics — The number of requested latent topics to be extracted from the training corpus.
random_state — Either a randomState object or a seed to generate one. Useful for reproducibility.
update_every — Number of documents to be iterated through for each update. Set to 0 for batch learning, > 1 for online iterative learning.
chunksize — Number of documents to be used in each training chunk.
passes — Number of passes through the corpus during training.
alpha — auto: Learns an asymmetric prior from the corpus
per_word_topics — If True, the model also computes a list of topics, sorted in descending order of most likely topics for each word, along with their phi values multiplied by the feature-length (i.e. word count)

Text to Audio

gTTS, a python library was used to make a function to output audio from the generated responses.

Usage

Install the required dependencies :

$pip install -r requirements.txt
$sudo apt-get install gstreamer-1.0
$python3 -m spacy download en

Training checkpoints, LDA model weights and tokens can be found here

Required File Structure:

Response Generation
├── bin
│   ├── LDA
│   ├── Tokens.txt
│   ├── topic_dict.dict
│   ├── training_checkpoints
│   └── glove.42B.300d.txt
└── ...

Running the bot

usage: bot.py [-h] [-m {msg,trigger}]

The bot.

optional arguments:
  -h, --help            show this help message and exit
  -m {msg,trigger}, --mode {msg,trigger}
                        Mode of execution : Message box/ Trigger word
                        detection

Modes

Message Box - Provides a GUI for the user to start the conversation at the click of a button.
Trigger Word Detection - The program listens in the background and starts the conversation upon hearing the trigger word.
- Commencement Trigger - Hello
- Concluding Trigger - Bye

Functionality

Casual Conversations
Google search along with an explicit search feature for images
Weather Information

Demonstration

The video demonstration of this project can be found here.

References

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
- Link : [https://arxiv.org/abs/1512.02595]
- Author(s)/Organization : Baidu Research – Silicon Valley AI Lab
- Tags : Speech Recognition
- Published : 8 Dec, 2015
Topic Aware Neural Response Generation
- Link : [https://arxiv.org/abs/1606.08340]
- Authors : Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, Wei-Ying Ma
- Tags : Neural response generation; Sequence to sequence model; Topic aware conversation model; Joint attention; Biased response generation
- Published : 21 Jun 2016 (v1), 19 Sep 2016 (v2)
Topic Modelling and Event Identification from Twitter Textual Data
- Link : [https://arxiv.org/abs/1608.02519]
- Authors : Marina Sokolova, Kanyi Huang, Stan Matwin, Joshua Ramisch, Vera Sazonova, Renee Black, Chris Orwa, Sidney Ochieng, Nanjira Sambuli
- Tags : Latent Dirichlet Allocation; Topic Models; Statistical machine translation
- Published : 8 Aug 2016
OpenSubtitles (Dataset)
- Link : [http://opus.nlpl.eu/OpenSubtitles-v2018.php]

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
AIML		AIML
Assignments		Assignments
Neural Networks and deep learning		Neural Networks and deep learning
Reference Papers		Reference Papers
Response Generation		Response Generation
Sequence Models		Sequence Models
SpeechRecognition		SpeechRecognition
TextToSpeech		TextToSpeech
.gitignore		.gitignore
README.md		README.md
bot.py		bot.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Robot

Mentors

Team Members

Overall Pipeline of the Project

CTC_MODEL

ENCODER - DECODER MODEL

LDA MODEL

Optimal Number of Topics

Gensim LDA Model parameters

Text to Audio

Usage

Running the bot

Modes

Functionality

Demonstration

References

About

Releases

Packages

Languages

hemshah011/Conversational_robot

Folders and files

Latest commit

History

Repository files navigation

Conversational Robot

Mentors

Team Members

Overall Pipeline of the Project

CTC_MODEL

ENCODER - DECODER MODEL

LDA MODEL

Optimal Number of Topics

Gensim LDA Model parameters

Text to Audio

Usage

Running the bot

Modes

Functionality

Demonstration

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages