Skip to content

Latest commit

 

History

History

egs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Examples

All examples are under directory egs and named by its name of dataset. All data-sets starts with "mock" are data-sets for test.

Examples for NLP

DataSet Supported Tasks Description
ATIS Sequence labeling/ Text classification/ NLU joint learning Air Travel Information System (ATIS) pilot corpus.
CoNLL2003 Sequence labeling The CoNLL 2003 NER task consists of newswire text from the Reuters RCV1 corpus tagged with four different entity types (PER, LOC, ORG, MISC).
MSRA_NER Sequence labeling MSRA datasets are in the news domain about NER.
SNIL Sentence Matching Stanford Natural Language Inference corpus is a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning.
Quora_QP Sentence Matching Data collected from the quara platform. Quora is a place to gain and share knowledge—about anything.
Yahoo_Answer Document Classification Yahoo answers are obtained from (Zhang et al., 2015). This is a topic classification task with 10 classes. The document we use includes question titles, question contexts and best answers.
Trec Document Classification This data collection contains all the data used in our learning question classification experiments,which has question class definitions.

Examples for Speech

DataSet Supported Tasks Description
hkust ASR HKUST Mandarin Telephone Speech
voxceleb Speaker Verfication VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube
iemocap Emotion The Interactive Emotional Dyadic Motion Capture (IEMOCAP) database is an acted, multimodal and multispeaker database, recently collected at SAIL lab at USC.