Trainable Multi Label Classifiers, predict Stackoverflow Tags and achieve State Of the Art Results results in 1 Line of with NLU 1.0.6
NLU 1.0.6 Release Notes
Trainable Multi-Label Classifiers, predict Stackoverflow Tags and much more in 1 Line of Python Code with NLU 1.0.6
We are glad to announce NLU 1.0.6 has been released!
NLU 1.0.6 comes with the Multi-Label classifier, it can learn to map strings to multiple labels.
The Multi-Label Classifier is using Bidirectional GRU and CNN's inside TensorFlow and supports up to 100 classes.
We provide examples on how to train a Multi-Label classifier on the E2E dataset and on Stack Overflow Question Tags.
NLU 1.0.6 New Features
- Multi-Label Classifier
- The Multi-Label Classifier learns a 1 to many mapping between text and labels. This means it can predict multiple labels at the same time for a given input string. This is very helpful for tasks similar to content tag prediction (HashTags/RedditTags/YoutubeTags/Toxic/E2e etc..)
- Support up to 100 classes
- Pre-trained Multi Label Classifiers are already avaiable as Toxic and E2E classifiers
Multi Label Classifier
- Train Multi Label Classifier on E2E dataset
- Train Multi-Label Classifier on Stack Overflow Question Tags dataset
This model can predict multiple labels for one sentence.
To train the Multi-Label text classifier model, you must pass a dataframe with atext
column and ay
column for the label.
They
label must be a string column where each label is separated with a separator.
By default,,
is assumed as line separator.
If your dataset is using a different label separator, you must configure thelabel_separator
parameter while calling thefit()
method.
By default, Universal Sentence Encoder Embeddings (USE) are used as sentence embeddings for training.
fitted_pipe = nlu.load('train.multi_classifier').fit(train_df)
preds = fitted_pipe.predict(train_df)
If you add a nlu sentence embeddings reference, before the train reference, NLU will use that Sentence embeddings instead of the default USE.
#Train on BERT sentence emebddings
fitted_pipe = nlu.load('embed_sentence.bert train.multi_classifier').fit(train_df)
preds = fitted_pipe.predict(train_df)
Configure a custom line seperator
#Use ; as label seperator
fitted_pipe = nlu.load('embed_sentence.electra train.multi_classifier').fit(train_df, label_seperator=';')
preds = fitted_pipe.predict(train_df)
NLU 1.0.6 Enhancements
- Improved outputs for Toxic and E2E Classifier.
- by default, all predicted classes and their confidences that are above the threshold will be returned inside of a list in the Pandas dataframe
- by configuring meta=True, the confidences for all classes will be returned.
NLU 1.0.6 New Notebooks and Tutorials
- Train Multi Label Classifier on E2E dataset
- Train Multi-Label Classifier on Stack Overflow Question Tags dataset
NLU 1.0.6 Bug-fixes
- Fixed a bug that caused
en.ner.dl.bert
to be inaccessible - Fixed a bug that caused
pt.ner.large
to be inaccessible - Fixed a bug that caused USE embeddings not being properly configured to document level output when using multiple embeddings at the same time