Release Trainable Multi Label Classifiers, predict Stackoverflow Tags and achieve State Of the Art Results results in 1 Line of with NLU 1.0.6 · JohnSnowLabs/nlu

NLU 1.0.6 Release Notes

Trainable Multi-Label Classifiers, predict Stackoverflow Tags and much more in 1 Line of Python Code with NLU 1.0.6

We are glad to announce NLU 1.0.6 has been released!
NLU 1.0.6 comes with the Multi-Label classifier, it can learn to map strings to multiple labels.
The Multi-Label Classifier is using Bidirectional GRU and CNN's inside TensorFlow and supports up to 100 classes.
We provide examples on how to train a Multi-Label classifier on the E2E dataset and on Stack Overflow Question Tags.

NLU 1.0.6 New Features

Multi-Label Classifier
- The Multi-Label Classifier learns a 1 to many mapping between text and labels. This means it can predict multiple labels at the same time for a given input string. This is very helpful for tasks similar to content tag prediction (HashTags/RedditTags/YoutubeTags/Toxic/E2e etc..)
- Support up to 100 classes
- Pre-trained Multi Label Classifiers are already avaiable as Toxic and E2E classifiers

Multi Label Classifier

Train Multi Label Classifier on E2E dataset
Train Multi-Label Classifier on Stack Overflow Question Tags dataset
This model can predict multiple labels for one sentence.
To train the Multi-Label text classifier model, you must pass a dataframe with a text column and a y column for the label.
The y label must be a string column where each label is separated with a separator.
By default, , is assumed as line separator.
If your dataset is using a different label separator, you must configure the label_separator parameter while calling the fit() method.

By default, Universal Sentence Encoder Embeddings (USE) are used as sentence embeddings for training.

fitted_pipe = nlu.load('train.multi_classifier').fit(train_df)
preds = fitted_pipe.predict(train_df)

If you add a nlu sentence embeddings reference, before the train reference, NLU will use that Sentence embeddings instead of the default USE.

#Train on BERT sentence emebddings
fitted_pipe = nlu.load('embed_sentence.bert train.multi_classifier').fit(train_df)
preds = fitted_pipe.predict(train_df)

Configure a custom line seperator

#Use ; as label seperator
fitted_pipe = nlu.load('embed_sentence.electra train.multi_classifier').fit(train_df, label_seperator=';')
preds = fitted_pipe.predict(train_df)

NLU 1.0.6 Enhancements

Improved outputs for Toxic and E2E Classifier.
- by default, all predicted classes and their confidences that are above the threshold will be returned inside of a list in the Pandas dataframe
- by configuring meta=True, the confidences for all classes will be returned.

NLU 1.0.6 New Notebooks and Tutorials

NLU 1.0.6 Bug-fixes

Fixed a bug that caused en.ner.dl.bert to be inaccessible
Fixed a bug that caused pt.ner.large to be inaccessible
Fixed a bug that caused USE embeddings not being properly configured to document level output when using multiple embeddings at the same time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trainable Multi Label Classifiers, predict Stackoverflow Tags and achieve State Of the Art Results results in 1 Line of with NLU 1.0.6