Skip to content

Trainable Multi Label Classifiers, predict Stackoverflow Tags and achieve State Of the Art Results results in 1 Line of with NLU 1.0.6

Compare
Choose a tag to compare
@C-K-Loan C-K-Loan released this 02 Jan 15:59
· 1304 commits to master since this release
73cc744

NLU 1.0.6 Release Notes

Trainable Multi-Label Classifiers, predict Stackoverflow Tags and much more in 1 Line of Python Code with NLU 1.0.6

We are glad to announce NLU 1.0.6 has been released!
NLU 1.0.6 comes with the Multi-Label classifier, it can learn to map strings to multiple labels.
The Multi-Label Classifier is using Bidirectional GRU and CNN's inside TensorFlow and supports up to 100 classes.
We provide examples on how to train a Multi-Label classifier on the E2E dataset and on Stack Overflow Question Tags.

NLU 1.0.6 New Features

  • Multi-Label Classifier
    • The Multi-Label Classifier learns a 1 to many mapping between text and labels. This means it can predict multiple labels at the same time for a given input string. This is very helpful for tasks similar to content tag prediction (HashTags/RedditTags/YoutubeTags/Toxic/E2e etc..)
    • Support up to 100 classes
    • Pre-trained Multi Label Classifiers are already avaiable as Toxic and E2E classifiers

Multi Label Classifier

By default, Universal Sentence Encoder Embeddings (USE) are used as sentence embeddings for training.

fitted_pipe = nlu.load('train.multi_classifier').fit(train_df)
preds = fitted_pipe.predict(train_df)

If you add a nlu sentence embeddings reference, before the train reference, NLU will use that Sentence embeddings instead of the default USE.

#Train on BERT sentence emebddings
fitted_pipe = nlu.load('embed_sentence.bert train.multi_classifier').fit(train_df)
preds = fitted_pipe.predict(train_df)

Configure a custom line seperator

#Use ; as label seperator
fitted_pipe = nlu.load('embed_sentence.electra train.multi_classifier').fit(train_df, label_seperator=';')
preds = fitted_pipe.predict(train_df)

NLU 1.0.6 Enhancements

  • Improved outputs for Toxic and E2E Classifier.
    • by default, all predicted classes and their confidences that are above the threshold will be returned inside of a list in the Pandas dataframe
    • by configuring meta=True, the confidences for all classes will be returned.

NLU 1.0.6 New Notebooks and Tutorials

NLU 1.0.6 Bug-fixes

  • Fixed a bug that caused en.ner.dl.bert to be inaccessible
  • Fixed a bug that caused pt.ner.large to be inaccessible
  • Fixed a bug that caused USE embeddings not being properly configured to document level output when using multiple embeddings at the same time