Skip to content

Open AI Completion and Word Embeddings, Visual Cocument Dlassifcation, Bart and XLM-RoBerta Zero-Shot-Classification and more in John Snow Labs NLU 5.3.0

Compare
Choose a tag to compare
@C-K-Loan C-K-Loan released this 30 Apr 22:26
10ff7d7

We are very excited to announce NLU 5.3.0 has been released!
It features support for Open AI's Completion and Word Embeddings, alongside visual document classification, Bart and XLM RoBerta for Zero Shot Classification.


Open AI Completion

Tutorial Notebook
OpenAICompletion combines powers of OpenAI’s completion models with the robust NLP processing capabilities of Spark NLP. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages.
This annotator makes direct API calls to OpenAI’s Completion endpoint right from datasets. This enhancement promises to elevate the efficiency and versatility of data processing workflows within Spark NLP pipelines.
Powered by OpenAICompletion
Reference: OpenAI API Doc
Reference: OpenAICompletion Doc

nlu.load() reference Spark NLP Model reference
openai.completion OpenAICompletion

Open AI Embeddings

Tutorial Notebook
OpenAIEmbeddings combines powers of OpenAI’s embeddings model with the robust NLP processing capabilities of Spark NLP. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages.
This annotator makes direct API calls to OpenAI’s Embeddings endpoint right from datasets. This enhancement promises to elevate the efficiency and versatility of data processing workflows within Spark NLP pipelines.
Powered by OpenAIEmbeddings

nlu.load() reference Spark NLP Model reference
openai.embeddings OpenAIEmbeddings

Visual Document Classifier

Tutorial Notebook

The VisualDocumentClassifier is a DL model for document classification using text and layout data. The currently available pre-trained model on the Tobacco3482 dataset contains 3482 images belonging to 10 different classes (Resume, News, Note, Advertisement, Scientific, Report, Form, Letter, Email and Memo)

Powered By
VisualDocumentClassifier

Language nlu.load() reference Spark NLP Model reference
xx en.classify_image.tabacco visual_document_classifier_tobacco3482

Bart for Zero Shot Classificaiton

Tutorial Notebook

BartForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
The equivalent of BartForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
We used TFBartForSequenceClassification to train this model and used BartForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale
Powered by BartForZeroShotClassification

Language nlu.load() reference Spark NLP Model reference
English en.bart.zero_shot_classifier bart_large_zero_shot_classifier_mnli

XLM RoBerta For Zero Shot Classification

Tutorial Notebook
XlmRoBertaForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Equivalent of XlmRoBertaForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
We used TFXLMRobertaForSequenceClassification to train this model and used XlmRoBertaForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
Powered by XlmRoBertaForZeroShotClassification

Language nlu.load() reference Spark NLP Model reference
xx xx.xlm_roberta.zero_shot_classifier xlm_roberta_large_zero_shot_classifier_xnli_anli

Bugfixes

  • Fix bug loading Albert for Question Answering Models
  • Fix bug for predicting on imagefiles in Databricks

📖 Additional NLU resources


Installation

#PyPI
pip install nlu pyspark