Open AI Completion and Word Embeddings, Visual Cocument Dlassifcation, Bart and XLM-RoBerta Zero-Shot-Classification and more in John Snow Labs NLU 5.3.0
We are very excited to announce NLU 5.3.0 has been released!
It features support for Open AI's Completion and Word Embeddings, alongside visual document classification, Bart and XLM RoBerta for Zero Shot Classification.
Open AI Completion
Tutorial Notebook
OpenAICompletion combines powers of OpenAI’s completion models with the robust NLP processing capabilities of Spark NLP. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages.
This annotator makes direct API calls to OpenAI’s Completion endpoint right from datasets. This enhancement promises to elevate the efficiency and versatility of data processing workflows within Spark NLP pipelines.
Powered by OpenAICompletion
Reference: OpenAI API Doc
Reference: OpenAICompletion Doc
nlu.load() reference | Spark NLP Model reference |
---|---|
openai.completion | OpenAICompletion |
Open AI Embeddings
Tutorial Notebook
OpenAIEmbeddings combines powers of OpenAI’s embeddings model with the robust NLP processing capabilities of Spark NLP. This integration not only ensures the utilization of OpenAI's capabilities but also capitalizes on Spark's inherent scalability advantages.
This annotator makes direct API calls to OpenAI’s Embeddings endpoint right from datasets. This enhancement promises to elevate the efficiency and versatility of data processing workflows within Spark NLP pipelines.
Powered by OpenAIEmbeddings
nlu.load() reference | Spark NLP Model reference |
---|---|
openai.embeddings | OpenAIEmbeddings |
Visual Document Classifier
The VisualDocumentClassifier is a DL model for document classification using text and layout data. The currently available pre-trained model on the Tobacco3482 dataset contains 3482 images belonging to 10 different classes (Resume, News, Note, Advertisement, Scientific, Report, Form, Letter, Email and Memo)
Powered By
VisualDocumentClassifier
Language | nlu.load() reference | Spark NLP Model reference |
---|---|---|
xx | en.classify_image.tabacco | visual_document_classifier_tobacco3482 |
Bart for Zero Shot Classificaiton
BartForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
The equivalent of BartForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
We used TFBartForSequenceClassification to train this model and used BartForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale
Powered by BartForZeroShotClassification
Language | nlu.load() reference | Spark NLP Model reference |
---|---|---|
English | en.bart.zero_shot_classifier | bart_large_zero_shot_classifier_mnli |
XLM RoBerta For Zero Shot Classification
Tutorial Notebook
XlmRoBertaForZeroShotClassification using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Equivalent of XlmRoBertaForSequenceClassification models, but these models don’t require a hardcoded number of potential classes, they can be chosen at runtime. It usually means it’s slower but it is much more flexible.
We used TFXLMRobertaForSequenceClassification to train this model and used XlmRoBertaForZeroShotClassification annotator in Spark NLP 🚀 for prediction at scale!
Powered by XlmRoBertaForZeroShotClassification
Language | nlu.load() reference | Spark NLP Model reference |
---|---|---|
xx | xx.xlm_roberta.zero_shot_classifier | xlm_roberta_large_zero_shot_classifier_xnli_anli |
Bugfixes
- Fix bug loading Albert for Question Answering Models
- Fix bug for predicting on imagefiles in Databricks
📖 Additional NLU resources
- 140+ NLU Tutorials
- Streamlit visualizations docs
- The complete list of all 20000+ models & pipelines in 300+ languages is available on Models Hub
- Spark NLP publications
- NLU documentation
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!
Installation
#PyPI
pip install nlu pyspark