Medical Text Generation, ConvNext for image Classification and DistilBert,Bert,Roberta for Zero-Shot Classification in John Snow Labs NLU 5.0.0
We are very excited to announce NLU 5.0.0 has been released!
It comes with ZeroShotClassification
models based on Bert
, DistilBert
, and Roberta
architectures.
Additionally Medical Text Generator based on Bio-GPT
as-well as a Bart
based General Text Generator are now available in NLU.
Finally, ConvNextForImageClassification
is an image classifier based on ConvNet models.
ConvNextForImageClassification
Tutorial Notebook
ConvNextForImageClassification
is an image classifier based on ConvNet models.
The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie. ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.
Powered by ConvNextForImageClassification
Reference: A ConvNet for the 2020s
New NLU Models:
Language | NLU Reference | Spark NLP Reference | Task | Annotator Class |
---|---|---|---|---|
en | en.classify_image.convnext.tiny | image_classifier_convnext_tiny_224_local | Image Classification | ConvNextImageClassifier |
en | en.classify_image.convnext.tiny | image_classifier_convnext_tiny_224_local | Image Classification | ConvNextImageClassifier |
DistilBertForZeroShotClassification
DistilBertForZeroShotClassification
using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by DistilBertForZeroShotClassification
New NLU Models:
Language | NLU Reference | Spark NLP Reference | Task | Annotator Class |
---|---|---|---|---|
en | en.distilbert.zero_shot_classifier | distilbert_base_zero_shot_classifier_uncased_mnli | Zero-Shot Classification | DistilBertForZeroShotClassification |
tr | tr.distilbert.zero_shot_classifier.multinli | distilbert_base_zero_shot_classifier_turkish_cased_multinli | Zero-Shot Classification | DistilBertForZeroShotClassification |
tr | tr.distilbert.zero_shot_classifier.allnli | distilbert_base_zero_shot_classifier_turkish_cased_allnli | Zero-Shot Classification | DistilBertForZeroShotClassification |
tr | tr.distilbert.zero_shot_classifier.snli | distilbert_base_zero_shot_classifier_turkish_cased_snli | Zero-Shot Classification | DistilBertForZeroShotClassification |
BertForZeroShotClassification
Tutorial Notebook
BertForZeroShotClassification
using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by BertForZeroShotClassification
New NLU Models:
Language | NLU Reference | Spark NLP Reference | Task | Annotator Class |
---|---|---|---|---|
en | en.bert.zero_shot_classifier | bert_base_cased_zero_shot_classifier_xnli | Zero-Shot Classification | BertForZeroShotClassification |
RoBertaForZeroShotClassification
Tutorial Notebook
RoBertaForZeroShotClassification
using a ModelForSequenceClassification trained on NLI (natural language inference) tasks.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis pair and passed to the pretrained model.
Powered by RoBertaForZeroShotClassification
New NLU Models:
Language | NLU Reference | Spark NLP Reference | Task | Annotator Class |
---|---|---|---|---|
en | en.roberta.zero_shot_classifier | roberta_base_zero_shot_classifier_nli | Zero-Shot Classification | RoBertaForZeroShotClassification |
BartTransformer
The Facebook BART (Bidirectional and Auto-Regressive Transformer)
model is a state-of-the-art language generation model that was introduced by Facebook AI in 2019. It is based on the transformer architecture and is designed to handle a wide range of natural language processing tasks such as text generation, summarization, and machine translation.
BART is unique in that it is both bidirectional and auto-regressive, meaning that it can generate text both from left-to-right and from right-to-left. This allows it to capture contextual information from both past and future tokens in a sentence,resulting in more accurate and natural language generation.
The model was trained on a large corpus of text data using a combination of unsupervised and supervised learning techniques. It incorporates pretraining and fine-tuning phases, where the model is first trained on a large unlabeled corpus of text, and then fine-tuned on specific downstream tasks.
BART has achieved state-of-the-art performance on a wide range of NLP tasks, including summarization, question-answering, and language translation. Its ability to handle multiple tasks and its high performance on each of these tasks make it a versatile and valuable tool for natural language processing applications.
Powered by BartTransformer
Reference : BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
New NLU Models:
MedicalTextGenerator
MedicalTextGenerator
uses the basic BioGPT model to perform various tasks related to medical text abstraction.
A user can provide a prompt and context and instruct the system to perform a specific task, such as explaining why a patient may have a particular disease or paraphrasing the context more directly.
In addition, this annotator can create a clinical note for a cancer patient using the given keywords or write medical texts based on introductory sentences.
The BioGPT model is trained on large volumes of medical data allowing it to identify and extract the most relevant information from the text provided.
Powered by TextGenerator
New NLU Models:
Language | NLU Reference | Spark NLP Reference | Task | Annotator Class |
---|---|---|---|---|
en | en.generate.biomedical_biogpt_base | text_generator_biomedical_biogpt_base | Text Generation | MedicalTextGenerator |
en | en.generate.generic_flan_base | text_generator_generic_flan_base | Text Generation | MedicalTextGenerator |
en | en.generate.generic_jsl_base | text_generator_generic_jsl_base | Text Generation | MedicalTextGenerator |
en | en.generate.generic_flan_t5_large | text_generator_generic_flan_t5_large | Text Generation | MedicalTextGenerator |
en | en.generate.biogpt_chat_jsl | biogpt_chat_jsl | Text Generation | MedicalTextGenerator |
en | en.generate.biogpt_chat_jsl_conversational | biogpt_chat_jsl_conversational | Text Generation | MedicalTextGenerator |
en | en.generate.biogpt_chat_jsl_conditions | biogpt_chat_jsl_conditions | Text Generation | MedicalTextGenerator |
Install NLU
pip install nlu pyspark
Additional NLU resources
- 140+ NLU Tutorials
- NLU in Action
- Streamlit visualizations docs
- The complete list of all 20000+ models & pipelines in 200+ languages is available on Models Hub.
- Spark NLP publications
- NLU documentation
- Discussions Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!