Support for Speech2Text, Images-Classification, Tabular Data, Zero-Shot-NER, via Wav2Vec2, Tapas, VIT , 4000+ New Models, 90+ Languages, in John Snow Labs NLU 4.2.0

We are incredibly excited to announce NLU 4.2.0 has been released with new 4000+ models in 90+ languages and support for new 8 Deep Learning Architectures.
4 new tasks are included for the very first time,
Zero-Shot-NER, Automatic Speech Recognition, Image Classification and Table Question Answering powered
by Wav2Vec 2.0, HuBERT, TAPAS, VIT, SWIN, Zero-Shot-NER.

Additionally, CamemBERT based architectures are available for Sequence and Token Classification powered by Spark-NLPs
CamemBertForSequenceClassification and CamemBertForTokenClassification

Automatic Speech Recognition (ASR)

Demo Notebook
Wav2Vec 2.0 and HuBERT enable ASR for the very first time in NLU.
Wav2Vec2 is a transformer model for speech recognition that uses unsupervised pre-training on large amounts of unlabeled speech data to improve the accuracy of automatic speech recognition (ASR) systems. It is based on a self-supervised learning approach that learns to predict masked portions of speech signal, and has shown promising results in reducing the amount of labeled training data required for ASR tasks.

These Models are powered by Spark-NLP's Wav2Vec2ForCTC Annotator

HuBERT models match or surpass the SOTA approaches for speech representation learning for speech recognition, generation, and compression. The Hidden-Unit BERT (HuBERT) approach was proposed for self-supervised speech representation learning, which utilizes an offline clustering step to provide aligned target labels for a BERT-like prediction loss.

These Models is powered by Spark-NLP's HubertForCTC Annotator

Usage

You just need an audio-file on disk and pass the path to it or a folder of audio-files.

import nlu
# Let's download an audio file 
!wget https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/audio/samples/wavs/ngm_12484_01067234848.wav
# Let's listen to it 
from IPython.display import Audio
FILE_PATH = "ngm_12484_01067234848.wav"
asr_df = nlu.load('en.speech2text.wav2vec2.v2_base_960h').predict('ngm_12484_01067234848.wav')
asr_df

text
PEOPLE WHO DIED WHILE LIVING IN OTHER PLACES

To test out HuBERT you just need to update the parameter for load()

asr_df = nlu.load('en.speech2text.hubert').predict('ngm_12484_01067234848.wav')
asr_df

Image Classification

Demo Notebook

For the first time ever NLU introduces state-of-the-art image classifiers based on
VIT and Swin giving you access to hundreds of image classifiers for various domains.

Inspired by the Transformer scaling successes in NLP, the researchers experimented with applying a standard Transformer directly to images, with the fewest possible modifications. To do so, images are split into patches and the sequence of linear embeddings of these patches were provided as an input to a Transformer. Image patches were actually treated the same way as tokens (words) in an NLP application. Image classification models were trained in supervised fashion.

You can check Scale Vision Transformers (ViT) Beyond Hugging Face article to learn deeper how ViT works and how it is implemeted in Spark NLP.
This is Powerd by Spark-NLP's VitForImageClassification Annotator

Swin is a hierarchical Transformer whose representation is computed with Shifted windows.
The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.
This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks
This is powerd by Spark-NLP's Swin For Image Classification
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

Usage:

# Download an image
os.system('wget https://raw.githubusercontent.com/JohnSnowLabs/nlu/release/4.2.0/tests/datasets/ocr/vit/ox.jpg') 
# Load VIT model and predict on image file
vit = nlu.load('en.classify_image.base_patch16_224').predict('ox.jpg')

Lets download a folder of images and predict on it

!wget -q https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/images/images.zip
import shutil
shutil.unpack_archive("images.zip", "images", "zip")
! ls /content/images/images/

Once we have image data its easy to label it, we just pass the folder with images to nlu.predict()
and NLU will return a pandas DF with one row per image detected

nlu.load('en.classify_image.base_patch16_224').predict('/content/images/images')

To use SWIN we just update the parameter to load()

load('en.classify_image.swin.tiny').predict('/content/images/images')

Visual Table Question Answering

TapasForQuestionAnswering can load TAPAS Models with a cell selection head and optional aggregation head on top for question-answering tasks on tables (linear layers on top of the hidden-states output to compute logits and optional logits_aggregation), e.g. for SQA, WTQ or WikiSQL-supervised tasks. TAPAS is a BERT-based model specifically designed (and pre-trained) for answering questions about tabular data.

Demo Notebook

Usage:

First we need a pandas dataframe on for which we want to ask questions. The so called "context"

import pandas as pd 

context_df = pd.DataFrame({
    'name':['Donald Trump','Elon Musk'], 
    'money': ['$100,000,000','$20,000,000,000,000'], 
    'married': ['yes','no'], 
    'age' : ['75','55'] })
context_df

Then we create an array of questions

questions = [
    "Who earns less than 200,000,000?",
    "Who earns more than 200,000,000?",
    "Who earns 100,000,000?",
    "How much money has Donald Trump?",
    "Who is the youngest?",
]
questions

Now Combine the data, pass it to NLU and get answers for your questions

import nlu
# Now we combine both to a tuple and we are done! We can now pass this to the .predict() method
tapas_data  = (context_df, questions)
# Lets load a TAPAS QA model and predict on (context,question). 
# It will give us an aswer for every question in the questions array, based on the context in context_df
answers = nlu.load('en.answer_question.tapas.wtq.large_finetuned').predict(tapas_data)
answers

sentence	tapas_qa_UNIQUE_aggregation	tapas_qa_UNIQUE_answer	tapas_qa_UNIQUE_cell_positions	tapas_qa_UNIQUE_cell_scores	tapas_qa_UNIQUE_origin_question
Who earns less than 200,000,000?	NONE	Donald Trump	[0, 0]	1	Who earns less than 200,000,000?
Who earns more than 200,000,000?	NONE	Elon Musk	[0, 1]	1	Who earns more than 200,000,000?
Who earns 100,000,000?	NONE	Donald Trump	[0, 0]	1	Who earns 100,000,000?
How much money has Donald Trump?	SUM	SUM($100,000,000)	[1, 0]	1	How much money has Donald Trump?
Who is the youngest?	NONE	Elon Musk	[0, 1]	1	Who is the youngest?

Zero-Shot NER

Demo Notebook
Based on John Snow Labs Enterprise-NLP ZeroShotNerModel
This architecture is based on RoBertaForQuestionAnswering.
Zero shot models excel at generalization, meaning that the model can accurately predict entities in very different data sets without the need to fine tune the model or train from scratch for each different domain.
Even though a model trained to solve a specific problem can achieve better accuracy than a zero-shot model in this specific task,
it probably won’t be be useful in a different task.
That is where zero-shot models shows its usefulness by being able to achieve good results in various domains.

Usage:

We just need to load the zero-shot NER model and configure a set of entity definitions.

import nlu 
# load zero-shot ner model
enterprise_zero_shot_ner = nlu.load('en.zero_shot.ner_roberta')

# Configure entity definitions
enterprise_zero_shot_ner['zero_shot_ner'].setEntityDefinitions(
    {
        "PROBLEM": [
            "What is the disease?",
            "What is his symptom?",
            "What is her disease?",
            "What is his disease?",
            "What is the problem?",
            "What does a patient suffer",
            "What was the reason that the patient is admitted to the clinic?",
        ],
        "DRUG": [
            "Which drug?",
            "Which is the drug?",
            "What is the drug?",
            "Which drug does he use?",
            "Which drug does she use?",
            "Which drug do I use?",
            "Which drug is prescribed for a symptom?",
        ],
        "ADMISSION_DATE": ["When did patient admitted to a clinic?"],
        "PATIENT_AGE": [
            "How old is the patient?",
            "What is the gae of the patient?",
        ],
    }
)

Then we can already use this pipeline to predict labels

# Predict entities
df = enterprise_zero_shot_ner.predict(
    [
        "The doctor pescribed Majezik for my severe headache.",
        "The patient was admitted to the hospital for his colon cancer.",
        "27 years old patient was admitted to clinic on Sep 1st by Dr."+
        "X for a right-sided pleural effusion for thoracentesis.",
    ]
)
df

document	entities_zero_shot	entities_zero_shot_class	entities_zero_shot_confidence	entities_zero_shot_origin_chunk
The doctor pescribed Majezik for my severe headache.	Majezik	DRUG	0.646716	0
The doctor pescribed Majezik for my severe headache.	severe headache	PROBLEM	0.552635	1
The patient was admitted to the hospital for his colon cancer.	colon cancer	PROBLEM	0.88985	0
27 years old patient was admitted to clinic on Sep 1st by Dr. X for a right-sided pleural effusion for thoracentesis.	27 years old	PATIENT_AGE	0.694308	0
27 years old patient was admitted to clinic on Sep 1st by Dr. X for a right-sided pleural effusion for thoracentesis.	Sep 1st	ADMISSION_DATE	0.956461	1
27 years old patient was admitted to clinic on Sep 1st by Dr. X for a right-sided pleural effusion for thoracentesis.	a right-sided pleural effusion for thoracentesis	PROBLEM	0.500266	2

New Notebooks

New Models Overview

Supported Languages are:
ab, am, ar, ba, bem, bg, bn, ca, co, cs, da, de, dv, el, en, es, et, eu, fa, fi, fon, fr, fy, ga, gam, gl, gu, ha, he, hi, hr, hu, id, ig, is, it, ja, jv, kin, kn, ko, kr, ku, ky, la, lg, lo, lt, lu, luo, lv, lwt, ml, mn, mr, ms, mt, nb, nl, no, pcm, pl, pt, ro, ru, rw, sg, si, sk, sl, sq, st, su, sv, sw, swa, ta, te, th, ti, tl, tn, tr, tt, tw, uk, unk, ur, uz, vi, wo, xx, yo, yue, zh, zu

Automatic Speech Recognition Models Overview

Language	NLU Reference	Spark NLP Reference	Annotator Class
ab	ab.speech2text.wav2vec_xlsr.gpu.by_hf_test	asr_xls_r_ab_test_by_hf_test_gpu	Wav2Vec2ForCTC
ba	ba.speech2text.wav2vec_xlsr.v2_large_300m_gpu	asr_wav2vec2_large_xls_r_300m_bashkir_cv7_opt_gpu	Wav2Vec2ForCTC
bem	bem.speech2text.wav2vec_xlsr.v2_large_gpu.by_csikasote	asr_wav2vec2_large_xlsr_bemba_gpu	Wav2Vec2ForCTC
bg	bg.speech2text.wav2vec_xlsr.v2_large_300m_d2_gpu	asr_wav2vec2_large_xls_r_300m_d2_gpu	Wav2Vec2ForCTC
ca	ca.speech2text.wav2vec2.voxpopuli.v2_large_gpu	asr_wav2vec2_large_100k_voxpopuli_catala_by_ccoreilly_gpu	Wav2Vec2ForCTC
cs	cs.speech2text.wav2vec_xlsr.v2_large.by_arampacha	asr_wav2vec2_large_xlsr_czech	Wav2Vec2ForCTC
da	da.speech2text.wav2vec2.v2_base	asr_alvenir_wav2vec2_base_nst_cv9	Wav2Vec2ForCTC
de	de.speech2text.wav2vec_xlsr.v3_large.by_marcel	asr_wav2vec2_large_xlsr_german_demo	Wav2Vec2ForCTC
el	el.speech2text.wav2vec_xlsr.v3_large_gpu.by_skylord	asr_wav2vec2_large_xlsr_greek_2_gpu	Wav2Vec2ForCTC
en	en.speech2text.wav2vec_xlsr.v2gpu.by_bakhtullah123	asr_xlsr_training_gpu	Wav2Vec2ForCTC
fa	fa.speech2text.wav2vec2.v2_gpu_s117_exp	asr_exp_w2v2t_pretraining_s117_gpu	Wav2Vec2ForCTC
fa	fa.speech2text.wav2vec_xlsr.v2_s44_exp	asr_exp_w2v2t_xls_r_s44	Wav2Vec2ForCTC
fi	fi.speech2text.wav2vec2.voxpopuli.v2_base	asr_wav2vec2_base_10k_voxpopuli	Wav2Vec2ForCTC
fi	fi.speech2text.wav2vec_xlsrby_aapot	asr_wav2vec2_xlsr_1b_finnish_lm_by_aapot	Wav2Vec2ForCTC
fon	fon.speech2text.wav2vec_xlsr	asr_fonxlsr	Wav2Vec2ForCTC
fr	fr.speech2text.wav2vec_xlsr.v2_s800_exp	asr_exp_w2v2t_xlsr_53_s800	Wav2Vec2ForCTC
gu	gu.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_gpu	Wav2Vec2ForCTC
hi	hi.speech2text.wav2vec2.by_harveenchadha	asr_hindi_model_with_lm_vakyansh	Wav2Vec2ForCTC
hi	hi.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_hindi_gpu	Wav2Vec2ForCTC
hu	hu.speech2text.wav2vec2.voxpopuli.v2_base_gpu	asr_wav2vec2_base_10k_voxpopuli_gpu	Wav2Vec2ForCTC
hu	hu.speech2text.wav2vec_xlsr.v2_large_gpu.by_gchhablani	asr_wav2vec2_large_xlsr_gpu	Wav2Vec2ForCTC
id	id.speech2text.wav2vec_xlsr.v2_s449_exp	asr_exp_w2v2t_xlsr_53_s449	Wav2Vec2ForCTC
it	it.speech2text.wav2vec2.v2_gpu_s149_vp_exp	asr_exp_w2v2t_vp_100k_s149_gpu	Wav2Vec2ForCTC
it	it.speech2text.wav2vec_xlsr.v2_s417_exp	asr_exp_w2v2t_xls_r_s417	Wav2Vec2ForCTC
ja	ja.speech2text.wav2vec_xlsr.v2_large	asr_wav2vec2_large_xlsr_japanese_hiragana	Wav2Vec2ForCTC
ko	ko.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_korean_gpu	Wav2Vec2ForCTC
kr	kr.speech2text.wav2vec_xlsr.v2	asr_wav2vec2_xlsr_korean_senior	Wav2Vec2ForCTC
kr	kr.speech2text.wav2vec_xlsr.v2_gpu	asr_wav2vec2_xlsr_korean_senior_gpu	Wav2Vec2ForCTC
ku	ku.speech2text.wav2vec_xlsr.gpu	asr_xlsr_kurmanji_kurdish_gpu	Wav2Vec2ForCTC
ky	ky.speech2text.wav2vec_xlsr.v2_large	asr_wav2vec2_large_xlsr_53_kyrgyz	Wav2Vec2ForCTC
ky	ky.speech2text.wav2vec_xlsr.v2_large_gpu.by_iarfmoose	asr_wav2vec2_large_xlsr_kyrgyz_by_iarfmoose_gpu	Wav2Vec2ForCTC
la	la.speech2text.wav2vec2.v2_base	asr_wav2vec2_base_latin	Wav2Vec2ForCTC
la	la.speech2text.wav2vec2.v2_base_gpu	asr_wav2vec2_base_latin_gpu	Wav2Vec2ForCTC
lg	lg.speech2text.wav2vec_xlsr.v2_multilingual_gpu	asr_wav2vec2_xlsr_multilingual_56_gpu	Wav2Vec2ForCTC
lt	lt.speech2text.wav2vec_xlsr.v2_large_gpu.by_dundar	asr_wav2vec2_large_xlsr_53_lithuanian_by_dundar_gpu	Wav2Vec2ForCTC
lv	lv.speech2text.wav2vec_xlsr.v2_large	asr_wav2vec2_large_xlsr_53_latvian	Wav2Vec2ForCTC
lv	lv.speech2text.wav2vec_xlsr.v2_large_gpu.by_jimregan	asr_wav2vec2_large_xlsr_latvian_gpu	Wav2Vec2ForCTC
mn	mn.speech2text.wav2vec_xlsr.v2_large_gpu.by_manandey	asr_wav2vec2_large_xlsr_mongolian_by_manandey_gpu	Wav2Vec2ForCTC
nl	nl.speech2text.wav2vec_xlsr.v2_s972_exp	asr_exp_w2v2t_xlsr_53_s972	Wav2Vec2ForCTC
pt	pt.speech2text.wav2vec_xlsr.voxforge1.gpu.by_lgris	asr_bp_voxforge1_xlsr_gpu	Wav2Vec2ForCTC
ro	ro.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_53_romanian_by_gmihaila_gpu	Wav2Vec2ForCTC
sg	sg.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_53_swiss_german_gpu	Wav2Vec2ForCTC
su	su.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_sundanese_gpu	Wav2Vec2ForCTC
sv	sv.speech2text.wav2vec_xlsr.v2_large_gpu.by_marma	asr_wav2vec2_large_xlsr_swedish_gpu	Wav2Vec2ForCTC
tt	tt.speech2text.wav2vec_xlsr.v2_large_small	asr_wav2vec2_large_xlsr_53_W2V2_TATAR_SMALL	Wav2Vec2ForCTC
tw	tw.speech2text.wav2vec_xlsr.v2	asr_wav2vec2large_xlsr_akan	Wav2Vec2ForCTC
uz	uz.speech2text.wav2vec2	asr_uzbek_stt	Wav2Vec2ForCTC
vi	vi.speech2text.wav2vec_xlsr.v2_large_gpu.by_not_tanh	asr_wav2vec2_large_xlsr_53_vietnamese_by_not_tanh_gpu	Wav2Vec2ForCTC
wo	wo.speech2text.wav2vec_xlsr.v2_300m_gpu	asr_av2vec2_xls_r_300m_wolof_lm_gpu	Wav2Vec2ForCTC
yue	yue.speech2text.wav2vec_xlsr.v2_large_gpu	asr_wav2vec2_large_xlsr_cantonese_by_ctl_gpu	Wav2Vec2ForCTC

Image Classification Models Overview

Language	NLU Reference	Spark NLP Reference	Annotator Class
en	en.classify_image.Check_GoodBad_Teeth	image_classifier_vit_Check_GoodBad_Teeth	ViTForImageClassification
en	en.classify_image.Check_Gum_Teeth	image_classifier_vit_Check_Gum_Teeth	ViTForImageClassification
en	en.classify_image.Check_Missing_Teeth	image_classifier_vit_Check_Missing_Teeth	ViTForImageClassification
en	en.classify_image.Infrastructures	image_classifier_vit_Infrastructures	ViTForImageClassification
en	en.classify_image.Insectodoptera	image_classifier_vit_Insectodoptera	ViTForImageClassification
en	en.classify_image.Tomato_Leaf_Classifier	image_classifier_vit_Tomato_Leaf_Classifier	ViTForImageClassification
en	en.classify_image.Visual_transformer_chihuahua_cookies	image_classifier_vit_Visual_transformer_chihuahua_cookies	ViTForImageClassification
en	en.classify_image._spectrogram	image_classifier_vit__spectrogram	ViTForImageClassification
en	en.classify_image.age_classifier	image_classifier_vit_age_classifier	ViTForImageClassification
en	en.classify_image.airplanes	image_classifier_vit_airplanes	ViTForImageClassification
en	en.classify_image.animal_classifier	image_classifier_vit_animal_classifier	ViTForImageClassification
en	en.classify_image.anomaly	image_classifier_vit_anomaly	ViTForImageClassification
en	en.classify_image.apes	image_classifier_vit_apes	ViTForImageClassification
en	en.classify_image.autotrain_cifar10__base	image_classifier_vit_autotrain_cifar10__base	ViTForImageClassification
en	en.classify_image.autotrain_dog_vs_food	image_classifier_vit_autotrain_dog_vs_food	ViTForImageClassification
en	en.classify_image.baked_goods	image_classifier_vit_baked_goods	ViTForImageClassification
en	en.classify_image.base_beans	image_classifier_vit_base_beans	ViTForImageClassification
en	en.classify_image.base_cats_vs_dogs	image_classifier_vit_base_cats_vs_dogs	ViTForImageClassification
en	en.classify_image.base_cifar10	image_classifier_vit_base_cifar10	ViTForImageClassification
en	en.classify_image.base_food101	image_classifier_vit_base_food101	ViTForImageClassification
en	en.classify_image.base_movie_scenes_v1	image_classifier_vit_base_movie_scenes_v1	ViTForImageClassification
en	en.classify_image.base_mri	image_classifier_vit_base_mri	ViTForImageClassification
en	en.classify_image.base_patch16_224	image_classifier_vit_base_patch16_224	ViTForImageClassification
en	en.classify_image.base_patch16_224.by_google	image_classifier_vit_base_patch16_224	ViTForImageClassification
en	en.classify_image.base_patch16_224_cifar10	image_classifier_vit_base_patch16_224_cifar10	ViTForImageClassification
en	en.classify_image.base_patch16_224_finetuned_eurosat	image_classifier_vit_base_patch16_224_finetuned_eurosat	ViTForImageClassification
en	en.classify_image.base_patch16_224_finetuned_kvasirv2_colonoscopy	image_classifier_vit_base_patch16_224_finetuned_kvasirv2_colonoscopy	ViTForImageClassification
en	en.classify_image.base_patch16_224_in21k_snacks	image_classifier_vit_base_patch16_224_in21k_snacks	ViTForImageClassification
en	en.classify_image.base_patch16_224_in21k_ucSat	image_classifier_vit_base_patch16_224_in21k_ucSat	ViTForImageClassification
en	en.classify_image.base_patch16_224_recylce_ft	image_classifier_vit_base_patch16_224_recylce_ft	ViTForImageClassification
en	en.classify_image.base_patch16_384	image_classifier_vit_base_patch16_384	ViTForImageClassification
en	en.classify_image.base_patch16_384.by_google	image_classifier_vit_base_patch16_384	ViTForImageClassification
en	en.classify_image.base_patch32_384.by_google	image_classifier_vit_base_patch32_384	ViTForImageClassification
en	en.classify_image.base_xray_pneumonia	image_classifier_vit_base_xray_pneumonia	ViTForImageClassification
en	en.classify_image.baseball_stadium_foods	image_classifier_vit_baseball_stadium_foods	ViTForImageClassification
en	en.classify_image.beer_vs_wine	image_classifier_vit_beer_vs_wine	ViTForImageClassification
en	en.classify_image.beer_whisky_wine_detection	image_classifier_vit_beer_whisky_wine_detection	ViTForImageClassification
en	en.classify_image.blocks	image_classifier_vit_blocks	ViTForImageClassification
en	en.classify_image.cifar10	image_classifier_vit_cifar10	ViTForImageClassification
en	en.classify_image.cifar_10_2	image_classifier_vit_cifar_10_2	ViTForImageClassification
en	en.classify_image.computer_stuff	image_classifier_vit_computer_stuff	ViTForImageClassification
en	en.classify_image.croupier_creature_classifier	image_classifier_vit_croupier_creature_classifier	ViTForImageClassification
en	en.classify_image.deit_base_patch16_224	image_classifier_vit_deit_base_patch16_224	ViTForImageClassification
en	en.classify_image.deit_base_patch16_224.by_facebook	image_classifier_vit_deit_base_patch16_224	ViTForImageClassification
en	en.classify_image.deit_flyswot	image_classifier_vit_deit_flyswot	ViTForImageClassification
en	en.classify_image.deit_small_patch16_224	image_classifier_vit_deit_small_patch16_224	ViTForImageClassification
en	en.classify_image.deit_small_patch16_224.by_facebook	image_classifier_vit_deit_small_patch16_224	ViTForImageClassification
en	en.classify_image.deit_tiny_patch16_224	image_classifier_vit_deit_tiny_patch16_224	ViTForImageClassification
en	en.classify_image.deit_tiny_patch16_224.by_facebook	image_classifier_vit_deit_tiny_patch16_224	ViTForImageClassification
en	en.classify_image.demo	image_classifier_vit_demo	ViTForImageClassification
en	en.classify_image.denver_nyc_paris	image_classifier_vit_denver_nyc_paris	ViTForImageClassification
en	en.classify_image.diam	image_classifier_vit_diam	ViTForImageClassification
en	en.classify_image.digital	image_classifier_vit_digital	ViTForImageClassification
en	en.classify_image.dog	image_classifier_vit_dog	ViTForImageClassification
en	en.classify_image.dog_breed_classifier	image_classifier_vit_dog_breed_classifier	ViTForImageClassification
en	en.classify_image.dog_food__base_patch16_224_in21k	image_classifier_vit_dog_food__base_patch16_224_in21k	ViTForImageClassification
en	en.classify_image.dog_races	image_classifier_vit_dog_races	ViTForImageClassification
en	en.classify_image.dog_vs_chicken	image_classifier_vit_dog_vs_chicken	ViTForImageClassification
en	en.classify_image.doggos_lol	image_classifier_vit_doggos_lol	ViTForImageClassification
en	en.classify_image.dogs	image_classifier_vit_dogs	ViTForImageClassification
en	en.classify_image.dwarf_goats	image_classifier_vit_dwarf_goats	ViTForImageClassification
en	en.classify_image.electric_2	image_classifier_vit_electric_2	ViTForImageClassification
en	en.classify_image.electric_pole_type_classification	image_classifier_vit_electric_pole_type_classification	ViTForImageClassification
en	en.classify_image.ex_for_evan	image_classifier_vit_ex_for_evan	ViTForImageClassification
en	en.classify_image.finetuned_eurosat_kornia	image_classifier_vit_finetuned_eurosat_kornia	ViTForImageClassification
en	en.classify_image.flowers	image_classifier_vit_flowers	ViTForImageClassification
en	en.classify_image.food	image_classifier_vit_food	ViTForImageClassification
en	en.classify_image.fruits	image_classifier_vit_fruits	ViTForImageClassification
en	en.classify_image.garbage_classification	image_classifier_vit_garbage_classification	ViTForImageClassification
en	en.classify_image.grain	image_classifier_vit_grain	ViTForImageClassification
en	en.classify_image.greens	image_classifier_vit_greens	ViTForImageClassification
en	en.classify_image.hot_dog_or_sandwich	image_classifier_vit_hot_dog_or_sandwich	ViTForImageClassification
en	en.classify_image.hotdog_not_hotdog	image_classifier_vit_hotdog_not_hotdog	ViTForImageClassification
en	en.classify_image.housing_categories	image_classifier_vit_housing_categories	ViTForImageClassification
en	en.classify_image.hugging_geese	image_classifier_vit_hugging_geese	ViTForImageClassification
en	en.classify_image.ice_cream	image_classifier_vit_ice_cream	ViTForImageClassification
en	en.classify_image.iiif_manuscript_	image_classifier_vit_iiif_manuscript_	ViTForImageClassification
en	en.classify_image.indian_snacks	image_classifier_vit_indian_snacks	ViTForImageClassification
en	en.classify_image.koala_panda_wombat	image_classifier_vit_koala_panda_wombat	ViTForImageClassification
en	en.classify_image.lawn_weeds	image_classifier_vit_lawn_weeds	ViTForImageClassification
en	en.classify_image.llama_alpaca_guanaco_vicuna	image_classifier_vit_llama_alpaca_guanaco_vicuna	ViTForImageClassification
en	en.classify_image.llama_alpaca_snake	image_classifier_vit_llama_alpaca_snake	ViTForImageClassification
en	en.classify_image.llama_or_potato	image_classifier_vit_llama_or_potato	ViTForImageClassification
en	en.classify_image.llama_or_what	image_classifier_vit_llama_or_what	ViTForImageClassification
en	en.classify_image.lotr	image_classifier_vit_lotr	ViTForImageClassification
en	en.classify_image.lucky_model	image_classifier_vit_lucky_model	ViTForImageClassification
en	en.classify_image.lung_cancer	image_classifier_vit_lung_cancer	ViTForImageClassification
en	en.classify_image.mit_indoor_scenes	image_classifier_vit_mit_indoor_scenes	ViTForImageClassification
en	en.classify_image.modelversion01	image_classifier_vit_modelversion01	ViTForImageClassification
en	en.classify_image.my_bean_VIT	image_classifier_vit_my_bean_VIT	ViTForImageClassification
en	en.classify_image.new_york_tokyo_london	image_classifier_vit_new_york_tokyo_london	ViTForImageClassification
en	en.classify_image.occupation_prediction	image_classifier_vit_occupation_prediction	ViTForImageClassification
en	en.classify_image.opencampus_age_detection	image_classifier_vit_opencampus_age_detection	ViTForImageClassification
en	en.classify_image.orcs_and_friends	image_classifier_vit_orcs_and_friends	ViTForImageClassification
en	en.classify_image.oz_fauna	image_classifier_vit_oz_fauna	ViTForImageClassification
en	en.classify_image.pasta_pizza_ravioli	image_classifier_vit_pasta_pizza_ravioli	ViTForImageClassification
en	en.classify_image.pasta_shapes	image_classifier_vit_pasta_shapes	ViTForImageClassification
en	en.classify_image.places	image_classifier_vit_places	ViTForImageClassification
en	en.classify_image.planes_airlines	image_classifier_vit_planes_airlines	ViTForImageClassification
en	en.classify_image.planes_trains_automobiles	image_classifier_vit_planes_trains_automobiles	ViTForImageClassification
en	en.classify_image.puppies_classify	image_classifier_vit_puppies_classify	ViTForImageClassification
en	en.classify_image.rare_bottle	image_classifier_vit_rare_bottle	ViTForImageClassification
en	en.classify_image.roomclassifier	image_classifier_vit_roomclassifier	ViTForImageClassification
en	en.classify_image.rust_image_classification_1	image_classifier_vit_rust_image_classification_1	ViTForImageClassification
en	en.classify_image.skin_type	image_classifier_vit_skin_type	ViTForImageClassification
en	en.classify_image.snacks	image_classifier_vit_snacks	ViTForImageClassification
en	en.classify_image.south_indian_foods	image_classifier_vit_south_indian_foods	ViTForImageClassification
en	en.classify_image.string_instrument_detector	image_classifier_vit_string_instrument_detector	ViTForImageClassification
en	en.classify_image.vc_bantai__withoutAMBI_adunest	image_classifier_vit_vc_bantai__withoutAMBI_adunest	ViTForImageClassification
en	en.classify_image.trainer_rare_puppers	image_classifier_vit_trainer_rare_puppers	ViTForImageClassification
en	en.classify_image.world_landmarks	image_classifier_vit_world_landmarks	ViTForImageClassification

Install NLU

pip install nlu pyspark

Additional NLU resources

140+ NLU Tutorials
NLU in Action
Streamlit visualizations docs
The complete list of all 4000+ models & pipelines in 200+ languages is available on Models Hub.
Spark NLP publications
NLU documentation
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP and NLU!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Speech2Text, Images-Classification, Tabular Data, Zero-Shot-NER, via Wav2Vec2, Tapas, VIT , 4000+ New Models, 90+ Languages, in John Snow Labs NLU 4.2.0

Support for Speech2Text, Images-Classification, Tabular Data, Zero-Shot-NER, via Wav2Vec2, Tapas, VIT , 4000+ New Models, 90+ Languages, in John Snow Labs NLU 4.2.0

Automatic Speech Recognition (ASR)

Image Classification

Visual Table Question Answering

Zero-Shot NER

New Notebooks

New Models Overview

Automatic Speech Recognition Models Overview

Image Classification Models Overview

Install NLU

Additional NLU resources