This plugin lets you generate and store captions for your samples using state-of-the-art image captioning models.
This version of the plugin supports the following models:
- BLIP Base from Hugging Face
- BLIPv2 (via Replicate)
- Fuyu-8b from Adept AI (via Replicate)
- GiT from Hugging Face
- Llava-1.5-7b from Hugging Face
- Llava-13b (via Replicate)
- Qwen-vl-chat (via Replicate
- ViT-GPT2 from Hugging Face
Feel free to fork this plugin and add support for other models!
- If you plan to use it, install the Hugging Face transformers library:
pip install transformers
- If you plan to use it, install the Replicate library:
pip install replicate
And add your Replicate API key to your environment:
export REPLICATE_API_TOKEN=<your-api-token>
fiftyone plugins download https://github.com/jacobmarks/fiftyone-image-captioning-plugin
- Applies the selected image captioning model to the desired target view, and stores the resulting captions in the specified field on the samples.