This repository contains PyTorch implemetation of the paper Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning !
If you find our paper or provided codes helpful in your research, then please do not forget to cite our paper. Thank you!
The following architecture represents our proposed model LATGeO for Image Captioning.
-python 3.8.8 -pysimplegui 4.47.0 -pytorch 1.8.1 -torchvision 0.9.1 -numpy 1.18.5 -h5py 2.10.0 -cython 0.29.23 -cudatoolkit 11.1.74 -pillow 8.2.0 -protobuf 3.17.3 -scipy 1.4.1 -tensorboard 2.4.0 -tensorflow-gpu 2.3.0 -spacy 3.0.6 -python 3.8.8 -requests 2.24.0 -tqdm 4.60.0
Detection using RCCN: Follow the installation instructions provided by Bottom-Up.
The testing.py code is provided for predicting the detection of the provided image.
We have also provided the Jupyter Notebook for better visualization of the predicted captions.
Two Jupyter Notebooks are provided :
- test_DETR-LATGeO.ipynb
- test_RCNN-LATGeO.ipynb
We have also provided the code for the GUI-Demo of our method.
Following are some results after running this GUI-Demo file "GUI_Demo_LATGeO_RCNN.py"
You could use the provided GUI-Demo code for your application as well.
Model | BLEU-1 | BLEU-4 | METEOR | ROUGE-L | SPICE | CIDEr-D |
---|---|---|---|---|---|---|
LATGeO | 76.5 | 36.4 | 27.8 | 56.7 | - | 115.8 |
LATGeO + RL | 81.0 | 38.8 | 29.2 | 58.7 | 22.9 | 131.7 |
Please cite the following BibTex:
@misc{dubey2021labelattention, title={Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning}, author={Shikha Dubey and Farrukh Olimov and Muhammad Aasim Rafique and Joonmo Kim and Moongu Jeon}, year={2021}, eprint={2109.07799}, archivePrefix={arXiv}, primaryClass={cs.CV} }
If you find the paper and this repository helpful, please consider citing our paper LATGeO. Thank you!
This project is licensed under Machine Learning & Vision Laboratory (MLV Lab), GIST.
We would like to thanks AImageLab, peteanderson80 and facebookresearch teams.