This repository is the frontend code for a tool with which you can extract data from images using visual LLMs. The backend code (using fastapi) can be found here: AI-OCR.
To use the AI-OCR tool, it is best if you install both repositories, backend and frontend, i.e. follow these steps:
- Clone this repository for the backend
git clone https://github.com/jWinman91/AI-OCR.git
cd ai-ocr
- Install the required dependencies for the backend:
pip install -r requirements.txt
- Pull and run the coachdb docker file with the following command:
docker run -e COUCHDB_USER=admin -e COUCHDB_PASSWORD=JensIsCool -p 5984:5984 -d --name config_db couchdb:latest
- Clone the frontend repository
git clone https://github.com/jWinman91/AI-OCR-Frontend.git
cd ai-ocr-frondend
- Install the required dependencies for the frontend:
pip install -r requirements.txt
You can then start the backend by running:
python app.py $IP_ADDRESS
Make sure that the docker container for the coachdb is running.
Since, the backend uses fastapi, you could now try it out via the fastapi docs by going to $IP_ADDRESS:5000/docs
.
But you can also start the frontend now by running:
chmod +x start_up.sh
./start_up.sh
from within the cloned frontend repository.
A streamlit window will automaticall open in your browser. Within the web application you'll then find two pages on the sidebar:
- AI-OCR: Webpage for running the actual optical character recognition
- Model Configurations: Subpage for configuring the models (e.g. ChatGPT, Llava, ...)
- Streamlit - Python-Framework for frontend.
- Hugging Face - Framework for working with state-of-the-art natural language processing models.