A simple CLI interface written to help my dad produce test questions more effectively from past year papers. Implements semantic splitting for improved retrieval accuracy (compared to recursive splitting), as well as a fully local querying system using Ollama. Works only with PDF files.
Note: This was produced as a means of self-familiarisation with Python. If you want an actual RAG system (local and cloud deployable) with a clean UI, private-gpt and Quivr are definitely better alternatives.
-
Install Ollama and ensure that the binary is added to
$PATH
. This is the default provider. The default embedding model isbge-m3
and the default language model isphi3:14b-medium-4k-instruct-q4_0
. These may be changed indefaults.py
. -
Configure a virtual environment for the project using
venv
orconda
. Use python version >= 3.12, as newline characters are used in f-strings.
conda create --prefix ./.conda python=3.12
- Subsequently, install required dependencies using
pip
.
pip install -r requirements.txt
- Most customisations may be performed using CLI flags for
refresh_db.py
andquery_data.py
(defined incli_flags.py
). The only variable which cannot be specified in this way is the prompt template (accessible underdefaults.py
) because it is unnecessarily long to include as a CLI argument. Pointing to a filepath is a valid alternative, but editing the template indefaults.py
directly is more straightfoward than having an external file.
-
The two main scripts used in the CLI are
refresh_db.py
andquery_data.py
. The purposes of these files are self explanatory. These files may be run by python with the-h
flag to print relevant CLI flags. -
The CLI flags are specified in
cli_flags.py
. A separate module was written because both of the two files above may share common flags. -
To add more splitting methods and model providers,
split_methods.py
andmodel_providers.py
may be edited. -
Finally, a collation of default settings are provided in
defaults.py
.
The scripts are written such that alternative models, model providers and splitting methods may be easily specified via the CLI. It is also easy to add on more options for these, by importing necessary packages and modifying/adding methods under model_providers.py
and split_methods.py
respectively. The new methods can then be added to the CLI by editing the appropriate choice list variables in the respective files.