Our preferred method for running the code locally is through a version-controlled environment with Conda.
To setup a conda environment that contains both Python and all the necessary dependencies, we advise you to use a minimal version of conda, aptly named miniconda.
Here, based on your operating system, we choose an installer that has python 3.10 pre-installed. To illustrate, we would install the following on Windows:
and the following on Linux:
After installation, you will need to open up your terminal and create your environment as follows:
conda create -n thellmbook python=3.10
We will need to activate the environment first before we can use it:
conda activate thellmbook
After creating our environment, we will need to install all the dependencies. If you created your environment using the environment.yml
file, you can skip over this.
There are two methods that you can follow to install the dependencies:
- Install all dependencies with
requirements.txt
which requires Microsoft Visual C++ 14.0 - Install base depedencies with
requirements_base.txt
which requires specific installations in certain chapters
The first method, is to directly install all dependencies (aside from Chapter 11) using the requirements.txt
by running the following from the root of this repository:
pip install -r requirements.txt
This should install all necessary dependencies in the environment we just created.
Tip
If pip install -r requirements.txt is throwing an error, run this which will resolve the error
pip install --upgrade pip
Tip
The requirements.txt
file pins versions of dependencies for reproducibility. However, this might mean you are missing out
on new features of many of the packages. You can also use requirements_min.txt
instead that will install all the latest versions.
Do note that this might break certain examples as the API of these packages can change over time.
Warning
If you get the following error error: Microsoft Visual C++ 14.0 or greater is required.
then you will need to install C++.
Follow the instructions here for an installation guide before you can install your environment.
If you run into issues with the requirements.txt
file, you can also install a base set of dependencies that are installed throughout the book:
pip install -r requirements_base.txt
The missing dependencies can be installed by following the instructions in the README in each chapter's folder. Or you can install them all at once:
# Install BERTopic and annoy through conda to prevent additional C++ installations
# conda config --add channels conda-forge
conda config --append channels conda-forge
conda install bertopic=0.16.0 python-annoy=1.17.2
This allows you to have more flexibility over supported packages and some that might go out of support at some point.
Now that we have installed all necessary dependencies, you might want to update one specific dependency, namely PyTorch. Depending on your system, PyTorch might install a CPU-based version and for most of the example, we will need to make use of the GPU.
If you go to the official PyTorch website, then you'll find on the frontpage the current guideline for installing the package:
There, you can choose which CUDA version you need (it is typically advised to choose the default). Copy the lines for pip installation and run them in your terminal:
pip3 install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Note that wes added the --upgrade
tag here to make sure the CPU-version of PyTorch is overwritten with the GPU-version.
After having installed all necessary packages, you can then use Jupyter Lab (or any other notebook backend) to run all of the notebooks associated with each chapter. You can start Jupyter Lab directly from the terminal:
jupyter lab
When you start running each notebook, make sure to check whether you have selected the correct environment. You can do so by selecting the "ipykernel" on the top right:
You will then see a screen that allows you to select the "thellmbook" environment from the list:
To validate whether this worked, you can check if the selected environment has access to a GPU:
import torch
torch.cuda.is_available()
or by checking the name of the current conda environment:
import sys
import os
# Get the path to the current conda environment
python_path = sys.executable
env_path = os.path.dirname(os.path.dirname(python_path))
env_name = os.environ.get('CONDA_DEFAULT_ENV')
env_name