This guide will walk you through the steps to install and run Katsu on your local machine.
NOTE: If you want to run as docker container, please refer to the docker guide
Before you start, make sure you have the following installed on your machine:
- Python 3.8 or later
- PostgreSQL
First, clone the repository to the folder where you want to install:
git clone https://github.com/CanDIG/katsu.git
It's a good idea to create a virtual environment for your Django project, to keep your dependencies separate from other Python projects on your machine:
# Linux
sudo apt-get install python3-venv # If needed
python3 -m venv .venv
source .venv/bin/activate
# macOS
python3 -m venv .venv
source .venv/bin/activate
# Windows
py -3 -m venv .venv
.venv\scripts\activate
With your virtual environment activated, navigate to the project directory and install the project dependencies:
pip install -r requirements/local.txt
This will install all the packages needed for development.
Install for macOS:
# Install PostgreSQL
brew install postgresql@16
# Set PATH environment variable
echo 'export PATH="/opt/homebrew/opt/postgresql@16/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
# Start PostgreSQL service
brew services start postgresql@16
# Log in to PostgreSQL
psql -d postgres
Install for Linux:
# Install PostgreSQL
sudo apt install postgresql
# Start PostgreSQL service
sudo systemctl start postgresql.service
# Log in to PostgreSQL
sudo -u postgres psql
Create a Role and Database:
CREATE ROLE admin_local WITH LOGIN PASSWORD 'password_local' CREATEDB CREATEROLE;
CREATE DATABASE katsu_local WITH OWNER = admin_local;
GRANT ALL PRIVILEGES ON DATABASE katsu_local TO admin_local;
-- Quit PostgreSQL
\q
Note: In some cases, existing PostgreSQL might cause issues. You may need to clean them up manually.
With the database configured, run the following command to create the necessary database tables:
python manage.py migrate
Start the server with the following command:
python manage.py runserver
This will start the development server on port 8000. You can access the server by opening a web browser and navigating to http://localhost:8000
Mock data are available here
Run all tests and linting checks for the whole project:
tox
Run all tests for mohpackets:
python manage.py test chord_metadata_service/mohpackets/tests/
Run each test:
python manage.py test chord_metadata_service.mohpackets.tests.endpoints.<test_name>
Test and create coverage
HTML report:
tox
coverage html
.
├── chord_metadata_service/
│ └── mohpackets/
│ ├── apis/
│ │ ├── clinical_data.py
│ │ ├── core.py
│ │ ├── discovery.py
│ │ └── ingestion.py
│ ├── data/
│ │ ├── ...
│ │ ├── data_loader.py
│ │ └── README.md
│ ├── docs/
│ │ ├── ...
│ │ ├── schema.json
│ │ └── README.md
│ ├── migrations/
│ ├── schemas/
│ │ ├── base.py
│ │ ├── discovery.py
│ │ ├── filter.py
│ │ ├── ingestion.py
│ │ ├── model.py
│ │ └── nested_data.py
│ ├── tests/
│ │ ├── endpoints/
│ │ │ ├── base.py
│ │ │ ├── test_donor.py
│ │ │ └── ...
│ │ ├── models/
│ │ │ ├── test_biomarker.py
│ │ │ └── ...
│ │ └── factories.py
│ ├── apps.py
│ ├── models.py
│ ├── pagination.py
│ ├── permissible_values.py
│ ├── signals.py
│ └── utils.py
├── config/
│ ├── settings/
│ │ ├── base.txt
│ │ ├── dev.txt
│ │ ├── local.txt
│ │ └── prod.txt
│ ├── asgi.py
│ ├── urls.py
│ └── wsgi.py
├── requirements/
│ ├── base.txt
│ ├── dev.txt
│ ├── local.txt
│ └── prod.txt
├── entrypoint.sh
├── katsu_wsgi.ini
├── manage.py
├── codecov.yml
├── Dockerfile
├── README.md
└── tox.ini
Katsu uses an underlying data model that is a compatible interpretation, but does not exactly match the MOHCCN data model. Katsu is currently compliant with version 3 of the model, released May 2024. Some relationships between objects have been modified to avoid excessive complexity in the katsu database and allow for the submission of data that is incomplete compared to the MOHCCN gold standard requirements. Permissable values for controlled fields are not validated by the underlying database.
The katsu MoH model is explicitly defined as a set of classes in models.py. Permissable values for controlled fields, conditionally required fields and relationships are enforced by the serializers, clinical ETL and ingest validation steps.
An overview diagram of how objects in the katsu model is shown below. A more detailed entity relationship diagram containing field level information can be found in the mohpackets docs folder
---
title: katsu object level MoH ER diagram
---
erDiagram
Program ||--o{ Donor : ""
Donor ||--o{ Comorbidity : ""
Donor ||--o{ Biomarker : ""
Donor ||--o{ Exposure : ""
Donor ||--o{ FollowUp : ""
Donor ||--o{ PrimaryDiagnosis : ""
Donor ||--o{ Treatment : ""
PrimaryDiagnosis ||--o{ Specimen : ""
PrimaryDiagnosis ||--o{ Treatment : ""
PrimaryDiagnosis o|--o{ FollowUp : ""
Specimen ||--o{ SampleRegistration : ""
Treatment ||--o| Radiation : ""
Treatment ||--o| Surgery : ""
Treatment ||--o{ Systemictherapy : ""
Treatment o|--o{ FollowUp : ""
- The primary key for Program is
program_id
and should be unique across all instances of the CanDIG platform - For all other objects, the primary key is the
submitter_<object_name>_id
, a user provided identifier that should be unique across all instances of an object within a program - All objects are explicitly linked with foreign keys to a Program and the Donor the object derives from.
- Biomarker is explicitly linked to Donor with a foreign key, it should also be linked to a specific clinical event by storing either a
specimen
,primary_diagnosis
,treatment
orfollow_up
submitter_id
in the Biomarker object. If it isn't linked to a clinical event, it should havetest_date
specified. - Surgery is explicitly linked with a foreign key to a Treatment, it can also store a
specimen_submitter_id
to indicate which specimen derived from the surgery, this is not a foreign key relationship
Katsu uses Tyk for authentication and OPA for authorization. The following diagram illustrates the flow of requests and responses:
sequenceDiagram
User->>+Katsu: Tyk send request (with auth token)
Katsu->>+OPA: Forward request
OPA-->>-Katsu: Send authorized datasets
Katsu-->>-User: Send response
Inside Katsu, the flow of the request/response can be broken down into the following steps:
-
Base Queryset: receives the GET request and generates a base queryset.
-
Discovery: counts the patients that matches the criteria.
-
Authorized: passes the request to OPA, which returns authorized datasets. Katsu then use it to generate only authorized data.
flowchart TD
A[GET Request] --> B{Base Queryset}
B -->|Discovery| C[Count Queryset]
B -->|Authorized| D[Authorized Queryset]
C --> E[Response]
D --> F[Response]
You can find the schema documentation here
The majority of CANDIG implementations can be found within the mohpackets folder.