A secure workflow for the transfer, storing and processing of sensitive data. This is an implementation of "A Secure Workflow for Shared HPC Systems" at GWDG.
This Secure HPC environment enables the processing of sensitive data such as sensitive medical data on shared HPC Systems.
In a typical user workflow, the user logs in to the frontend and uploads sensitive data. A batch script for processing the data on the compute nodes is run if the user is authorised with a valid UID
. The processed data is then to be transferred back. This workflow is problematic since it is vulnerable to attacks at several places (for example, if an attacker gains root privileges at the user-end) . The secure workflow ensures security by encrypting data, securing job dependencies in encrypted containers, and using encrypting batch script. Furthermore, a separate Key Server is used for managing keys required for de/encryption.
In this git repo we have
- Client: Client-side files. Creation of data containers and keys, encryption of the batch script and exeution(?) on hpc server.
- Server: Decryption of data and batch file, execution of the batch script, prepating output data container.
- Tutorial: A tutorial for training users in the Secure HPC workflow. Contains
JobTemplate/
with scripts for implementing client-side secure workflow on a VM.
- A user with
UID
logs into the front end and uploads a LUKS [1] data container. - The batch script is encrpyted and uploaded. Keys are uploaded to the key management server managed by
Vault
[2]. - Identity on the (?) server is verified via an access token. The batch script is decrypted and run on the hpc cluster. Jobs are managed by Slurm[3].
- Output data is mounted for use. (To mention mounting/unmounting of data container and file system? Is there decryption again of the outdata?)