It is supposed that the user has some prior knowledge about aws.
- Install Terraform
- Install AWS CLI
- Setup AWS credentials
aws configure
Preferred. This deployment will assume that aws configure is used.- Need
aws_access_key_id and aws_secret_access_key
key values; inside~/.aws/credentials
- Export env variables for keys mentioned in .env.example into shell session.
- example:
export TF_VAR_aws_creds_path="**********" TF_VAR_aws_region="**********" TF_VAR_accountId="**********"
- example:
- Deploy using
bash deploy.sh
. The bash script does the following things:- terraform init
- terrafrom plan
- terraform apply
- Use the
subsettingTool.postman_collection.json
postman collection to test.
After terraform finishes building the Subsetting tool infrastructure, it outputs env varaibles that can be used in the frontend.
- For
<sensitive>
as output value, useterraform output <key_name>
- Use
terraform destroy
The subsetting tool has 3 parts:
-
Core subsetting tool: This part deals with subsetting the actual instrument values. It uses multiple subsetting lambda workers for different instruments. Raw data is pulled first from s3 SOURCE_BUCKET, processed and is finally stored in DESTINATION_BUCKET (SUBSET_OUTPUT_BUCKET)
-
Progress bar: This part deals with setting up two way communication between frontend and subset workers using Websockets. It uses Websocket IDs to differentiate connections uniquely in a serverless architecture. Dynamodb is used.
-
Subsets direct download: This part deals with exposing the privately stored subsets and making it directly downloadable using the CDN (Cloudfront).
The codebase is segregated based on the above mentioned parts.