Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Add dataflow integration test #226

Merged
merged 67 commits into from
Feb 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b2593be
add build-review-app workflow
cisaacstern Jan 24, 2023
941f8ac
prebuild Dockerfile and use Dockerfile.heroku for Heroku
cisaacstern Jan 24, 2023
dee1f38
delete review app workflow first commit
cisaacstern Jan 25, 2023
3b47970
fix delete review app on: block
cisaacstern Jan 25, 2023
f12d737
actually make delete review app workflow work
cisaacstern Jan 25, 2023
b11cf40
fix indentation in delete workflow, run build on sync
cisaacstern Jan 25, 2023
122877b
fix branch name for review app
cisaacstern Jan 25, 2023
95fbe3d
use + instead of , to remove whitespace in env
cisaacstern Jan 25, 2023
8abf23c
use head sha for review app source blob
cisaacstern Jan 25, 2023
2404466
add dev-app-proxy secrets
cisaacstern Jan 27, 2023
e6cdd01
PANGEO_FORGE_DEPLOYMENT=dev-app-proxy for release and run scripts
cisaacstern Jan 27, 2023
19b4122
copy workdir into image at heroku build time
cisaacstern Jan 27, 2023
454ba8a
mv git clone into Dockerfile.heroku
cisaacstern Jan 27, 2023
18bca73
addition seed data for postdeploy
cisaacstern Jan 28, 2023
f2b8462
specify dockerfile in docker-compose (for tests)
cisaacstern Jan 28, 2023
2912fae
fix docker-compose build declaration
cisaacstern Jan 28, 2023
d45a669
setup bakery config for dev-app-proxy
cisaacstern Jan 28, 2023
1fa20fc
branching logic for review app webhook url
cisaacstern Jan 28, 2023
fb25e6f
test-dataflow-integration workflow first commit
cisaacstern Feb 2, 2023
f5196f9
integration test workflow WIP cont
cisaacstern Feb 2, 2023
f4eae11
try to fix IS_UP step if block
cisaacstern Feb 2, 2023
342568a
for deployment_status event, check if pr has label
cisaacstern Feb 3, 2023
f4ead60
quote context variables
cisaacstern Feb 3, 2023
7ea643d
fix str to bool concat issue
cisaacstern Feb 3, 2023
591c49c
poll review app until it's ready
cisaacstern Feb 3, 2023
8473e62
quote context variable (again)
cisaacstern Feb 3, 2023
028dd54
escape json quotes
cisaacstern Feb 3, 2023
d03b933
actual pytest test WIP first commit
cisaacstern Feb 3, 2023
951b286
test dataflow integration WIP cont
cisaacstern Feb 7, 2023
37e182d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 7, 2023
519bf8f
pr label fixturing first pass
cisaacstern Feb 7, 2023
f81e94e
fixtures successfully make real pr
cisaacstern Feb 7, 2023
e23c827
make gh_token SecretStr to prevent leaking in logs
cisaacstern Feb 8, 2023
f390fff
add authentication for test cleanup
cisaacstern Feb 8, 2023
1639153
label pr in between adding files
cisaacstern Feb 8, 2023
f0b4307
make /run comment on test pr
cisaacstern Feb 9, 2023
75a6d02
in job submission calledprocesserror, recipe_run.message might be none
cisaacstern Feb 9, 2023
9cbd07f
sleep for 60s after triggering dataflow job
cisaacstern Feb 9, 2023
f83fd1f
run test from github workflow
cisaacstern Feb 9, 2023
349c8a6
fix quote escaping
cisaacstern Feb 9, 2023
7c46072
install dependencies in test workflow
cisaacstern Feb 9, 2023
f468bd1
specify DATABASE_URL
cisaacstern Feb 9, 2023
8527b13
install sops in workflow
cisaacstern Feb 10, 2023
55c2557
rewrite test w/out importing pangeo_forge_orchestrator
cisaacstern Feb 10, 2023
fca1c4d
remove orchestrator dependency from dataflow test
cisaacstern Feb 10, 2023
edc4d92
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 10, 2023
418fd2e
replace \n with \n when reading key from env
cisaacstern Feb 10, 2023
3e9542c
quote rsa key in workflow
cisaacstern Feb 10, 2023
449b89e
PyJWT for test, not jwt
cisaacstern Feb 10, 2023
936d7ef
integration test sequence diagram first draft
cisaacstern Feb 10, 2023
cd8a050
attempted fixtures refactor WIP
cisaacstern Feb 10, 2023
37a9954
ensure pr fixture gives correct sha, add logging
cisaacstern Feb 11, 2023
0e06182
add gcp dataflow readonly creds to integration test
cisaacstern Feb 13, 2023
17deec1
test job status notification comment
cisaacstern Feb 14, 2023
040cf64
variablize recipe_id
cisaacstern Feb 14, 2023
ed945ff
consolidate source recipe into single var/fixture
cisaacstern Feb 14, 2023
e6d9e42
always assume reference pr is on test-staged-recipes
cisaacstern Feb 14, 2023
15b98b4
generate prs matrix
cisaacstern Feb 14, 2023
f97ed76
add also-test block to matrix generator
cisaacstern Feb 14, 2023
05ad3a8
for pr event, get also-test via http request
cisaacstern Feb 15, 2023
8d24d3c
single quotes in python command
cisaacstern Feb 15, 2023
5f52bcd
label 'name' field, got me again
cisaacstern Feb 15, 2023
9b2f548
split labels on 'also-test:' string
cisaacstern Feb 15, 2023
5896a3c
use job-specific identifier for ref name
cisaacstern Feb 15, 2023
29512a1
use working branch name in pr title
cisaacstern Feb 15, 2023
5bdeaa3
integration test docs
cisaacstern Feb 16, 2023
7eef128
Merge remote-tracking branch 'origin/main' into test-dataflow
cisaacstern Feb 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .github/workflows/build-review-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Build Review App

on:
pull_request:
branches: ['main']
types: [opened, reopened, synchronize, labeled]

env:
PIPELINE: '17cc0239-494f-4a68-aa75-3da7c466709c'
REPO_URL: 'https://github.com/pangeo-forge/pangeo-forge-orchestrator'

jobs:
build:
if: |
github.event.label.name == 'build-review-app' ||
contains( github.event.pull_request.labels.*.name, 'build-review-app')
runs-on: ubuntu-latest
steps:
# https://devcenter.heroku.com/articles/platform-api-reference#review-app-create
- run: |
curl -X POST https://api.heroku.com/review-apps \
-d '{
"branch": "${{ github.head_ref }}",
"pr_number": ${{ github.event.pull_request.number }},
"pipeline": "${{ env.PIPELINE }}",
"source_blob": {
"url": "${{ env.REPO_URL }}/tarball/${{ github.event.pull_request.head.sha }}",
"version": "${{ github.event.pull_request.head.sha }}"
}
}' \
-H "Content-Type: application/json" \
-H "Accept: application/vnd.heroku+json; version=3" \
-H "Authorization: Bearer ${{ secrets.HEROKU_API_KEY }}"
32 changes: 32 additions & 0 deletions .github/workflows/delete-review-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Delete Review App

on:
pull_request:
branches: ['main']
types: [unlabeled]

env:
PIPELINE: '17cc0239-494f-4a68-aa75-3da7c466709c'

jobs:
delete:
if: |
github.event.label.name == 'build-review-app'
runs-on: ubuntu-latest
steps:
- name: Get review app id & export to env
run: |
curl -s https://api.heroku.com/pipelines/${{ env.PIPELINE }}/review-apps \
-H "Accept: application/vnd.heroku+json; version=3" \
-H "Authorization: Bearer ${{ secrets.HEROKU_API_KEY }}" \
| python3 -c "
import sys, json;
j = json.load(sys.stdin);
print('REVIEW_APP_ID=' + [app['id'].strip() for app in j if app['pr_number'] == ${{ github.event.pull_request.number }}].pop(0))
" >> $GITHUB_ENV
- name: Delete review app
run: |
curl -X DELETE https://api.heroku.com/review-apps/${{ env.REVIEW_APP_ID }} \
-H "Content-Type: application/json" \
-H "Accept: application/vnd.heroku+json; version=3" \
-H "Authorization: Bearer ${{ secrets.HEROKU_API_KEY }}"
184 changes: 184 additions & 0 deletions .github/workflows/test-dataflow-integration.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
name: Test Dataflow Integration

on:
deployment_status:
# TODO: add on 'schedule' against staging deployment?
pull_request:
branches: ['main']
types: [labeled]

jobs:
matrix-generate-prs:
# Generates the matrix of reference prs to test against. Compare:
# - https://blog.aspect.dev/github-actions-dynamic-matrix
# - https://github.com/aspect-build/bazel-lib/blob/
# 0c8ef86684d5a3335bb5e911a51d64e5fab39f9b/.github/workflows/ci.yaml
runs-on: ubuntu-latest
steps:
- id: default
run: echo "pr=22::gpcp-from-gcs" >> $GITHUB_OUTPUT

- id: also-test-from-deployment-status
if: |
github.event_name == 'deployment_status'
run: |
export ENVIRONMENT=${{ github.event.deployment_status.environment }} \
&& python3 -c "
import os; print(os.environ['ENVIRONMENT'].split('-')[-1])" \
| xargs -I{} curl -s ${{ github.event.deployment_status.repository_url }}/pulls/{} \
| python3 -c "
import json, sys;
labels = json.load(sys.stdin)['labels'];
also_test = [
l['name'].split('also-test:')[-1] for l in labels if l['name'].startswith('also-test')
]
if also_test:
for label in also_test:
print(f'pr={label}')
" >> $GITHUB_OUTPUT

- id: also-test-from-pull-request
if: |
github.event_name == 'pull_request'
&& contains( join(github.event.pull_request.labels.*.name), 'also-test')
run: |
python3 -c "
import json;
labels = json.loads('${{ toJSON(github.event.pull_request.labels.*.name) }}')
also_test = [l.split('also-test:')[-1] for l in labels if l.startswith('also-test')]
if also_test:
for label in also_test:
print(f'pr={label}')
" >> $GITHUB_OUTPUT
outputs:
# Will look like '["22::gpcp-from-gcs", etc...]'
prs: ${{ toJSON(steps.*.outputs.pr) }}

test:
# run when:
# - a PR is labeled 'test-dataflow'
# (assuming it is also labeled 'build-review-app'
# *and* the deployment for the head sha is a success)
# - heroku marks a deployment with 'state' == 'success'
# (assuming PR also has 'test-dataflow' label)
runs-on: ubuntu-latest

needs:
- matrix-generate-prs

strategy:
fail-fast: false
matrix:
prs: ${{ fromJSON(needs.matrix-generate-prs.outputs.prs) }}

steps:
# conditional step if triggering event is a pull_request
- name: Maybe set REVIEW_APP_URL and DEPLOYMENT_STATE from pull_request
if: |
github.event_name == 'pull_request'
&& github.event.label.name == 'test-dataflow'
&& contains( github.event.pull_request.labels.*.name, 'build-review-app')
# if we get here, this is a pull request, so we need to know the statuses url
# for the deployment associated with the head sha. we use the **base** repo
# deployments url, and look for deployments associated with pr's head sha.
# (the head repo deployments url would cause errors, if the pr is from a fork.)
run: |
export DEPLOYMENTS_URL=\
${{ github.event.pull_request.base.repo.deployments_url }}\
\?environment\=pforge-pr-${{ github.event.pull_request.number }}\
\&sha\=${{ github.event.pull_request.head.sha }}
curl -s $DEPLOYMENTS_URL \
| python3 -c "
import sys, json; print(json.load(sys.stdin)[0]['statuses_url'])" \
| xargs -I{} curl -s {} \
| python3 -c "
import sys, json;
d = json.load(sys.stdin)[-1];
print('TEST_DATAFLOW=True');
print('DEPLOYMENT_STATE=' + d['state']);
print('REVIEW_APP_URL=' + d['environment_url']);" \
>> $GITHUB_ENV

# conditional step if triggering event is deployment_status
- name: Maybe set REVIEW_APP_URL and DEPLOYMENT_STATE from deployment_status
if: |
github.event_name == 'deployment_status'
# if we're here, we know this is a deployment_status event, but we don't know whether or not
# the PR has the 'test-dataflow' label. (it's possible the PR *only* has the 'build-review-app'
# label, but not the 'test-dataflow' label, in which case we do not want to deploy a dataflow job.
# so before we do anything else, we need to make sure this PR is labeled 'test-dataflow'.
# note that the github deployment "environments" for our review apps are named according to the
# convention "pforge-pr-${NUMBER}". so our most direct path to get the PR number from the deployment
# status event is to parse the PR number out of this string.
run: |
export ENVIRONMENT=${{ github.event.deployment_status.environment }} \
&& python3 -c "
import os; print(os.environ['ENVIRONMENT'].split('-')[-1])" \
| xargs -I{} curl -s ${{ github.event.deployment_status.repository_url }}/pulls/{} \
| python3 -c "
import json, sys;
labels = json.load(sys.stdin)['labels'];
print('TEST_DATAFLOW=' + str(True if any([l['name'] == 'test-dataflow' for l in labels]) else False));
print('REVIEW_APP_URL=' + '${{ github.event.deployment_status.environment_url }}');
print('DEPLOYMENT_STATE=' + '${{ github.event.deployment_status.state }}');" \
>> $GITHUB_ENV

- name: Is app up?
if: ${{ env.DEPLOYMENT_STATE == 'success' }}
# Heroku updates deployment as 'success' when build succeedes, not when *release* succeedes.
# So there is actually still a latency between when this status is set, and when the review app
# is ready to receive requests. In general, the review apps take about 3 minutes to release.
# So here we wait 2 minutes, then start checking if the app is up, repeating every 30 seconds
# until it's either up, or if > 10 mins have elapsed, something's gone wrong, so we bail out.
run: |
python3 -c "
import sys, time;
from urllib.request import urlopen;
start = time.time();
time.sleep(60 * 2);
while True:
elapsed = time.time() - start;
if elapsed > 60 * 10:
# releases shouldn't take > 10 mins; something's gone wrong, so exit.
sys.exit(1)
contents = urlopen('${{ env.REVIEW_APP_URL }}').read().decode()
if contents == '{\"status\":\"ok\"}':
# if we get this response from the review app, it's up and ready to go.
print('IS_UP=True')
break
else:
time.sleep(30)" \
>> $GITHUB_ENV

- name: Checkout the repo
uses: actions/checkout@v3

- name: Install deps
run: |
python3 -m pip install aiohttp PyJWT pydantic pytest pytest-asyncio gidgethub

- name: 'Authenticate to Google Cloud'
uses: 'google-github-actions/auth@v1'
with:
# the creds to deploy jobs to dataflow are packaged with the review app itself, but
# this test needs its own read only creds so that it can poll dataflow for job status
credentials_json: '${{ secrets.GCP_DATAFLOW_READONLY_SERVICE_KEY }}'

- name: Run test
if: |
env.DEPLOYMENT_STATE == 'success'
&& env.IS_UP == 'True'
&& env.TEST_DATAFLOW == 'True'
# So far here, we:
# - programatically make a /run comment on an existing PR in pforgetest
# - check to ensure a dataflow job was submitted within a plausible timeframe
# Remaining TODO:
# - parametrize SOURCE_REPO_FULL_NAME and SOURCE_REPO_PR_NUMBER
# - wait for the job to complete (5-6 mins)
# - check to make sure the job was successful
run: |
DEV_APP_PROXY_GITHUB_APP_PRIVATE_KEY='${{ secrets.DEV_APP_PROXY_GITHUB_APP_PRIVATE_KEY }}' \
GH_WORKFLOW_RUN_ID=${{ github.run_id }} \
PR_NUMBER_AND_RECIPE_ID=${{ matrix.prs }} \
REVIEW_APP_URL=${{ env.REVIEW_APP_URL }} \
pytest -vxs tests.integration/test_dataflow.py
24 changes: 4 additions & 20 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,25 +39,9 @@ RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.
&& curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /usr/share/keyrings/cloud.google.gpg \
&& apt-get update && apt-get -y install google-cloud-cli

COPY requirements.txt ./
RUN python3.9 -m pip install -r requirements.txt

COPY . /opt/app
WORKDIR /opt/app

# heroku can't fetch submodule contents from github:
# https://devcenter.heroku.com/articles/github-integration#does-github-integration-work-with-git-submodules
# so even though we have this in the repo (for development & testing convenience), we actually .dockerignore
# it, and then clone it from github at build time (otherwise we don't actually get these contents on heroku)
# After cloning, reset to a specific commit, so we don't end up with the wrong contents.
# Install git, for fetching submodule contents in Dockerfile.heroku
RUN apt-get update && apt-get -y install git
RUN git clone -b main --single-branch https://github.com/pangeo-forge/dataflow-status-monitoring \
&& cd dataflow-status-monitoring \
&& git reset --hard c72a594b2aea5db45d6295fadd801673bee9746f \
&& cd -

# the only deploy-time process which needs pangeo_forge_orchestrator installed is the review app's
# `postdeploy/seed_review_app_data.py`, but this shouldn't interfere with anything else.
RUN SETUPTOOLS_SCM_PRETEND_VERSION=0.0 pip install . --no-deps

RUN chmod +x scripts.deploy/release.sh
# Install pip requirements, a time-consuming step!
COPY requirements.txt ./
RUN python3.9 -m pip install -r requirements.txt
21 changes: 21 additions & 0 deletions Dockerfile.heroku
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
FROM pangeo/forge-orchestrator:latest

COPY . /opt/app
WORKDIR /opt/app

# heroku can't fetch submodule contents from github:
# https://devcenter.heroku.com/articles/github-integration#does-github-integration-work-with-git-submodules
# so even though we have this in the repo (for development & testing convenience), we actually .dockerignore
# it, and then clone it from github at build time (otherwise we don't actually get these contents on heroku)
# After cloning, reset to a specific commit, so we don't end up with the wrong contents.
RUN apt-get update && apt-get -y install git
RUN git clone -b main --single-branch https://github.com/pangeo-forge/dataflow-status-monitoring \
&& cd dataflow-status-monitoring \
&& git reset --hard c72a594b2aea5db45d6295fadd801673bee9746f \
&& cd -

# the only deploy-time process which needs pangeo_forge_orchestrator installed is the review app's
# `postdeploy/seed_review_app_data.py`, but this shouldn't interfere with anything else.
RUN SETUPTOOLS_SCM_PRETEND_VERSION=0.0 pip install . --no-deps

RUN chmod +x scripts.deploy/release.sh
4 changes: 3 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ services:
web:
# For platform spec, see https://stackoverflow.com/a/70238851
platform: linux/amd64
build: .
build:
context: .
dockerfile: Dockerfile.heroku
ports:
- '3000:8000'
depends_on:
Expand Down
Loading