Skip to content

Commit

Permalink
Add instructions and support for debugging an import batch
Browse files Browse the repository at this point in the history
  • Loading branch information
jessemortenson committed Nov 18, 2024
1 parent fef737e commit afd0a4d
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 4 deletions.
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,44 @@ aws s3api put-bucket-lifecycle-configuration --bucket openstates-realtime-bills
]
}'
```

## Debug a realtime processing import failure

If realtime data processing fails for a particular jurisdiction batch, you'll see a message in the AWS Lambda logs
that looks like this:

> 17:37:32 ERROR root: Error importing jurisdiction ocd-jurisdiction/country:us/government, stored snapshot of
> import dir as archive/usa-2024-11-18T17:37:31.789982.zip, error: get() returned more than one VoteEvent --
> it returned 2!
The error message will of course vary, but the "Error importing jurisdiction" text is a key you can look for regardless.
As the message indicates, the code now stores a zipfile containing the objects that were included in the juris-batch
for which import was attempted. In this case the file is available within the realtime processing S3 bucket at the
obejct prefix `archive/usa-2024-11-18T17:37:31.789982.zip`.

This file is an import failure snapshot, and consists of pertaining to one specific jurisdiction import batch. The
entrypoint function does the work of collecting a bunch of SQS messages, sorting data from those into jurisdictions,
and then executing import for individual jurisdiction (sub) batches. These instructions are for debugging one particular
snapshot.

To debug this failure locally, follow these steps:

1. Download the file to your machine
2. Unzip the file so that its contents occupy a unique directory, i.e. `unzip -d usa-2024-11-18T17:37:31.789982 usa-2024-11-18T17:37:31.789982.zip`
3. Decide if you are testing the import into a LOCAL database, or into the AWS Open States database.
4. Prepare parameters for running `app.py`: `do_import`, `{os_jurisdiction_id}`, `{path_to_snapshot_folder}`
* `do_import` this tells app.py to run the `do_import` function, rather than exec the whole entrypoint function
* `{os_jurisdiction_id}` in the form `ocd-jurisdiction/country:us/state:ma/government`
* `{path_to_snapshot_folder}` which should be a path to the folder you unzipped into, like
`/path/to/usa-2024-11-18T17:37:31.789982`
5. Prepare environment variables for running `app.py`
* `AWS_DEFAULT_PROFILE`: necessary if importing into the AWS Open States database, to access parameter store
* `STAGE_PARAM_NAME=/passwords/os_realtime_bill`: necessary if importing into the AWS OS DB
* `POSTGRES_HOST`: hostname for the database into which you will import
* `POSTGRES_NAME`: name of the postgres database
* `POSTGRES_USER`: name of the postgres database user account
* `DJANGO_SETTINGS_MODULE=settings`
6. Execute the script with the above parameters and env vars

You should now be able to reproduce whatever error occurred during the original import of that batch, including the
ability to step it through the debugger.
18 changes: 15 additions & 3 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import os
import shutil
import signal
import sys
import urllib.parse
from django.db import transaction # type: ignore
from openstates.utils.instrument import Instrumentation
Expand Down Expand Up @@ -327,7 +328,7 @@ def do_atomic(func, datadir, type_):
logger.info(f"Running {type_}...")
try:
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(120)
signal.alarm(360)
return func(datadir)
except TimeoutError:
logger.error(f"Timeout running {type_}")
Expand Down Expand Up @@ -385,6 +386,17 @@ def do_import(jurisdiction_id: str, datadir: str) -> None:
)


# run process_import_function if main
# run process_import_function or do_import
if __name__ == "__main__":
process_import_function({}, {})
# pass arguments to run do_import() process, for reproducing import errors locally
# poetry run python app.py do_import {os_jurisdiction_id} {path_to_snapshot_folder}
# e.g. poetry run python app.py do_import
# "ocd-jurisdiction/country:us/state:ma/government" "/path/to/ma-2024-11-15T12_00_59.748666"
if sys.argv[1] == "do_import":
# we need to set up django here, because we're running locally and don't have zappa packaging
import django
django.setup()
do_import(sys.argv[2], sys.argv[3])
else:
# just run the entrypoint function, if no args present
process_import_function({}, {})
2 changes: 1 addition & 1 deletion settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
parameter = ssm.get_parameter(
Name=os.environ.get("STAGE_PARAM_NAME"), WithDecryption=True
)
POSTGRES_PASSWORD = parameter["Parameter"]["Value"]
POSTGRES_PASSWORD = os.environ.get("POSTGRES_PASSWORD", parameter["Parameter"]["Value"])


# django settings for tests
Expand Down

0 comments on commit afd0a4d

Please sign in to comment.