Skip to content

Latest commit

 

History

History
48 lines (34 loc) · 1.96 KB

dataflow.md

File metadata and controls

48 lines (34 loc) · 1.96 KB

Intro

In order to stream Stackdriver logs from GCP into Splunk we use GCP dataflow job implemented at this repository, see this guide for the official documentation (see caveats for additional issues we have resolved).

Usage

Create an HEC endpoint in Splunk get its token and place it under (name should be the same in all the steps that follow):

secrets/<name>/splunk-hec-token-plaintext

Encrypt the token using:

export REGION_ID=us-central1
gcloud kms encrypt \
        --key=hec-token-key \
        --keyring=export-keys \
        --location=$REGION_ID \
        --plaintext-file=./splunk-hec-token-plaintext \
        --ciphertext-file=./splunk-hec-token-encrypted

In order to create a new dataflow pipeline from a GCP logs filter (see filters that we use in our dataflow jobs):

./pipeline/management.py create --name=<name> --project=[your project] --filter='<stackdrive log filter>'

Now we can lauch our dataflow pipeline:

# Standard udf template
./pipeline/run.py template --project='[your project]' --name=<name> --token=<name> --transform='gs://splk-public/js/dataflow_udf_messages_replay.js' --function='process' --release=latest
# Custom udf template
./pipeline/run.py template --project='[your project]' --name=<name> --token=<name> --transform='gs://[your-bucket]/[your udf].js' --function='[fn name]' --release=latest
# Specificy the specific version of the dataflow release we use
./pipeline/run.py template --project='[your project]' --name=<name> --token=<name> --release=2021-07-26-00_RC00

In order view the available job releases:

gsutil ls "gs://dataflow-templates/2021-*" | grep Splunk$