Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update performance tests #14289

Merged
merged 27 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3340c28
Update performance tests
ReToCode Aug 16, 2023
8505cc0
add updated grafana dashboard
ReToCode Aug 28, 2023
5c37198
add better logging
ReToCode Sep 1, 2023
cfcc4fc
make sure results are always calculated
ReToCode Sep 1, 2023
4b8f3b2
use existing Knative grafana instance
ReToCode Sep 4, 2023
b284e43
add local grafana setup instructions
ReToCode Sep 4, 2023
cd9d9aa
Add real traffic test
ReToCode Sep 5, 2023
fe4fe74
Add datasource variable in grafana dashboard
ReToCode Sep 5, 2023
d152c01
Update grafana URL
ReToCode Sep 5, 2023
7de87de
Drop old grafana file
ReToCode Sep 5, 2023
47e6176
Add new test to simulate real traffic
ReToCode Sep 6, 2023
652147c
Minor improvements
ReToCode Sep 6, 2023
a7e9740
Create service in code instead of using YAML
ReToCode Sep 6, 2023
2d95fc0
Add function to wrap reporting vegeta.Metrics to influxdb
ReToCode Sep 6, 2023
fb71526
Update grafana dashboard
ReToCode Sep 6, 2023
08583b7
add from + to filter to grafana link
ReToCode Sep 6, 2023
200d7b4
fix pointers
ReToCode Sep 6, 2023
22f2f95
make `knative.dev/serving/test/v1` work with `injection.ParseAndGetRE…
ReToCode Sep 7, 2023
81cc289
minor test fixes
ReToCode Sep 7, 2023
78d7dc0
use milliseconds for grafana url params
ReToCode Sep 8, 2023
a6cce14
fix review dog warnings
ReToCode Sep 8, 2023
f227e12
use BUILD_ID and JOB_NAME to identify a CI job
ReToCode Sep 11, 2023
83315a9
drop unnecessary type conversions
ReToCode Sep 13, 2023
e6ee641
Run `hack/update-deps.sh`
ReToCode Sep 13, 2023
6d06e81
use `envsubst` instead of `sed`
ReToCode Sep 13, 2023
6091c00
Review fixes
ReToCode Sep 14, 2023
847a398
Wait for all KSVC to be deleted before starting the next job
ReToCode Sep 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ require (
github.com/google/go-containerregistry v0.13.0
github.com/google/go-containerregistry/pkg/authn/k8schain v0.0.0-20230209165335-3624968304fd
github.com/google/gofuzz v1.2.0
github.com/google/mako v0.0.0-20190821191249-122f8dcef9e3
github.com/gorilla/websocket v1.5.0
github.com/hashicorp/golang-lru v1.0.2
github.com/influxdata/influxdb-client-go/v2 v2.9.0
Expand Down Expand Up @@ -95,13 +94,10 @@ require (
github.com/go-openapi/swag v0.22.3 // indirect
github.com/gobuffalo/flect v1.0.2 // indirect
github.com/golang-jwt/jwt/v4 v4.4.2 // indirect
github.com/golang/glog v1.1.0 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/gnostic v0.6.9 // indirect
github.com/google/go-containerregistry/pkg/authn/kubernetes v0.0.0-20230209165335-3624968304fd // indirect
github.com/google/go-github/v27 v27.0.6 // indirect
github.com/google/go-querystring v1.0.0 // indirect
github.com/google/s2a-go v0.1.7 // indirect
github.com/google/uuid v1.3.1 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.2.5 // indirect
Expand Down
7 changes: 0 additions & 7 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,6 @@ github.com/golang-jwt/jwt/v4 v4.4.2 h1:rcc4lwaZgFMCZ5jxF9ABolDcIHdBytAFgqFPbSJQA
github.com/golang-jwt/jwt/v4 v4.4.2/go.mod h1:m21LjoU+eqJr34lmDMbreY2eSTRJ1cv77w39/MY0Ch0=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
github.com/golang/glog v1.1.0 h1:/d3pCKDPWNnvIWe0vVUpNP32qc8U3PDVxySP/y360qE=
github.com/golang/glog v1.1.0/go.mod h1:pfYeQZ3JWZoXTV5sFc986z3HTpwQs9At6P4ImfuP3NQ=
github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/golang/groupcache v0.0.0-20191227052852-215e87163ea7/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
Expand Down Expand Up @@ -283,16 +282,10 @@ github.com/google/go-containerregistry/pkg/authn/k8schain v0.0.0-20230209165335-
github.com/google/go-containerregistry/pkg/authn/k8schain v0.0.0-20230209165335-3624968304fd/go.mod h1:x5fIlj5elU+/eYF60q4eASMQ9kDc+GMFa7UU9M3mFFw=
github.com/google/go-containerregistry/pkg/authn/kubernetes v0.0.0-20230209165335-3624968304fd h1:AQZlI371LcvBYY/7Q55TjxrpZJs6wtEXMw4Wq38XLy8=
github.com/google/go-containerregistry/pkg/authn/kubernetes v0.0.0-20230209165335-3624968304fd/go.mod h1:6pjZpt+0dg+Z0kUEn53qLtD57raiZo/bqWzsuX6dDjo=
github.com/google/go-github/v27 v27.0.6 h1:oiOZuBmGHvrGM1X9uNUAUlLgp5r1UUO/M/KnbHnLRlQ=
github.com/google/go-github/v27 v27.0.6/go.mod h1:/0Gr8pJ55COkmv+S/yPKCczSkUPIM/LnFyubufRNIS0=
github.com/google/go-querystring v1.0.0 h1:Xkwi/a1rcvNg1PPYe5vI8GbeBY/jrVuDX5ASuANWTrk=
github.com/google/go-querystring v1.0.0/go.mod h1:odCYkC5MyYFN7vkCjXpyrEuKhc/BUO6wN/zVPAxq5ck=
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/gofuzz v1.1.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0=
github.com/google/gofuzz v1.2.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/mako v0.0.0-20190821191249-122f8dcef9e3 h1:/o5e44nTD/QEEiWPGSFT3bSqcq3Qg7q27N9bv4gKh5M=
github.com/google/mako v0.0.0-20190821191249-122f8dcef9e3/go.mod h1:YzLcVlL+NqWnmUEPuhS1LxDDwGO9WNbVlEXaF4IH35g=
github.com/google/martian v2.1.0+incompatible/go.mod h1:9I4somxYTbIHy5NJKHRl3wXiIaQGbYVAs8BPL6v8lEs=
github.com/google/martian/v3 v3.0.0/go.mod h1:y5Zk1BBys9G+gd6Jrk0W3cC1+ELVxBWuIGO+w/tUAp0=
github.com/google/pprof v0.0.0-20181206194817-3ea8567a2e57/go.mod h1:zfwlbNMJ+OItoe0UupaVj+oy1omPYYDuagoSzA8v9mc=
Expand Down
3 changes: 0 additions & 3 deletions hack/tools.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,6 @@ import (
// Migration job.
_ "knative.dev/pkg/apiextensions/storageversion/cmd/migrate"

// Mako stub
_ "knative.dev/pkg/test/mako/stub-sidecar"
ReToCode marked this conversation as resolved.
Show resolved Hide resolved
ReToCode marked this conversation as resolved.
Show resolved Hide resolved

_ "k8s.io/code-generator/cmd/client-gen"
_ "k8s.io/code-generator/cmd/deepcopy-gen"
_ "k8s.io/code-generator/cmd/defaulter-gen"
Expand Down
19 changes: 0 additions & 19 deletions test/config/ytt/performance/influx/influx-secret.yaml

This file was deleted.

5 changes: 0 additions & 5 deletions test/config/ytt/performance/kperf-test-namespace.yaml

This file was deleted.

11 changes: 0 additions & 11 deletions test/config/ytt/performance/overlay-activator-min-replicas.yaml

This file was deleted.

8 changes: 0 additions & 8 deletions test/config/ytt/performance/overlay-config-autoscaler.yaml

This file was deleted.

10 changes: 0 additions & 10 deletions test/config/ytt/performance/overlay-config-network.yaml

This file was deleted.

1 change: 1 addition & 0 deletions test/conformance.go
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ const (
Timeout = "timeout"
Volumes = "volumes"
WorkingDir = "workingdir"
SlowStart = "slowstart"

// Constants for test image output.
PizzaPlanetText1 = "What a spaceport!"
Expand Down
9 changes: 0 additions & 9 deletions test/e2e-common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ export ENABLE_HA=${ENABLE_HA:-0}
export ENABLE_TLS=${ENABLE_TLS:-0}
export MESH=${MESH:-0}
export AMBIENT=${AMBIENT:-0}
export PERF=${PERF:-0}
export KIND=${KIND:-0}
export CLUSTER_DOMAIN=${CLUSTER_DOMAIN:-cluster.local}

Expand Down Expand Up @@ -131,10 +130,6 @@ function parse_flags() {
readonly MESH=0
return 1
;;
--perf)
ReToCode marked this conversation as resolved.
Show resolved Hide resolved
readonly PERF=1
return 1
;;
--enable-ha)
readonly ENABLE_HA=1
return 1
Expand Down Expand Up @@ -312,10 +307,6 @@ function install() {
YTT_FILES+=("${REPO_ROOT_DIR}/test/config/ytt/ha")
fi

if (( PERF )); then
YTT_FILES+=("${REPO_ROOT_DIR}/test/config/ytt/performance")
fi

if (( KIND )); then
YTT_FILES+=("${REPO_ROOT_DIR}/test/config/ytt/kind/core")
fi
Expand Down
File renamed without changes.
105 changes: 87 additions & 18 deletions test/performance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Knative performance tests are tests geared towards producing useful performance
metrics of the knative system. As such they can choose to take a closed-box
point-of-view of the system and use it just like an end-user might see it. They
can also go more open-boxy to narrow down the components under test.
can also go more open-box to narrow down the components under test.
ReToCode marked this conversation as resolved.
Show resolved Hide resolved

## Load Generator

Expand All @@ -15,32 +15,101 @@ different rate, you can write your own pacer by implementing
interface. Custom pacer implementations used in Knative tests are under
[pacers](https://github.com/knative/pkg/tree/main/test/vegeta/pacers).

## Benchmarking using Mako

The benchmarks were originally built to use [mako](https://github.com/google/mako), but currently
running without connecting to the Mako backend, and collecting the data using
a Mako sidecar stub.
## Testing architecture

### Run without Mako
The performance tests are based on Kubernetes Jobs running Golang code based on different [benchmarks](#benchmarks).
The script [performance-tests.sh](./performance-tests.sh) first creates a cluster in GKE, installs Serving with specific settings
ReToCode marked this conversation as resolved.
Show resolved Hide resolved
for the performance tests. Then it installs the required Knative Services and runs the testing jobs.

To run a benchmark once, and use the result from `mako-stub` for plotting:
The results are written to:
* Stdout
* A logfile in a folder defined in env: `$ARTIFACTS`
* To an InfluxDB hosted in the knative-community GKE project: https://github.com/knative/infra/tree/main/infra/k8s/shared
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason why InfluxDB was chosen last year was because it could store metrics and had an ok dashboard for viewing the results.

Given the grafana work you've done I wonder if it makes sense to switch out the store to prometheus so we could then include our monitoring dashboards which include control plane metrics

https://github.com/knative-extensions/monitoring

Probably worth doing this in a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I assumed that as well. I first tried to do the dashboards in influx, but that is very limited and also pretty slow if you add multiple graphs. So I'm in favour of

Probably worth doing this in a follow up

/cc @skonto. Maybe we could use this downstream as well (instead of ElasticSearch directly?)


1. Start the benchmarking job:

`ko apply -f test/performance/benchmarks/deployment-probe/continuous/benchmark-direct.yaml`
## Grafana

1. Wait until all the pods with the selector equal to the job name are completed.
To better visualize the test results, Grafana is used to show the results in a dashboard.
The dashboard is defined in [grafana-dashboard.json](./visualization/grafana-dashboard.json) and
hosted on [grafana.knative.dev](https://grafana.knative.dev/d/igHJ5-fdk/knative-serving-performance-tests?orgId=1)

1. Retrieve results from mako-stub using the script in where `pod_name` is the name of the pod from the previous step.

`read_results.sh "$pod_name" "$pod_namespace" ${mako_port:-10001} ${timeout:-120} ${retries:-100} ${retries_interval:-10} "$output_file"`
## Benchmarks

This will download a CSV with all raw results. Alternatively you can remove
the port argument `-p` in `mako-stub` container to dump the output to
container log directly.
Knative Serving has different benchmarking scenarios:

**Note:** Running `performance-tests-mako.sh` creates a cluster and runs all the benchmarks in sequence. Results are downloaded in a temp folder
* [dataplane-probe](./benchmarks/dataplane-probe): Measures the overhead Knative has compared to a `Deployment`
* [load-test](./benchmarks/load-test): Measures request metrics for Knative Services under load in different scenarios (Activator always in path, Activator only in path at zero, Activator moving out of path on high-load)
* [real-traffic-test](./benchmarks/real-traffic-test): Simulates realistic traffic with random request latency, service startup latency and payload sizes.
* [reconciliation-delay](./benchmarks/reconciliation-delay): Measures the time it takes to reconcile a `KnativeService` and its child CRs.
* [rollout-probe](./benchmarks/rollout-probe): Measures request metrics for a rolling update of a scaled `KnativeService`.
* [scale-from-zero](./benchmarks/scale-from-zero): Measures the latency of scaling 1, 5, 25 and 100 Knative Services from zero in parallel.

### Benchmarking using Kperf

Running `performance-tests.sh` runs performance tests using [kperf](https://github.com/knative-extensions/kperf)
## Running the benchmarks

### Local InfluxDB setup

You first need a local running instance of InfluxDB.

Note: if you don't have or don't want to use helm, you can also [install InfluxDB with YAMLs](https://docs.influxdata.com/influxdb/v2.7/install/?t=Kubernetes).

```bash
# Create an InfluxDB with helm
helm repo add influxdata https://helm.influxdata.com/
ReToCode marked this conversation as resolved.
Show resolved Hide resolved
kubectl create ns influx
helm upgrade --install -n influx local-influx --set persistence.enabled=true,persistence.size=50Gi influxdata/influxdb2
echo "Admin password"
echo $(kubectl get secret local-influx-influxdb2-auth -o "jsonpath={.data['admin-password']}" --namespace influx | base64 --decode)
echo "Admin token"
echo $(kubectl get secret local-influx-influxdb2-auth -o "jsonpath={.data['admin-token']}" --namespace influx | base64 --decode)

# Forward the InfluxDB service to your laptop if you want to access the UI:
kubectl port-forward -n influx svc/local-influx-influxdb2 8080:80

# Set up the expected influxdb config
export INFLUX_URL=http://localhost:8080
export INFLUX_TOKEN=$(kubectl get secret local-influx-influxdb2-auth -o "jsonpath={.data['admin-token']}" --namespace influx | base64 --decode)

# Run the script to initialize the Organization + Buckets in InfluxDB
./visualization/setup-influx-db.sh
```

### Local grafana dashboards

Use an existing grafana instance or create one on your cluster, [see docs](https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/).

To use our InfluxDB as a datasource for Grafana
* Navigate to Grafana UI and log in using the user from the installation
* Create a new datasource for InfluxDB
* Select the flux query language
* Server-URL: http://local-influx-influxdb2.influx:80 (Note: this could be different if your grafana instance is hosted outside the cluster)
* Organization: Knativetest
* Bucket: knative-serving
* Token: <your influx-db token>


### Local development

You can run all the benchmarks directly by calling the `main()` method in `main.go` in the respective [benchmarks](./benchmarks) folders.

#### Environment

The tests expect to be configured with certain environment variables:

* KO_DOCKER_REPO = What you have set for `ko`
* SYSTEM_NAMESPACE = Where knative-serving is installed, typically `knative-serving`
* INFLUX_URL = http://local-influx-influxdb2.influx:80
* INFLUX_TOKEN = as outputted from the command above
* BUILD_ID=local
* JOB_NAME=local


### Running them on cluster

Check out what the [script](./performance-tests.sh) does. Basically just run:

```bash
envsubst < your-benchmark-job.yaml | ko apply --sbom=none -Bf -
```
Loading