Service endpoints are not updated / removed after upgrade to Kubernetes 1.28 #15510

mbrancato · 2024-09-14T18:14:41Z

What version of Knative?

0.15.2

Expected Behavior

endpoints should update properly

Actual Behavior

Endpoints for a service are not getting updated on scale down operation or pod deletes. This leaves a lot of incorrect values in the endpoints. The propagates to the public service as well.

% kubectl -n detection get endpoints my-app-00112-private
NAME                      ENDPOINTS                                                              AGE
my-app-00112-private   10.32.101.40:9091,10.32.101.41:9091,10.32.101.43:9091 + 5997 more...   136m

% kubectl -n detection get deploy my-app-00112-deployment
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
my-app-00112-deployment   2/2     2            2           136m

I was able to get logs like this from SKS:

{
apiVersion: "v1"
eventTime: null
involvedObject: {
apiVersion: "networking.internal.knative.dev/v1alpha1"
kind: "ServerlessService"
name: "my-app-00112"
namespace: "detection"
resourceVersion: "6779758389"
uid: "f6ed0598-0171-43ff-bf7a-c45069fdcbe2"
}
kind: "Event"
lastTimestamp: "2024-09-14T15:38:13Z"
message: "SKS: my-app-00112 does not own Service: my-app-00112-private"
metadata: {
creationTimestamp: "2024-09-14T15:38:13Z"
managedFields: [1]
name: "my-app-00112.17f5266fbfda92c2"
namespace: "detection"
resourceVersion: "3317050884"
uid: "20dcc671-4abb-490c-aff8-7404dfdf8063"
}
reason: "InternalError"
reportingComponent: "serverlessservice-controller"
reportingInstance: ""
source: {
component: "serverlessservice-controller"
}
type: "Warning"
}
logName: "projects/my-project-92384924/logs/events"
receiveTimestamp: "2024-09-14T15:38:13.778779952Z"
resource: {
labels: {
cluster_name: "my-cluster-192132"
location: "us-central1-c"
project_id: "my-project-92384924"
}
type: "k8s_cluster"
}
severity: "WARNING"
timestamp: "2024-09-14T15:38:13Z"
}

Steps to Reproduce the Problem

This happens with all our ksvc that scale up and then down or have pods removed (via delete / evict).

The text was updated successfully, but these errors were encountered:

mbrancato · 2024-09-15T04:21:23Z

I'm pretty sure this is an upstream bug, and have opened this:
kubernetes/kubernetes#127370

In the SKS update process, it is the private service Endpoints that are feeding SKS. Is there any plan to read from EnpointSlices (stable since 1.21) and move away from the legacy Endpoints? From the docs:

The EndpointSlice API is the recommended replacement for Endpoints.

ReToCode · 2024-09-17T13:46:23Z

Yepp, seems like the upstream issue, so not much we can do here.
For EndpointSlices check the discussion here.

skonto · 2024-09-24T07:40:28Z

move away from the legacy Endpoints?

Pls check discussion here.

mbrancato · 2024-09-24T11:15:40Z

Upstream fix:
kubernetes/kubernetes#127417

DavidR91 · 2024-11-15T14:50:46Z

We've just been affected by this in our environment on knative 1.16 in Google Cloud - for reference for people experiencing this in GKE, although the current stable channel is 1.30.5, it is 1.30.6 and above that contains the fix.

(and can confirm that once the fix is in, the endpoints behave normally again)

mbrancato added the kind/bug Categorizes issue or PR as related to a bug. label Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service endpoints are not updated / removed after upgrade to Kubernetes 1.28 #15510

Service endpoints are not updated / removed after upgrade to Kubernetes 1.28 #15510

mbrancato commented Sep 14, 2024

mbrancato commented Sep 15, 2024

ReToCode commented Sep 17, 2024 •

edited

Loading

skonto commented Sep 24, 2024

mbrancato commented Sep 24, 2024

DavidR91 commented Nov 15, 2024

Service endpoints are not updated / removed after upgrade to Kubernetes 1.28 #15510

Service endpoints are not updated / removed after upgrade to Kubernetes 1.28 #15510

Comments

mbrancato commented Sep 14, 2024

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

mbrancato commented Sep 15, 2024

ReToCode commented Sep 17, 2024 • edited Loading

skonto commented Sep 24, 2024

mbrancato commented Sep 24, 2024

DavidR91 commented Nov 15, 2024

ReToCode commented Sep 17, 2024 •

edited

Loading