Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service endpoints are not updated / removed after upgrade to Kubernetes 1.28 #15510

Open
mbrancato opened this issue Sep 14, 2024 · 5 comments
Open
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mbrancato
Copy link

What version of Knative?

0.15.2

Expected Behavior

endpoints should update properly

Actual Behavior

Endpoints for a service are not getting updated on scale down operation or pod deletes. This leaves a lot of incorrect values in the endpoints. The propagates to the public service as well.

% kubectl -n detection get endpoints my-app-00112-private
NAME                      ENDPOINTS                                                              AGE
my-app-00112-private   10.32.101.40:9091,10.32.101.41:9091,10.32.101.43:9091 + 5997 more...   136m

% kubectl -n detection get deploy my-app-00112-deployment
NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
my-app-00112-deployment   2/2     2            2           136m

I was able to get logs like this from SKS:

{
apiVersion: "v1"
eventTime: null
involvedObject: {
apiVersion: "networking.internal.knative.dev/v1alpha1"
kind: "ServerlessService"
name: "my-app-00112"
namespace: "detection"
resourceVersion: "6779758389"
uid: "f6ed0598-0171-43ff-bf7a-c45069fdcbe2"
}
kind: "Event"
lastTimestamp: "2024-09-14T15:38:13Z"
message: "SKS: my-app-00112 does not own Service: my-app-00112-private"
metadata: {
creationTimestamp: "2024-09-14T15:38:13Z"
managedFields: [1]
name: "my-app-00112.17f5266fbfda92c2"
namespace: "detection"
resourceVersion: "3317050884"
uid: "20dcc671-4abb-490c-aff8-7404dfdf8063"
}
reason: "InternalError"
reportingComponent: "serverlessservice-controller"
reportingInstance: ""
source: {
component: "serverlessservice-controller"
}
type: "Warning"
}
logName: "projects/my-project-92384924/logs/events"
receiveTimestamp: "2024-09-14T15:38:13.778779952Z"
resource: {
labels: {
cluster_name: "my-cluster-192132"
location: "us-central1-c"
project_id: "my-project-92384924"
}
type: "k8s_cluster"
}
severity: "WARNING"
timestamp: "2024-09-14T15:38:13Z"
}

Steps to Reproduce the Problem

This happens with all our ksvc that scale up and then down or have pods removed (via delete / evict).

@mbrancato mbrancato added the kind/bug Categorizes issue or PR as related to a bug. label Sep 14, 2024
@mbrancato
Copy link
Author

I'm pretty sure this is an upstream bug, and have opened this:
kubernetes/kubernetes#127370

In the SKS update process, it is the private service Endpoints that are feeding SKS. Is there any plan to read from EnpointSlices (stable since 1.21) and move away from the legacy Endpoints? From the docs:

The EndpointSlice API is the recommended replacement for Endpoints.

@ReToCode
Copy link
Member

ReToCode commented Sep 17, 2024

Yepp, seems like the upstream issue, so not much we can do here.
For EndpointSlices check the discussion here.

@skonto
Copy link
Contributor

skonto commented Sep 24, 2024

move away from the legacy Endpoints?

Pls check discussion here.

@mbrancato
Copy link
Author

Upstream fix:
kubernetes/kubernetes#127417

@DavidR91
Copy link

We've just been affected by this in our environment on knative 1.16 in Google Cloud - for reference for people experiencing this in GKE, although the current stable channel is 1.30.5, it is 1.30.6 and above that contains the fix.

(and can confirm that once the fix is in, the endpoints behave normally again)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants