Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keyspace: Etcd watch can retry on Unknown gRPC errors #407

Merged
merged 1 commit into from
Nov 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions keyspace/key_space.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ import (
"go.etcd.io/etcd/api/v3/v3rpc/rpctypes"
clientv3 "go.etcd.io/etcd/client/v3"
"go.etcd.io/etcd/client/v3/mirror"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)

// A KeySpace is a local mirror of a decoded portion of the Etcd key/value space,
Expand Down Expand Up @@ -179,13 +181,20 @@ func (ks *KeySpace) Watch(ctx context.Context, client clientv3.Watcher) error {
case resp, ok := <-watchCh:
if !ok {
return ctx.Err() // Watch contract implies the context is cancelled.
} else if err := resp.Err(); err == rpctypes.ErrNoLeader {
// ErrNoLeader indicates our watched Etcd is in a partitioned
// minority. Retry, attempting to find a member in the majority.
} else if err := resp.Err(); err != nil && !resp.Canceled {
log.WithFields(log.Fields{"err": err, "attempt": attempt}).
Warn("non-terminal watch error (will continue)")
} else if status, _ := status.FromError(err); err == rpctypes.ErrNoLeader || status.Code() == codes.Unknown {
// ErrNoLeader indicates our watched Etcd is in a partitioned minority.
// Unknown gRPC status can happen during Etcd shutdowns, such as in:
//
// rpc error: code = Unknown desc = malformed header: missing HTTP content-type
//
// In both cases we restart the watch.
watchCh = nil

log.WithFields(log.Fields{"err": err, "attempt": attempt}).
Warn("watch failed (will retry)")
Warn("watch channel failed (will restart)")

select {
case <-time.After(backoff(attempt)): // Pass.
Expand Down
Loading