Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(helm): No more retry if jobs failed beyond BackoffLimit #13364

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dayeguilaiye
Copy link

What this PR does / why we need it:
This PR closes #13317
Special notes for your reviewer:

If applicable:

  • this PR contains user facing changes (the docs needed label should be applied if so)
  • this PR contains unit tests
  • this PR has been tested for backwards compatibility

@pull-request-size pull-request-size bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 2, 2024
pkg/kube/ready.go Outdated Show resolved Hide resolved
@dayeguilaiye
Copy link
Author

@robertsirc I fixed the lint problem, could you review again?

@robertsirc
Copy link
Contributor

Hi @dayeguilaiye i pulled this down and ran all the unit test, and a few of my smoke test. What I think we need is a unit test around this functionality. Is there a way we can add in some test coverage?

@dayeguilaiye
Copy link
Author

@robertsirc I found that there is no any test for wait.go, and test for wait may waste so much time when doing unit test so I didn't add any test.

But I found that there are some test for ready.go, and I should test that when Failed > BackoffLimit, it should return an ErrNotRetryError, I'll add it.

Thanks for your reviewing and advise.

@pull-request-size pull-request-size bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 10, 2024
@dayeguilaiye
Copy link
Author

@robertsirc I've added it as I said. Could you review again?

@robertsirc
Copy link
Contributor

Also you might want to look at latest I did a bit of work in writing up more unit test for coverage.

@@ -91,6 +93,10 @@ func (w *waiter) isRetryableError(err error, resource *resource.Info) bool {
w.log("Status code received: %d. Retryable error? %t", statusCode, retryable)
return retryable
}
if errors.Is(err, ErrNoRetryError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but seems like this should be above the if statement above

@@ -91,6 +93,10 @@ func (w *waiter) isRetryableError(err error, resource *resource.Info) bool {
w.log("Status code received: %d. Retryable error? %t", statusCode, retryable)
return retryable
}
if errors.Is(err, ErrNoRetryError) {
w.log("The error is a NoRetryError, Retryable err? %t", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much value in this log to me

@@ -38,6 +38,8 @@ import (
"k8s.io/apimachinery/pkg/util/wait"
)

var ErrNoRetryError = errors.New("this error will stop retry")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something along the lines of "retry limit exceeded waiting on resource"

@gzb1128
Copy link

gzb1128 commented Nov 17, 2024

I want to fix this issue, if @dayeguilaiye is not taking progress, may I make a new PR to fix it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--wait-for-jobs - the isRetryableError() returns false positive for Job failures.
5 participants