Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notify when host and bootstrap agent paths mismatch #3123

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jordandcarter
Copy link
Contributor

Description

During agent startup check if the bootstrap agent path matches the startup agent path. This is to avoid confusion during development when changes are made to the agent but the bootstrap agent still uses the default agent on the $PATH. Usually this results in changes being made to the agent, but then those changes not being observed when running the updated agent locally.

For example if I

  1. go build -o buildkite-agent . then
  2. ./buildkite-agent start --spawn 2 --endpoint http://agent.buildkite.localhost/v3 --token buildkite

By default the above will result in ./buildkite-agent starting up and waiting for jobs, when a job is run, if that job uploads a pipeline it will use (in my case) /opt/homebrew/bin/buildkite-agent, not ./buildkite-agent, so any changes I've made to ./buildkite-agent will not be observed for the agent pipeline upload command.

This PR updates the agent to return a error fatal: check binary paths: mismatched buildkite-agent paths: host="/Users/jordan/src/agent/buildkite-agent" bootstrap="/opt/homebrew/bin/buildkite-agent" on boot. If the host agent is a development build then the error is fatal, otherwise it is just a warning.

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go fmt ./...)

Copy link
Contributor

@patrobinson patrobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should generate a warning for production environments, although maybe it'll be useful?

I'm pretty happy with this as an approach, and will save us so much time in the future.

// update the buildkite-agent binary available from $PATH
if err := checkBinaryPaths(); err != nil {
if version.IsDevelopmentBuild() {
return fmt.Errorf("check binary paths: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be situations where we want to allow this. I think it's good to err on the side of blowing up, rather than behaving unexpectedly, but I wonder if we should have a parameter to allow it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that it's only used for development builds, i think if we need to turn it off for dev builds we can just comment out the code?

return fmt.Errorf("failed to locate bootstrap buildkite-agent: %w", err)
}

hostPath, err := os.Executable()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation points out this may return a symlink, which could cause inconsistencies. It recommends evaluating the symlink:

If a symlink was used to start the process, depending on the operating system, the result might be the symlink or the path it pointed to. If a stable result is needed, path/filepath.EvalSymlinks might help.

if version.IsDevelopmentBuild() {
return fmt.Errorf("check binary paths: %s", err)
} else {
l.Warn("check binary paths: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could create a lot of logs for customers. I wonder if it should be debug level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only be once at the start of the agent logs (not the job logs), so i think that this is probably fine.

// checkBinaryPaths looks up both the bootstrap and host buildkite-agent paths,
// and returns an error if they do not match or if either cannot be determined.
func checkBinaryPaths() error {
bootstrapPath, err := exec.LookPath("buildkite-agent")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also want to resolve this to ensure it's not a symlink, to ensure we're comparing like for like

Copy link
Contributor

@moskyb moskyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome, such a good change.

if version.IsDevelopmentBuild() {
return fmt.Errorf("check binary paths: %s", err)
} else {
l.Warn("check binary paths: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only be once at the start of the agent logs (not the job logs), so i think that this is probably fine.

// update the buildkite-agent binary available from $PATH
if err := checkBinaryPaths(); err != nil {
if version.IsDevelopmentBuild() {
return fmt.Errorf("check binary paths: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that it's only used for development builds, i think if we need to turn it off for dev builds we can just comment out the code?

Copy link
Contributor

@moskyb moskyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops! wrong review

@jordandcarter
Copy link
Contributor Author

jordandcarter commented Dec 11, 2024

I'm not sure if we should generate a warning for production environments, although maybe it'll be useful?

Yeah I wasn't sure either but was thinking if it ever did log the warning in production that would be super informative to the user, then I thought of the following scenario too.

A production environment in this context includes a user downloading a new agent binary directly to test out a new feature locally. They forgot they installed the agent via brew 12 months ago and during their testing they'll run into the same confusion as us. In this case it will only raise the warning, which is the least disruptive approach while still giving them a clue to whats wrong.

@DrJosh9000
Copy link
Contributor

I think the overall approach is good.

Though at some point I want to get #1816 on the frontburner again...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants