Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory artifact paths #1571

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jsoo1
Copy link

@jsoo1 jsoo1 commented Mar 11, 2022

Closes #1570

Previously, only glob patterns were allowed in artifact_paths. The name was confusing and also made certain patterns impossible. In particular, in a common bash script, it is impossible to unambiguously "unglob" a string. For instance, in a shell script with $BUILDKITE_ARTIFACT_PATHS=foo/**/* and $PWD in an empty directory, the following would create a directory named literally foo/**/* (assuming no globbing shell options were set):

mkdir $BUILDKITE_ARTIFACT_PATHS

Even if you were to try one of the globbing related shell options, there are none that would accomplish the desired effect.

Of course, the workaround would be to "just" mkdir foo in the shell script. However this breaks the decoupling between buildkite and the script as the buildkite configuration must "know" about the shell script and vice versa.

This change also makes new things possible.

One example is the following in a shell script command step:

make prefix=$BUILDKITE_ARTIFACT_PATHS

Which is a common idiom in many build systems.

@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch 2 times, most recently from 3b56f7f to bd82ef5 Compare March 11, 2022 07:09
@jsoo1
Copy link
Author

jsoo1 commented Mar 11, 2022

After some more looking at this, I think it is slightly more complicated than this (but not much).

@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch 2 times, most recently from 61f4a86 to 7a5508d Compare March 11, 2022 19:37
@jsoo1
Copy link
Author

jsoo1 commented Mar 11, 2022

Ok I think this is a start now. I see that maybe in the past filepath.Walk might have been restrictively slow but now filepath.WalkDir exists 😄 .

@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch from 7a5508d to d0eac4e Compare March 11, 2022 19:39
@jsoo1
Copy link
Author

jsoo1 commented Mar 11, 2022

A brief test on our CI confirms this does what I want. Though the symlinks need some work and there could be cases of which I am unaware.

@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch 2 times, most recently from 6ba2618 to 77dbc24 Compare March 12, 2022 02:34
@jsoo1
Copy link
Author

jsoo1 commented Mar 12, 2022

Ok tests pass and I think are actually correct (and symlinks are handled).

@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch 3 times, most recently from d98250e to 04aa7f8 Compare March 15, 2022 21:24
@sj26
Copy link
Member

sj26 commented Feb 14, 2023

Hi @jsoo1! This is an interesting suggestion, and thank you for the contribution, but BUILDKITE_ARTIFACT_PATHS is intended to be a list of globs for artifact upload, not used as an input to shell scripts for commands like mkdir.

If you want a shared value for commands and for artifacts then perhaps you want to abstract that. I would suggest doing something like this instead, for example:

steps:
- env:
    - PREFIX=the/place/to/build
  artifact_paths:
    - ${PREFIX}/bin/*
  command: make PREFIX="${PREFIX}"

@jsoo1
Copy link
Author

jsoo1 commented Feb 14, 2023

Hi @sj26 thank you for your feedback!

BUILDKITE_ARTIFACT_PATHS is intended to be a list of globs for artifact upload, not used as an input to shell scripts for commands like mkdir.

Hm. Why is that? I would assume that almost any env var in a shell would be subject to some use by a command.

If you want a shared value for commands and for artifacts then perhaps you want to abstract that. I would suggest doing something like this instead, for example:

steps:

- env:

    - PREFIX=the/place/to/build

  artifact_paths:

    - ${PREFIX}/bin/*

  command: make PREFIX="${PREFIX}"

I can see this being useful, thank you!

I would like to push back on this decision a bit, though.

While I could probably define my own env var for the purpose of $prefix, my bigger papercuts are listed in #1570. To recap:

My problems are with glob semantics and violation of the principle of least surprise.

  1. $BUILDKITE_ARTIFACT_PATHS is a misnomer. I get confused when I include a directory in the list (which is a path, as the name suggests) and it gets ignored.
  2. Suppose it were renamed to $BUILDKITE_ARTIFACT_PATTERNS and documented to accept only glob patterns. A directory is still a valid glob pattern!

Thus my expectations are doubly violated when a directory I list in $BUILDKITE_ARTIFACT_PATHS is explicitly ignored for upload.

If we modify the example slightly:

steps:

- env:

    - PREFIX=the/place/to/build

  artifact_paths:

    - ${PREFIX}/bin

  command: make PREFIX="${PREFIX}"

Then no contents of ${PREFIX}/bin will be uploaded despite existing and being a valid glob.

The primary benefit of supporting directories would be to increase the usability of pipeline definitions. The use of $prefix is a potential secondary upside of supporting directory listings.

Otherwise I'm not sure I see the downsides. Is there extra maintenance that would be required due to this change I am not aware of?

In any case, we love Buildkite and use it heavily. Thank you for your work!

@sj26
Copy link
Member

sj26 commented Feb 20, 2023

The agent puts all the parameters required to run the job into the environment and then invokes a bootstrap process which consumes a lot of those variables to run hooks etc around your command to provide the surrounding behaviour. $BUILDKITE_ARTIFACT_PATHS is one such env, and is consumed by the buildkite-agent artifact upload command during arifact upload after the command hook.

You can upload the binaries produced by make with something like this:

steps:
- env:
    - PREFIX=the/place/to/build
  artifact_paths: ${PREFIX}/bin/*
  command: make PREFIX="${PREFIX}"

or you could be explicit and handle the upload yourself if you want more control:

steps:
- env:
    - PREFIX=the/place/to/build
  command: |
    make PREFIX="${PREFIX}"
    buildkite-agent artifact upload "${PREFIX}/bin/*"

I reckon we'll add a recursive mode for artifact upload at some point. Those commands need a bit of a rethink for better ergonomics. Maybe something like:

steps:
- env:
    - PREFIX=the/place/to/build
  command: |
    make PREFIX="${PREFIX}"
    # Doesn't work right now, but might at some point in the future
    buildkite-agent artifact upload -r "${PREFIX}/bin"

@jsoo1
Copy link
Author

jsoo1 commented Feb 21, 2023

@sj26

The agent puts all the parameters required to run the job into the environment and then invokes a bootstrap process which consumes a lot of those variables to run hooks etc around your command to provide the surrounding behaviour. $BUILDKITE_ARTIFACT_PATHS is one such env, and is consumed by the buildkite-agent artifact upload command during arifact upload after the command hook.

Ok... can you expand a bit on how that relates to this conversation? Is that an additional maintenance burden? Sorry I'm having trouble seeing the relation.

You can upload the binaries produced by make with something like this:

steps:

- env:

    - PREFIX=the/place/to/build

  artifact_paths: ${PREFIX}/bin/*

  command: make PREFIX="${PREFIX}"

... but importantly not:

  artifact_paths: ${PREFIX}/bin

I reckon we'll add a recursive mode for artifact upload at some point. Those commands need a bit of a rethink for better ergonomics.

That's what this pr does, no?

The artifact uploader is used here:

uploader := agent.NewArtifactUploader(l, client, agent.ArtifactUploaderConfig{

And this this pr takes care of the hypothetical --recursive flag for you:

  • transparently
  • requiring no extra effort on your part (except some maintenance burden?)
  • improving the ergonomics
  • reducing surprise factor
  • lining the artifacts up with glob semantics

I'm still not seeing the downsides (though I'm open to hearing them!).

Note the current state of affairs actually thwarts the glob matching by explicitly ignoring directories! See here:

if isDir(absolutePath) {

Maybe what would help is some background on why that was done, because it is so confusing! My best guess is the previous performance downsides of filepath.Walk (I think it is called, it's been some time since I've looked into it). But the std lib now has filepath.WalkDir which solves the stat-storm of Walk: https://cs.opensource.google/go/go/+/go1.20.1:src/path/filepath/path.go;l=530.

Thanks again.

@jsoo1
Copy link
Author

jsoo1 commented Feb 21, 2023

Perhaps I've been a little too polite here. Not allowing directories in the artifact_paths list is a major usability and correctness bug. Let's use your example to motivate why. Suppose our default make target begins with a single binary output. We start with the example pipeline definition:

steps:
- env:
    - PREFIX=the/place/to/build
  artifact_paths:
    - ${PREFIX}/bin/*
  command: make PREFIX="${PREFIX}"

Then one day we decide to add a lib output. Now we would have to go change the pipeline to make sure the lib directory is uploaded:

  steps:
  - env:
      - PREFIX=the/place/to/build
    artifact_paths:
      - ${PREFIX}/bin/*
+     - ${PREFIX}/lib/*
    command: make PREFIX="${PREFIX}"

But wait! This can be simplified!

  steps:
  - env:
      - PREFIX=the/place/to/build
    artifact_paths:
-     - ${PREFIX}/bin/*
-     - ${PREFIX}/lib/*
+     - ${PREFIX}/*
    command: make PREFIX="${PREFIX}"

... woops, now our job has this in the logs:

Skipping directory bin
Skipping directory lib

*grumble* well ok, I guess we can do this?

  steps:
  - env:
      - PREFIX=the/place/to/build
    artifact_paths:
-     - ${PREFIX}/*
+     - ${PREFIX}/**/*
    command: make PREFIX="${PREFIX}"

Sweet! We can even add libexec and share directories, no problem!

Well, suppose then we want to add a README.md to the root of ${PREFIX}, then we also have to do this:

  steps:
  - env:
      - PREFIX=the/place/to/build
    artifact_paths:
      - ${PREFIX}/**/*
+     - ${PREFIX}/*
    command: make PREFIX="${PREFIX}"

But we still have these annoying Skipping directory bin... logs. Plus my pipeline definition relies on the implementation of the Makefile and vice versa. The intermediate ${PREFIX} variable really isn't providing any "abstraction" here. The makefile and the pipeline.yaml are still coupled. What I would really like is:

  steps:
  - env:
      - PREFIX=the/place/to/build
    artifact_paths:
      - ${PREFIX}
    command: make PREFIX="${PREFIX}"

Which, as I noted in the original description, can be shorted to this (in the simplest cases):

  steps:
    artifact_paths:
      - the/place/to/build
    command: make PREFIX="${BUILDKITE_ARTIFACT_PATHS}"

Which is what this PR enables.

@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch 2 times, most recently from 03f86bd to e9620e6 Compare August 29, 2023 20:16
@jsoo1 jsoo1 force-pushed the directory-artifact-paths branch from e9620e6 to 5af6007 Compare August 29, 2023 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support directories in artifact_paths
2 participants