-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] bring your own builder #763
Comments
Something I wanted to note is that re-usable workflows cannot call each other. So on GH, it may not be feasible to separate the "builder" and "generator" as different "trusted entities". Which is why I'm proposing a GHA to ensure the same provenance format. The. original thread (sigstore/fulcio#754) suggested separating the entities, but It's not currently possible, AFAIK |
Following up on conversation from sigstore/fulcio#754:
I want separate TCBs for provenance generation (which IMO should be higher-security, and change very infrequently) and the compilation stage. I should be able to quickly verify that I trust the provenance even if I'm not convinced about the compilation workflow. The provenance should include what compilation stage I used (without trusting the calling workflow).
Yes, exactly. But there's a couple subtleties. First, I don't want the compilation workflow to invoke the provenance generator component, because every time the compilation workflow changes, I'd have to worry about the provenance generator too (which is higher-consequence if compromised). I want direct control over the version of provenance generator used in a repo. But I can't call a generic provenance generator: this means I have to audit the repository to trust that the contents of the generic provenance generator are okay, even if it has a valid signature from a release of the generic provenance generator (because I may have modified the calling workflow to feed it bad artifacts). Basically, my use case is the npm one: I want to, in an automated fashion, verify the provenance of an artifact, and that it was built using a specific compilation workflow from a publicly-known source repo at a known hash. This is fine if we trust the builder to invoke the provenance generator correctly. Really what I want is a generic provenance generator that wraps other compilation actions: you would give it Then, if I have a certificate indicating that the provenance generator came from a trusted workflow, I know that all of the provenance information is accurate, even if the build was malicious, including the build workflow that was invoked. I have no idea if GitHub supports such a thing, and I'm probably being a little pedantic—at the end of the day, I have to trust both the provenance generation and the compilation steps, and "it's really easy to integrate standard, signed provenance into my builder" is probably good enough. Also, I could be confused, and there could be a reasonable way to check that we have a signature from a specific version of the provenance-generating workflow over provenance that captures the compilation workflow that ran. |
Ahh, yeah per @laurentsimon's comment upthread, what I want isn't possible on GitHub. I'm glad I wrote it down, though. Maybe a FR for GitHub Actions? EDIT: seems like I'm not the only one to want this: actions/runner#2079, actions/runner#1541 |
Oh, and
may not be true anymore (as of 3 days ago): https://docs.github.com/en/actions/using-workflows/reusing-workflows#nesting-reusable-workflows I still don't think this gets us what we want (see my comment above). |
great find! I wish they announced such changes.. or maybe I'm not subscribing to the right repos...
Why not? |
That's basically what we do in our generators. Except that we use VM within the same workflow to call the generator and the builder, instead of re-usable workflow for each. Within our trusted workflow, we can call an external trusted builder via a GitHub action. If you trust the top-level re-usable workflow, you trust it to call the Action properly. All we really need is for the builder Action to support a "dry-run" option to get the steps in a trusted way in order to populate One issue may be to make the builder name dynamic. If we're really confident about parsing the input, we could turn script injection in our favor, but it's a little dangerous :-) If we can call the trusted builder via an API instead of an Action, that would work. Another way would be to simply fork the repo of the trusted builder and run the code of the trusted builder, or even just grab their signed binaries from their releases and run that in a VM / job. The latter is what we do today in our builders - not sure why I got hung up on GH Actions :). So I think this is technically do-able, without the need for re-usable workflow chains. |
Very cool! I think that would do what I'm looking for 🚀 |
Responding to the comments on the previous issue.
For a single build service with some common info this is doable. It becomes a lot harder when talking about completely different build services that might have very different data and methods for retrieving the trusted data from the builder. So doable for GitHub Actions. Much less doable for other build services like Gitlab etc.
IIUC This is a requirement of SLSA 3's Non-falsifiable provenance requirement. i.e. builds have to be isolated from the provenance generation such that build service users cannot change the provenance. In practice this means there must be a hard security boundary between the user's actual build code, and the provenance generator. In the case of slsa-github-generator we use different GHA jobs and thus different VMs. Generally I expect SLSA 3 builders to implement this functionality and thus you should be able to be determine this via the |
This feature is also useful to onboard scanner, like Syft. We'd just run their CLI and attest to the output. |
What would be even better is if we could turn an existing GHA into an attested one. This may be possible as well. If the input to our workflow is an action, we dynamically generate the call to the action. Inputs would need to be a map, so that we can pass them to the action... I don't think this is supported yet. This means users would need to pass the input as a JSON string for us to parse... as a v0 PoC this may be acceptable. Note: if the input is a binary CLI instead of an action, we also need the arguments, so argument passing is a common probem to solve. |
A simpler solution to start with could be to operate a service like https://github.com/actions/starter-workflows, but for SLSA. Ask interested builders, scanners, etc to submit a config file that describes the input, output, and commands to run. |
I also thought about a "command" workflow that would run a command or set of commands and attest to some output. I like the idea of turning an existing GHA into an attested one but I wonder if we can get enough info from outside of the workflow. |
For subject output measurement: this would require one of
I like (2) a lot in the case that we have an existing GHA. For a CLI: we would have to resort to (1) |
Would we have a config file on our repo, on the user's repo... or everything via workflow input parameters? |
Something to think about: where do we store the artifact type? DSSE? Intoto predicate type? Inside the |
Another angle to think about is branding. For an ecosystem like npm, users will be more receptive if the re-usable workflow is located on an npm org / repository, rather than in the slsa-framework repo. However, building re-usable workflow is hard, takes time and requires maintenance. Given the re-usable workflows can call each other now, we could have:
|
Here's a way we can implement this feature, at a high-level. Let's say goreleaser is the builder / toolchain here. Goreleaser maintainers create a re-usable workflow to wrap their Action, say at https://github.com/goreleaser/goreleaser-action/tree/master/.github/workflows/builder.yml This workflow looks roughly like the following (I've omitted a lot of details for simplicity): # Checkout developer repository to scan / build / create SBOM for.
- uses: actions/checkout@xxx
# Run the action.
- uses: goreleaser/goreleaser-action
with:
...
# Create SLSA subjects
- id: hash
env:
ARTIFACTS: ...
run: |
checksum_file=$(sha256sum "$ARTIFACT")
echo "::set-output name=hashes::$(echo "$checksum_file" | base64 -w0)"
slsa-attest:
uses: slsa-framework/slsa-github-generator/path/to/[email protected]
with:
base64-subjects: ${{ needs.run-tool.outputs.hashes }}
args: <> Users of Goreleaser can use the https://github.com/goreleaser/goreleaser-action/tree/master/.github/workflows/builder.yml instead of calling the Action. From Goreleaser's point of view:
We would need to decide what the format of the provenance looks like, so that we can report the re-usable workflow that called us inside the provenance. Here's a possible format to get the ball rolling: "buildConfig": {
"version": 1,
"builder": {
"id": "https://github.com/goreleaser/goreleaser-action/tree/master/.github/workflows/builder.yml@refs/tags/v1.2.0",
"sha1": "abcdef...",
"type": "delegatedReusableWorkflow",
},
// Optional steps or buildConfig provided by the re-usable workflow calling us (?)
"steps": {
"https://github.com/goreleaser/[email protected]",
"argument1", "argument1-value", etc
} |
Something that's not covered in the above provenance is when there are several re-usable workflows involved. Say:
Do we want to support this? What would the provenance format look like? @ckotzbauer please chime in if you have some advice / ideas. |
Broadly, that plan makes a lot of sense! The main hesitation I have is just that there's a lot of copy-paste involved in the wrapper reusable workflow, which makes it harder for a verifier to audit whether it's trusted. Not sure we can do anything about that without some big enhancements to GH Actions. |
# Create SLSA subjects
- id: hash
env:
ARTIFACTS: ...
run: |
checksum_file=$(sha256sum "$ARTIFACT")
echo "::set-output name=hashes::$(echo "$checksum_file" | base64 -w0)" Is there a "standard" format for it? |
It's currently always sha256, because it's the most popular one used in the slsa specs. |
Sigstore bundle spec inclines to have digest+bytes for hash. // Only a subset of the secure hash standard algorithms are supported.
// See https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf for more
// details.
enum HashAlgorithm {
SHA2_256 = 0;
SHA2_512 = 1;
}
// HashOutput captures a digest of a 'message' and the corresponding hash
// algorithm used.
message HashOutput {
HashAlgorithm algorithm = 1;
// This is the raw octets of the message digest as computed by
// the hash algorithm.
bytes digest = 2;
} Can the provenance be split/grouped into several files?
There are cases when a single release includes 100+ modules. They all are a part of "JUnit 5.9.1" release. In that regard, if the all the sha256 for all the released files are mixed, then the resulting attestation would grow. WDYT if the builder groups the checksums so it could get several attestations back? |
we're trying to make it easier for the caller to call the builder without the need to format their input in JSON / protobuf / etc. But we can support other hashes if need be, by providing a
Thanks for bringing this up, we want to be sure we support options user need. Technically this should be do-able. Several options I'd like to explore and get your feedback on:
@ianlewis this is also relevant for our generic generator |
I think this is doable. Though some granularity in the data might be lost if you want to capture the build steps in the provenance and know what all the outputs of the build process are. Usually that's not a primary concern though.
For the reasons you mentioned this is usually my preference. In the absence of some discovery mechanism that doesn't exist yet, it's easiest to look for |
I have a similar usecase for a Gradle project that includes several artifacts as part of a single release. I find it useful to both have a top level provenance that is published as a GitHub release asset with multiple subjects (however that could grow very large and hence this issue #845) and separate provenances like |
I wonder whether it would cause confusion that the provenance uploaded to Maven central is different from the one in GitHub release. Would users not expect to see the top-level provenance as well on Maven central? |
@vlsi Do you have a link or something to the code or workflow that is used to generate releases like the one you described? |
As far as I understand the BYOB feature expects custom builders to run on a single working node. What if the build requires creating artifacts in different nodes (linux, osx, windows)? Is there a way to let the build part of the BYOB be run on separate nodes, collect artifacts, and continue with attestation? |
We have not implemented it yet support for different workers because nobody asked (and to start simple), but that would be possible. I think the way it'd work is that you'd call the BYOB multiple times, once for each runner you want. Would that work? has anyone asked for support for runners besides Linux? |
Well, invoking the jreleaser multiple times is doable but there's the problem of collating all artifacts and perhaps updating and existing release. It definitely gets trickier this way instead of building on separate workers, collect all artifacts in a single worker and perform a release. |
How about the following: expose to users a runner list. The jreleaser workflow then calls BYOB in different jobs (uses |
That'd be great but I don't think it's feasible with the current impl of the jreleaser/java-builder. As far as I understand it, this step informs the SLSA delegator where to find the BYOB action Then this step executes the builder action inside a worker node setup by the delegator During this step, artifacts are built and released, then attestated. Wouldn't setting different worker nodes outside of the delegator be considered a possible break in trust? |
If jreleaser workflow calls BYOB for each runner, it could pass a different Action path if needed. Essentially you would have 3 setup calls, and 3 delegator_generic_slsa3 calls. Note: you should update to using v1.10.0. There was a breaking change in Sigstore a few weeks ago and we released a new version https://github.com/slsa-framework/slsa-github-generator/blob/v1.10.0/CHANGELOG.md#v1100
Correct.
I don't think it would. The use still needs to trust the jreleaser workflow which is considered the builder. BYOB is merely a framework to help write your own builders. |
Then the delegator would have to accept a parameter specifying which OS should be used by the worker, right? At the moment the worker is explicitly set to Linux slsa-github-generator/.github/workflows/delegator_generic_slsa3.yml Lines 129 to 135 in 4534a0b
It would have to be parameterized to support all managed runners, with Windows being problematic because of scripts and Macos because of docker (for some builds I suppose)
I'll do so shortly. |
yes you're correct. We would add support for other runners. Let me know if this is something people have asked support for. |
it may be a useful component for others to create provenance with the same format across GH builders.
See sigstore/fulcio#754 (comment)
The text was updated successfully, but these errors were encountered: