Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardizing CI OIDC token claims #754

Closed
haydentherapper opened this issue Aug 23, 2022 · 35 comments · Fixed by #945 or #1073
Closed

Standardizing CI OIDC token claims #754

haydentherapper opened this issue Aug 23, 2022 · 35 comments · Fixed by #945 or #1073
Assignees
Labels
enhancement New feature or request npm-ga

Comments

@haydentherapper
Copy link
Contributor

haydentherapper commented Aug 23, 2022

Goal

Create a standard set of claims that should be present in OIDC tokens from CI systems such as GitHub Actions, Cirrus CI, GitLab, Circle CI, etc.

Background

As noted in the NPM RFC for integrating with Sigstore, and as documented in other tickets (#243, #591, #748), there is interest in support for other CI systems. It is technically possible to implement support for each, but it will require code duplication and work for onboarding every CI platform. It would be ideal if all OIDC tokens from all CI systems had a standard set of claims to represent identity, so that onboarding would simply be updating configuration.

Current state

All of the above platforms either are working on or currently produce OIDC tokens for CI workflows. Fulcio currently only accepts CI tokens from GitHub Actions, and has hardcoded the GitHub specific claim values and produces a code signing certificate with GitHub specific OID values.

Currently expected claims (GitHub ref)

  • job_workflow_ref
  • sha
  • event_name
  • repository
  • workflow
  • ref
  • aud (which must be set to sigstore)
  • exp

sha, event_name, repository, workflow, and ref are included in issued certificates in custom OIDs - https://github.com/sigstore/fulcio/blob/main/docs/oid-info.md.

Required claims

The token should include standard OIDC claims like:

  • aud (which must be customizable and set to sigstore)
  • sub
  • iss
  • exp
  • iat
  • nbf

We should include the claims specified in "Currently expected claims".

There was conversation in #624 about including the run ID (run_id), run count (run_number) and attempt count (run_attempt). We should decide if these should be required for Fulcio certificates.

Another useful claim may be actor, who triggered the CI run.

Any claim values must be immutable. For example, user IDs should be used instead of usernames, and repository IDs should be used instead of repository names, to prevent resurrection attacks.

cc @asraa @laurentsimon @znewman01 @fkorotkov @feelepxyz, what would you like to see in a token and do you have recommendations on claim names?

@haydentherapper haydentherapper added the enhancement New feature or request label Aug 23, 2022
@wlynch
Copy link
Member

wlynch commented Aug 23, 2022

Another useful data point, Tekton (which is also drives other tools like JenkinsX), which would be limited by the Kubernetes JWT workload claims. (I assume this would also affect prow and other Kubernetes based CI as well)

@asraa
Copy link
Contributor

asraa commented Aug 24, 2022

job_workflow_ref may be unnecessary if we can construct it from other claim values like workflow and ref.

I don't think that's correct. In your link The workflow, ref, and other attributes describe the caller workflow, while job_workflow_ref refers to the called workflow: These are different, and we rely on them for SLSA 3 builders to demonstrate the identity of the trusted builder, the called workflow, which is distinct from the caller.

@asraa
Copy link
Contributor

asraa commented Aug 24, 2022

There was conversation in #624 about including the run ID (run_id), run count (run_number) and attempt count (run_attempt). We should decide if these should be required for Fulcio certificates.

I'm actually on the side that it should not: these values can easily be added inside a signed attestation -- this is very much like recreating provenance inside a signing certificate. See slsa-framework/slsa#464 related issue. The signing cert contains enough builder information that "You could think of the x509 builder as a first-stage builder, which is limited but sets the "root of trust". " We definitely don't NEED to include all the provenance inside the certificate. @laurentsimon

Another useful claim may be actor, who triggered the CI run.

Again, I think this starts turning into provenance metadata.

At minimum the signing cert should contain just the necessary info to identify the workflow: including the caller and called workflow and its commit SHA.

For example, user IDs should be used instead of usernames, and repository IDs should be used instead of repository names, to prevent resurrection attacks.

BIG +1! GitHub does expose repository IDs. Although: think of the verification side: it is much harder for humans to verify the cert fields to see if a signature came from a repository, when it is a repository ID. Again, that can go in provenance info (EDIT: maybe not? since the repository might be an unutrusted resurrected one)

@haydentherapper
Copy link
Contributor Author

These are different, and we rely on them for SLSA 3 builders to demonstrate the identity of the trusted builder, the called workflow, which is distinct from the caller.

Thanks for noting this, I've removed this from the issue description so it's now required.

I'm actually on the side that it should not:

I am in agreement, I believe Laurent as well from the discussion.

think of the verification side: it is much harder for humans to verify the cert fields to see if a signature came from a repository

I think we should be building verification policies around IDs and not human-readable values. I do agree it's harder for a human to validate it though, but I think this can be solved with better UX.

@bdehamer
Copy link

Related to the job_workflow_ref and workflow claims in token issued by GitHub Actions . . .

The job_workflow_ref claim provides the full path to the called workflow:

slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@refs/heads/main

Whereas the workflow claim only identifies the logical name of the calling workflow:

build

This name isn't guaranteed to be unique across the workflows defined in a particular repo so this isn't particularly useful in identifying the calling workflow.

@laurentsimon
Copy link

laurentsimon commented Aug 25, 2022

I think we should be building verification policies around IDs and not human-readable values. I do agree it's harder for a human to validate it though, but I think this can be solved with better UX.

Maybe a broader question is whether you see Sigstore as a building block / "root of trust" for "richer" systems (re-usable workflows and similar systems) or not. If you consider Sigstore the fundamental building block / enabler / RoT, then you may not need to keep adding more fields in the OIDC token / cert.

+1 on agreeing on the set of minimum claims, like workflow(s), ref(s) and repo. I don't know enough about other non-GitHub CIs to say anything about actors and other pieces. On GitHub it's part of the GH context and can be retrieved by the "richer" builder, so is not necessarily needed. Not sure about other CIs

If you consider Sigstore a standalone solution for sighing only (without richer trusted builders), then additional fields may be something to consider. Maybe it's a product decision...? Certificates are not the most human-friendly to work with, so it may limit the usability of a solution; whereas adding fields into a JSON-formatted provenance seems easier? Trusted builders can more easily incorporate changes over time (e.g., additional fields, new features), so maybe something to keep in mind as well.

@znewman01
Copy link
Contributor

At this point, I think we're agreement on the following:

  1. The top priority is: we should require enough information to uniquely identify the workflow that was run.

    That is (provided nothing has been deleted), I would be able to go fetch from the CI system enough complete information to understand what the workflow did, and consequently what any artifacts that validate against this certificate are.

    Ideally, I can even be convinced about what workflow ran without needing a round-trip to the CI system.

    In order to uniquely identify the workflow run, we need immutable identifiers for everything.

  2. We may or may not want additional information, including human-readable versions of IDs or identifying the runner.

Questions

I'd bet many of these have easy answers and I just don't have enough context/background to know them.

job_workflow_ref immutable identifiers

@bdehamer, the example of a job_workflow_ref you gave seems to support organization names, repo names, and tags/branches, all of which can change over time. Is there a way to get the analog of that job_workflow_ref with organization IDs, repo IDs, and digests?

How will verification work?

I think in most cases, we're trying to check the claim "this artifact was produced by workflow X from repository Y at commit Z," where typically X is somehow a trusted workflow.

@asraa and @laurentsimon's blog post about SLSA 3 on GitHub Actions is really helpful for understanding how these things will be used.

In general, I think we should have sophisticated tools for verification; manually checking all these things makes my head spin, but one-time engineering work to convince me "these packages were built from their source on GitHub using a standard Javascript builder" feels reasonable. I guess I'm in the @haydentherapper camp, or the "building block / root of trust" camp). I'd need to see use cases for the "signing only" camp that involve GitHub OIDC workflow identities.

Do we need to identify anything beyond the workflow?

Candidates:

Arguments for:

  • Convenience: easier to interpret what a cert means
  • It might be really easy to accidentally trust the provenance information from an unvetted workflow, which could lie about these things. If we trained verifiers to get this information from the cert, we wouldn't have that problem (though we would have it for everything else).

Arguments against:

  • It's redundant. That information is available to any party (e.g., workflow) that has access to the key material. They could lie about it, but you can identify/audit such workflows and determine whether to trust provenance coming from them.
  • Privacy: you may not want names, or the the identity of the person who kicked off an action in a CT log

My take: I think I'm convinced by @asraa and @laurentsimon here: minimal is better, and all of this is available elsewhere. Maybe with an exception for "what runner did this use?" since that seems like it could invalidate the provenance.

What is the minimal set of claims to uniquely identify a workflow on GitHub?

Maybe (see GitHub OIDC docs):

  • repository_id: uniquely identifies a repository (do we need repository_owner_id too? can you fetch a repository based only on its ID?)
  • sha: with repository_id, this uniquely identifies a source tree
  • workflow: upthread, it's pointed out that this may not be unique within a repo; what should we use instead?
  • job_workflow_ref (but with IDs): to identify a called reusable workflow

I worry that the "reusable workflow" concept is a little GitHub-specific, and we might want one field that combines workflow and job_workflow_ref.

What is the minimal set of claims to uniquely identify a workflow on other CI systems?

Need to do some homework here.

To what extent is it important that the fields are standard?

I'd argue that any verifier needs to understand each CI system in order to property understand how to interpret its repository IDs, workflow identifiers, etc., so my answer is "not very important."

It's definitely "nice" to have similar concepts represented by the same fields. However, it could be dangerous: maybe there are different types of hashes used by different systems, and you could confused them? I think most scenarios in which this is a problem are pretty contrived.

Do we require changes to the OIDC tokens we're getting to meet any of our goals?

If so, we should probably request those ASAP.

How can we improve ergonomics and trust for maintainers and verifiers?

It feels nice to be able to have "standard" provenance fields that maintainers can easily incorporate but verifiers can still trust. It's a little out-of-scope for this issue, but I think if we have a satisfying solution here that means that it's pretty hard to argue for more than a "minimal" claim set. (That said, we may be able to build a solution for some CI systems but not others.)

https://github.com/slsa-framework/slsa-github-generator gets so close in my mind. The missing element is composability: right now, I can:

  1. Use a special, build-process-specific combined SLSA provenance generator/builder (e.g, the Go builder).

    This tightly couples the provenance generator and builder in a way I don't like: if I need an update to my builder workflow, this could affect provenance generation, and any changes that could affect provenance generation are very high-consequence. An attack could turn a routine update (that, if it only affected the builder, could at worst lead to bad artifacts) into a break-everything forgery (I can provide provenance with arbitrary data).

    Plus, now there's one provenance generator for each ecosystem that needs to be audited. Even if we're reusing components, there's not an easy way to check "my provenance came from a trusted generator, even if the builder is bad" without going into the source of the builder and parsing the workflow.

  2. Use a provenance-only generator.

    I much prefer this from a security standpoint. However, anytime I mess with the calling workflow, I have the ability to change what artifacts are provided.

I would really like to say "use standard provenance generator @ commit X and standard node.js builder @ commit Y, together". Then, I would know that no matter what the source of the calling repository was, my artifact came from node build on that source.

Conclusion

I think it'll be much easier to decide what fields to use once we've answered the above questions for several candidate CI systems. Then, I think we should proceed with a minimal set of claims, and let users come to us with use cases that they don't meet.

Much of the above is a little bit off-topic, and I'm happy to table any discussions about those and pick them up elsewhere/later.

@laurentsimon
Copy link

laurentsimon commented Aug 25, 2022

job_workflow_ref immutable identifiers

@bdehamer, the example of a job_workflow_ref you gave seems to support organization names, repo names, and tags/branches, all of which can change over time. Is there a way to get the analog of that job_workflow_ref with organization IDs, repo IDs, and digests?

+1 on having them, and asking GH to support it including for re-usable workflows if it's not available yet.

Do we need to identify anything beyond the workflow?

Candidates:

self-hosted runners have access to OIDC. You need a round-trip to verify this unless it's added into OIDC token (we asked GH to do that, so it may happen in the future). One additional complexity is that it's possible for a workflow to declare jobs self-hosted and others not.
Note: the trusted builder can hardcode it (we do that in our builders).

https://github.com/slsa-framework/slsa-github-generator gets so close in my mind. The missing element is composability: right now, I can:

  1. Use a special, build-process-specific combined SLSA provenance generator/builder (e.g, the Go builder).
    This tightly couples the provenance generator and builder in a way I don't like: if I need an update to my builder workflow, this could affect provenance generation, and any changes that could affect provenance generation are very high-consequence. An attack could turn a routine update (that, if it only affected the builder, could at worst lead to bad artifacts) into a break-everything forgery (I can provide provenance with arbitrary data).

I don't entirely follow. At least in our case, the build and the provenance generation are separate jobs. The format remains the same, and only the buildConfig / builder.id change across builders. Agreed that if the code that's responsible for populating the buildConfig can be hijacked, it could forge the steps. But this code is part of the TCB, IIUC.

Maybe you're proposing having a dedicated project for provenance generation only? We kinda of have this in the generator repo. We don't expose it and only use it internally, though. We could, in theory, expose it thru a GitHub action.

Let me know if I mis-understood the comment.

Plus, now there's one provenance generator for each ecosystem that needs to be audited. Even if we're reusing components, there's not an easy way to check "my provenance came from a trusted generator, even if the builder is bad" without going into the source of the builder and parsing the workflow.

I think the plan is to share the provenance generation code with other builders for a given CI. On GitHub, we could theoretically create an Action for this. /cc @ianlewis

@znewman01
Copy link
Contributor

self-hosted runners have access to OIDC. You need a round-trip to verify this unless it's added into OIDC token (we asked GH to do that, so it may happen in the future). One additional complexity is that it's possible for a workflow to declare jobs self-hosted and others not.
Note: the trusted builder can hardcode it (we do that in our builders).

TY! That helps.

Maybe you're proposing having a dedicated project for provenance generation only? We kinda of have this in the generator repo. We don't expose it and only use it internally, though. We could, in theory, expose it thru a GitHub action.

Let's move this conversation over to slsa-framework/slsa-github-generator#763; apologies for the distraction from the root issue in this thread 😄

@bdehamer
Copy link

Is there a way to get the analog of that job_workflow_ref with organization IDs, repo IDs, and digests?

I don't know, but I'll try and track down the team here responsible for this stuff and make some inquiries.

can you fetch a repository based only on its ID

Yeah, there's a GET /repositories/:id endpoint that will look-up a repo based solely on its ID (and the ID persists across renames and ownership changes)

@trevrosen
Copy link

I'd like to jumpstart some movement on this issue if possible, as we're regarding it pretty important for our work on npm attestations, especially now that we have begun to reach out to some potential launch partners (read: cloud CI vendors with existing OIDC support) to talk about integration on their own platforms.

Additionally we have some commitments from the Actions team to extend the OIDC token with the types of fields discussed in this thread (though we may need to get some further alignment there). If we can get crisp on some non-GitHub nomenclature for the cert fields, I feel like we're a long way toward settling this. Is anyone taking a stab at some generic naming notions? Should we try to chat in Sigstore Slack about a plan for settling this into a PR?

@haydentherapper
Copy link
Contributor Author

Let's get a chat going either on Slack or here, there hasn't been any progress.

This was referenced Nov 16, 2022
@asraa
Copy link
Contributor

asraa commented Nov 17, 2022

Chiming in to describe some updates after we've had some conversations. I think some of this echos @znewman01 discussion earlier.

We MUST have the certificate to identify (with immutable references) the smallest "trust domain" relevant for client verification. So for GitHub we MUST have:

  • An immutable reference to the reusable workflow (repository ID, workflow, SHA)
  • An immutable reference to the caller workflow (repository ID, workflow, SHA). Although I think this can be accessed by the reusable workflow and so can be placed in the provenance (for SLSA GH builders we verify the source from here AND the provenance), this would fit nicely into OTHER CI/CD platforms where a project/org (caller workflow) kicks of a job (reusable workflow)

Stuff I think we can punt:

  1. The run. This can be part of the provenance. Once the "trust domain" (reusable workflow/whatever) is established, that can live in the signed content.
  2. Human readable workflow/repository/owner: get these from the provenance instead.

Stuff I'm not sure of:

  • The runner info: I think it should.
  • The actor: this feels like provenance information.
  • The caller workflow immutable reference. See above.

If we do something like we MUST have the reusable workflow immutable ref AND the caller immutable ref, then this lines up with the patter for BuildKite #890 where the reusable workflow is the job_id and the caller immutable ref becomes the organization/pipeline slug. @sj26

@feelepxyz
Copy link
Member

@asraa the GH Actions team have just added some new claims to the ID token:

  • job_workflow_sha: sha of the reusable workflow if one is used, otherwise will be the sha of the parent/triggering workflow, which can be from a different branch to the source repo/materials (is this version in the draft slsa v1 spec?)
  • workflow_ref: Similar to job_workflow_ref but always points to the trigger workflow path (aka "entryPoint"), instead of the reusable workflow if one is used. This should replace use of the workflow claim that just points to the name of the triggering workflow.
  • workflow_sha: Similar to job_workflow_sha but always points to the triggering workflow SHA (so could maybe be attached to entryPoint to make this reference immutable)

We MUST have the certificate to identify (with immutable references) the smallest "trust domain" relevant for client verification.

This makes sense for trusted builders, which is the north star. I wanted to raise a use-case for npm where it might take a very long time for us to effectively roll out trusted builders in the npm ecosystem given the varied nature of publish workflows in the wild. The majority of existing automated npm publish workflows I've investigated would be hard to support for a trusted builder without a lot of different runtimes and config options.

Until we get to a place where most projects end up using trusted builders, we could definitely use more information in the Fulcio cert to be able to validate that key pieces of the provenance statement have not been falsified.

This might be a bit of a anti-pattern given the preference for trusted builders to solve this problem. But if we had the repo URL, commit SHA, triggering workflow path, SHA and/or re-usable workflow path, SHA we could compare these values in the Fulcio cert against what's in the provenance statement before accepting the package for publishing.

Ideally we could access the following GitHub OIDC claims in the Fulcio cert:

  • Path to the reusable workflow (if one is used) including branch ref: job_workflow_ref
  • Reusable workflow SHA: job_workflow_sha (could this be combined with above in the SAN like ${job_workflow_ref}#${job_workflow_sha}?)
  • Path to the triggering workflow including branch ref: workflow_ref
  • Triggering workflow SHA: workflow_sha (could this be combined with workflow_ref like: ${workflow_ref}#${workflow_sha})?
  • These could be combined into something like: ${repo}@${ref}#${sha}
    • Commit: sha
    • Repo ref: ref
    • Repo name: repo

Another thought, would it make sense to adopt "SLSA" naming for these attributes in the signing cert?

  • EntryPointURI: workflow_ref
  • EntryPointDigest: workflow_sha
  • ConfigSourceURI: job_workflow_ref
  • ConfigSourceDigest: job_workflow_sha
  • SourceURI: repo@ref
  • SourceDigest: sha
  • Nice to have: InvocationId: ${run_number}-${run_attempt}

It might seem redundant to include workflow_sha and sha as in GitHub's case they are almost always identical, but there's at least one case where they are not the same when using the pull_request_target event.

@laurentsimon
Copy link

laurentsimon commented Nov 18, 2022

@asraa the GH Actions team have just added some new claims to the ID token:

  • job_workflow_sha: sha of the reusable workflow if one is used, otherwise will be the sha of the parent/triggering workflow, which can be from a different branch to the source repo/materials (is this version in the draft slsa v1 spec?)

This information should always be present for both the job_workflow and the triggering workflow, even if the caller refers to it by tag / branch. The GitHub context (not OIDC) provides this information for the repository. It's as important that the OIDC token provide this for the workflow / builder as well: sha, ref, ref_type should always be present.

Another thought, would it make sense to adopt "SLSA" naming for these attributes in the signing cert?

  • EntryPointURI: workflow_ref
  • EntryPointDigest: workflow_sha
  • ConfigSourceURI: job_workflow_ref
  • ConfigSourceDigest: job_workflow_sha
  • SourceURI: repo@ref
  • SourceDigest: sha
  • Nice to have: InvocationId: ${run_number}-${run_attempt}

Let's think carefully about making the OIDC format dependent on the SLSA (evolving) specs. In v1.0, for example, entryPoint no longer exists. In general, if we only care about the identity, either job_workflow_* or workflow_* information are really needed to be part of the OIDC claims. I am not sure the distinction actually matters between the two. In the case of a workflow, the GitHub runner attests (thru OIDC) that it runs a workflow:job, which is the identity. In the case of a reusable workflow, the runner attests again to a (re-)workflow. A single claim could take care of this, since a reusable workflow has a different path than a (traditional / triggering) workflow - so the verifier can infer it. To be more generic, you could have a running-identity-name and an running-identity-type claim: this may be more generic and allows for other identity providers to express their identity more flexibly. The reusable workflow can get the triggering information from the GH context. Otherwise, there is no argument that 2 identities is the right number (workflow and job_workflow) and someone may want to have the complete list of nested reusable workflows (4 can be called from one another)

@steiza
Copy link
Member

steiza commented Nov 18, 2022

I have another claim to suggest, unrelated to the (great) conversation above about build instructions and references.

Some CI/CD providers allow you to either run your build on their cloud-hosted infrastructure, or let the customer host their own runner infrastructure. In the npm registry, we want to differentiate between builds that ran on cloud-hosted or customer-hosted infrastructure. We think it makes sense to include this claim alongside the other information being securely communicated from the CI/CD system to Fulcio (and then downstream to npm and other package managers).

@laurentsimon
Copy link

runner information makes sense to include, I think, since it's part of the running-identity and identifies the trust boundary.

@sj26
Copy link
Contributor

sj26 commented Nov 21, 2022

Hiya! I'm Sam from Buildkite. We're introducing OIDC tokens, and I'm keen to see if we can enable usage of cosign for signing and verifying provenance of containers produced by CI/CD builds.

We include these already:

  • aud (can be set to sigstore)
  • sub (various attributes composed together identifying the pipeline and some build inputs)
  • iss (https://agent.buildkite.com, because tokens are issued by our agent api to agents)
  • exp
  • iat
  • nbf

We include some equivalents to these:

  • sha is build_commit, but may be a user-supplied value for manually triggered builds, or HEAD for a new build until resolved
  • ref is a combination of build_branch and build_tag

We do not include these:

  • job_workflow_ref — the closest might be a reference to the containing pipeline, like https://buildkite.com/buildkite/lifecycled, everything else is dynamic
  • event_name — no current equivalent, although we do record whether a build was started manually, by webhook, etc
  • repository, workflow — these are roughly the same thing for us, each org (account) has many pipelines which contain many builds (or "runs" in GHA parlance), e.g. https://buildkite.com/buildkite/lifecycled

We add these, which I think are important:

  • job_id - a unique id for a particular task within a build run as a concrete process somewhere, GitHub also uses job id I think.
  • agent_id - a unique id for the persistent environment in which many jobs may be run

The job_id feels particularly important for provenance, and because as much as we'd like CI/CD to be a pure function of few inputs it's actually a complicated mess of context with network access which can only be completely captured by a reference to the actual task and environment (the job and agent in our case).

In terms of things a user might like to verify, I expect the most would be the pipeline (or workflow) which produced an image, and the source branch or tag (ref) which was used. These feel like good generic attributes.

"Git Ref" and "Git Commit" for example could be good generic names for the current GitHub attributes "sha" and "ref". "Git Repository" also feels like a good generic attribute, although I would suggest it be a URI to be useful across CI providers instead of a simple org/repo reference.

I don't know a good generic name for pipelines or workflows, the container of many runs of a particular ci/cd workflow, but it's closest to job_workflow_ref. Every provider uses different terminology. In GitHub it's a combo of the repository, and a workflow file location at a ref. In GitLab it's the repository's CI/CD pipelines section, they use "pipeline" to mean one invocation of CI/CD in a repository, which we call a "build" and github calls a "run". AWS CodePipeline uses "pipeline" to mean the container of all invocations, and "pipeline execution" to mean one run of a pipeline with many pieces. GitHub has multiple workflows per repository, gitlab uses repository directly as the ci/cd container, and buildkite and aws codepipeline pipelines live outside and separately from the repository and the repository can change over time, so the repository alone is also not quite right. I would choose "Pipeline URI", but perhaps that's my bias for our domain language.

There is no standard for CI/CD provider OIDC tokens to my knowledge, and I'm not aware of any drive to standardise at the moment. The domain models vary significantly, too. I suspect normalizing the claims into useful attributes for verifying will need to remain in this fulcio for now. But perhaps there are some common attributes which will emerge and influence the claims generated in future, like GitHub's tokens.

If I had to pick a set of common attributes which would be useful in sigstore right now, it'd be roughly:

  • Git Repository URI
    • Git Ref
    • Git Commit
  • Pipeline URI (container of all work across all time, like job_workflow_ref)
    • Job ID/URI (concrete description of individual piece of work)
    • Runner ID/URI (concrete environment in which work was run)

That's a whole bunch of thoughts and opinions, I'm not sure how much of it is useful, but hopefully a bit 🙏

@laurentsimon
Copy link

laurentsimon commented Nov 22, 2022

Hiya! I'm Sam from Buildkite. We're introducing OIDC tokens, and I'm keen to see if we can enable usage of cosign for signing and verifying provenance of containers produced by CI/CD builds.

We include these already:

  • aud (can be set to sigstore)
  • sub (various attributes composed together identifying the pipeline and some build inputs)

Do you have an example of what this looks like? I'm curious why you need the build inputs to be part of the token. If your builder can be identified using sub, could the builder create the attestation and store the inputs in it? (instead of packing everything in the certificate?) For interoperability, we're trying to use intoto as the provenance format.

buildkite and aws codepipeline pipelines live outside and separately from the repository and the repository can change over time

You mean the source of the builder, not the source being built, correct? Do you have a link / example?

If I had to pick a set of common attributes which would be useful in sigstore right now, it'd be roughly:

  • Git Repository URI
    • Git Ref
    • Git Commit

I would add Git Ref Type, which indicates if the ref is a branch, tag, etc.

It may be useful to pack these fields into its own struct / field / x509 cert, and version it to allow for flexibility, like:

identity {
 version: 1
 <other-fields>
}

Fyi, I took a brief look at the SPIFFE ID (https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/), and they don't seem to have much more than whats proposed here (it's just spiffe:<trust-domain>/<path>).

@sj26
Copy link
Contributor

sj26 commented Nov 22, 2022

Do you have an example of what this looks like?

aud defaults to https://buildkite.com/<org> but is customisable.

sub currently looks like:

organization:acme-in:pipeline:super-duper-app:ref:refs/heads/super-duper-feature:commit:abc123...:step:build

It's quite symbolic. That's because some consumers of OIDC tokens, i.e. AWS, only allow writing policies based on partial string matches against subjects. It also does not uniquely identify a piece of work. It's not ideal, but it's what is available.

I'm curious why you need the build inputs to be part of the token. If your builder can be identified using sub, could the builder create the attestation and store the inputs in it? (instead of packing everything in the certificate?) For interoperability, we're trying to use intoto as the provenance format.

The builder could create and store attestations, but most consumers of tokens want to make decisions without round trips back to the builder. And then how does one authenticate back the builder to ask for attestations? If you have a living identity token then maybe it makes sense to use that, but in a signature that token is gone.

Again, in the conversations I've been having, most folks want enough information baked into the tokens and/or signatures to make policy decisions without additional round trips or more external systems involved.

buildkite and aws codepipeline pipelines live outside and separately from the repository and the repository can change over time
You mean the source of the builder, not the source being built, correct? Do you have a link / example?

Hm, yes I think so. Presuming a "builder" means the same thing as a "pipeline" to both Buildkite and CodePipeline, a builder is generally configured with inputs for new builds, and one of those inputs is the source repository. But the source repository can be changed between builds. So the source repository for two builds run by the same builder may not be the same.

I would add Git Ref Type, which indicates if the ref is a branch, tag, etc.

If Git Ref is fully qualified this is already included, no? i.e. refs/heads/some-branch versus refs/tags/v1.2.3

Fyi, I took a brief look at the SPIFFE ID (https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/), and they don't seem to have much more than whats proposed here (it's just spiffe:/).

Yeah, interesting. So that's almost pure identity without attributes, unless you understand the URI format for a particular trust domain. More complex policy decisions would need to start from the identity and consult other systems for more context. If you have control over the shape of the SPIFFE ID then it might be easier, but when using a hosted service where the format is dictated then its idea of the trust domain might vary from your own. For example, some workloads might care about branch, but some might not.

OIDC seem powerful in contrast because complex policy decisions can be made based on the identity token and the contained attributes including provenance information without requiring additional interactions. And these can be varied by consumers without much support from providers (including providers of hosted services).

So I guess it depends on what degree of provenance information sigstore would like to include for policy decisions without additional system dependencies or provider control.

@steiza
Copy link
Member

steiza commented Dec 13, 2022

So there's sort of two questions rolled into one here:

  1. What properties must cloud CI/CD providers send to Fulcio?
  2. Should / can OIDC token field names be standardized across cloud CI/CD providers? Or will Fulcio need to do some custom mapping work per cloud CI/CD provider?

It's critical that Fulcio work across many cloud CI/CD providers. But I'm not sure if we'll be able to get all providers to use the same field names. So to help answer question 1, I'm going to reference the current GitHub OIDC token field names for illustrative purposes, even though the goal of this issue (as I understand it!) is to ensure each cloud CI/CD provider is sending Fulcio the information in some field, which may (or may not) have a different name.

At any rate, here's an attempt to summarize where we're at so far.

First there's some standard OIDC fields:

Field Description GitHub OIDC Field Name Why / Notes / Questions
A user-customizable field set to “sigstore” aud To ensure the OIDC tokens are being used for their intended service
Who issued the token iss So we know which platform the token is from
Timestamp of when the token expires exp So a token cannot be used well after it was provisioned
Timestamp of when the token starts being valid nbf To set a start time before which the token can be used
Timestamp of when the token was issued iat To audit when the token was issued

Then we get into the build attestations:

Field Description GitHub OIDC Field Name GitHub OIDC Example Value Why / Notes / Questions
Enough information to construct a URL to point to the source code repository org/repo Should we ask providers to send a URL, instead of asking consumers to construct a URL from this field?
An immutable reference (i.e. a commit SHA) to a specific version of the source code sha 01234... sub field might contain mutable references, and we want to know exactly what version we are using
The source code branch / tag name, for enforcing policies like "all releases must come from a certain branch" ref main This is useful for writing policies, but could potentially be confusing if it points to a mutable release branch / tag
Enough information to construct a URL to point at the top-level / initiating build instructions workflow_file_ref org/repo/.../[email protected] Similar to repository, should we ask providers to send a URL, instead of asking consumers to construct a URL from this field? Note that GitHub currently includes branch information in this field, which Fulcio might choose to ignore. Note that this is more precise and would replace the field workflow.
An immutable reference (i.e. a commit SHA) to a specific version of the top-level / initiating build instructions workflow_file_sha 01234...
Enough information to construct a URL to point at low-level / specific build instructions that could be maintained by a neutral party like SLSA job_workflow_ref slsa-framework/repo/.../[email protected] Similar to repository, should we ask providers to send a URL, instead of asking consumers to construct a URL from this field? Note that GitHub currently includes branch information in this field, which Fulcio might choose to ignore.
An immutable reference (i.e. a commit SHA) to a specific version of the low-level / specific build instructions job_workflow_sha 01234...
To specify if a build took place in platform-hosted infrastructure or customer-hosted infrastructure runner_environment TBD To distinguish the security properties of a build system where the customer can influence the environment (or not). We suggest values like "platform-hosted" or "self-hosted" for now, which could be extended in the future.
Was a build triggered by a human or an automatic process? event_name workflow_dispatch This is part of the Fulcio certificate today, but is this something we care about? What are some good platform-neutral values for this field?

feelepxyz added a commit to feelepxyz/fulcio that referenced this issue Jan 4, 2023
Proposal for standardizing Fulcio's Certificate Extensions to align with
the discussion on [standardizing OIDC token
claims](sigstore#754) across CI/CD
systems (today GitHub Actions, in future Circle,
GitLab, Buildkite etc).

The aim here is to find new platform agnostic extensions (as the current
ones are GitHub specific) that make sense across the different CI/CD
providers that we'd like to see supported in Fulcio.

I've preemptively moved the existing oid info doc to a deprecated
version with little thought into how this transition should happen in
practice. This should probably be scoped in a lot more detail.

Signed-off-by: Philip Harrison <[email protected]>
@feelepxyz
Copy link
Member

👋 I opened a draft PR: #945 - attempting to standardize on the Fulcio cert extensions where these claims would end up. Let me know if this would be better suited in a new issue before starting on a PR but seemed easier to collaborate on an actual file.

@nsmith5
Copy link
Contributor

nsmith5 commented Jan 4, 2023

I would love to see standardized claims in CI provider tokens, but is it reasonable for us to expect providers to actually try to become conformant with a standard created here? As an example, many CI providers failed to correctly implement the aud claim in their tokens. Even with a clear security incentive to make that claim configurable, its taken a long time for many to fix the problem.

What would incentivize these various platforms to be compliant? What have they gained for their users if they do?

@haydentherapper
Copy link
Contributor Author

The incentive is ease of integration, and a template for a minimum set of claims to represent an identity. We've had many discussions across issues in this repo about what represents an identity vs what represents provenance. Standardizing on a set of claims makes it clear what we consider to be an identity. Additionally, if a CI provider wants to integrate with Fulcio and has implemented the set of claims, it'll be easy not just for the Fulcio integration in terms of the code that needs to be added, but also for all of the clients that need to verify sigstore-issued certificates. If every CI has its own set of claims/OIDs, it'll be difficult to write verification policies across sigstore clients.

@feelepxyz
Copy link
Member

I would love to see standardized claims in CI provider tokens

My thinking with #945 was to standardise on the Fulcio cert extensions that cover the identity. This would effectively standardise on a subset of required OIDC claims, but at the same time not require CI/CD providers to conform to the same claim attribute names. CI/CD specific mapping would still need to exist in Fulcio.

@nsmith5
Copy link
Contributor

nsmith5 commented Jan 10, 2023

@feelepxyz +1 to standardizing the certificate extensions over the actual token claims. I feel like its quite a bit easier for providers to marshal / parse existing token claims into the right cert extensions with a small amount of logic in Fulcio itself instead of requiring changes to their token format

@sj26
Copy link
Contributor

sj26 commented Jan 23, 2023

I don't think it's likely CI provides (us included) will change OIDC attributes to suit Sigstore, sorry. Those tokens have too many requirements on them already. But I reckon we'll be happy to provide the grunt to glue them together within sigstore/fulcio.

I'm pretty excited that #890 is close to merge. Beyond identifying which pipeline a binary comes from, we have customers asking for the ability to verify which git branch and commit a signed binary comes from, and which build and job (the specific run of a workflow) created a binary too. For example, being able to verify that a binary was produced by an earlier job in the same build, or using the job identity to seek domain-specific attestations via an api. Very keen to see some generalised attributes added. I'm happy to write the plumbing for Buildkite once a direction has been decided.

#945 looks pretty promising!

@haydentherapper
Copy link
Contributor Author

Reopening during implementation.

Im starting implementation on this now.

@haydentherapper haydentherapper self-assigned this Feb 23, 2023
@haydentherapper
Copy link
Contributor Author

haydentherapper commented Mar 15, 2023

We've got a certificate!

-----BEGIN CERTIFICATE-----
MIIGVzCCBf2gAwIBAgIUBC0AN21K0mDArYsvFMITxLAqhIMwCgYIKoZIzj0EAwIw
aDEMMAoGA1UEBhMDVVNBMQswCQYDVQQIEwJXQTERMA8GA1UEBxMIS2lya2xhbmQx
FTATBgNVBAkTDDc2NyA2dGggU3QgUzEOMAwGA1UEERMFOTgwMzMxETAPBgNVBAoT
CHNpZ3N0b3JlMB4XDTIzMDMxNTIyNDM0NFoXDTIzMDMxNTIyNTM0NFowADBZMBMG
ByqGSM49AgEGCCqGSM49AwEHA0IABN2JaEWm3pvFf5SNN6T/c9AV6GPEQYt+C+qK
67CnRSIJYpMJ6UoFMoaCOIhWlXjBTYqDtt4r85PnC4nJtLx0x+SjggTrMIIE5zAO
BgNVHQ8BAf8EBAMCB4AwEwYDVR0lBAwwCgYIKwYBBQUHAwMwHQYDVR0OBBYEFK17
43RedvZXbIiYWZb8W9oTwwPmMB8GA1UdIwQYMBaAFPlHHwJ/gqhBtZ0dAlWvBMDN
Tv3sMGwGA1UdEQEB/wRiMGCGXmh0dHBzOi8vZ2l0aHViLmNvbS9oYXlkZW50aGVy
YXBwZXIvdGVzdC1yZXBvc2l0b3J5Ly5naXRodWIvd29ya2Zsb3dzL3Rlc3QueWFt
bEByZWZzL2hlYWRzL21haW4wOQYKKwYBBAGDvzABAQQraHR0cHM6Ly90b2tlbi5h
Y3Rpb25zLmdpdGh1YnVzZXJjb250ZW50LmNvbTA7BgorBgEEAYO/MAEVBC0MK2h0
dHBzOi8vdG9rZW4uYWN0aW9ucy5naXRodWJ1c2VyY29udGVudC5jb20wHwYKKwYB
BAGDvzABAgQRd29ya2Zsb3dfZGlzcGF0Y2gwNgYKKwYBBAGDvzABAwQoNjE4ZjA3
NDUxMzM4NTExYTc5YTQ0NjEyYWU2YmM4NzYyMmUyZjZlYzASBgorBgEEAYO/MAEE
BARUZXN0MC0GCisGAQQBg78wAQUEH2hheWRlbnRoZXJhcHBlci90ZXN0LXJlcG9z
aXRvcnkwHQYKKwYBBAGDvzABBgQPcmVmcy9oZWFkcy9tYWluMEsGCisGAQQBg78w
AQgEPQw7aHR0cHM6Ly9naXRodWIuY29tLzYxOGYwNzQ1MTMzODUxMWE3OWE0NDYx
MmFlNmJjODc2MjJlMmY2ZWMwOAYKKwYBBAGDvzABCQQqDCg2MThmMDc0NTEzMzg1
MTFhNzlhNDQ2MTJhZTZiYzg3NjIyZTJmNmVjMB0GCisGAQQBg78wAQoEDwwNZ2l0
aHViLWhvc3RlZDBCBgorBgEEAYO/MAELBDQMMmh0dHBzOi8vZ2l0aHViLmNvbS9o
YXlkZW50aGVyYXBwZXIvdGVzdC1yZXBvc2l0b3J5MDgGCisGAQQBg78wAQwEKgwo
NjE4ZjA3NDUxMzM4NTExYTc5YTQ0NjEyYWU2YmM4NzYyMmUyZjZlYzAfBgorBgEE
AYO/MAENBBEMD3JlZnMvaGVhZHMvbWFpbjAZBgorBgEEAYO/MAEOBAsMCTYwNjIx
MDIxNzAyBgorBgEEAYO/MAEPBCQMImh0dHBzOi8vZ2l0aHViLmNvbS9oYXlkZW50
aGVyYXBwZXIwFwYKKwYBBAGDvzABEAQJDAc4NDE4NzYwMG4GCisGAQQBg78wAREE
YAxeaHR0cHM6Ly9naXRodWIuY29tL2hheWRlbnRoZXJhcHBlci90ZXN0LXJlcG9z
aXRvcnkvLmdpdGh1Yi93b3JrZmxvd3MvdGVzdC55YW1sQHJlZnMvaGVhZHMvbWFp
bjA4BgorBgEEAYO/MAESBCoMKDYxOGYwNzQ1MTMzODUxMWE3OWE0NDYxMmFlNmJj
ODc2MjJlMmY2ZWMwIQYKKwYBBAGDvzABEwQTDBF3b3JrZmxvd19kaXNwYXRjaDBl
BgorBgEEAYO/MAEUBFcMVWh0dHBzOi8vZ2l0aHViLmNvbS9oYXlkZW50aGVyYXBw
ZXIvdGVzdC1yZXBvc2l0b3J5L2FjdGlvbnMvcnVucy80NDMxNTU4NzExL2F0dGVt
cHRzLzIwCgYIKoZIzj0EAwIDSAAwRQIgERwyY9BWWEZMDy28nfvxf8QSYB0taVcD
Yk+81NhN7dICIQC7YFA90OXnmSorP+/ibHNlJX4/9Wo3euYbJC7QMtKr8A==
-----END CERTIFICATE-----

Which expands to:

$ openssl x509 -in cert.txt -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            04:2d:00:37:6d:4a:d2:60:c0:ad:8b:2f:14:c2:13:c4:b0:2a:84:83
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: C = USA, ST = WA, L = Kirkland, street = 767 6th St S, postalCode = 98033, O = sigstore
        Validity
            Not Before: Mar 15 22:43:44 2023 GMT
            Not After : Mar 15 22:53:44 2023 GMT
        Subject:
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:dd:89:68:45:a6:de:9b:c5:7f:94:8d:37:a4:ff:
                    73:d0:15:e8:63:c4:41:8b:7e:0b:ea:8a:eb:b0:a7:
                    45:22:09:62:93:09:e9:4a:05:32:86:82:38:88:56:
                    95:78:c1:4d:8a:83:b6:de:2b:f3:93:e7:0b:89:c9:
                    b4:bc:74:c7:e4
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                Code Signing
            X509v3 Subject Key Identifier:
                AD:7B:E3:74:5E:76:F6:57:6C:88:98:59:96:FC:5B:DA:13:C3:03:E6
            X509v3 Authority Key Identifier:
                F9:47:1F:02:7F:82:A8:41:B5:9D:1D:02:55:AF:04:C0:CD:4E:FD:EC
            X509v3 Subject Alternative Name: critical
                URI:https://github.com/haydentherapper/test-repository/.github/workflows/test.yaml@refs/heads/main
            1.3.6.1.4.1.57264.1.1:
                https://token.actions.githubusercontent.com
            1.3.6.1.4.1.57264.1.21:
                .+https://token.actions.githubusercontent.com
            1.3.6.1.4.1.57264.1.2:
                workflow_dispatch
            1.3.6.1.4.1.57264.1.3:
                618f07451338511a79a44612ae6bc87622e2f6ec
            1.3.6.1.4.1.57264.1.4:
                Test
            1.3.6.1.4.1.57264.1.5:
                haydentherapper/test-repository
            1.3.6.1.4.1.57264.1.6:
                refs/heads/main
            1.3.6.1.4.1.57264.1.8:
                .;https://github.com/618f07451338511a79a44612ae6bc87622e2f6ec
            1.3.6.1.4.1.57264.1.9:
                .(618f07451338511a79a44612ae6bc87622e2f6ec
            1.3.6.1.4.1.57264.1.10:
github-hosted   .
            1.3.6.1.4.1.57264.1.11:
                .2https://github.com/haydentherapper/test-repository
            1.3.6.1.4.1.57264.1.12:
                .(618f07451338511a79a44612ae6bc87622e2f6ec
            1.3.6.1.4.1.57264.1.13:
                ..refs/heads/main
            1.3.6.1.4.1.57264.1.14:
                ..606210217
            1.3.6.1.4.1.57264.1.15:
                ."https://github.com/haydentherapper
            1.3.6.1.4.1.57264.1.16:
                ..8418760
            1.3.6.1.4.1.57264.1.17:
                .^https://github.com/haydentherapper/test-repository/.github/workflows/test.yaml@refs/heads/main
            1.3.6.1.4.1.57264.1.18:
                .(618f07451338511a79a44612ae6bc87622e2f6ec
            1.3.6.1.4.1.57264.1.19:
                ..workflow_dispatch
            1.3.6.1.4.1.57264.1.20:
                .Uhttps://github.com/haydentherapper/test-repository/actions/runs/4431558711/attempts/2
    Signature Algorithm: ecdsa-with-SHA256
    Signature Value:
        30:45:02:20:11:1c:32:63:d0:56:58:46:4c:0f:2d:bc:9d:fb:
        f1:7f:c4:12:60:1d:2d:69:57:03:62:4f:bc:d4:d8:4d:ed:d2:
        02:21:00:bb:60:50:3d:d0:e5:e7:99:2a:2b:3f:ef:e2:6c:73:
        65:25:7e:3f:f5:6a:37:7a:e6:1b:24:2e:d0:32:d2:ab:f0

Please double check the values match up to what's expected. Something to note is that the value for each new extension is now in line with what RFC5280 requires, a DER encoded string rather than the raw value[1]. This should hopefully mean that off-the-shelf certificate parsing libraries will have an easier time handling custom extensions.

Just cleaning up the code now and then I'll push up a PR with the changes.

[1] This was never brought up by the Golang clients because it's so easy to get the value of a custom certificate extension. The DER encoding adds two bytes, a tag for type (0x0C, meaning a UTF8String) and the length of the value. This change means clients will have to unmarshal the extension now. For Go, this looks like:

var issuerVal string
rest, err := asn1.Unmarshal(issuerExt.Value, &issuerVal)

Very easy still! Now we get the added benefit of being able to specify non-string extension values too.

@feelepxyz
Copy link
Member

@haydentherapper awesome! Thanks for taking this on 😍

1.3.6.1.4.1.57264.1.8

Looks like the job_workflow_sha maybe ended up here instead of job_workflow_ref as this should be the Build Signer URI?

1.3.6.1.4.1.57264.1.10

Maybe just some rendering weirdness but what's up with the value showing up to the left of the period? Also, presuming the prefixes showing up in the above example are part of the encoding somehow? e.g. .^..h, ."h etc.

Everything else looks good to me!

@feelepxyz
Copy link
Member

1.3.6.1.4.1.57264.1.21:
.+https://token.actions.githubusercontent.com

Is this encoding the the issuer as DER encoded string? Nit, but should the re-encoded issuer come before Build Signer URI at 1.3.6.1.4.1.57264.1.8 instead of at the end, bumping all the other new ones down one?

@bdehamer
Copy link

bdehamer commented Mar 20, 2023

Maybe just some rendering weirdness but what's up with the value showing up to the left of the period?

This is just openssl not expecting the extension value to contain an encoded string. The stray characters are the DER-encoded tag and length for the UTF8String.

Old encoding vs new encoding:
image

@feelepxyz
Copy link
Member

The stray characters are the DER-encoded tag and length for the UTF8String.

Nice one 👍

@haydentherapper
Copy link
Contributor Author

Looks like the job_workflow_sha maybe ended up here instead of job_workflow_ref as this should be the Build Signer URI?

Good catch, fixed!

Is this encoding the the issuer as DER encoded string?

+1 to what Brian said. For example, for .;https://github.com/618f07451338511a79a44612ae6bc87622e2f6ec, ; is 0x3B = 59, the length of https://github.com/618f07451338511a79a44612ae6bc87622e2f6ec. The first . is just because openssl can't render 0x0C into ASCII.

@haydentherapper
Copy link
Contributor Author

1.3.6.1.4.1.57264.1.21

Yea, I can make that change to move this to .8 and bump all OIDs by 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request npm-ga
Projects
None yet