Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Type-hinted StepResults for the Provenance #1065

Closed
Tracked by #7424
chitrangpatel opened this issue Mar 5, 2024 · 14 comments
Closed
Tracked by #7424

Parse Type-hinted StepResults for the Provenance #1065

chitrangpatel opened this issue Mar 5, 2024 · 14 comments
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@chitrangpatel
Copy link
Contributor

chitrangpatel commented Mar 5, 2024

Feature request

Parse type-hinted StepResults and surface them appropriately in the provenance

Use case

StepActions were released in Tekton Pipelines v0.54.0. As a part of that, we introduced StepResults. StepResults are not automatically surfaced to TaskResults. Users explicitly need to fetch them like they do for PipelineResults from TaskResults. As a result, if a type-hinted StepResult is not explicitly surfaced by the user then it will go unnoticed by Chains and thus not be surfaced as a ResolvedDependency/Material, Subject or Byproduct.

The solution is simple. The formatter just needs to additionally look into the StepState for StepResults.

Proposal

Add an extra key isBuildArtifact with a boolean value defaulted to false to the Type-hinted Object StepResults. When Chains parses the step results, it can look for that field and insert the output artifact into subject or byProducts. In the new Artifacts Struct defined in https://github.com/tektoncd/community/blob/main/teps/0147-tekton-artifacts-phase1.md, we can add a similar field to outputs.

Pros

  • Chains gets accurate intention about what to do with the output artifact.
  • The StepAction/Task can actually ask the intent to the Pipeline/TaskRun via params.
    • That way, since it is the Pipeline that really understands the complete picture, it can indicate the nature of the artifact that the underlying StepAction/Task was used to fetch (i.e. subject or a byProduct).

Example

apiVersion: tekton.dev/v1alpha1
kind: StepAction
metadata:
  name: gcs-upload
spec:
  image: gcs
  params:
    - name: isBuildArtifact
      description: Should the file/folder you are uploading be considered as a build artifact (subject) or a byProduct?
      default: "false"
  results:
    - name: upload_ARTIFACT_OUTPUT
      type: object
  script: |
    echo {"uri:" ..., "digest": ..., "isBuildArtifact": "$(params.isBuildArtifact)"} | tee $(step.results.upload_ARTIFACT_OUTPUT.path)
---
apiVersion: tekton.dev/v1
kind: TaskRun
spec:
  params:
    - name: isBuildArtifact
      value: "true"
  taskSpec:
    steps:
      - name: step-producing-artifact
        ref:
          name: gcs-upload
        params:
          - name: isBuildArtifact
            value: $(params.isBuildArtifact)

Behavior amongst Task Results, Pipeline Results and StepResults

Based on extensive discussions in the Tekton Chains WG, we've reached the following conclusion:.

To provide a consistent behavior across TaskResults, PipelineResults and StepResults we propose the following.

For non-object type-hinted results, we assume that they are all byProducts.
For object results, we assume that by default *ARTIFACT_OUTPUTS are byProducts. If a matching key isBuildArtifact is found and is set to true then Chains considers it as a subject.

Note that this currently not the behavior for Task/Pipeline Results (they are subjects by default). This behaviour will be made consistent with StepResults so that across the board, users can expect to do the same thing. i.e. declare something as a build Artifact to treat it as a subject.
To not break the users immediately, we, propose gating this behind slsa/v2alpha4 so that users currently using slsa/v2alpha2,3 do not experience this change. However, once we mature to slsa/v2beta1, we will deprecate older alpha versions and this (i.e. *ARTIFACT_OUTPUTS are byProducts by default and that users have to specifically declare it as a subject) will be the expected behavior.

@chitrangpatel chitrangpatel added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 5, 2024
@chitrangpatel
Copy link
Contributor Author

/good-first-issue
/help-wanted

@tekton-robot
Copy link

@chitrangpatel:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue
/help-wanted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Mar 5, 2024
@chitrangpatel
Copy link
Contributor Author

chitrangpatel commented Mar 6, 2024

cc @lcarva @wlynch @aaron-prindle

Looking for feedback.

@chitrangpatel
Copy link
Contributor Author

/assign

@wlynch
Copy link
Member

wlynch commented Mar 14, 2024

No objections!

My guess is things that are produced by steps but not surfaced to the Task should be byproducts though? So a little bit different than the Task -> Pipeline extraction.

@chitrangpatel
Copy link
Contributor Author

chitrangpatel commented Mar 14, 2024

I think we can rely on type-hints to drive that decision. Currently, Type-hinted results get populated in ResolvedDependencies/Subjects. Other results are inserted as byProducts.

I can imagine a case where a git-clone StepAction produces type-hinted StepResults that is not necessarily surfaced to the Task because it may not be needed to be sent to downstream Tasks (or in case of a TaskRun, not needed to be surfaced at all). I think Chains should still insert it into ResolvedDependencies not byProducts. Same argument for Images produced etc.

@chitrangpatel
Copy link
Contributor Author

chitrangpatel commented Mar 15, 2024

Summarizing the discussion from Slack and adding some more thoughts:

Context:
When parsing Artifacts/Type-hinted StepResults from StepActions, should we pollute the subjects with thwm or byProducts?
- Subjects are the critical build Products.
- ByProducts are things like cache, other artifacts (e.g. coverage, logs etc.), other results that were produced as part of the build but are not the main build products.

Proposal:

  • StepAction’s type-hinted StepResults should be considered as byProducts by default.
    • This is mainly driven by the fact that a StepAction is the smallest unit and has no knowledge about the complete build pipeline.
      • e.g. a gcs-upload StepAction might push an artifact and type-hint the results.
    • But whether the pushed artifact is a Subject (critical build-artifact) or a byProduct is better known by the user of the StepAction i.e. a Task. Although it is best known by the highest level of orchestration (i.e. Pipeline in case of a PipelineRun)
    • Therefore, any output artifact that is surfaced via Task-level Type-hinted Result should be considered a subject but not StepResult.

Pros:

  • Chains will not pollute the Subjects if we treat things as byProducts.
    • Whats the concern here? It seems like the bigger concern is users forgetting to take that extra step and having artifacts without provenance (?)

Cons:

  • Prior to StepActions, the same logic applies to remote Tasks, yet we consider any type-hinted results that indicate an output artifact as a subject. So technically, there it’s the pipeline author that truly knows which artifact is a Subject vs a byProduct.
  • What about users that don't use StepActions and still treat Tasks as the smallest unit? Why are the Task Results in that case treated differently? Shouldn't by a similar logic, we only consider type-hinted output Artifact related PipelineResults as subjects?
  • In contrast, if users forget to explicitly surface results up the ladder then they do not have a provenance.
  • In the current workflow, the users of remote Tasks, StepActions do not need to worry about having to do anything special to get the provenance. The providers of these Tasks and StepActions already perform type-hinting and follow best practices to get thins into Chains.
  • Today, if people are simply converting their Pipeline runs into Taskruns with StepActions, this will cause a breaking behavior.
  • We would also be treating Tasks using StepActions different from Tasks with inlined Steps?

Other thoughts:

  • If users are type-hinting their Tasks/StepActions then it should be considered important and so a subject.
    • Currently, if things are not type-hinted, we already list them under byProducts so I think that we should consider anything type-hinted as important.
    • For certain StepActions (e.g. upload-cache etc.) authors don't need to type-hint the results at that stage. They can leave it to the users to decide whether to type-hint it at the Task level.
    • Just because its produced from a StepAction should probably not determine that its not important.
      • Worst case, even if we have a few additional subjects that aren't too important, there is no harm done. Not catching them seems worse. I wouldn't consider it as polluting the Subjects. I would probably treat it that way if we did that with all our results.
  • In SLSA v0.1, we did not have byProducts and so our behavior was to treat all type-hinted results as subjects. This would be the case today as well if we didn't have StepActions. What if SLSA gets rid of byProducts again in the future in a 2.0? Would we go back to treating it as a Subject there?

The solution where a difference in the behavior of Chains based on the BuildType might be ok here but I wanted to try and convince everyone that treating type-hinted StepResults as subjects is not a bad thing.

Thoughts @wlynch (hopefully, I didn't miss anything major)?

@chitrangpatel
Copy link
Contributor Author

chitrangpatel commented Mar 16, 2024

Synced offline with @wlynch

Proposal

Add an extra key isBuildArtifact with a boolean value defaulted to false to the Type-hinted Object StepResults. When Chains parses the step results, it can look for that field and insert the output artifact into subject or byProducts. In the new Artifacts Struct defined in https://github.com/tektoncd/community/blob/main/teps/0147-tekton-artifacts-phase1.md, we can add a similar field to outputs.

Pros

  • Chains gets accurate intention about what to do with the output artifact.
  • The StepAction/Task can actually ask the intent to the Pipeline/TaskRun via params.
    • That way, since it is the Pipeline that really understands the complete picture, it can indicate the nature of the artifact that the underlying StepAction/Task was used to fetch (i.e. subject or a byProduct).

Example

apiVersion: tekton.dev/v1alpha1
kind: StepAction
metadata:
  name: gcs-upload
spec:
  image: gcs
  params:
    - name: isBuildArtifact
      description: Should the file/folder you are uploading be considered as a build artifact (subject) or a byProduct?
      default: "false"
  results:
    - name: upload_ARTIFACT_OUTPUT
      type: object
  script: |
    echo {"uri:" ..., "digest": ..., "isBuildArtifact": "$(params.isBuildArtifact)"} | tee $(step.results.upload_ARTIFACT_OUTPUT.path)
---
apiVersion: tekton.dev/v1
kind: TaskRun
spec:
  params:
    - name: isBuildArtifact
      value: "true"
  taskSpec:
    steps:
      - name: step-producing-artifact
        ref:
          name: gcs-upload
        params:
          - name: isBuildArtifact
            value: $(params.isBuildArtifact)

@chitrangpatel
Copy link
Contributor Author

/assign @renzor

@tekton-robot
Copy link

@chitrangpatel: GitHub didn't allow me to assign the following users: renzor.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @renzor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@renzodavid9
Copy link
Contributor

/assign me

@tekton-robot
Copy link

@renzodavid9: GitHub didn't allow me to assign the following users: me.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign me

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@renzodavid9
Copy link
Contributor

/assign @renzodavid9

@chitrangpatel
Copy link
Contributor Author

Thanks @renzodavid9 for enabling this 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants