-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added entrypoint log grabber to taskrun controller #167
Added entrypoint log grabber to taskrun controller #167
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aaron-prindle The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some more notes on what happens to the claimed space once the task is done?
func GetCopyStep() corev1.Container { | ||
return corev1.Container{ | ||
Name: "place-tools", | ||
Image: "gcr.io/k8s-prow/entrypoint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Lets pin this image to sha.
}, | ||
) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to create PVC %q: %s", tr.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we explain it more like "failed to claim Persistent Volume due to error "
const ( | ||
// MountName is the name of the pvc being mounted (which | ||
// will contain the entrypoint binary and eventually the logs) | ||
MountName = "tools" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaron-prindle and I talked about some follow-up work to randomize the name of the mount and the volume so that they won't conflict with any of the user's config - I'll create a follow up ticket!
} | ||
|
||
// TODO: add more test cases after all, e.g. with existing env | ||
// var and volume mounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you can remove this comment for now i think @aaron-prindle , we can cover this in the follow up task ill make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping you can remove the TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol okay we can remove these TODOs later, i dont want to block this any longer
// TODO: This will not work when a step uses an image that has its | ||
// own entrypoint, i.e. `Command` is a required field. In later iterations | ||
// we can update the controller to inspect the image's `Entrypoint` | ||
// and use that if required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll need a follow up ticket for this too - could you write that one up @aaron-prindle ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: #175
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz remove todo (or update with issue #)
return nil, fmt.Errorf("Task has nil BuildSpec") | ||
// TODO: additional validation in Task webhook: | ||
// No mounts with same names / path | ||
// No env var with the name we'll be adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you can remove these comments too, i think we can handle this with randomization instead
} | ||
// 3. check that volume was created | ||
pvc := kubeclient.Actions()[0].(ktesting.CreateAction).GetObject().(*corev1.PersistentVolumeClaim) | ||
// TODO(aaron-prindle) make these tests verify exact pvc spec, not just name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably add this before we merge @aaron-prindle
ClaimName: volumeName, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is bSpec.Volumes
guaranteed to be empty at this point or should this new volume be appended instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point @tanner-bruce , we should definitely fix this before merging @aaron-prindle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, done!
// panic: cannot handle unexported field: {v1.ResourceRequirements}.Requests["storage"].i | ||
// consider using AllowUnexported or cmpopts.IgnoreUnexported [recovered] | ||
|
||
// if d := cmp.Diff(pvc.Spec.Resources, expectedVolume.Spec.Resources); d != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, error) { | ||
// GetExpectedVolume returns the volume template used by createVolume | ||
// this is exported for testing | ||
func GetExpectedVolume(tr *v1alpha1.TaskRun) *corev1.PersistentVolumeClaim { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use this in the tests - if we are using exactly the same code to test the code we are using, there is no reason to test it.
In the test we should only assert on the fields we care about. I'd rather have less coverage in the test than do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we plz rename this and not use it in the tests? assert only on fields we care about? i.e. access mode + resource size? or not assert on those at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -209,6 +250,29 @@ func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, er | |||
return nil, fmt.Errorf("Task %s has nil BuildSpec", t.Name) | |||
} | |||
|
|||
bSpec := t.Spec.BuildSpec | |||
bSpec.Volumes = append(bSpec.Volumes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlorenc tried this out and if there is an error creating the build, we end up appending this multiple times and trying to create the same PVC multiple times - and wrapping the entry point multiple times too!! 🤣
We need to:
- not create the PVC if it already exists, maybe we can try to reuse it
- not append this volume multiple times, - copy the buildspec before modifying it - we shouldn't mutate the task at all!!
We should add a test case somewhere to cover this, we can probably use the reconciler tests and avoid making this a full fledged system test
@@ -209,6 +250,29 @@ func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, er | |||
return nil, fmt.Errorf("Task %s has nil BuildSpec", t.Name) | |||
} | |||
|
|||
bSpec := t.Spec.BuildSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think this needs to be DeepCopy()
out.txt
Outdated
ok github.com/knative/build-pipeline/pkg/reconciler/v1alpha1/taskrun/resources (cached) | ||
? github.com/knative/build-pipeline/pkg/system [no test files] | ||
ok github.com/knative/build-pipeline/test (cached) | ||
? github.com/knative/build-pipeline/test/gohelloworld [no test files] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- i think this was committed by accident
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -209,6 +265,28 @@ func (c *Reconciler) createBuild(tr *v1alpha1.TaskRun) (*buildv1alpha1.Build, er | |||
return nil, fmt.Errorf("Task %s has nil BuildSpec", t.Name) | |||
} | |||
|
|||
bSpec := t.Spec.BuildSpec.DeepCopy() | |||
bSpec.Volumes = appendIfMissing(bSpec.Volumes, corev1.Volume{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- since we're copying the buildspec before modifying it, we should be able to append the volume every time? it should never be present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed appendIfMissing( fxn and logic -> append( now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm except for 2 nits.
@@ -154,8 +151,20 @@ func (c *Reconciler) reconcile(ctx context.Context, tr *v1alpha1.TaskRun) error | |||
// get build the same as the taskrun, this is the value we use for 1:1 mapping and retrieval | |||
build, err := c.BuildClientSet.BuildV1alpha1().Builds(tr.Namespace).Get(tr.Name, metav1.GetOptions{}) | |||
if errors.IsNotFound(err) { | |||
pvc, err := c.KubeClientSet.CoreV1().PersistentVolumeClaims(tr.Namespace).Get(tr.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move L154 to createPVC
and name is getOrCreatePVC
.
This block is getting longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this another PR
|
||
// Override the entrypoint so that we can use our custom | ||
// entrypoint which copies logs | ||
err = resources.AddEntrypoint(bSpec.Steps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the Concepts.doc to mention we no longer depend on a BuilderImage
?
You can also add a note for it and then we can decide what this image should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand, I can update the doc to explain the entrypoint log change but we still depend on a BuilderImage if my understanding is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've update this doc with entrypoint info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing @tejal29 is talking about the fact that the builder image contract actually says to do the opposite of what we want - it says we want ppl to define an entrypoint when in fact we definitely do not want that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created some docs but I'm having a hard time adding them to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm failing at this, I'll just make a new PR with the doc changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem being solved? This PR addresses issue #143. This issue is that it when Build's complete, logs from the steps the Build ran are garbage collected by kubernetes and no longer available to the user. Why is this the best approach? 1) No special kubernetes configuration (eg. changing garbage collector values) What other approaches did you consider? 1) Changing kubernetes garbage collection for these containers so that they are not immediately deleted and log capture was possible What side effects will this approach have? 1) With this approach, users will have to specify the "Command" value for the containers they which to run as the Entrypoint is not retrievable. This means that containers/flows setup to use only the Entrypoint will no longer be supported. What future work remains to be done? 1) It is possible to have the PVCs changed to EmptyDir volumes once a log uploader is created. This will help with the issue that currently PVCs are being created and not cleaned up. Co-authored-by: Christie Wilson <[email protected]>
entrypointJSONConfigEnvVar = "ENTRYPOINT_OPTIONS" | ||
EntrypointImage = "gcr.io/k8s-prow/entrypoint@sha256:7c7cd8906ce4982ffee326218e9fc75da2d4896d53cabc9833b9cc8d2d6b2b8f" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should not be using an image from Prow for our running pipeline system, maybe have out own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, esp. if we feel we can't rely on this image.
but in the meantime, i think it's easier if they manage it's lifecycle? one less thing we have to worry about! but once we start having a release process i guess we should probably release this image too 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want our own anyway, to add features like being able to specify the entrypoint. BUilding the image ourself also makes it easy to build into the config YAMLs with ko
, and makes it easy to make sure it's cached on every node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now its fine, but once we have a release like you said, we should be using our own to make sure it does't change on us
The Builder image contract is actually the opposite of what we now support - it specifies that users should define an `entrypoint` and rely on that, however the new behaviour in #167 will completely ignore this `entrypoint`. These docs are a copy of the Builder image docs at https://github.com/knative/docs/blob/master/build/builder-contract.md but modified to describe the contract required by Task.
/lgtm |
The Builder image contract is actually the opposite of what we now support - it specifies that users should define an `entrypoint` and rely on that, however the new behaviour in tektoncd#167 will completely ignore this `entrypoint`. These docs are a copy of the Builder image docs at https://github.com/knative/docs/blob/master/build/builder-contract.md but modified to describe the contract required by Task.
The Builder image contract is actually the opposite of what we now support - it specifies that users should define an `entrypoint` and rely on that, however the new behaviour in tektoncd#167 will completely ignore this `entrypoint`. These docs are a copy of the Builder image docs at https://github.com/knative/docs/blob/master/build/builder-contract.md but modified to describe the contract required by Task. This is for tektoncd#143
The Builder image contract is actually the opposite of what we now support - it specifies that users should define an `entrypoint` and rely on that, however the new behaviour in #167 will completely ignore this `entrypoint`. These docs are a copy of the Builder image docs at https://github.com/knative/docs/blob/master/build/builder-contract.md but modified to describe the contract required by Task. This is for #143
The Builder image contract is actually the opposite of what we now support - it specifies that users should define an `entrypoint` and rely on that, however the new behaviour in #167 will completely ignore this `entrypoint`. These docs are a copy of the Builder image docs at https://github.com/knative/docs/blob/master/build/builder-contract.md but modified to describe the contract required by Task. This is for #143
What is the problem being solved?
This PR addresses issue #143. This issue is that it when Build's complete, logs from the steps the Build ran are garbage collected by kubernetes and no longer available to the user.
Why is this the best approach?
What other approaches did you consider?
What side effects will this approach have?
What future work remains to be done?
Co-authored-by: Christie Wilson [email protected]