-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing the way Result Parameters are stored. #4012
Comments
Hi @bobcatfish this is what I was trying to describe on the WG call. |
I got a bit lost reading the issue summary - at first it sounds like the problem is the size of the termination message but then you mention using annotations as a storage mechanism. I don't understand how a running container could write a result to an annotation. It's also worth bearing in mind that annotations also have a size limit which is shared across all the annotations on a pod. I think the upshot of this would be that the size limit could vary from task to task since some tools might selectively add annotations to certain pods but not others. The remaining "space" for the result annotations could therefore change task to task. With regard to "patch into fields as part of the TaskRun CRD"; it might be helpful to describe this solution a little more in depth - I likewise didn't totally understand what was being proposed here. Broadly I do understand that limitations on result size are both annoying and make some things harder. But I'm not sure Kubernetes has an "escape hatch" for passing arbitrarily sized data between pods without using persistent storage. More details would definitely be helpful in understanding what's being proposed though! |
Hey @sbwsg thanks for the response. Apologies for the confusing issue description, I am only just jumping into participating with the community and Tekton. You are correct, this is all about the storage size limitation and the mechanism for that. So to confirm, entrypoint is running as part of the pod? That's my mistake, I presumed it was running outside. The solution I was thinking was something that runs as part of a wrap-up or termination (prior to all containers ending. 1 would need to be active) and uses the Kubernetes read file mechanism (think I also get annotations are restricted too but a lot less restrictive. But it was a way to achieve a similar "store with the TaskRun object" similar to the termination message but in an alternate more expansive standardized way. I feel annotations are used for storing system relevant data. Otherwise we could make a space on the CRD that can then be 'patched' with the result parameters. As a bit of only semi relevant background, the reason I mentioned this is prior to moving to Tekton, the in house system we had built ran Jobs and orchestrated a lifecycle mechanism very similar to what Tekton does with the downwards API for starting a step, however, we used it to determine if our lifecycle watcher could then gather the results. We didn't patch an annotation, we patched a configmap as the storage. And then finished. |
Yeah you're right - entrypoint is run in every Step Container, from inside the TaskRun Pod. Fascinating idea about injecting a final container which uploads results using something akin to |
I'd be happy to try and help contribute to this and adding this enhancement but I would need a reasonable amount of guidance in getting a local setup and guidance on where to make the change. I roughly know the Kubernetes logic of waiting on the downwards API and declaring a termination container as part of the spec etc. But not in relation to Tektons implementation. Happy to have any calls etc. |
Hey @tlawrie thanks for offering to jump in on this. I think the first step would be to get the problem and its constraints written up as a Tekton Enhancement Proposal. This initial proposal wouldn't need full technical detail but I think it's probably worthwhile to make sure everyone's on the same page wrt this being a problem Tekton should fix. |
Thanks @sbwsg . Ive been on vacation so I will look at doing that. |
sorry it took me so long to reply!! 🙏
One potential downside to this approach is that the container would need to be able to make changes to the cluster (it would need to have the kube config info for the cluster and be able to execute requests to the api server as a user with the permission to modify the TaskRun, or ConfigMap, or whereever we're storing the data. This is kinda the main motivation for using the termination message the way we are - it didn't require additional permissions. This could still be an option if it was limited to a container we inject and control - but I think we want to avoid expanding the default permissions of TaskRun pods if we can.
In this example, is the list of slack channel members big enough that it is greater than the max size of the termination message? Just wondering if in this particular case the problem is the size or if it's the syntax (e.g. having variable replacement for results vs just a path to a workspace) - and im wondering if array results might help.
I'm curious what it looked liked to use those configmaps later on - did you find that to be a better user experience than just volumes? (I wonder if it's possible to mount configmaps for writing, and if so if that would help...)
I'm wondering, even if we made it possible to write arbitrarily large results, are we going to run into trouble providing these to consuming Tasks in a Pipeline? Because ultimately they will need to be written into the TaskRun - and there is a limit on the size of the CRD. But since the options you've listed already involved editing the TaskRun I'm assuming your scenarios are that the data is greater than what a termination message would store, but less than what a CRD can store? Some thoughtsA solution based around k8s types would seem more appealing, b/c it's easier to write logic around k8s types than it is to support arbitrary remote storage (e.g. supporting GCS, S3, any other arbitrary remote storage, and making it so that folks can extend this) (maybe we can leverage some of @sbwsg 's work on remote resolution) Two more ideas which might just be the same idea twice XD:
This would slow things down tho in that in order to read these values, the controller is probably going to have to do something like write a pod - however we could have more flexibility in that this pod could be given more broad permissions such as writing to a CRD or config map or something Basically wondering if there is some way we can leverage workspaces, but improve the user experience problems you've described
tasks:
- name: put-files-in-workspace
workspaces:
- name: put-stuff-here
workspace: my-workspace
- name: do-something
params:
- name: arg
# waving my hands vigourously
value: "$(tasks.put-files-in-workspace.workspaces.put-stuff-here)/foo.json" |
Hi @bobcatfish its now my turn to apologize for taking so long. Ill try and circle back around, also happy to get on a separate Slack / Google Meet call to discuss and work through turning into a TEP. I can confirm its definitely a size limitation, we have a reasonable number of tasks that have large results. Possibly hard for me to explain why, but its for automatic ease of referencing them in following tasks without
Permissions Configmaps
That being said, we could have an alternate (maybe this is akin to your ideas in a way) whereby we change from termination.writeMessage to The idea here is that a special 'configmap' is mounted to all Tasks (can we protect with a particular service account?) and we also abstract and take care of being able to reference it. If a user types $(results..results.entry1) then Tekton would automatically retrieve it from the configmap. Plus surely not too much damage can be done from the ability to write to one specific configmap? The benefit here is that end-users don't have to worry about the mounting and unmounting and doing special handling of objects/files in their Task code. They just read it as they would normally. A normal parameter. The suggested alternate ideas
My problem with Workspaces is that there has to be a storage mechanism in the cluster that can be shared between Tasks. That can be complex, or have performance issues in itself if using an As A Service that orders the storage at spin-up time. Or forces Tasks to all run on the same node. etc. Storage is a big complex adoption hurdle. The TaskRun CRD or Configmap have the benefits of being storage mechanisms that already exist. To circle back to the first point, allowing a way to mess with the TaskRun might be of concern. However a specific configmap might be secure enough and configmaps spin up reasonably fast. |
Hello, We are also facing issues using the results feature in Tekton. When using pipelines with tasks that emit results, we are observing random failures in downward tasks where emitted results sometimes are empty. According to the documentation here https://tekton.dev/docs/pipelines/tasks/#emitting-results, results are emitted using the container termination message feature of Kubernetes. We are under the impression downward tasks are sometimes started before the termination message of the previous task is available which then gives to empty results of the previous task. We have no in-depth knowledge about when the pipeline controller starts downward tasks and how the results are provisioned so our hypothesis about the failures might be flawed. We could circumvent this by using workspaces to persist results, but would lose the $(results.name.path) mechanism in scripts and when conditions and would lead to a lot of extra scripting. We would like to suggest results being persisted to either using a PVC or a ConfigMap. Optimally both options are supported and configurable on Pipeline level. We hope this will solve the stability issues we are facing with the results, so +1 for this feature request. Best regards, |
/cc @jerop @pritidesai |
thanks @pschildkamp for reporting this. @sbwsg has requested changes to avoid termination message for this kind of use (results, started time, exit code, etc) in issue #4150. @sm43 is looking at the short term solution and has PR opened #4186. If these changes does not help resolve the issue, we can definitely look into long term solution. |
Hello @pritidesai, Thank you for your quick response. The sizing of the result object doesn't seem to be an issue for us. But the result is missing entirely sometimes. This gives us the impression that the next task is started before the previous had a chance to write the result via container termination message. We have no idea how the controller deals with task terminations. We think that the termination message is written too late or maybe the termination message is not synced in time to other master etcd nodes (if this is an async process). Best regards, |
Very interesting issue, thanks @pschildkamp, do you have an example Pipeline that you're able to reproduce the problem with consistently / more consistently than others? Or the yaml saved from any PipelineRuns where this issue appears? I'd be very interested to investigate a bit further. In the mean time I will try setting up a test against a cluster with a couple of nodes and many PipelineRuns to see I can reproduce it. |
Hey @sbwsg, I've attached YAML files of the failed pipelines run and the succesfull pipelinerun with exact the same configuration and parameteres. They are too large to put in this post. failed_pipelinerun.yaml.zip As you can see, the failed In the success We use this result in the next task as input for the label of the build image. This task fails. Does this help? Also, I have another hypothesis, but I don't want to pollute this topic. It's about the way we write results to the result file which has nothing to do with the termination message. Should I make a new issue? |
This helps a lot, thank you! I need to do some more work to prove it but I believe the issue here might be related to the Here're some links to support my current thinking all kind of related to weird behaviour using a combination of
I wonder if the results would get written more consistently using something slightly more verbose like: /tools/run-java.sh ${ARGS} | tee output.log
RESULT_DATA=$(cat output.log | tail -n 1)
printf "%s" $RESULT_DATA > $(results.artifact-version.path) Yeah let's take this to another issue if you want to explore it further, happy to help out! |
Thanks, @pschildkamp @sbwsg some great discussion. Glad that you have potentially found the issue. I think its great input into what we have been discussing around a broader option for result persistence. |
hey @sbwsg @pschildkamp it seems like the issue you're discussing is results related but not necessarily about the original issue this was opened to discuss - is it possible to move that discussion to a different issue?
I think having to give all TaskRuns the ability to write to even one specific ConfigMap is not going to be a good idea in the long run. Some thoughts:
@tlawrie you might find this recent tekton chains WG meeting interesting - @priyawadhwa has been pursuing adding an API to chains for handling data that is too big for results, via a GRPC server If we feel that a workspace based solution won't work, my feeling is that the next best option is going to be to have the pipelines controller run a server that taskruns can connect to for reporting results. |
+1, the k8s object/CRD approach seems nice but really just punts the problem further down the line IMO. The other size limits/rate limiting/permissions are going to complicate that in the long term. I think it's time to move to a real HTTP based API directly to a service/controller where we can avoid passing data through the k8s control plane first. |
Quick update from the 2 API working group meetings today (earlier meeting , later meeting - recordings should be up in not too long)
In addition to @tlawrie several people expressed that they were interested in this discussion though it sounds like (for now) these folks aren't planning to drive the discussion (but might in the future): @mattmoor @priyawadhwa @dlorenc @vdemeester @pritidesai @afrittoli @imjasonh |
oh and one more thing, @imjasonh is going to be giving a talk at Kubecon (in collaboration with Argo!) where he plans to talk about how Tekton and Argo are dealing with results (sounds like Argo also has a configmap based approach) and alternatives for the future: Argo And Tekton: Pushing The Boundaries Of The Possible On Kubernetes - Alex Collins, Intuit & Jason Hall, Red Hat |
There are still auth and HA problems here. (brain dump of what's been rattling in my head) I don't understand the desire to trampoline through an HTTP service when the Suppose hypothetically that the pipelines controller were to create N ConfigMaps** for each of N results, it could also grant the workload access to write to these results using one of these focused Roles: Line 100 in 9c61cdf
The ConfigMaps**, the Role, and the RoleBinding could all be OwnerRef'd to the *Run, to deal with cleanup. ** - Personally, I think a data-only CRD here would be clearer (a la Task/Pipeline/...), and you could use a webhook if you wanted to validate that it is "write once", but I don't have terribly strong feelings here. Like I said, I'd be happy to help someone write a TEP, but I probably won't have bandwidth to drive this work. |
So, few questions and remark (brain dumping here too)
In general, the less we put in Pipeline, the better 😅. |
My main concern is around having to create a new ConfigMap**, Role and RoleBinding per TaskRun, when at the end of the day we don't actually care about updating that ConfigMap**, but the TaskRun's results. Could we just have the entrypoint update its own TaskRun directly, and have the webhook admission controller enforce sane updates? (e.g., previous steps' results can't be changed once set). We already have known/understood HA limitations with the webhook today, and the API server will prevent updates if the service is unavailable. A risk would be that a step could |
If we stick with a k8s object this makes the most sense to me. ConfigMap feels like a hack, and a new CRD could just be inlined into TaskRun instead. Would a sub-resource help with the rbac'ing? My big worry is allowing taskruns to mutate everything about themselves. |
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
Feature request
The current implementation of result parameters relies on reading the results from disk and storing as part of the termination message output reference. We are then limited by the 4096 bytes of the Termination Message.
I'd like to propose an alternative, non-workspace based, whereby we still read the result parameters the same way
readResultsFromDisk()
, however, we eitherUse case
This is coming from the background of creating a UI that runs and abstracts away Tekton. The end users only see Tasks mapping to other Tasks. As such it makes it hard to then understand the complexities of writing the output to a field and then knowing how to get data from that file.
Result parameters in turn are a great example whereby you can feed the output of a task directly into another task through the
tasks.<task_name>.results.<param_name>
syntax without having to be concerned with storage or files or writing a task to retrieve from a file, instead tasks can be written and display in the UI to receive input as an end-user would interact.Very simple example
Take for example two tasks, implemented in a drag and drop fashion in a UI. One task (Task A) downloads the members of a Slack channel, the next task (Task B) actions them.
Task A outputs a JSON response result parameter.
Task B is written and defined to expect an input parameter.
Task B can handle input to the parameter as free text or the results of Task A using the Task Result replacement syntax. Otherwise, Task B has to be written to look for a file that is not generic and now we are adding complexities of storage and reading contents of files etc.
The text was updated successfully, but these errors were encountered: