This doc describes how TaskRuns are implemented using pods.
Tekton releases include a binary called the "entrypoint", which wraps the
user-provided binary for each Step
container to manage the execution order of the containers.
The entrypoint
binary has the following arguments:
wait_file
- If specified, file to wait forwait_file_content
- If specified, wait until the file has non-zero sizepost_file
- If specified, file to write upon completionentrypoint
- The command to run in the image being wrapped
As part of the PodSpec created by TaskRun
the entrypoint for each Task
step
is changed to the entrypoint binary with the mentioned arguments and a volume
with the binary and file(s) is mounted.
If the image is a private registry, the service account should include an ImagePullSecret
For more details, see entrypoint/README.md or the talk "Russian Doll: Extending Containers with Nested Processes".
The entrypoint now allows exiting with an error and continue running rest of the
steps in a task i.e., it is possible for a step to exit with a non-zero exit
code. Now, it is possible to design a task with a step which can take an action
depending on the exit code of any prior steps. The user can access the exit code
of a step by reading the file pointed by the path variable
$(steps.step-<step-name>.exitCode.path)
or
$(steps.step-unnamed-<step-index>.exitCode.path)
. For example:
$(steps.step-my-awesome-step.exitCode.path)
where the step name ismy-awesome-step
.$(steps.step-unnamed-0.exitCode.path)
where the first step in a task has no name.
The exit code of a step is stored in a file named exitCode
under a directory
/tekton/steps/step-<step-name>/
or /tekton/steps/step-unnamed-<step-index>/
which is reserved for any other step specific information in the future.
If you would like to use the tekton internal path, you can access the exit code by reading the file (which is not recommended though since the path might change in the future):
cat /tekton/steps/step-<step-name>/exitCode
And, access the step exit code without a step name:
cat /tekton/steps/step-unnamed-<step-index>/exitCode
Or, you can access the step metadata directory via symlink, for example, use
cat /tekton/steps/0/exitCode
for the first step in a task.
Tekton Pipelines uses a Pod's
termination message
to pass data from a Step's container to the Pipelines controller. Examples of
this data include: the time that execution of the user's step began, contents of
task results, contents of pipeline resource results.
The contents and format of the termination message can change. At time of
writing the message takes the form of a serialized JSON blob. Some of the data
from the message is internal to Tekton Pipelines, used for book-keeping, and
some is distributed across a number of fields of the TaskRun's
status
. For
example, a TaskRun's
status.taskResults
is populated from the termination
message.
/workspace
- This directory is where volumes for resources and workspaces are mounted.
The /tekton/
directory is reserved on containers for internal usage.
Here is an example of a directory layout for a simple Task with 2 script steps:
/tekton
|-- bin
`-- entrypoint
|-- creds
|-- downward
| |-- ..2021_09_16_18_31_06.270542700
| | `-- ready
| |-- ..data -> ..2021_09_16_18_31_06.270542700
| `-- ready -> ..data/ready
|-- home
|-- results
|-- run
`-- 0
`-- out
`-- status
`-- exitCode
|-- scripts
| |-- script-0-t4jd8
| `-- script-1-4pjwp
|-- steps
| |-- 0 -> /tekton/run/0/status
| |-- 1 -> /tekton/run/1/status
| |-- step-foo -> /tekton/run/1/status
| `-- step-unnamed-0 -> /tekton/run/0/status
`-- termination
Path | Description |
---|---|
/tekton | Directory used for Tekton specific functionality |
/tekton/bin | Tekton provided binaries / tools |
/tekton/creds | Location of Tekton mounted secrets. See Authentication at Run Time for more details. |
/tekton/debug | Contains Debug scripts used to manage step lifecycle during debugging at a breakpoint and the Debug Info mount used to assist for the same. |
/tekton/downward | Location of data mounted via the Downward API. |
/tekton/home | (deprecated - see tektoncd#2013) Default home directory for user containers. |
/tekton/results | Where results are written to (path available to Task authors via $(results.name.path) ) |
/tekton/run | Runtime variable data. Used for coordinating step ordering. |
/tekton/scripts | Contains user provided scripts specified in the TaskSpec. |
/tekton/steps | Where the step exitCodes are written to (path available to Task authors via $(steps.<stepName>.exitCode.path) ) |
/tekton/termination | where the eventual termination log message is written to Sequencing step containers |
The following directories are covered by the Tekton API Compatibility policy, and can be relied on for stability:
/tekton/results
All other files/directories are internal implementation details of Tekton - users should not rely on specific paths or behaviors as it may change in the future.
/tekton/run
is a collection of implicit volumes mounted on a pod and created
for storing the step specific information/metadata. Steps can only write
metadata to their own tekton/run
directory - all other step volumes are mounted as
readonly
. The tekton/run
directories are considered internal implementation details
of Tekton and are not bound by the API compatibility policy - the contents and
structure can be safely changed so long as user behavior remains the same.
/tekton/steps
are special subdirectories are created for each step in a task -
each directory is actually a symlink to a directory in the Step's corresponding
/tekton/run
volume. This is done to ensure that step directories can only be
modified by their own Step. To ensure that these symlinks are not modified, the
entire /tekton/steps
volume is initially populated by an initContainer, and
mounted readonly
on all user steps.
These symlinks are created as a part of the step-init
entrypoint subcommand
initContainer on each Task Pod.
The entrypoint is modified to include an additional flag representing the step specific directory where step metadata should be written:
step_metadata_dir - the dir specified in this flag is created to hold a step specific metadata
step_metadata_dir
is set to /tekton/run/<step #>/status
for the entrypoint
of each step.
Let's take an example of a task with two steps, each exiting with non-zero exit code:
kind: TaskRun
apiVersion: tekton.dev/v1beta1
metadata:
generateName: test-taskrun-
spec:
taskSpec:
steps:
- image: alpine
name: step0
onError: continue
script: |
exit 1
- image: alpine
onError: continue
script: |
exit 2
During step-step0
, the first container is actively running so none of the
output files are populated yet. The /tekton/steps
directories are symlinked to
locations that do not yet exist, but will be populated during execution.
/tekton
|-- run
| |-- 0
| `-- 1
|-- steps
|-- 0 -> /tekton/run/0/status
|-- 1 -> /tekton/run/1/status
|-- step-step0 -> /tekton/run/0/status
`-- step-unnamed1 -> /tekton/run/1/status
During step-unnamed1
, the first container has now finished. The output files
for the first step are now populated, and the folder pointed to by
/tekton/steps/0
now exists, and is populated with a file named exitCode
which contains the exit code of the first step.
/tekton
|-- run
| |-- 0
| | |-- out
| | `-- status
| | `-- exitCode
| `-- 1
|-- steps
|-- 0 -> /tekton/run/0/status
|-- 1 -> /tekton/run/1/status
|-- step-step0 -> /tekton/run/0/status
`-- step-unnamed1 -> /tekton/run/1/status
Notice that there are multiple symlinks showing under /tekton/steps/
pointing
to the same /tekton/run
location. These symbolic links are created to provide
simplified access to the step metadata directories i.e., instead of referring to
a directory with the step name, access it via the step index. The step index
becomes complex and hard to keep track of in a task with a long list of steps,
for example, a task with 20 steps. Creating the step metadata directory using a
step name and creating a symbolic link using the step index gives the user
flexibility, and an option to choose whatever works best for them.
Tekton has to take some special steps to support sidecars that are injected into TaskRun Pods. Without intervention sidecars will typically run for the entire lifetime of a Pod but in Tekton's case it's desirable for the sidecars to run only as long as Steps take to complete. There's also a need for Tekton to schedule the sidecars to start before a Task's Steps begin, just in case the Steps rely on a sidecars behavior, for example to join an Istio service mesh. To handle all of this, Tekton Pipelines implements the following lifecycle for sidecar containers:
First, the
Downward API
is used to project an annotation on the TaskRun's Pod into the entrypoint
container as a file. The annotation starts as an empty string, so the file
projected by the downward API has zero length. The entrypointer spins, waiting
for that file to have non-zero size.
The sidecar containers start up. Once they're all in a ready state, the annotation is populated with string "READY", which in turn populates the Downward API projected file. The entrypoint binary recognizes that the projected file has a non-zero size and allows the Task's steps to begin.
On completion of all steps in a Task the TaskRun reconciler stops any sidecar
containers. The Image
field of any sidecar containers is swapped to the nop
image. Kubernetes observes the change and relaunches the container with updated
container image. The nop container image exits immediately because it does not
provide the command that the sidecar is configured to run. The container is
considered Terminated
by Kubernetes and the TaskRun's Pod stops.
There are known issues with the existing implementation of sidecars:
-
When the
nop
image does provide the sidecar's command, the sidecar will continue to run even afternop
has been swapped into the sidecar container's image field. See the issue tracking this bug for the issue tracking this bug. Until this issue is resolved the best way to avoid it is to avoid overriding thenop
image when deploying the tekton controller, or ensuring that the overriddennop
image contains as few commands as possible. -
kubectl get pods
will show a Completed pod when a sidecar exits successfully but an Error when the sidecar exits with an error. This is only apparent when usingkubectl
to get the pods of a TaskRun, not when describing the Pod usingkubectl describe pod ...
nor when looking at the TaskRun, but can be quite confusing.
Halting a TaskRun execution on Failure of a step.
The entrypoint binary is used to manage the lifecycle of a step. Steps are aligned beforehand by the TaskRun controller
allowing each step to run in a particular order. This is done using -wait_file
and the -post_file
flags. The former
let's the entrypoint binary know that it has to wait on creation of a particular file before starting execution of the step.
And the latter provides information on the step number and signal the next step on completion of the step.
On success of a step, the -post-file
is written as is, signalling the next step which would have the same argument given
for -wait_file
to resume the entrypoint process and move ahead with the step.
On failure of a step, the -post_file
is written with appending .err
to it denoting that the previous step has failed with
and error. The subsequent steps are skipped in this case as well, marking the TaskRun as a failure.
The failed step writes <step-no>.err
to /tekton/run
and stops running completely. To be able to debug a step we would
need it to continue running (not exit), not skip the next steps and signal health of the step. By disabling step skipping,
stopping write of the <step-no>.err
file and waiting on a signal by the user to disable the halt, we would be simulating a
"breakpoint".
In this breakpoint, which is essentially a limbo state the TaskRun finds itself in, the user can interact with the step environment using a CLI or an IDE.
To exit a step which has been paused upon failure, the step would wait on a file similar to <step-no>.breakpointexit
which
would unpause and exit the step container. eg: Step 0 fails and is paused. Writing 0.breakpointexit
in /tekton/run
would unpause and exit the step container.
TaskRun will be stuck waiting for user debugging before the step execution.
The step program will be executed after all the -wait_file
monitoring ends. If want the user to enter the debugging before the step is executed,
need to pass a parameter debug_before_step
to entrypoint
,
and entrypoint
will end the monitoring of waitFiles
back pause,
waiting to listen to the /tekton/run/0/out.beforestepexit
file
entrypoint
listening /tekton/run/{{ stepID }}/out.beforestepexit
or /tekton/run/{{ stepID }}/out.beforestepexit.err
to
decide whether to proceed this step, out.beforestepexit
means continue with step,
out.beforestepexit.err
means do not continue with the step.