Tekton tasks and pipelines to run notebooks and generate reports #613

jlewi · 2020-02-18T15:40:39Z

Kubeflow relies heavily on notebooks for reporting and e2e testing.

We'd like to make it easier to run notebooks and generate reports for them.

One way to do that would be to create reusable Tekton tasks to run notebooks (e.g. using papermill) and then upload an HTML version of the output to an object store like GCS or S3.

Then users could easily automate the running and generation of notebook reports just by adding tasks to a Tekton pipeline.

The task should be parameterized so that users just have to specify the parameters defining their notebook. Possible parameters

inputs:

Git repo containing the source notebook (using a Tekton Git resource)
Path within the repo to the notebook

Outputs:

Location where report should be published.

We should define a catalog of tasks inside kubeflow/testing and put the task there. Similar to
https://github.com/tektoncd/catalog.

kubeflow/examples#735 provides an example of using papermill to execute a notebook, then running nbconvert to convert to html, and then uploading it to GCS. That could be as a model of what a Tekton task might do.

As a follow on issue we might want to investigate using commuter to view these notebooks

issue-label-bot · 2020-02-18T15:40:47Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
kind/feature	0.91

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

jlewi · 2020-02-18T15:41:19Z

/cc @gabrielwen

jlewi · 2020-03-12T15:52:40Z

I think @gabrielwen is working on this.

@gabrielwen any update on this?

sarahmaddox · 2020-03-19T21:03:46Z

/area gsoc

StephennFernandes · 2020-03-27T15:33:33Z

I would like to work on this issue as a part of GSOC 2020

11fenil11 · 2020-03-30T08:27:11Z

@sarahmaddox I would like to contribute on this issue during GSoC journey.

jtfogarty · 2020-04-17T16:43:20Z

/priority p1

jlewi · 2020-06-09T00:53:50Z

#622 included some initial tekton tasks and pipelines for running our E2E tests.
I'm currently trying to get those running against our auto deployed blueprints per:
GoogleCloudPlatform/kubeflow-distribution#42

issue-label-bot · 2020-06-09T00:53:58Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/engprod	0.88

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

jlewi · 2020-06-10T17:22:15Z

Here's the current notebook task

https://github.com/kubeflow/testing/blob/master/tekton/templates/tasks/notebook-test-task.yaml
This kicks off https://github.com/kubeflow/examples/blob/master/py/kubeflow/examples/notebook_tests/run_notebook_test.py
- Which fires off a K8s Job to run the notebook inside the KF cluster
Which uses init-containers to download the source under test https://github.com/kubeflow/examples/blob/b8b7179fc253e23455124885beda43399b824b4d/py/kubeflow/examples/notebook_tests/nb_test_util.py#L62

The use of init containers is a bit brittle and leads to passing around repo references and duplicating git logic best handled by Tekton.

I think a better approach is

Tekton pipeline (which runs on the test cluster) builds a docker image which includes any source
repos needed
We fire off the K8s job to run the notebook using this docker image so that the k8s job doesn't need to use init containers and pull from git.

* Changes pulled in from kuueflow/examples#764 * Notebook tests should print a link to the stackdriver logs for the actual notebook job. * Related to kubeflow/testing#613

* Changes pulled in from kuueflow/examples#764 * Notebook tests should print a link to the stackdriver logs for the actual notebook job. * Related to kubeflow/testing#613 Co-authored-by: Gabriel Wen <[email protected]>

jlewi · 2020-06-18T01:16:14Z

/cc @rmgogogo @Bobgy

* kubeflow#613 currently the way we run notebook tests is by firing off a K8s job on the KF cluster which runs the notebook. * The K8s job uses init containers to pull in source code and install dependencies like papermill. * This is a bit brittle. * To fix this we will instead use Tekton to build a docker image that takes the notebook image and then adds the notebook code to it. * Dockerfile.notebook_runner dockerfile to build the test image. * Add tekton tasks to build the image.

Notebook tests should build a docker image to run the notebook in. * kubeflow#613 currently the way we run notebook tests is by firing off a K8s job on the KF cluster which runs the notebook. * The K8s job uses init containers to pull in source code and install dependencies like papermill. * This is a bit brittle. * To fix this we will instead use Tekton to build a docker image that takes the notebook image and then adds the notebook code to it. * Dockerfile.notebook_runner dockerfile to build the test image. The pipeline to run the notebook consists of two tasks 1. A Tekton Task to build a docker image to run the notebook in 1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster. Here's a list of changes to make this work * tekton_client should provide methods to upload artifacts but not parse junits * Add a tekton_client method to construct the full image URL based on the digest returned from kaniko * Copy over the code for running the notebook tests from kubeflow/examples and start modifying it. * Create a simple CLI to wait for nomos to sync resources to the cluster * This is used in some syntactic sugar make rules to aid the dev-test loop The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF deployments don't have proper GSA's to write to GCS. Related to: kubeflow#613

* Revamp how Tekton pipelines to run notebooks work. Notebook tests should build a docker image to run the notebook in. * #613 currently the way we run notebook tests is by firing off a K8s job on the KF cluster which runs the notebook. * The K8s job uses init containers to pull in source code and install dependencies like papermill. * This is a bit brittle. * To fix this we will instead use Tekton to build a docker image that takes the notebook image and then adds the notebook code to it. * Dockerfile.notebook_runner dockerfile to build the test image. The pipeline to run the notebook consists of two tasks 1. A Tekton Task to build a docker image to run the notebook in 1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster. Here's a list of changes to make this work * tekton_client should provide methods to upload artifacts but not parse junits * Add a tekton_client method to construct the full image URL based on the digest returned from kaniko * Copy over the code for running the notebook tests from kubeflow/examples and start modifying it. * Create a simple CLI to wait for nomos to sync resources to the cluster * This is used in some syntactic sugar make rules to aid the dev-test loop The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF deployments don't have proper GSA's to write to GCS. Related to: #613 * tekton_client.py can't use format strings yet because we are still running under python2. * Remove f-style strings. * Fix typo. * Address PR comments. * * copy-buckets should not abort on error as this prevents artifacts from being copied and thus the results from showing up in testgrid see #703

stale · 2020-09-17T04:38:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

stale · 2020-09-26T03:41:24Z

This issue has been closed due to inactivity.

issue-label-bot bot added the kind/feature label Feb 18, 2020

k8s-ci-robot added the area/gsoc label Mar 19, 2020

jlewi mentioned this issue Apr 9, 2020

kustomize package for tekton tektoncd/pipeline#1198

Closed

k8s-ci-robot added the priority/p1 label Apr 17, 2020

jlewi mentioned this issue May 8, 2020

Better Tekton support (possibly replace ClusterBuilder) kubeflow/fairing#454

Open

issue-label-bot bot added the area/engprod label Jun 9, 2020

jlewi mentioned this issue Jun 10, 2020

Setup CI against blueprint deployments GoogleCloudPlatform/kubeflow-distribution#42

Closed

jlewi mentioned this issue Jun 10, 2020

Some improvements to testing of notebooks kubeflow/examples#803

Merged

jlewi mentioned this issue Jun 24, 2020

Revamp how Tekton pipelines to run notebooks work. #703

Merged

stale bot added the lifecycle/stale label Sep 17, 2020

stale bot closed this as completed Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tekton tasks and pipelines to run notebooks and generate reports #613

Tekton tasks and pipelines to run notebooks and generate reports #613

jlewi commented Feb 18, 2020

issue-label-bot bot commented Feb 18, 2020

jlewi commented Feb 18, 2020

jlewi commented Mar 12, 2020

sarahmaddox commented Mar 19, 2020

StephennFernandes commented Mar 27, 2020

11fenil11 commented Mar 30, 2020

jtfogarty commented Apr 17, 2020

jlewi commented Jun 9, 2020

issue-label-bot bot commented Jun 9, 2020

jlewi commented Jun 10, 2020

jlewi commented Jun 18, 2020

stale bot commented Sep 17, 2020

stale bot commented Sep 26, 2020

Tekton tasks and pipelines to run notebooks and generate reports #613

Tekton tasks and pipelines to run notebooks and generate reports #613

Comments

jlewi commented Feb 18, 2020

issue-label-bot bot commented Feb 18, 2020

jlewi commented Feb 18, 2020

jlewi commented Mar 12, 2020

sarahmaddox commented Mar 19, 2020

StephennFernandes commented Mar 27, 2020

11fenil11 commented Mar 30, 2020

jtfogarty commented Apr 17, 2020

jlewi commented Jun 9, 2020

issue-label-bot bot commented Jun 9, 2020

jlewi commented Jun 10, 2020

jlewi commented Jun 18, 2020

stale bot commented Sep 17, 2020

stale bot commented Sep 26, 2020