-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sample] update XGBoost sample #2220
Conversation
…enew-xgboost-sample # Conflicts: # samples/core/xgboost_training_cm/xgboost_training_cm.py
cc @gaoning777 |
/test kubeflow-pipeline-sample-test |
@Ark-kun do you know if there is a way to change the name in the task factory? |
…enew-xgboost-sample # Conflicts: # samples/core/xgboost_training_cm/xgboost_training_cm.py
@numerology The updated gcp/dataproc/create_cluster component: https://raw.githubusercontent.com/kubeflow/pipelines/677fbaa281125fd604b81eab2488513efee7b600/components/gcp/dataproc/create_cluster/component.yaml |
Yes! You can use my_task = my_op(...).set_display_name('Custom name') |
schema=schema, | ||
train_data=train_data, | ||
output=output_template | ||
).after(create_cluster_op).apply(gcp.use_gcp_secret('user-gcp-sa')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nicer to apply the secrets to all ops in one line in the end of the pipeline function:
kfp.dsl.get_pipeline_conf().add_op_transformer(use_gcp_secret('user-gcp-sa'))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
target=target, | ||
analysis=analyze_output, | ||
output=output_template | ||
).after(analyze_op).apply(gcp.use_gcp_secret('user-gcp-sa')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual execution order control (.after
) is usually not needed as the task dependencies should normally be data dependencies. It can be a sign of component or pipeline design issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. But currently we're lacking a way to output arbitrary artifacts that can be consumed by downstream for those GCP components. This is just a workaround.
analyze_op.output, | ||
output_template | ||
).apply(gcp.use_gcp_secret('user-gcp-sa')) | ||
project=project, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change. Explicitly named arguments makes the pipeline robust against the signature changes.
/hold cancel |
region='us-central1', | ||
train_data='gs://ml-pipeline-playground/sfpd/train.csv', | ||
eval_data='gs://ml-pipeline-playground/sfpd/eval.csv', | ||
schema='gs://ml-pipeline-playground/sfpd/schema.json', | ||
target='resolution', | ||
rounds=200, | ||
rounds=5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this change is for fast testing, could you set it in the test config?
/lgtm |
Memo: since the new Dataproc components no longer support custom output, visualization components like confusion matrix and ROC are no longer available. See #2177 |
/hold |
Done. |
/retest |
Just remember another thing todo is to make sure it's compatible with the post-submit tests as well. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gaoning777 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
/lgtm |
Auto-merging does not seem to work. Manually merged |
""" | ||
|
||
# Remove existing [output]/train and [output]/eval if they exist. | ||
delete_directory_from_gcs(os.path.join(output, 'train')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is going to work?
delete_directory_from_gcs
is not a component, so it won't be executed as part of the pipeline.
Update xgboost sample to adopt new GCP components.
TODO:
add back visualization.Plan to do in future PR. This requires changes to the components.This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"