Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of BigQuery query execution component. #601

Merged
merged 3 commits into from
Dec 29, 2018

Conversation

cbreuel
Copy link
Contributor

@cbreuel cbreuel commented Dec 28, 2018

This is a component that runs a query in BigQuery and stores the results in a new table, and optionally also exports the results to GCS.


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @cbreuel. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

Hi @cbreuel. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@qimingj qimingj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cbreuel!


def run_query(query, project_id, dataset_id, table_id, output):
results = _query(query, project_id, dataset_id, table_id, output)
results['output'] = output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems useful if we store the files pattern, not just the output dir into results. As the component provider, I probably know the file pattern beforehand, and that is useful information to component consumers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually assuming the output parameter will be a complete path, including a file name or pattern. Would you prefer to make it just a directory and force the file pattern to be e.g. the table name, then include the pattern in the output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the output is just whatever is passed in right? Is it a dir or a file prefix? I would prefer the output includes the pattern or file (if single file). For example, if passed in output is "gs://b/dir", the output could be "gs://b/dir/files-*"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/lgtm

@cbreuel
Copy link
Contributor Author

cbreuel commented Dec 28, 2018

Made some changes based on your comments, PTAL.

Copy link
Contributor Author

@cbreuel cbreuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @qimingj, @Ark-kun, @gaoning777, and @hongye-sun)


components/bigquery/query/src/query.py, line 66 at r1 (raw file):

Previously, qimingj wrote…

then "required=True"?

Done.


components/bigquery/query/src/query.py, line 81 at r1 (raw file):

Previously, qimingj wrote…

Here the output is just whatever is passed in right? Is it a dir or a file prefix? I would prefer the output includes the pattern or file (if single file). For example, if passed in output is "gs://b/dir", the output could be "gs://b/dir/files-*"?

The idea is that "gs://b/dir/files-*" would be passed in, not "gs://b/dir", so it would make sense for them to be the same. I could also change the code to expect only the dir portion to be passed in and add the file name pattern in the code, then outputting the combination. Do you think that would work better?

@qimingj
Copy link
Contributor

qimingj commented Dec 28, 2018

/ok-to-test
/lgtm

@qimingj
Copy link
Contributor

qimingj commented Dec 28, 2018


RUN easy_install pip

RUN pip install google-apitools==0.5.20 \
Copy link

@charlesa101 charlesa101 Dec 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be nice to parameterize this values i.e ARG GOOG_CLOUD_BQ='1.8.1'

@qimingj
Copy link
Contributor

qimingj commented Dec 29, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qimingj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 99227e8 into kubeflow:master Dec 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants