-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into new_york_taxi_trips
- Loading branch information
Showing
222 changed files
with
21,385 additions
and
30,135 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,17 +4,27 @@ Note: If you are adding or editing a dataset, please specify the dataset folder | |
|
||
## Checklist | ||
|
||
Note: If an item applies to you, all of its sub-items must be fulfilled | ||
|
||
- [ ] **(Required)** This pull request is appropriately labeled | ||
- [ ] Please merge this pull request after it's approved | ||
|
||
Use the sections below based on what's applicable to your PR and delete the rest: | ||
|
||
### Feature | ||
- [ ] I'm adding or editing a feature | ||
- [ ] I have updated the [`README`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/README.md) accordingly | ||
- [ ] I have added tests for the feature | ||
- [ ] I have updated the [`README`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/README.md) accordingly | ||
- [ ] I have added/revised tests for the feature | ||
|
||
### Data Onboarding | ||
- [ ] I'm adding or editing a dataset | ||
- [ ] The [Google Cloud Datasets team](mailto:[email protected]) is aware of the proposed dataset | ||
- [ ] I put all my code inside `datasets/<DATASET_NAME>` and nothing outside of that directory | ||
- [ ] The [Google Cloud Datasets team](mailto:[email protected]) is aware of the proposed dataset | ||
- [ ] I put all my code inside `datasets/<DATASET_NAME>` and nothing outside of that directory | ||
|
||
### Documentation | ||
- [ ] I'm adding/editing documentation | ||
|
||
### Bug fix | ||
- [ ] I'm submitting a bugfix | ||
- [ ] I have added tests to my bugfix (see the [`tests`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/tree/main/tests) folder) | ||
- [ ] I have added/revised tests related to my bugfix (see the [`tests`](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/tree/main/tests) folder) | ||
|
||
### Code cleanup or refactoring | ||
- [ ] I'm refactoring or cleaning up some code |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,10 @@ | ||
# Google Cloud Datasets: Data Pipelines and Documentation Set | ||
|
||
This repository contains the followings: | ||
![public-datasets-pipelines](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/images/architecture.png) | ||
|
||
This repository contains the following: | ||
|
||
- Cloud-native, data pipeline architecture for onboarding public datasets to [Google Cloud Datasets](https://cloud.google.com/datasets). | ||
- Documentation set for tutorials, samples, and other articles related to the datasets hosted by the program. | ||
- Documentation set containing tutorials, samples, and other articles making use of the datasets hosted by the program. | ||
|
||
For detailed documentation, please see the [Wiki Pages](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/wiki). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
/** | ||
* Copyright 2021 Google LLC | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
|
||
provider "google" { | ||
project = var.project_id | ||
impersonate_service_account = var.impersonating_acct | ||
region = var.region | ||
} | ||
|
||
data "google_client_openid_userinfo" "me" {} | ||
|
||
output "impersonating-account" { | ||
value = data.google_client_openid_userinfo.me.email | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
/** | ||
* Copyright 2021 Google LLC | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
|
||
variable "project_id" {} | ||
variable "bucket_name_prefix" {} | ||
variable "impersonating_acct" {} | ||
variable "region" {} | ||
variable "env" {} | ||
variable "iam_policies" { | ||
default = {} | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
from airflow import DAG | ||
from airflow.providers.google.cloud.transfers import gcs_to_gcs | ||
|
||
default_args = { | ||
"owner": "Google", | ||
"depends_on_past": False, | ||
"start_date": "2022-07-22", | ||
} | ||
|
||
|
||
with DAG( | ||
dag_id="celeba.celeba", | ||
default_args=default_args, | ||
max_active_runs=1, | ||
schedule_interval="@once", | ||
catchup=False, | ||
default_view="graph", | ||
) as dag: | ||
|
||
# Transfer data from source to destination in GCS | ||
GCStoGCS_transfer = gcs_to_gcs.GCSToGCSOperator( | ||
task_id="GCStoGCS_transfer", | ||
source_bucket="{{ var.value.composer_bucket }}", | ||
source_object="{{ var.json.celeba.source_object }}", | ||
destination_bucket="{{ var.value.composer_bucket }}", | ||
destination_object="{{ var.json.celeba.destination_object }}", | ||
) | ||
|
||
GCStoGCS_transfer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
--- | ||
resources: ~ | ||
|
||
dag: | ||
airflow_version: 2 | ||
initialize: | ||
dag_id: celeba | ||
default_args: | ||
owner: "Google" | ||
depends_on_past: False | ||
start_date: "2022-07-22" | ||
max_active_runs: 1 | ||
schedule_interval: "@once" | ||
catchup: False | ||
default_view: graph | ||
|
||
tasks: | ||
- operator: "GoogleCloudStorageToGoogleCloudStorageOperator" | ||
description: "Transfer data from source to destination in GCS" | ||
args: | ||
task_id: "GCStoGCS_transfer" | ||
source_bucket: "{{ var.value.composer_bucket }}" | ||
source_object: "{{ var.json.celeba.source_object }}" | ||
destination_bucket: "{{ var.value.composer_bucket }}" | ||
destination_object: "{{ var.json.celeba.destination_object }}" | ||
|
||
graph_paths: | ||
- "GCStoGCS_transfer" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
dataset: | ||
name: celeba | ||
friendly_name: Celebrity Attributes | ||
description: Dataset of images of the facial attributes of various celebrities. | ||
dataset_sources: ~ | ||
terms_of_use: ~ | ||
|
||
|
||
resources: ~ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.