5.2.0 (2022-11-01)
- Add geom columns for thelook_ecommerce dataset (#307) (f39a177)
- Add Municipal Calendar to San Francisco Dataset (#480) (a21c2ef)
- Add PM25_FRM_DAILY_SUMMARY Pipeline To Epa_Historical_Air_Quality Dataset (#518) (4f66c05)
- Add Storms Database to Noaa Dataset (#498) (8d02866)
- Adding a tutorial for the Iowa Liquor dataset (#419) (b619b71)
- Adding New Pipelines To San Francisco Dataset. (#487) (58cda71)
- Extract the tabular metadata for Cloud Datasets program (#452) (1a3d59e)
- Launch AFDB v4 dataset (#522) (c6664a7)
- Migrate the dataset Covid19 Italy from Xenon (#488) (1ca6bd6)
- Migrate the World Bank datasets x 3 from Xenon (#506) (65295d0)
- Migrate the Xenon World Bank WDI dataset (#482) (35457a9)
- onboard chembl-30 dataset (#467) (ef9c57b)
- Onboard COVID-19 Genome Sequence dataset (#460) (0b7828f)
- Onboard dataset Open Buildings (#453) (739b6cf)
- Onboard EBI CHemBL Previous Data dataset (#470) (63b4012)
- Onboard FDIC dataset (#495) (e20e157)
- Onboard Fec dataset (#485) (2da413e)
- Onboard Human Variant Annotation dataset (#438) (ebfe4de)
- Onboard IDC v10 dataset (#433) (c2ffc77)
- onboard irs 990 ein dataset (#481) (65544a2)
- Onboard MERFISH Mouse Brain Receptor Map dataset (#457) (4333fca)
- Onboard Multilingual Spoken Words Corpus - MLCommons Association dataset (#461) (22cc27c)
- Onboard New Fec dataset (#486) (6ee1fa3)
- Onboard New FEC dataset (#513) (e770220)
- Onboard NHTSA Traffic Fatalities dataset (#454) (eb409c4)
- Onboard NOAA Passive Bioacoustic dataset (#471) (2ecd9ea)
- Onboard Uniref50 dataset (#443) (dbf2300)
- Onboard Uniref50 dataset (#473) (b44d572)
- YAML custom tag for interpolating GCR image URLs (#372) (ef901e5)
- Added "is_public" to cloud_datasets.tabular_datasets table (#501) (802cff6)
- Added Airport Fee To Schema Files And Pipeline.Yaml In New York Taxi Trips Dataset (#476) (d94105a)
- Adds BRL currency in Google Political Ads (#469) (edd3654)
- AlphaFold dataset - add accession_ids.csv to the bucket (#451) (cacd9f1)
- Change Destination Dataset in Noaa Pipelines (#479) (c7c047c)
- City Health Dashboard Schema Changes (#515) (1bdb0dd)
- deleting pod error (#511) (77fe479)
- Fixing the forecasting issue in the notebook. (#472) (de7f1fa)
- For COVID-19 Italy, resolve bucket variable in pipeline.yaml (#509) (1f913ac)
- For FDA Food Enforcement, Resolve invalid source DateTime data. (#508) (f4b5a52)
- Increase number of years to back date to 2009 in New York Taxi Trips Dataset (#445) (a9c5998)
- Modified Resources for Kpod Operator (#521) (e715154)
- Remove GKE cluster operator for dataset Census Opportunity Atlas (#458) (9ecfbc4)
- Removed create cluster process (#517) (d36e6d4)
- Resolve cluster name mismatch in pipeline.yaml (#439) (3e8d20d)
- Resolve cluster name mismatch in pipeline.yaml (#440) (d2658f6)
- Resolve DateTime Issues In FEC Dataset (#514) (014465b)
- Resolve failure in production for the dataset Open Buildings (#468) (9a22d5f)
- Resolve Failures In New york Pipeline And Merge To One Image (#516) (7d21778)
- Resolve Issue With Name Node Corruption In New york Dataset (#459) (59e3aed)
- Resolve null column for csv output and changed copyright year (#466) (00e636e)
- Resolve production issue for Iowa Liquor Sales dataset (#520) (cf2b460)
- Resolve reference to hard coded bucket. (#477) (039ff61)
- Resolve San Francisco Pipeline Yaml Variable Assignment Issue (#489) (2d34cf9)
- Resolve source file location and format issue in the New York Taxi Trips dataset (#441) (13a829f)
- Resolve Typo Issue In EPA Historical Air Quality Pipeline.yaml (#519) (c54836d)
- Resolve variables. (#464) (3c34e7e)
- Resolved reference to destination bucket causing failure in production. (#507) (69128bc)
- set gnomAD pipeline to run daily (#510) (5f50601)
- Update project parameters for COVID-19 Genome Sequence dataset (#462) (78d55d9)
5.1.0 (2022-07-30)
- Add scaffold script for directory + dataset.yaml setup (#412) (5bf354b)
- Adding a notebook tutorial for the EPA dataset: CO levels (#422) (f0bab59)
- Adds operators for Cloud SQL, Cloud Functions, and GCE (#429) (9b5da34)
- Support
--async-builds
flag forgenerate_dag.py
(#424) (7536df9)
- Onboard DeepMind AlphaFold DB (#431) (02c887e)
- Onboard CelebA dataset (#420) (0c28563)
- Adds BQ views to
scalable_open_source
dataset (#416) (2785234) - Rename co2 columns to emissions to make it generic from Travel Impact Model dataset. (#418) (e1ac106)
- Change
cms_medicare
tables with columnprovider_zipcode
from integer to string type (#417) (27b0a9b) - Resolve conflicts on Census Bureau ACS (#414) (492b973)
- Resolve CRON value in Cloud Storage Geo Index dataset (#413) (8903e82)
- Resolve IP error when creating NOAA cluster (#423) (82d53f4)
- Use proper GCS prefix for custom data folder (#408) (9d56363)
5.0.0 (2022-07-11)
- Upgrade to Airflow 2.2.5 and Python 3.8.12 (#394)
- Onboard Carbon-Free Energy Calculator dataset (#391) (f3a9447)
- Onboard Census Bureau ACS Dataset (#399) (98e0179)
- Onboard Fashion MNIST dataset (#387) (91b7f6a)
- Onboard IMDb dataset (#406) (2559838)
- Optimize tests for DAG and Terraform generation (#395) (ffcd18c)
- Remove co2e columns from Travel Impact Model dataset. (#400) (d7179ce)
- NOAA - Resolve table field name issue. (#402) (51860eb)
- Use specific Python version for Airflow 1 tests (#401) (6fa94a7)
4.2.0 (2022-06-25)
- Onboard COVID-19 dataset from The New York Times (#383) (9aac451)
- Onboard NOAA dataset (#378) (02cc038)
- Onboard San Jose Translation dataset (#377) (63ea9b9)
- Onboarding MIMIC-III dataset (#389) (baf6b8d)
- [datasets/gbif] Add a query to uncover species found in one region only (#388) (bd5a135)
4.1.1 (2022-06-16)
- Onboard IMDB dataset (#382) (8bf7065)
- Onboard MNIST dataset (#379) (9809935)
- Onboard New York Taxi Trips dataset (#381) (897ac3f)
4.1.0 (2022-06-10)
- Onboard City Health Dashboard dataset (#374) (c7cd9dd)
- Onboard Cloud Storage Geo Index (#367) (63cdb2a)
- Onboard EPA Historical Air Quality (#373) (4f4c87e)
- Onboard IDC v9 dataset (#364) (bfb9f23)
- Onboard NOAA datasets (#353) (0f1c696)
- Onboard The General Index Dataset (#342) (67d7216)
- Revised COVID-19 Google Mobility dataset (#363) (ddd3dac)
4.0.0 (2022-05-23)
- Unified variables and adds support for IAM policies (#341)
- Use poetry over pipenv (#337)
- Onboard Census Opportunity Atlas Dataset (#263) (13ce71d)
- Onboard deps.dev (Open Source Insights) dataset (#356) (12143af)
- Onboard Diversity Annual Report and complementary datasets (#358) (4a8a2cd)
- Onboard EPA Historical Air Quality dataset (#301) (214a56f)
- Onboard GBIF dataset (#355) (ab4e208)
- Onboard IDC v8 dataset (#319) (0f112e0)
- Onboard International Search Terms for Google Trends (#323) (855aa7f)
- Onboard NASA wildfire (#275) (f593161)
- Onboard New York Trees dataset (#265) (2905308)
- Onboard Open Targets Genetics dataset (#318) (03b4f89)
- Onboard Open Targets Platform dataset (#313) (c5adce6)
- Onboard SEC Failure to Deliver dataset (#309) (afa6492)
- Rename Travel Sustainability to Travel Impact Model (#351) (83df285)
- Retrieve Composer bucket name when deploying DAGs (#312) (220f1d5)
- Update BLS - CPSAAT18 with 2021 data (#357) (a8f8856)
- Added functionality to support a data folder to store schema files (#354) (f893dff)
- Unified variables and adds support for IAM policies (#341) (c4a45a0)
- Use poetry over pipenv (#337) (ca43066)
- Adds packages for docs dependency group (#339) (6721490)
- bump black version due to
click
dependency issue (#320) (cac6f18) - Fix generating BQ views for IDC dataset (#324) (5896865)
- Removed unecessary pathlib param from test_deploy_dag (#345) (45dd0b2)
- thelook_ecommerce - increase # of customers and revised order_items (#352) (ed1570d)
3.0.0 (2022-03-24)
- Reorganize pipelines and infra files into their respective folders (#292)
- Reorganize pipelines and infra files into their respective folders (#292) (7408d44)
- Upgrade some pipelines to Airflow 2 and explicitly set pod storage (#283) (cbc3278)
- Onboard Broad Genome References dataset (#316) (4f1f6db)
- Onboard Imaging Data Commons (IDC) v7 dataset (#287) (dfda5d9)
- Onboard ML dataset (#276) (48e51af)
- Onboard Travel Sustainability dataset (#280) (8e9731a)
- Onboard Travel Sustainability dataset (schema update) (#298) (7a13daa)
- Onboarding TheLook E-Commerce dataset (#294) (15f663a)
- Revise Google Political Ads due to new dataset version (#317) (6ffb0d0)
- Update "location" to GEOGRAPHY type for
datasets/google_trends
schema (#297) (9d9d3bd)
- Docs: Add SF 311 example (#310) (844a7fb)
- Docs: Add a query snippet to calculate the monthly average bike trips for
san_francisco_bikeshare
(#284) (7a009f6) - Docs: Added a template for tutorials (#299) (ae23d4b)
- Docs: SF 311 Calls - Predicting the number of calls per category using LSTM (#293) (88637ca)
- Allow other JSON files to be checked in (such as
schema.json
) (#281) (2c94b79) - Update and fix
city_health_dashboard
dataset (#285) (4767fed)
2.8.0 (2022-01-27)
- Onboard America Health Rankings dataset (#244) (8ecbfda)
- Onboard American Community Survey dataset (#222) (861d0e6)
- Onboard Census Opportunity Atlas dataset (#248) (0e62f27)
- Onboard Census tract 2019 dataset (#272) (d2b5e52)
- Onboard CFPB Complaints dataset (#225) (9051773)
- Onboard Chronic Disease Indicators dataset (#242) (48c96f2)
- Onboard City Health Dashboard dataset (#250) (8cc5286)
- Onboard COVID-19 CDS EU dataset (#261) (d710dec)
- Onboard EUMETSAT Solar Forecasting dataset (#273) (db479cf)
- Onboard FDA Drug Enforcement dataset (#245) (53c98ac)
- Onboard gnomAD dataset (#264) (804b440)
- Onboard MLCommons Multilingual Spoken Words Corpus (MSWC) dataset (#252) (ec93997)
- Onboard News Hate Crimes dataset (#238) (9b242ef)
- Onboard Race and Economic Opportunity dataset (#236) (fe6c826)
- Onboarding COVID-19 (UK) Government Response dataset (#262) (914d39c)
- Update IDC dataset with new views and
v6
version (#266) (02cae2b)
2.7.0 (2021-12-14)
- Onboard CDC Places Dataset (#241) (e2fcb0c)
- Onboard Cloud Storage Geo Index Dataset (#219) (27a2c8e)
- Onboard EPA historical air quality dataset (#221) (6267b82)
- Onboard FDA food dataset (#223) (f0ced96)
- Onboard IDC PDP datasets (#230) (3f944df)
- Namespace Terraform resources under dataset names (#227) (a3f4b34)
- Renamed dataset from
sunroof
tosunroof_solar
(#226) (0780df8)
2.6.0 (2021-11-04)
- Onboard Austin Waste dataset (#200) (79dbf5d)
- Onboard BLS dataset (#201) (c7cdd82)
- Onboard Chicago Crime dataset (#199) (d766547)
- Onboard Sunroof Solar dataset (#166) (375cbae)
- Onboard World Bank Intl Education dataset (#182) (ff384fd)
- Onboard World Bank WDI dataset (#198) (cbad321)
2.5.0 (2021-10-14)
- Onboard Iowa Liquor Sales dataset (#193) (06848c8)
- Onboard San Francisco Bikeshare Station dataset (#191) (0707012)
- Onboard San Francisco Bikeshare Status dataset (#192) (e4e1f26)
- Onboard San Francisco Film Locations dataset (#190) (2284e09)
- Combine
san_francisco_bikeshare_*
folders intosan_francisco_bikeshare
(#211) (50e4e6d) - Rename
san_francisco_311_service_requests
folder tosan_francisco_311
(#209) (697f7be)
2.4.0 (2021-10-08)
- Onboard Austin Crime dataset (#174) (b4fbaad)
- Onboard CMS Medicare dataset (#185) (d0425cd)
- Onboard COVID-19 Google Mobility dataset (#177) (1653a8e)
- Onboard New York datasets: 311 Service Requests, Citibike Stations, and Tree Census (#167) (d1f1d7c)
- Onboard San Francisco 311 Service Requests dataset (#184) (a8ba2e9)
- Onboard San Francisco Street Trees dataset (#176) (7da5061)
- Onboard World Bank Health Population dataset (#178) (4aba767)
- Onboard World Bank International Debt dataset (#179) (5ebbabb)
2.3.1 (2021-09-28)
- Delete temp GCS objects generated by gsutil's parallel composite upload for
geos_fp
dataset (#195) (f307cce) - Use patched
flask-openid
version to fix failing builds (#188) (1ea15a0)
2.3.0 (2021-09-10)
- Onboard
google_political_ads.advertiser_geo_spend
dataset (#154) (2201ebe) - Onboard Austin Bikeshare dataset (#156) (0bd5659)
- Onboard NOAA's GSOD Stations and Lightning Strikes datasets (#158) (8371856)
2.2.0 (2021-08-27)
- Onboard COVID19-Italy dataset (#148) (f56b5f2)
- Onboard GEOS-FP dataset (#130) (d32f46b)
- Onboard Google CFE dataset (#146) (9bca8ef)
- Onboard Google Political Ads dataset (#149) (5903253)
- Onboard IRS 990 dataset (#150) (1105eed)
- Regenerate Terraform files for Google Political Ads (#152) (102f8e5)
- shared_variables.json should not be reset when deploying (#147) (a6754df)
2.1.0 (2021-08-13)
2.0.0 (2021-08-11)
- Pipeline YAML template using Airflow 2 operators (#138)
- Adds support for Airflow 2 Cloud Composer environment and operators (#134)
- Adds support for Airflow 2 Cloud Composer environment and operators (#134) (b2749c6)
- Pipeline YAML template using Airflow 2 operators (#138) (90ae7cd)
1.11.0 (2021-07-22)
- Adds Google license header bot config (#106) (d587689)
- Use a single file for shared Airflow variables (#122) (f5d227d)
1.10.0 (2021-07-21)
1.9.0 (2021-07-15)
1.8.0 (2021-07-01)
1.7.0 (2021-06-24)
1.6.0 (2021-06-17)
1.5.1 (2021-06-15)
1.5.0 (2021-06-14)
1.4.1 (2021-06-09)
1.4.0 (2021-06-08)
1.3.0 (2021-06-08)
1.2.0 (2021-06-02)
- Configure Renovate (#36) (d6fd93b)
- Support deploying a single pipeline in a dataset (#46) (8bdb8d7)
- Support Terraform remote state when generating GCP resources (#39) (9e01936)
1.1.0 (2021-05-26)
- Support building and pushing container images shared within a dataset folder (#27) (de9d1b9)
- support user-supplied bucket name prefix (#23) (610a9b7)
- added The COVID Tracking Project dataset
- added Vizgen MERFISH Mouse Brain Map dataset (#17)
- added Penguins dataset for ML tutorial (#15)