-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(trino): Add functionality to upload data #29164
feat(trino): Add functionality to upload data #29164
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #29164 +/- ##
===========================================
+ Coverage 60.48% 83.73% +23.24%
===========================================
Files 1931 518 -1413
Lines 76236 37547 -38689
Branches 8568 0 -8568
===========================================
- Hits 46114 31441 -14673
+ Misses 28017 6106 -21911
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
2680b22
to
97d011d
Compare
@@ -132,6 +132,7 @@ gevent = ["gevent>=23.9.1"] | |||
gsheets = ["shillelagh[gsheetsapi]>=1.2.18, <2"] | |||
hana = ["hdbcli==2.4.162", "sqlalchemy_hana==0.4.0"] | |||
hive = [ | |||
"boto3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is imported within the HiveEngineSpec
and thus likely should be defined as a depedency.
@@ -154,7 +155,7 @@ pinot = ["pinotdb>=0.3.3, <0.4"] | |||
playwright = ["playwright>=1.37.0, <2"] | |||
postgres = ["psycopg2-binary==2.9.6"] | |||
presto = ["pyhive[presto]>=0.6.5"] | |||
trino = ["trino>=0.328.0"] | |||
trino = ["boto3", "trino>=0.328.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
@@ -79,6 +79,12 @@ def upload_to_s3(filename: str, upload_prefix: str, table: Table) -> str: | |||
) | |||
|
|||
s3 = boto3.client("s3") | |||
|
|||
# The location is merely an S3 prefix and thus we first need to ensure that there is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered this issue when testing locally, i.e., given that the location field is a directory as opposed to a file any file within said directory is slurped into the table, hence why we should first ensure that the location is empty.
This reverts commit 53798c7.
SUMMARY
This PR adds functionality to allow Trino to upload data via a CSV which previously resided as custom logic at Airbnb. This is based on the logic outlined in the
HiveEngineSpec
.Note I'm not certain whether this is best housed in the
TrinoEngineSpec
orPrestoBaseEngineSpec
. Airbnb currently only supports Trino and thus I wouldn't be able to provide end-to-end testing for Presto.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Added tests—mimicking the Hive logic.
ADDITIONAL INFORMATION