-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Looker PDT operators #20882
Add Looker PDT operators #20882
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
Can you please rebase to latest main ? |
41bd24d
to
ace75c3
Compare
Thanks for reminder @potiuk, rebased! |
I think another reabase (and fixes to failing checks) are needed |
from airflow.providers.google.cloud.hooks.looker import LookerHook | ||
|
||
|
||
class LookerStartPdtBuildOperator(BaseOperator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are in the process to add Links to the GCP resources for most of the operators. Maybe this a good time to add this also to the Looker operators.
Example how you can do this you can find in the Dataproc Metastore:
#21267
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Is it the same as the extra links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly
ea4b1ed
to
a8e65ec
Compare
852df42
to
fc60226
Compare
docs/apache-airflow-providers-google/operators/cloud/looker.rst
Outdated
Show resolved
Hide resolved
class LookerHook(BaseHook): | ||
"""Hook for Looker APIs.""" | ||
|
||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if Looker is available for selection during connection setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by Looker is available for selection during connection setup
. Do you mean the drop down list with connection types (Looker connection has HTTP
type)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I mean this dropdown list.
I think you should create a new connection type and then add a new field in the connection form (timeout
, veriify_ssl
).
airflow/airflow/providers/snowflake/hooks/snowflake.py
Lines 79 to 132 in 3f9295a
conn_name_attr = 'snowflake_conn_id' | |
default_conn_name = 'snowflake_default' | |
conn_type = 'snowflake' | |
hook_name = 'Snowflake' | |
supports_autocommit = True | |
@staticmethod | |
def get_connection_form_widgets() -> Dict[str, Any]: | |
"""Returns connection widgets to add to connection form""" | |
from flask_appbuilder.fieldwidgets import BS3TextFieldWidget | |
from flask_babel import lazy_gettext | |
from wtforms import BooleanField, StringField | |
return { | |
"extra__snowflake__account": StringField(lazy_gettext('Account'), widget=BS3TextFieldWidget()), | |
"extra__snowflake__warehouse": StringField( | |
lazy_gettext('Warehouse'), widget=BS3TextFieldWidget() | |
), | |
"extra__snowflake__database": StringField(lazy_gettext('Database'), widget=BS3TextFieldWidget()), | |
"extra__snowflake__region": StringField(lazy_gettext('Region'), widget=BS3TextFieldWidget()), | |
"extra__snowflake__role": StringField(lazy_gettext('Role'), widget=BS3TextFieldWidget()), | |
"extra__snowflake__insecure_mode": BooleanField( | |
label=lazy_gettext('Insecure mode'), description="Turns off OCSP certificate checks" | |
), | |
} | |
@staticmethod | |
def get_ui_field_behaviour() -> Dict[str, Any]: | |
"""Returns custom field behaviour""" | |
import json | |
return { | |
"hidden_fields": ['port'], | |
"relabeling": {}, | |
"placeholders": { | |
'extra': json.dumps( | |
{ | |
"authenticator": "snowflake oauth", | |
"private_key_file": "private key", | |
"session_parameters": "session parameters", | |
}, | |
indent=1, | |
), | |
'schema': 'snowflake schema', | |
'login': 'snowflake username', | |
'password': 'snowflake password', | |
'extra__snowflake__account': 'snowflake account name', | |
'extra__snowflake__warehouse': 'snowflake warehouse name', | |
'extra__snowflake__database': 'snowflake db name', | |
'extra__snowflake__region': 'snowflake hosted region', | |
'extra__snowflake__role': 'snowflake role', | |
'extra__snowflake__insecure_mode': 'insecure mode', | |
}, | |
} |
This will also allow you to rename the "Login" field to "Client ID" and the "Password" field to "Client Secret".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same thoughts first, but then I decided not to create a new connection as Looker in the process of transitioning to GCP and ideally it should use GCP connection soon. So I didn't want for users to get used to this temporary Looker connection. Hope that does make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, users will have to consciously migrate to the new authentication method, which is likely to take some time. As part of the migration, we can start accepting GCP connections and add "(Deprecated)" to the name of the old connection type. I don't see why we should now make it difficult to configure a connection by not adding this connection to the UI now.
If you don't want to add the ability to easily create connections via UI, we can limit ourselves to documentation only, but we should make sure that everything is precisely described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to @potiuk and he said that the next google provider release is happening tomorrow. I will try to implement it by then, but we would really like to launch it with this release to avoid waiting for another month for the next one.
@potiuk - just rebased! |
I see a dependency conflict for
|
Then. I think you will need to remove While the dask limitation was clearly a dask problem, I think solving the Just removing the limit should trigger the update and you might be able to see the problems |
Ah no I looked clower. It seems that there is still some other reason to hold Beam from upgrading to latest version but what it is I do not know yet - I will look tomorrow. |
Ah. likely constraints not refreshed yet :). Let's see tomorrow morning. |
Right. Constraints refreshed but indeed typing-extensions are too limiting. Attempting to see what the problems are and fix it in #22046 |
…provider, setup, logo
0b9ebb2
to
8cb6602
Compare
Al right - @alekseiloginov - I rebased your change on top of the latest main (which should already have typing-extenasions and constraints fixed). |
🤞 ! |
Thanks a lot, @potiuk ! Most of the tests are passing now! 👍
|
That was a flaky testf for mssql: ERROR: for mssqlsetup Container "130ef72be1ab" is unhealthy. Merging. |
Awesome work, congrats on your first merged pull request! |
🎉 |
Preparing the release now. Good stuff. I just checked that Last Python 3.10 obstacle is removed and we just need to release all providers now with Python 3.10 support to be able to add 3.10 :) |
Description
This PR adds new Airflow operators that make use of the new Looker's Start/Stop API for Persistent Derived Tables (PDTs). These operators allow to manage PDT materialization via Looker API.
Currently supported operations are:
Under the hood these new operators use Looker SDK - a python client for Looker API.
Note: Looker PDT operators are being released as a part of Google Cloud operators. While Looker is a part of GCP now, it's still in the process of migrating to the GCP infrastructure. That's why for now we can't use Google Cloud connection, and a custom Looker-specific Airflow connection needs to be created.
Changes
This PR adds a new Looker operator, sensor, hook, example DAGs, tests, docs, and logo.
It also updates provider and setup files.
New classes:
LookerStartPdtBuildOperator
LookerCheckPdtBuildSensor
LookerHook
Tests
This PR adds the following unit tests:
TestLookerStartPdtBuildOperator
LookerStartPdtBuildOperator
cancel_on_kill
flag that specifies whether to cancel the PDT build or not, when a task is manually killedTestLookerCheckPdtBuildSensor
done
,error
,wait
,cancelled
TestLookerHook
wait_for_job
used byLookerStartPdtBuildOperator
in synchronous modecheck_pdt_build
used to get a build status from Looker SDKstart_pdt_build
used to start a build via Looker SDKstop_pdt_build
used to stop a build via Looker SDKNote: There are no system tests since Looker doesn't have an automated process to create a test environment.
Documentation
The main documentation is in
looker.rst
file as well as in docstrings.looker.rst
contains:Screenshots
Example of using Looker PDT operator in synchronous mode (start and status are combined in one task)
Example of using Looker PDT operator in asynchronous mode (start and status are separate tasks)