Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of testing Providers that were prepared on August 15, 2022 #25721

Closed
13 of 18 tasks
potiuk opened this issue Aug 15, 2022 · 9 comments
Closed
13 of 18 tasks

Status of testing Providers that were prepared on August 15, 2022 #25721

potiuk opened this issue Aug 15, 2022 · 9 comments
Labels
kind:meta High-level information important to the community testing status Status of testing releases

Comments

@potiuk
Copy link
Member

potiuk commented Aug 15, 2022

Body

I have a kind request for all the contributors to the latest provider packages release.
Could you please help us to test the RC versions of the providers?

Let us know in the comment, whether the issue is addressed.

Those are providers that require testing as there were some substantial changes introduced:

Provider common.sql: 1.1.0rc4

Provider databricks: 3.2.0rc4

The guidelines on how to test providers can be found in

Verify providers by contributors

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.
@potiuk potiuk added kind:meta High-level information important to the community testing status Status of testing releases labels Aug 15, 2022
@josh-fell
Copy link
Contributor

#25538 Looks good in the packaged files.

@kazanzhy
Copy link
Contributor

I must say that I do not have prepared integration tests for the SQL provider.
I looked over #23971 again and seems ok.
But it'll be great if someone help to test this PR.
@josh-fell @FanatoniQ @alexott

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2022

Tested:

#25299:

>>> from airflow.providers.apache.hive.hooks.hive import HiveServer2Hook
>>> hook = HiveServer2Hook()
>>> hook.get_pandas_df(hql='xxx')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: get_pandas_df() missing 1 required positional argument: 'sql'

#25293

root@a49eb20ef97a:/opt/airflow# airflow providers list
package_name                        | description                                                                 | version
====================================+=============================================================================+========
apache-airflow-providers-common-sql | Common SQL Provider https://en.wikipedia.org/wiki/SQL                       | 1.0.0  
apache-airflow-providers-ftp        | File Transfer Protocol (FTP) https://tools.ietf.org/html/rfc114             | 3.0.0  
apache-airflow-providers-http       | Hypertext Transfer Protocol (HTTP) https://www.w3.org/Protocols/            | 3.0.0  
apache-airflow-providers-imap       | Internet Message Access Protocol (IMAP) https://tools.ietf.org/html/rfc3501 | 3.0.0  
apache-airflow-providers-postgres   | PostgreSQL https://www.postgresql.org/                                      | 5.2.0  
apache-airflow-providers-sqlite     | SQLite https://www.sqlite.org/                                              | 3.0.0  
                                                                                                                           
root@a49eb20ef97a:/opt/airflow# python
Python 3.7.13 (default, Aug  2 2022, 12:15:43) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.sensors.sql import SqlSensor
>>> sensor = SqlSensor(conn_id='postgres_default', sql='SELECT * FROM DUAL', task_id='id')
>>> sensor._get_hook()
[2022-08-16 12:44:15,582] {base.py:68} INFO - Using connection ID 'postgres_default' for task execution.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/airflow/sensors/sql.py", line 84, in _get_hook
    f'The connection type is not supported by {self.__class__.__name__}. '
airflow.exceptions.AirflowException: The connection type is not supported by SqlSensor. The associated hook should be a subclass of `DbApiHook`. Got PostgresHook
>>> quit()
root@a49eb20ef97a:/opt/airflow# pip install apache-airflow-providers-common-sql==1.1.0rc4
Collecting apache-airflow-providers-common-sql==1.1.0rc4
  Downloading apache_airflow_providers_common_sql-1.1.0rc4-py3-none-any.whl (27 kB)
Requirement already satisfied: sqlparse>=0.4.2 in /usr/local/lib/python3.7/site-packages (from apache-airflow-providers-common-sql==1.1.0rc4) (0.4.2)
Installing collected packages: apache-airflow-providers-common-sql
  Attempting uninstall: apache-airflow-providers-common-sql
    Found existing installation: apache-airflow-providers-common-sql 1.0.0
    Uninstalling apache-airflow-providers-common-sql-1.0.0:
      Successfully uninstalled apache-airflow-providers-common-sql-1.0.0
Successfully installed apache-airflow-providers-common-sql-1.1.0rc4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@a49eb20ef97a:/opt/airflow# python
Python 3.7.13 (default, Aug  2 2022, 12:15:43) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.sensors.sql import SqlSensor
>>> sensor = SqlSensor(conn_id='postgres_default', sql='SELECT * FROM DUAL', task_id='id')
>>> sensor._get_hook()
[2022-08-16 12:45:24,882] {base.py:68} INFO - Using connection ID 'postgres_default' for task execution.
<airflow.providers.postgres.hooks.postgres.PostgresHook object at 0x7f3ddfa16750>
>>> 

#25350

[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.providers.common.sql.operators.sql import (
    SQLCheckOperator,
    SQLIntervalCheckOperator,
    SQLValueCheckOperator,
)
>>> 

#25713

common-sql: 1.1.0rc4:

>>> from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
>>> hook = DatabricksSqlHook()
>>> hook.run(sql='SELECT * FROM TABLE')
[2022-08-16 12:51:21,954] {base.py:68} INFO - Using connection ID 'databricks_default' for task execution.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks_sql.py", line 179, in run
    with closing(self.get_conn()) as conn:
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks_sql.py", line 105, in get_conn
    "http_path should be provided either explicitly, "
airflow.exceptions.AirflowException: http_path should be provided either explicitly, or in extra parameter of Databricks connection, or sql_endpoint_name should be specified
>>> import logging
>>> hook.log.setLevel(logging.DEBUG)
>>> hook.run(sql='SELECT * FROM TABLE')
[2022-08-16 12:53:00,963] {databricks_sql.py:172} DEBUG - Executing following statements against Databricks DB: ['SELECT * FROM TABLE']

common-sql: 1.1.0rc3:

root@a49eb20ef97a:/opt/airflow# pip install apache-airflow-providers-common-sql==1.1.0rc3
Collecting apache-airflow-providers-common-sql==1.1.0rc3
  Downloading apache_airflow_providers_common_sql-1.1.0rc3-py3-none-any.whl (27 kB)
Requirement already satisfied: sqlparse>=0.4.2 in /usr/local/lib/python3.7/site-packages (from apache-airflow-providers-common-sql==1.1.0rc3) (0.4.2)
Installing collected packages: apache-airflow-providers-common-sql
  Attempting uninstall: apache-airflow-providers-common-sql
    Found existing installation: apache-airflow-providers-common-sql 1.0.0
    Uninstalling apache-airflow-providers-common-sql-1.0.0:
      Successfully uninstalled apache-airflow-providers-common-sql-1.0.0
Successfully installed apache-airflow-providers-common-sql-1.1.0rc3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@a49eb20ef97a:/opt/airflow# python
Python 3.7.13 (default, Aug  2 2022, 12:15:43) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
>>> hook = DatabricksSqlHook()
>>> import logging
>>> hook.log.setLevel(logging.DEBUG)
>>> hook.run(sql='SELECT * FROM TABLE')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/databricks/hooks/databricks_sql.py", line 174, in run
    raise ValueError("List of SQL statements is empty")
ValueError: List of SQL statements is empty

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2022

Would be great @alexott if you could test the Databricks provider together with the new common.sql :).

Also I found one potential issue with default split_sql set to True - the 3.2.0rc4 new databricks provider actually depends on the 1.1.0 feature of "common-sql' - but we do not have min-requirement for it (for other providers it's non-default feature so they do not REALLY depend on this version). It's very unlikely for someone to get them not upgraded at the same time, but this is a posibility. I do not think this is worth rc5 - rather than that I'd put common-sql>= 1.1.0 in the next version. But maybe it's worth delaying the release of databricks and do it well.

WDYT @alexott @kazanzhy ?

Those are the current deps:

dependencies:
  - apache-airflow>=2.2.0
  - apache-airflow-providers-common-sql
  - requests>=2.27,<3
  - databricks-sql-connector>=2.0.0, <3.0.0
  - aiohttp>=3.6.3, <4

The "proper" change would be:

  - apache-airflow-providers-common-sql>=1.1.0

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2022

Actually seems that more providers are afffected - see #25740 . I have not realised we have another implicit dependency there as >= 1.1.0 should also be added for all other providers also using strip_sql_string. This will get fixed for those who upgrade to 1.1.0 of comon sql (once we release) but I think we should bump all the providers that depend on common-sql to have >=1.1.0 - also just in case we missed some other implicit dependency.

BTW. Those are the kind of problems i anticipated with common-sql. Managing those kind of cross-dependencies is something that I think we should be more careful about in general. But this is a good learning for the future when we do more of this kind of approach and maybe I will figure some kind of automation to help with that. I will think of that :)

@alexott -> I think this is quite ok to release this RC4 now without the limit, but right after this vote is complete, I think I will release a new wave of sql providers that do not have the common-sql limit to bump it up (including the databricks 3.2.1) WDYT ?

@alexott
Copy link
Contributor

alexott commented Aug 16, 2022

I'm ok with releasing it right now in that shape, I'm testing actual functionality right now

@alexott
Copy link
Contributor

alexott commented Aug 16, 2022

Ok, I tested few DAGs, SQL works as before... so LGTM

@jgr-trackunit
Copy link
Contributor

jgr-trackunit commented Aug 17, 2022

databricks: 3.2.0rc4 looks good on MWAA==2.2.2 Now provider sends proper data to databricks i.e. "enable_elastic_disk": True, and that's the result:

image

well LGTM

@eladkal
Copy link
Contributor

eladkal commented Aug 19, 2022

Released

@eladkal eladkal closed this as completed Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:meta High-level information important to the community testing status Status of testing releases
Projects
None yet
Development

No branches or pull requests

6 participants