Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AthenaSQLHook #36171

Merged
merged 1 commit into from
Jan 17, 2024
Merged

Implement AthenaSQLHook #36171

merged 1 commit into from
Jan 17, 2024

Conversation

flolas
Copy link
Contributor

@flolas flolas commented Dec 11, 2023

In this Pull Request, we're introducing the AthenaSQLHook, an implementation leveraging PyAthena. This development opens doors for us to interact with Athena using the familiar DBApiHook operators.
Close: #34823


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Dec 11, 2023
@flolas flolas force-pushed the feat/athena_sql_hook branch from a8baa4d to 2de24b6 Compare December 11, 2023 18:13
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docs for using the operator.
We probably will need also index added to explain the difference between the two options and then point to the specific doc of the relevant option.

You can see example of how we do it for Redshift
https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/redshift/index.html

Should be easy to be done. If you need help with it let me know

@flolas flolas force-pushed the feat/athena_sql_hook branch 3 times, most recently from 70207e8 to a9f651f Compare December 14, 2023 16:52
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing the feedbacks, I'd like to have the other opinions before merging though

@flolas flolas force-pushed the feat/athena_sql_hook branch from a9f651f to 93dae69 Compare December 15, 2023 02:06
@flolas flolas force-pushed the feat/athena_sql_hook branch from 071fa23 to 37e80f9 Compare December 16, 2023 14:29
@flolas flolas requested a review from eladkal December 16, 2023 14:37
@flolas flolas force-pushed the feat/athena_sql_hook branch 2 times, most recently from a32133a to 5736fd2 Compare December 18, 2023 11:33
@flolas
Copy link
Contributor Author

flolas commented Dec 18, 2023

@eladkal could you help me with the spellcheck test? can't find what's wrong. thx!
https://github.com/apache/airflow/actions/runs/7247473202/job/19742207009?pr=36171

@vincbeck
Copy link
Contributor

apache-airflow-providers                                     
  apache-airflow-providers                                     Traceback (most recent call last):
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/cmd/build.py", line 281, in build_main
  apache-airflow-providers                                         app.build(args.force_all, args.filenames)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/application.py", line 347, in build
  apache-airflow-providers                                         self.builder.build_update()
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 307, in build_update
  apache-airflow-providers                                         self.build(['__all__'], to_build)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 376, in build
  apache-airflow-providers                                         self.write(docnames, list(updated_docnames), method)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 571, in write
  apache-airflow-providers                                         self._write_serial(sorted(docnames))
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 581, in _write_serial
  apache-airflow-providers                                         self.write_doc(docname, doctree)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/contextlib.py", line 120, in __exit__
  apache-airflow-providers                                         next(self.gen)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/util/logging.py", line 218, in pending_warnings
  apache-airflow-providers                                         memhandler.flushTo(logger)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/util/logging.py", line 183, in flushTo
  apache-airflow-providers                                         logger.handle(record)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/logging/__init__.py", line 1599, in handle
  apache-airflow-providers                                         self.callHandlers(record)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/logging/__init__.py", line 1661, in callHandlers
  apache-airflow-providers                                         hdlr.handle(record)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/logging/__init__.py", line 950, in handle
  apache-airflow-providers                                         rv = self.filter(record)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/logging/__init__.py", line 811, in filter
  apache-airflow-providers                                         result = f.filter(record)
  apache-airflow-providers                                       File "/usr/local/lib/python3.8/site-packages/sphinx/util/logging.py", line 426, in filter
  apache-airflow-providers                                         raise exc
  apache-airflow-providers                                     sphinx.errors.SphinxWarning: <unknown>:17:unknown document: 'apache-airflow-providers-amazon:operators/athena/athena'
  apache-airflow-providers                                     
  apache-airflow-providers                                     [91mWarning, treated as error:[39;49;00m
  apache-airflow-providers                                     <unknown>:17:unknown document: 'apache-airflow-providers-amazon:operators/athena/athena'

@vincbeck
Copy link
Contributor

vincbeck commented Dec 18, 2023

Interesting, I have a very similar error message in one of my open PRs and cannot figure out why. I am very interested if someone knows the reason

@vincbeck
Copy link
Contributor

It seems, Sphinx has some issues referencing a document not built yet. See https://apache-airflow.slack.com/archives/CJ1LVREHX/p1702929226260749. In other words, here you are creating a new file (renaming a file is like creating a new file for Sphinx) and are referring this file. It seems Sphinx does not like it.

One way to solve it would be to keep the old file for now and delete it in another PR

@flolas flolas force-pushed the feat/athena_sql_hook branch from 5736fd2 to e87f135 Compare December 18, 2023 23:32
@flolas flolas force-pushed the feat/athena_sql_hook branch from e443580 to e5dcfd1 Compare December 19, 2023 01:57
@simonprydden
Copy link
Contributor

Will this work with AWS credentials provider chain? I'm thinking about mwaa environments where the aws_default conn_id is used for most aws hooks.

@eladkal
Copy link
Contributor

eladkal commented Jan 14, 2024

@flolas what is the status of this PR? I'm hoping to get it to the next provider release.

@flolas flolas force-pushed the feat/athena_sql_hook branch 2 times, most recently from 2aed8f6 to 25fffe0 Compare January 15, 2024 16:09
@flolas
Copy link
Contributor Author

flolas commented Jan 15, 2024

@eladkal Ummh, I can't figure out why spellcheck test fails with errors like this:

  apache-airflow-providers-celery                              Traceback (most recent call last):
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinx/cmd/build.py", line 281, in build_main
  apache-airflow-providers-celery                                  app.build(args.force_all, args.filenames)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinx/application.py", line 347, in build
  apache-airflow-providers-celery                                  self.builder.build_update()
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 307, in build_update
  apache-airflow-providers-celery                                  self.build(['__all__'], to_build)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 379, in build
  apache-airflow-providers-celery                                  self.finish()
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinxcontrib/spelling/builder.py", line 277, in finish
  apache-airflow-providers-celery                                  logger.warning('Found %d misspelled words',
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 1812, in warning
  apache-airflow-providers-celery                                  self.log(WARNING, msg, *args, **kwargs)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinx/util/logging.py", line 123, in log
  apache-airflow-providers-celery                                  super().log(level, msg, *args, **kwargs)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 1844, in log
  apache-airflow-providers-celery                                  self.logger.log(level, msg, *args, **kwargs)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 1512, in log
  apache-airflow-providers-celery                                  self._log(level, msg, args, **kwargs)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 1589, in _log
  apache-airflow-providers-celery                                  self.handle(record)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 1599, in handle
  apache-airflow-providers-celery                                  self.callHandlers(record)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 1661, in callHandlers
  apache-airflow-providers-celery                                  hdlr.handle(record)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 950, in handle
  apache-airflow-providers-celery                                  rv = self.filter(record)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/logging/__init__.py", line 811, in filter
  apache-airflow-providers-celery                                  result = f.filter(record)
  apache-airflow-providers-celery                                File "/usr/local/lib/python3.8/site-packages/sphinx/util/logging.py", line 426, in filter
  apache-airflow-providers-celery                                  raise exc
  apache-airflow-providers-celery                              sphinx.errors.SphinxWarning: Found 1 misspelled words
  apache-airflow-providers-celery                              
  apache-airflow-providers-celery                              [91mWarning, treated as error:[39;49;00m
  apache-airflow-providers-celery                              Found 1 misspelled words
  apache-airflow-providers-celery                             : ############################################################

After resolving this issue (if you could give me a hand, that would be great), I need to fix some hook tests due to some changes I made in response to @Taragolis' comments/suggestions (but that's the easy part).

@vincbeck
Copy link
Contributor

I can see:

File path: apache-airflow-providers-celery/_api/airflow/providers/celery/executors/celery_kubernetes_executor/index.rst
Incorrect Spelling: 'taskinstancekey'
Line with Error: 'airflow.models.taskinstancekey.TaskInstanceKey'

Adding taskinstancekey in docs/spelling_wordlist.txt should help.

@flolas
Copy link
Contributor Author

flolas commented Jan 15, 2024

I can see:

File path: apache-airflow-providers-celery/_api/airflow/providers/celery/executors/celery_kubernetes_executor/index.rst
Incorrect Spelling: 'taskinstancekey'
Line with Error: 'airflow.models.taskinstancekey.TaskInstanceKey'

Adding taskinstancekey in docs/spelling_wordlist.txt should help.

Yeah sounds good, but why is this showing up in my PR when I'm only changing the Amazon provider and nothing else? Is this a expected behaviour of the spellcheck test?

@flolas flolas force-pushed the feat/athena_sql_hook branch 2 times, most recently from a43268e to fadd899 Compare January 16, 2024 02:53
Co-authored-by: Vincent <[email protected]>
Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
@flolas flolas force-pushed the feat/athena_sql_hook branch from fadd899 to 28cc91f Compare January 17, 2024 03:46
@flolas flolas requested review from Taragolis and eladkal January 17, 2024 12:27
@flolas
Copy link
Contributor Author

flolas commented Jan 17, 2024

@eladkal Tests seems good now 👍 , can you review again the code? I made some changes to address @Taragolis comments.

thanks!

@eladkal eladkal merged commit 6661272 into apache:main Jan 17, 2024
80 checks passed
@eladkal
Copy link
Contributor

eladkal commented Jan 17, 2024

🎉 🎉🎉

potiuk pushed a commit that referenced this pull request Feb 7, 2024
Co-authored-by: Vincent <[email protected]>
Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
(cherry picked from commit 6661272)
@potiuk potiuk added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 8, 2024
@potiuk potiuk added this to the Airflow 2.8.2 milestone Feb 8, 2024
ephraimbuddy pushed a commit that referenced this pull request Feb 22, 2024
Co-authored-by: Vincent <[email protected]>
Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: Andrey Anshin <[email protected]>
(cherry picked from commit 6661272)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add AthenaSqlHook based on PyAthena
7 participants