Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3203] Fix DockerOperator & some operator test #4049

Merged
merged 1 commit into from
Oct 20, 2018

Conversation

XD-DENG
Copy link
Member

@XD-DENG XD-DENG commented Oct 13, 2018

Jira

Description

  • For argument image, no need to explicitly add "latest" if tag is omitted.
    "latest" will be used by default if no tag provided. This is handled by docker package itself.

  • Intermediate variable cpu_shares is not needed.

  • Fix out-dated usage of cpu_shares and cpu_shares.
    Based on https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
    They should be an arguments of self.cli.create_host_config() rather than APIClient.create_container().

  • Change name of the corresponding test script, to ensure it can be discovered.

  • Fix the test itself.

  • This PR also fixed part of the test scripts of Operators. They were named incorrectly which result in test discovery failure.

@codecov-io
Copy link

codecov-io commented Oct 13, 2018

Codecov Report

Merging #4049 into master will increase coverage by 1.77%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4049      +/-   ##
==========================================
+ Coverage   75.91%   77.69%   +1.77%     
==========================================
  Files         199      199              
  Lines       15948    15944       -4     
==========================================
+ Hits        12107    12387     +280     
+ Misses       3841     3557     -284
Impacted Files Coverage Δ
airflow/operators/docker_operator.py 97.61% <100%> (+97.61%) ⬆️
airflow/jobs.py 82.48% <0%> (+0.35%) ⬆️
airflow/hooks/hive_hooks.py 73.42% <0%> (+0.52%) ⬆️
airflow/operators/hive_operator.py 86.53% <0%> (+5.76%) ⬆️
airflow/utils/file.py 84% <0%> (+8%) ⬆️
airflow/operators/python_operator.py 95.03% <0%> (+13.04%) ⬆️
airflow/operators/subdag_operator.py 90.32% <0%> (+19.35%) ⬆️
airflow/operators/latest_only_operator.py 90% <0%> (+65%) ⬆️
airflow/operators/s3_to_hive_operator.py 94.01% <0%> (+94.01%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 719e0b1...111a803. Read the comment docs.

@XD-DENG
Copy link
Member Author

XD-DENG commented Oct 14, 2018

Hi @Fokko,

As we discussed in the email list yesterday, some scripts in /test should be renamed to ensure they can be discovered by the CI process.

This PR only addresses issues for DockerOperator though. There are some issues in both the DockerOperator itself & its test (including name of the test script).

For other operators whose test were missed due to file name, I suspect there are accumulated issues to fix as well (it may not be as easy as just prepending test_ to the test file name). I may take a look into them later when I get time.

Cheers

- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
@XD-DENG XD-DENG force-pushed the refine_DockerOperator branch from 3972467 to 111a803 Compare October 15, 2018 13:05
@XD-DENG XD-DENG changed the title [AIRFLOW-3203] Fix DockerOperator & its test [AIRFLOW-3203] Fix DockerOperator & some operator test Oct 15, 2018
@XD-DENG
Copy link
Member Author

XD-DENG commented Oct 15, 2018

Hi @Fokko ,

I have changed the name of some of the Operator test scripts (prepend with "test_"), after making sure they don't raise any exception and work as designed, including

  • tests/operators/docker_operator.py (with some code change)
  • tests/operators/hive_operator.py
  • tests/operators/latest_only_operator.py
  • tests/operators/python_operator.py
  • tests/operators/s3_to_hive_operator.py
  • tests/operators/slack_operator.py
  • tests/operators/subdag_operator.py

There are another two,

  • tests/operators/bash_operator.py
  • tests/operators/operator.py
    needing more works. I may fix them later (if I get time & nobody else picks them up).

@XD-DENG
Copy link
Member Author

XD-DENG commented Oct 15, 2018

In addition, I would suggest to include this commit into 1.10.1 which is intended to fix bugs. Re-enabling these tests should be useful to ensure less potential bugs.

if ':' not in self.image:
image = self.image + ':latest'
else:
image = self.image
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these remove? (I can't work out from the diff alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashb , this is because I found out that this will be handled by docker package itself.

Here this operator is using docker.APIClient.create_container(image=...) to create the container under the hood. If the image is just the name withOUT tag, say "fake_image", then the program will go search for "fake_image:latest" by default (of course it will search for the tag given by user if user has specified the tag).

You can try the code below to validate:

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image")

It will tell you something like "'fake_image:latest' can't be found" (I don't have Docker on the laptop I'm using at this moment so can't provide the exact exception). Please help double-validate.

Cheers

Copy link
Member Author

@XD-DENG XD-DENG Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashb , I have checked on a machine with Docker installed.

Without Tag

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image")

Result:

ImageNotFound: 404 Client Error: Not Found ("No such image: fake_image:latest")

With Tag

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image:version_1")

Result:

ImageNotFound: 404 Client Error: Not Found ("No such image: fake_image:version_1")

Conclusion

So as I shared earlier: if tag is omitted for image, "latest" will be used by default; if tag is provided by user, then that tag will be used.

So here in DockerOperator, we don't need to explicitly check & add ":latest".

Copy link
Member

@feluelle feluelle Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @XD-DENG were you really certain about that? Because I noticed something weird when I use for example just r-base as an image name.

The DockerOperator task pulled every image.

r-base                                                     latest              62c848eeb175        2 weeks ago         649MB
r-base                                                     3.5.3               62c848eeb175        2 weeks ago         649MB
r-base                                                     3.5.2               880eb7da671b        5 weeks ago         655MB
r-base                                                     3.5.1               bd9edc1a85ed        4 months ago        712MB
r-base                                                     3.5.0               190658892827        9 months ago        692MB
r-base                                                     3.4.4               d1325eaa28ad        11 months ago       667MB
r-base                                                     3.4.3               d1e1c25485af        13 months ago       670MB
r-base                                                     3.4.2               02d3b7e00020        17 months ago       651MB
r-base                                                     3.4.1               0ab131e275e4        19 months ago       1.18GB
r-base                                                     3.4.0               5a6c58403310        22 months ago       656MB
r-base                                                     3.3.3               88436550cddc        2 years ago         635MB
r-base                                                     3.3.2               38270dc4745b        2 years ago         635MB
r-base                                                     3.3.1               7ba1baf9d8bb        2 years ago         657MB
r-base                                                     3.3.0               8f62a54a58ab        2 years ago         970MB
r-base                                                     3.2.5               ee4ea743f431        2 years ago         1.05GB
r-base                                                     3.2.4               9d9c50e41475        3 years ago         1.07GB
r-base                                                     3.2.3               1844ae3ec8d8        3 years ago         1.03GB
r-base                                                     3.2.2               7201dfdf7e21        3 years ago         1.01GB
r-base                                                     3.2.1               62e43f6f9a91        3 years ago         563MB
r-base                                                     3.2.0               9adf6ba7ea84        3 years ago         552MB
r-base                                                     3.1.3               8b37253aa5ea        4 years ago         517MB
r-base                                                     3.1.2               803d5e0e0278        4 years ago         483MB

pip list shows me docker 3.7.2 in airflow 1.10.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feluelle let me have a quick check and get back here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feluelle may you share your log? You may have logs like Pulling image (xxx) from xxx. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @feluelle the change was made based on the facts below:

  1. For pull: https://docker-py.readthedocs.io/en/stable/api.html#docker.api.image.ImageApiMixin.pull In the example, you can find that the latest tag is used if no tag is provided together with the repo.

  2. For create_container: you can try to code below

import docker
client = docker.APIClient(base_url='unix://var/run/docker.sock')
client.create_container(image="fake_image")

It will give you error "ImageNotFound: 404 Client Error: Not Found ("No such image: fake_image:latest")"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That's not all) I think it is for like 3 images

@XD-DENG
Copy link
Member Author

XD-DENG commented Oct 18, 2018

@ashb @Fokko , may you take another look?

Cheers!

@XD-DENG
Copy link
Member Author

XD-DENG commented Oct 20, 2018

Hi @Fokko , a gentle ping. Cheers!

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up and sorting out the Docker stuff @XD-DENG. Highly appreciated.

I'm fine with moving this into 1.10.1. I don't see any API's being changed. What do you think @ashb

@Fokko Fokko merged commit b156151 into apache:master Oct 20, 2018
Fokko pushed a commit that referenced this pull request Oct 20, 2018
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
@Fokko
Copy link
Contributor

Fokko commented Oct 20, 2018

@ashb Cherry-picked it onto the 1.10.1 branch

@XD-DENG XD-DENG deleted the refine_DockerOperator branch October 21, 2018 01:21
@XD-DENG
Copy link
Member Author

XD-DENG commented Oct 21, 2018

Thanks @Fokko

ashb pushed a commit that referenced this pull request Oct 22, 2018
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
ashb pushed a commit to ashb/airflow that referenced this pull request Oct 22, 2018
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
tekn0ir pushed a commit to tekn0ir/incubator-airflow that referenced this pull request Oct 26, 2018
* master:
  [AIRFLOW-520] Fix Version Info in Flask UI (apache#4072)
  [AIRFLOW-XXX] Add Neoway to companies list (apache#4081)
  [AIRFLOW-XXX] Add Surfline to companies list (apache#4079)
  Revert "[AIRFLOW-461] Restore parameter position for BQ run_load method (apache#4077)"
  [AIRFLOW-461] Restore parameter position for BQ run_load method (apache#4077)
  [AIRFLOW-461]  Support autodetected schemas in BigQuery run_load (apache#3880)
  [AIRFLOW-3238] Fix models.DAG to deactivate unknown DAGs on initdb (apache#4073)
  [AIRFLOW-3239] Fix test recovery further (apache#4074)
  [AIRFLOW-3203] Fix DockerOperator & some operator test (apache#4049)
  [AIRFLOW-1867] Add sandbox mode and py3k bug  (apache#2824)
  [AIRFLOW-2993] s3_to_sftp and sftp_to_s3 operators (apache#3828)
  [AIRFLOW-XXX] BigQuery Hook - Minor Refactoring (apache#4066)
  [AIRFLOW-3232] More readable GCF operator documentation (apache#4067)
galak75 pushed a commit to VilledeMontreal/incubator-airflow that referenced this pull request Nov 23, 2018
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Jan 23, 2019
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 29, 2019
- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.
wmorris75 pushed a commit to modmed/incubator-airflow that referenced this pull request Jul 31, 2019
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564516048 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564515968 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564515909 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564515887 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564507924 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564507818 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564507092 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564507071 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564507049 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564506218 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564506121 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564505391 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564504191 -0400

parent 6ef0e37
author Ash Berlin-Taylor <[email protected]> 1564493832 +0100
committer wayne.morris <[email protected]> 1564504099 -0400

[AIRFLOW-5052] Added the include_deleted param to salesforce_hook

[AIRFLOW-1840] Support back-compat on old celery config

The new names are in-line with Celery 4, but if
anyone upgrades Airflow
without following the UPDATING.md instructions
(which we probably assume
most people won't, not until something stops
working) their workers
would suddenly just start failing. That's bad.

This will issue a warning but carry on working as
expected. We can
remove the deprecation settings (but leave the
code in config) after
this release has been made.

Closes apache#3549 from ashb/AIRFLOW-1840-back-compat

(cherry picked from commit a4592f9)
Signed-off-by: Bolke de Bruin <[email protected]>

[AIRFLOW-2812] Fix error in Updating.md for upgrading to 1.10

Closes apache#3654 from nrhvyc/AIRFLOW-2812

[AIRFLOW-2816] Fix license text in docs/license.rst

(cherry picked from commit af15f11)
Signed-off-by: Bolke de Bruin <[email protected]>

[AIRFLOW-2817] Force explicit choice on GPL dependency (apache#3660)

By default one of Apache Airflow's dependencies pulls in a GPL
library. Airflow should not install (and upgrade) without an explicit choice.

This is part of the Apache requirements as we cannot depend on Category X
software.

(cherry picked from commit c37fc0b)
Signed-off-by: Bolke de Bruin <[email protected]>
(cherry picked from commit b39e453)
Signed-off-by: Bolke de Bruin <[email protected]>

[AIRFLOW-2869] Remove smart quote from default config

Closes apache#3716 from wdhorton/remove-smart-quote-
from-cfg

(cherry picked from commit 67e2bb9)
Signed-off-by: Bolke de Bruin <[email protected]>
(cherry picked from commit 700f5f0)
Signed-off-by: Bolke de Bruin <[email protected]>

[AIRFLOW-2140] Don't require kubernetes for the SparkSubmit hook (apache#3700)

This extra dep is a quasi-breaking change when upgrading - previously
there were no deps outside of Airflow itself for this hook. Importing
the k8s libs breaks installs that aren't also using Kubernetes.

This makes the dep optional for anyone who doesn't explicitly use the
functionality

(cherry picked from commit 0be002e)
Signed-off-by: Bolke de Bruin <[email protected]>
(cherry picked from commit f58246d)
Signed-off-by: Bolke de Bruin <[email protected]>

[AIRFLOW-2859] Implement own UtcDateTime (apache#3708)

The different UtcDateTime implementations all have issues.
Either they replace tzinfo directly without converting
or they do not convert to UTC at all.

We also ensure all mysql connections are in UTC
in order to keep sanity, as mysql will ignore the
timezone of a field when inserting/updating.

(cherry picked from commit 6fd4e60)
Signed-off-by: Bolke de Bruin <[email protected]>
(cherry picked from commit 8fc8c7a)
Signed-off-by: Bolke de Bruin <[email protected]>

[AIRFLOW-2895] Prevent scheduler from spamming heartbeats/logs

Reverts most of AIRFLOW-2027 until the issues with it can be fixed.

Closes apache#3747 from aoen/revert_min_file_parsing_time_commit

[AIRFLOW-2979] Make celery_result_backend conf Backwards compatible (apache#3832)

(apache#2806) Renamed `celery_result_backend` to `result_backend` and broke backwards compatibility.

[AIRFLOW-2524] Add Amazon SageMaker Training (apache#3658)

Add SageMaker Hook, Training Operator & Sensor
Co-authored-by: srrajeev-aws <[email protected]>

[AIRFLOW-2524] Add Amazon SageMaker Tuning (apache#3751)

Add SageMaker tuning Operator and sensor
Co-authored-by: srrajeev-aws <[email protected]>

[AIRFLOW-2524] Add SageMaker Batch Inference (apache#3767)

* Fix for comments
* Fix sensor test
* Update non_terminal_states and failed_states to static variables of SageMakerHook

Add SageMaker Transform Operator & Sensor
Co-authored-by: srrajeev-aws <[email protected]>

[AIRFLOW-2763] Add check to validate worker connectivity to metadata Database

[AIRFLOW-2786] Gracefully handle Variable import errors (apache#3648)

Variables that are added through a file are not
checked as explicity as creating a Variable in the
web UI. This handles exceptions that could be caused
by improper keys or values.

[AIRFLOW-2860] DruidHook: time check is wrong (apache#3745)

[AIRFLOW-2773] Validates Dataflow Job Name

Closes apache#3623 from kaxil/AIRFLOW-2773

[AIRFLOW-2845] Asserts in contrib package code are changed on raise ValueError and TypeError (apache#3690)

[AIRFLOW-1917] Trim extra newline and trailing whitespace from log (apache#3862)

[AIRFLOW-XXX] Fix SlackWebhookOperator docs (apache#3915)

The docs refer to `conn_id` while the actual argument is `http_conn_id`.

[AIRFLOW-2912] Add Deploy and Delete operators for GCF (apache#3969)

Both Deploy and Delete operators interact with Google
Cloud Functions to manage functions. Both are idempotent
and make use of GcfHook - hook that encapsulates
communication with GCP over GCP API.

[AIRFLOW-3078] Basic operators for Google Compute Engine (apache#4022)

Add GceInstanceStartOperator, GceInstanceStopOperator and GceSetMachineTypeOperator.

Each operator includes:
- core logic
- input params validation
- unit tests
- presence in the example DAG
- docstrings
- How-to and Integration documentation

Additionally, in GceHook error checking if response is 200 OK was added:

Some types of errors are only visible in the response's "error" field
and the overall HTTP response is 200 OK.

That is why apart from checking if status is "done" we also check
if "error" is empty, and if not an exception is raised with error
message extracted from the "error" field of the response.

In this commit we also separated out Body Field Validator to
separate module in tools - this way it can be reused between
various GCP operators, it has proven to be usable in at least
two of them now.

Co-authored-by: sprzedwojski <[email protected]>
Co-authored-by: potiuk <[email protected]>

[AIRFLOW-3183] Fix bug in DagFileProcessorManager.max_runs_reached() (apache#4031)

The condition is intended to ensure the function
will return False if any file's run_count is still smaller
than max_run. But the operator used here is "!=".
Instead, it should be "<".

This is because in DagFileProcessorManager,
there is no statement helping limit the upper
limit of run_count. It's possible that
files' run_count will be bigger than max_run.
In such case, max_runs_reached() method
may fail its purpose.

[AIRFLOW-3099] Don't ever warn about missing sections of config (apache#4028)

Rather than looping through and setting each config variable
individually, and having to know which sections are optional and which
aren't, instead we can just call a single function on ConfigParser and
it will read the config from the dict, and more importantly here, never
error about missing sections - it will just create them as needed.

[AIRFLOW-3089] Drop hard-coded url scheme in google auth redirect. (apache#3919)

The google auth provider hard-codes the `_scheme` in the callback url to
`https` so that airflow generates correct urls when run behind a proxy
that terminates tls. But this means that google auth can't be used when
running without https--for example, during local development. Also,
hard-coding `_scheme` isn't the correct solution to the problem of
running behind a proxy. Instead, the proxy should be configured to set
the `X-Forwarded-Proto` header to `https`; Flask interprets this header
and generates the appropriate callback url without hard-coding the
scheme.

[AIRFLOW-3178] Handle percents signs in configs for airflow run (apache#4029)

* [AIRFLOW-3178] Don't mask defaults() function from ConfigParser

ConfigParser (the base class for AirflowConfigParser) expects defaults()
to be a function - so when we re-assign it to be a property some of the
methods from ConfigParser no longer work.

* [AIRFLOW-3178] Correctly escape percent signs when creating temp config

Otherwise we have a problem when we come to use those values.

* [AIRFLOW-3178] Use os.chmod instead of shelling out

There's no need to run another process for a built in Python function.

This also removes a possible race condition that would make temporary
config file be readable by more than the airflow or run-as user
The exact behaviour would depend on the umask we run under, and the
primary group of our user, likely this would mean the file was readably
by members of the airflow group (which in most cases would be just the
airflow user). To remove any such possibility we chmod the file
before we write to it

[AIRFLOW-2216] Use profile for AWS hook if S3 config file provided in aws_default connection extra parameters (apache#4011)

Use profile for AWS hook if S3 config file provided in
aws_default connection extra parameters
Add test to validate profile set

[AIRFLOW-3138] Use current data type for migrations (apache#3985)

* Use timestamp instead of timestamp with timezone for migration.

[AIRFLOW-3119] Enable debugging with Celery(apache#3950)

This will enable --loglevel when launching a
celery worker and inherit that LOGGING_LEVEL
setting from airflow.cfg

[AIRFLOW-3197] EMRHook is missing new parameters of the AWS API (apache#4044)

Allow passing any params to the CreateJobFlow API, so that we don't have
to stay up to date with AWS api changes.

[AIRFLOW-3203] Fix DockerOperator & some operator test (apache#4049)

- For argument `image`, no need to explicitly
  add "latest" if tag is omitted.
  "latest" will be used by default if no
  tag provided. This is handled by `docker` package itself.

- Intermediate variable `cpu_shares` is not needed.

- Fix wrong usage of `cpu_shares` and `cpu_shares`.
  Based on
  https://docker-py.readthedocs.io/en/stable/api.html#docker.api.container.ContainerApiMixin.create_host_config,
  They should be an arguments of
  self.cli.create_host_config()
  rather than
  APIClient.create_container().

- Change name of the corresponding test script,
  to ensure it can be discovered.

- Fix the test itself.

- Some other test scripts are not named properly,
  which result in failure of test discovery.

[AIRFLOW-3232] More readable GCF operator documentation (apache#4067)

[AIRFLOW-3231] Basic operators for Google Cloud SQL (apache#4097)

Add CloudSqlInstanceInsertOperator, CloudSqlInstancePatchOperator and CloudSqlInstanceDeleteOperator.

Each operator includes:
- core logic
- input params validation
- unit tests
- presence in the example DAG
- docstrings
- How-to and Integration documentation

Additionally, small improvements to GcpBodyFieldValidator were made:
- add simple list validation capability (type="list")
- introduced parameter allow_empty, which can be set to False
	to test for non-emptiness of a string instead of specifying
	a regexp.

Co-authored-by: sprzedwojski <[email protected]>
Co-authored-by: potiuk <[email protected]>

[AIRFLOW-2524] Update SageMaker hook and operators (apache#4091)

This re-works the SageMaker functionality in Airflow to be more complete, and more useful for the kinds of operations that SageMaker supports.

We removed some files and operators here, but these were only added after the last release so we don't need to worry about any sort of back-compat.

[AIRFLOW-3276] Cloud SQL: database create / patch / delete operators (apache#4124)

[AIRFLOW-2192] Allow non-latin1 usernames with MySQL backend by adding a SQL_ENGINE_ENCODING param and default to UTF-8 (apache#4087)

Compromised of:

Since we have unicode_literals importred and the engine arguments must be strings in Python2 explicitly make 'utf-8' a string.

replace bare exception with conf.AirflowConfigException for missing value.

It's just got for strings apparently.

Add utf-8 to default_airflow.cfg - question do I still need the try try/except block or can we depend on defaults (I note some have both).

Get rid of try/except block and depend on default_airflow.cfg

Use __str__ since calling str just gives us back a newstr as well.

Test that a panda user can be saved.

[AIRFLOW-3295] Fix potential security issue in DaskExecutor (apache#4128)

When user decides to use TLS/SSL encryption
for DaskExecutor communications,
`Distributed.Security` object will be created.

However, argument `require_encryption` is missed
to be set to `True` (its default value is `False`).

This may fail the TLS/SSL encryption setting-up.

[AIRFLOW-XXX] Fix flake8 errors from apache#4144

[AIRFLOW-2574] Cope with '%' in SQLA DSN when running migrations (apache#3787)

Alembic uses a ConfigParser like Airflow does, and "%% is a special
value in there, so we need to escape it. As per the Alembic docs:

> Note that this value is passed to ConfigParser.set, which supports
> variable interpolation using pyformat (e.g. `%(some_value)s`). A raw
> percent sign not part of an interpolation symbol must therefore be
> escaped, e.g. `%%`

[AIRFLOW-3090] Demote dag start/stop log messages to debug (apache#3920)

[AIRFLOW-3090] Specify path of key file in log message (apache#3921)

[AIRFLOW-3111] Fix instructions in UPDATING.md and remove comment (apache#3944)

artifacts in default_airflow.cfg

- fixed incorrect instructions in UPDATING.md regarding core.log_filename_template and elasticsearch.elasticsearch_log_id_template
- removed comments referencing "additional curly braces" from
default_airflow.cfg since they're irrelevant to the rendered airflow.cfg

[AIRFLOW-3127] Fix out-dated doc for Celery SSL (apache#3967)

Now in `airflow.cfg`, for Celery-SSL, the item names are
"ssl_active", "ssl_key", "ssl_cert", and "ssl_cacert".
(since PR https://github.com/apache/incubator-airflow/pull/2806/files)

But in the documentation
https://airflow.incubator.apache.org/security.html?highlight=celery
or
https://github.com/apache/incubator-airflow/blob/master/docs/security.rst,
it's "CELERY_SSL_ACTIVE", "CELERY_SSL_KEY", "CELERY_SSL_CERT", and
"CELERY_SSL_CACERT", which is out-dated and may confuse readers.

[AIRFLOW-3187] Update airflow.gif file with a slower version (apache#4033)

[AIRFLOW-3164] Verify server certificate when connecting to LDAP (apache#4006)

Misconfiguration and improper checking of exceptions disabled
server certificate checking. We now only support TLS connections
and do not support insecure connections anymore.

[AIRFLOW-2779] Add license headers to doc files (apache#4178)

This adds ASF license headers to all the .rst and .md files with the
exception of the Pull Request template (as that is included verbatim
when opening a Pull Request on Github which would be messy)

Added the include_deleted parameter to salesforce hook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants