Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: ci Initial hive support #10593

Merged
merged 8 commits into from
Aug 27, 2020
Merged

Conversation

bkyryliuk
Copy link
Member

@bkyryliuk bkyryliuk commented Aug 13, 2020

SUMMARY

Hive tests

  • added hive to the CI tests via using step and docker compose for sqllab and csv upload tests
  • exposed UPLOAD_FOLDER to be able to mount it with the docker
  • hdfs for the csv uploads is used instead of s3 via @ pytest.patch

Related cleanup

  • reduced presto tests runtime down from 16 to 9 minutes via configuring the poll_interval
  • sqlalchemy tables now are dropped as a part of the cleanup
  • added ipdb to the test dependencies to be able to debug inside tox environment

Side effects:

  • somehow test_update_dataset_update_column started failing for the presto database only on CI, was not able to reproduce it locally, help here would be greatly appreciated

Followed https://hub.docker.com/r/bde2020/hive/ as an example

TEST PLAN

Mostly test only changes, CI

@bkyryliuk bkyryliuk marked this pull request as draft August 13, 2020 03:16
@bkyryliuk bkyryliuk force-pushed the bogdan/hive_ci branch 11 times, most recently from d69e8e9 to a13658d Compare August 13, 2020 05:11
@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2020

Codecov Report

Merging #10593 into master will decrease coverage by 4.01%.
The diff coverage is 44.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10593      +/-   ##
==========================================
- Coverage   64.36%   60.35%   -4.02%     
==========================================
  Files         785      785              
  Lines       36981    36987       +6     
  Branches     3530     3530              
==========================================
- Hits        23803    22323    -1480     
- Misses      13070    14477    +1407     
- Partials      108      187      +79     
Flag Coverage Δ
#cypress ?
#javascript 60.91% <ø> (ø)
#python 60.01% <44.44%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/presto.py 73.54% <ø> (-8.79%) ⬇️
superset/examples/energy.py 100.00% <ø> (ø)
superset/examples/unicode_test_data.py 100.00% <ø> (ø)
superset/db_engine_specs/hive.py 83.98% <41.17%> (+28.96%) ⬆️
superset/config.py 90.38% <100.00%> (+0.03%) ⬆️
superset-frontend/src/SqlLab/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/App.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/explore/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
superset-frontend/src/dashboard/index.jsx 0.00% <0.00%> (-100.00%) ⬇️
... and 154 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b40d44...ba74384. Read the comment docs.

@bkyryliuk bkyryliuk requested review from villebro and ktmud August 16, 2020 01:44
@bkyryliuk bkyryliuk changed the title chore: ci Initial hive support [WIP] chore: ci Initial hive support Aug 16, 2020
@@ -16,7 +16,10 @@
#
-r base.in
-r integration.in
ipdb
ipython==7.16.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. ABC. Also why pin ipython?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment:

# pinning ipython as pip-compile-multi was bringing higher version
# of the ipython that was not found in CI

superset/db_engine_specs/hive.py Outdated Show resolved Hide resolved
@@ -14,18 +14,27 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# isort:skip_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? I prefer the previous logic on 23.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkyryliuk bkyryliuk marked this pull request as ready for review August 16, 2020 17:24
@villebro
Copy link
Member

FYI @bkyryliuk this needs a rebase

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I few minor comments, for the most part looks good!

strategy:
matrix:
# run unit tests in multiple version just for fun
python-version: [3.6, 3.7, 3.8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest dropping 3.6 from new tests going forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

Comment on lines +19 to +20
ipython-genutils==0.2.0 # via traitlets
ipython==7.16.1 # via -r requirements/testing.in, ipdb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surprising, do we need this dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for ipdb, it CI had issues locationg ipython > 7.16.1, I'll try it again.
As for ipdb - I like using it inside tox, that's why added to the test dependencies. It provides really nice debugging experience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update github actions still are unable to install ipython 7.17.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'll try it out 👍

Comment on lines 282 to 287
if utils.backend() == "hive":
# Be aware that hive only uses first value from the null values list
# it can be fixes if we will process csv file on the superset webserver.
assert data == [("john", 1, "x"), ("paul", 2, None)]
else:
assert data == [(None, 1, "x"), ("paul", 2, None)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand this, why is "john" showing up here on Hive when other dbs return None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment above, hive has it's own csv upload implementation and only supports a single null value where as default implementation works with the list. [ hive engine limitation ]

It can manually process csv to make it feature complete. Updated the comment to be more descriptive & added a todo

@@ -18,6 +18,7 @@
from sqlalchemy.dialects import oracle
from sqlalchemy.dialects.oracle import DATE, NVARCHAR, VARCHAR

from superset.db_engine_specs.hive import HiveEngineSpec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this isn't needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch :)

Comment on lines 132 to 147
expected = (
textwrap.dedent(
f"""\
SELECT source,
target,
value
FROM {table_name}
LIMIT 100"""
# TODO(bkyryliuk): unify sql generation
if db.backend == "presto":
assert (
'SELECT "source" AS "source",\n "target" AS "target",\n "value" AS "value"\nFROM "energy_usage"\nLIMIT 100'
== sql
)
if db.backend != "presto"
else textwrap.dedent(
f"""\
SELECT "source" AS "source",
"target" AS "target",
"value" AS "value"
FROM "{table_name}"
LIMIT 100"""
elif db.backend == "hive":
assert (
"SELECT `source`,\n `target`,\n `value`\nFROM `energy_usage`\nLIMIT 100"
== sql
)
else:
assert (
"SELECT source,\n target,\n value\nFROM energy_usage\nLIMIT 100"
in sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/personal preference: I kind of preferred the appearance of textwrap.dedent over this.

@bkyryliuk bkyryliuk force-pushed the bogdan/hive_ci branch 2 times, most recently from 5f4f914 to ba74384 Compare August 25, 2020 17:34
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Comment on lines +19 to +20
ipython-genutils==0.2.0 # via traitlets
ipython==7.16.1 # via -r requirements/testing.in, ipdb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I'll try it out 👍

@bkyryliuk bkyryliuk merged commit 19a9bcc into apache:master Aug 27, 2020
@bkyryliuk bkyryliuk deleted the bogdan/hive_ci branch August 27, 2020 16:49
dpgaspar pushed a commit to preset-io/superset that referenced this pull request Sep 16, 2020
* Initial hive support

* Clone hive setup

* Make hive tests work locally

* Debugging presto failure

* sleep in dataset test

* Address comments

* Address comments

* Pin ipython, exclude new pylint rules

Co-authored-by: bogdan kyryliuk <[email protected]>
@dpgaspar dpgaspar added the v0.38 label Sep 22, 2020
dpgaspar pushed a commit that referenced this pull request Sep 22, 2020
* Initial hive support

* Clone hive setup

* Make hive tests work locally

* Debugging presto failure

* sleep in dataset test

* Address comments

* Address comments

* Pin ipython, exclude new pylint rules

Co-authored-by: bogdan kyryliuk <[email protected]>
Ofeknielsen pushed a commit to ofekisr/incubator-superset that referenced this pull request Oct 5, 2020
* Initial hive support

* Clone hive setup

* Make hive tests work locally

* Debugging presto failure

* sleep in dataset test

* Address comments

* Address comments

* Pin ipython, exclude new pylint rules

Co-authored-by: bogdan kyryliuk <[email protected]>
auxten pushed a commit to auxten/incubator-superset that referenced this pull request Nov 20, 2020
* Initial hive support

* Clone hive setup

* Make hive tests work locally

* Debugging presto failure

* sleep in dataset test

* Address comments

* Address comments

* Pin ipython, exclude new pylint rules

Co-authored-by: bogdan kyryliuk <[email protected]>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.38.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL v0.38 🚢 0.38.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants