Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive dialect name error #6146

Closed
hsheth2 opened this issue Oct 11, 2022 · 3 comments
Closed

Hive dialect name error #6146

hsheth2 opened this issue Oct 11, 2022 · 3 comments
Assignees
Labels
bug Bugs bugs bugs! DevRel Triage devrel This item is being addressed by the Developer Relations Team

Comments

@hsheth2
Copy link

hsheth2 commented Oct 11, 2022

Describe the bug

The Hive dialect name is b'hive', which is of type bytes and not str. This causes issues in the dialect comparison here:

def __eq__(self, other: Union[str, GESqlDialect]):
if isinstance(other, str):
return self.value.lower() == other.lower()
return self.value.lower() == other.value.lower()

This issue applies to GE versions 0.15.23+.

This appears to be a regression introduced by #5980. Note that b'foo' == 'foo' is False but hash(b'foo') == hash('foo') is True, which is probably why this code worked before.

To Reproduce

We're running into this issue from the DataHub project, which depends on GE as a library. Hopefully the stack trace below has enough context, but I can try to provide more if needed.

Stack trace that I see:

Traceback (most recent call last):
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py", line 906, in _generate_single_profile
    batch = self._get_ge_dataset(
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/src/datahub/ingestion/source/ge_data_profiler.py", line 970, in _get_ge_dataset
    batch = ge_context.data_context.get_batch(
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/venv/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 1153, in get_batch
    return self._get_batch_v2(
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/venv/lib/python3.8/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 859, in _get_batch_v2
    return validator.get_dataset()
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/venv/lib/python3.8/site-packages/great_expectations/validator/validator.py", line 2427, in get_dataset
    return self.expectation_engine(
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/venv/lib/python3.8/site-packages/great_expectations/dataset/sqlalchemy_dataset.py", line 568, in __init__
    if self.engine.dialect.name.lower() == GESqlDialect.BIGQUERY:
  File "/home/mayuri/git/acryldata/datahub-fork/metadata-ingestion/venv/lib/python3.8/site-packages/great_expectations/execution_engine/sqlalchemy_dialect.py", line 31, in __eq__
    return self.value.lower() == other.value.lower()

Expected behavior

No crashes.

Environment (please complete the following information):

  • Operating System: Linux
  • Great Expectations Version: 0.15.23+ (repro'd on .23, .25, and .26)

Additional context

DataHub introduced a temporary fix: datahub-project/datahub#5980.

@hsheth2 hsheth2 changed the title Hive dialect name incompatibility Hive dialect name error Oct 11, 2022
@AFineDayFor AFineDayFor self-assigned this Oct 12, 2022
@AFineDayFor AFineDayFor added DevRel Triage devrel This item is being addressed by the Developer Relations Team bug Bugs bugs bugs! labels Oct 12, 2022
@AFineDayFor
Copy link
Contributor

Howdy @hsheth2 👋 thanks for reaching out and raising this with us 🙇

We're raising this within the team. Thank you so very much for helping us in identifying this 🦠🔬

@NathanFarmer
Copy link
Contributor

The fix for this is merged into develop and will be included as part of todays release (0.15.27). Thanks @hsheth2!

@hsheth2
Copy link
Author

hsheth2 commented Oct 13, 2022

Thanks @NathanFarmer for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs bugs bugs! DevRel Triage devrel This item is being addressed by the Developer Relations Team
Projects
None yet
Development

No branches or pull requests

3 participants