Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine #7134

Conversation

alexsherstinsky
Copy link
Contributor

Scope

Standardize output of table.head metric on pd.DataFrame, thereby removing the need to predicate on ExecutionEngine in MetricsCalculator and other callers. Consequently, Batch.head() in New Datasources becomes "a one-liner proxy" to MetricsCalculator.head() -- and similar simplifications have been applied to a few other places.

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • JIRA: GREAT-1654/GREAT-1637

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

Alex Sherstinsky added 4 commits February 13, 2023 15:53
@netlify
Copy link

netlify bot commented Feb 14, 2023

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit e7a4872
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/63ebbe8b68bc7700081983d5
😎 Deploy Preview https://deploy-preview-7134--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented Feb 14, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@alexsherstinsky alexsherstinsky marked this pull request as ready for review February 14, 2023 01:04
@alexsherstinsky alexsherstinsky requested a review from a team February 14, 2023 01:04
auto-merge was automatically disabled February 14, 2023 14:35

Merge queue setting changed

…rstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296
@alexsherstinsky alexsherstinsky enabled auto-merge (squash) February 14, 2023 15:40
Copy link
Contributor

@NathanFarmer NathanFarmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the full test suite on this PR please?

@@ -505,7 +507,7 @@ def update_forward_refs(cls):
@validate_arguments
def head(
self,
n_rows: Optional[StrictInt] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇🏻

@@ -199,11 +200,11 @@ def _spark(
metric_value_kwargs: dict,
metrics: dict[str, Any],
runtime_configuration: dict,
) -> list[pyspark_sql_Row] | pyspark_sql_Row:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexsherstinsky I'm surprised this passes tests in test_metric_configuration.py and test_sparkdf_execution_engine.py considering how much we use this metric there.

@alexsherstinsky
Copy link
Contributor Author

Can you run the full test suite on this PR please?

I have and it looked like it passed them all. It is running them again, since I updated with the latest from develop. I did go through the entire code base and carefully updated each use. Are you seeing something that does not look right? Thanks, @NathanFarmer !

@NathanFarmer
Copy link
Contributor

Can you run the full test suite on this PR please?

I have and it looked like it passed them all. It is running them again, since I updated with the latest from develop. I did go through the entire code base and carefully updated each use. Are you seeing something that does not look right? Thanks, @NathanFarmer !

@alexsherstinsky This should be expected to fail with your change I would think: https://github.com/great-expectations/great_expectations/blob/10f884135fe27e03c991[…]87ff353/tests/execution_engine/test_sparkdf_execution_engine.py

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296
@alexsherstinsky
Copy link
Contributor Author

https://github.com/great-expectations/great_expectations/blob/10f884135fe27e03c991[…]87ff353/tests/execution_engine/test_sparkdf_execution_engine.py

@NathanFarmer This passes, because the assertion looks at the Batch data (a level lower than the output of the metric). Thanks.

@alexsherstinsky
Copy link
Contributor Author

@NathanFarmer Here is how it runs for me locally:

pytest --mysql --mssql --postgresql --spark --cache-clear --full-trace tests/execution_engine/test_sparkdf_execution_engine.py::test_reader_fn_parameters -svv

PASSED

====================================================================================================================== warnings summary ======================================================================================================================
venv/lib/python3.9/site-packages/botocore/httpsession.py:18
  /Users/alexsherstinsky/Development/GreatExpectations/great_expectations/venv/lib/python3.9/site-packages/botocore/httpsession.py:18: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
    from urllib3.contrib.pyopenssl import orig_util_SSLContext as SSLContext

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================ 1 passed, 1 warning in 9.80s ================================================================================================================

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296
…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296
@alexsherstinsky alexsherstinsky merged commit b87f035 into develop Feb 14, 2023
@alexsherstinsky alexsherstinsky deleted the maintenance/GREAT-1654/GREAT-1637/alexsherstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296 branch February 14, 2023 18:16
SparkDFExecutionEngine,
SqlAlchemyExecutionEngine,
)
from great_expectations.execution_engine import ExecutionEngine # noqa: TCH001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this to the TYPE_CHECKING block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo59 That did not work in this PR, despite what RUFF said. Hence, I had to escape it. However, it seems to be working fine in #7094 -- I do not understand the reason for this inconsistency.

Shinnnyshinshin pushed a commit that referenced this pull request Feb 16, 2023
* develop: (29 commits)
  [BUGFIX] pydantic>=1.10.4 - ImportError: cannot import name dataclass_transform (#7163)
  [MAINTENANCE] ZEP - update asset factories method signatures from asset models (#7096)
  Delete cli v012 tests. (#7159)
  [CONTRIB] added new Expectations  - India_zip_code expectation and not_to_be_future_date expectation (#6086)
  [MAINTENANCE] Remove unused dockerfile (#7152)
  [DOCS] doc-464 consolidating and standardizing snippets (#7154)
  [BUGFIX] Patch broken rendered content Cloud tests (#7155)
  [MAINTENANCE] Clean up `mypy` violations in `CardinalityChecker` (#7146)
  [MAINTENANCE] Clean up pathlib.Path() usage in DataConnector utilities and restore tighter formatting in great_expectations/util.py  (#7149)
  [MAINTENANCE] Change all instances of `create_expectation_suite` to `add_expectation_suite` in tests, docs, and source code (#7117)
  [BUGFIX] Parse pandas version correctly for development builds (#7147)
  [MAINTENANCE] Update V3 DataConnector utilities to support New Datasources (ZEP) (#7144)
  [BUGFIX] Patch inconsistent ordering within GCP test asserts (#7130)
  Refactor sql splitter to take selectable instead of str. (#7133)
  [BUGFIX] `TupleAzureBlobStoreBackend` no longer gives warning when obfuscating connection string (#7139)
  [MAINTENANCE] ruff 0.0.246 update (#7137)
  [MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine (#7134)
  [BUGFIX] Copy previous versions after checking out the the current commit (#7142)
  [DOCS] Remove sitemap.xml (#7141)
  [MAINTENANCE] mypy `v1.0.0` (#7138)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants