[MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine #7134

alexsherstinsky · 2023-02-14T00:57:25Z

Scope

Standardize output of table.head metric on pd.DataFrame, thereby removing the need to predicate on ExecutionEngine in MetricsCalculator and other callers. Consequently, Batch.head() in New Datasources becomes "a one-liner proxy" to MetricsCalculator.head() -- and similar simplifications have been applied to a few other places.

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

JIRA: GREAT-1654/GREAT-1637

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added unit tests where applicable and made sure that new and existing tests are passing.
I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

…rstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

…n ZEP head

netlify · 2023-02-14T00:57:30Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`e7a4872`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/63ebbe8b68bc7700081983d5
😎 Deploy Preview	https://deploy-preview-7134--niobium-lead-7998.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

ghost · 2023-02-14T00:58:46Z

👇 Click on the image for a new way to code review

Legend

…n ZEP head

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

…rstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

NathanFarmer

Can you run the full test suite on this PR please?

NathanFarmer · 2023-02-14T13:42:10Z

great_expectations/experimental/datasources/interfaces.py

@@ -505,7 +507,7 @@ def update_forward_refs(cls):
    @validate_arguments
    def head(
        self,
-        n_rows: Optional[StrictInt] = None,


NathanFarmer · 2023-02-14T16:03:29Z

great_expectations/expectations/metrics/table_metrics/table_head.py

@@ -199,11 +200,11 @@ def _spark(
        metric_value_kwargs: dict,
        metrics: dict[str, Any],
        runtime_configuration: dict,
-    ) -> list[pyspark_sql_Row] | pyspark_sql_Row:


@alexsherstinsky I'm surprised this passes tests in test_metric_configuration.py and test_sparkdf_execution_engine.py considering how much we use this metric there.

alexsherstinsky · 2023-02-14T16:11:40Z

Can you run the full test suite on this PR please?

I have and it looked like it passed them all. It is running them again, since I updated with the latest from develop. I did go through the entire code base and carefully updated each use. Are you seeing something that does not look right? Thanks, @NathanFarmer !

NathanFarmer · 2023-02-14T16:19:55Z

Can you run the full test suite on this PR please?

I have and it looked like it passed them all. It is running them again, since I updated with the latest from develop. I did go through the entire code base and carefully updated each use. Are you seeing something that does not look right? Thanks, @NathanFarmer !

@alexsherstinsky This should be expected to fail with your change I would think: https://github.com/great-expectations/great_expectations/blob/10f884135fe27e03c991[…]87ff353/tests/execution_engine/test_sparkdf_execution_engine.py

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

alexsherstinsky · 2023-02-14T16:38:14Z

https://github.com/great-expectations/great_expectations/blob/10f884135fe27e03c991[…]87ff353/tests/execution_engine/test_sparkdf_execution_engine.py

@NathanFarmer This passes, because the assertion looks at the Batch data (a level lower than the output of the metric). Thanks.

alexsherstinsky · 2023-02-14T16:40:27Z

@NathanFarmer Here is how it runs for me locally:

pytest --mysql --mssql --postgresql --spark --cache-clear --full-trace tests/execution_engine/test_sparkdf_execution_engine.py::test_reader_fn_parameters -svv

PASSED

====================================================================================================================== warnings summary ======================================================================================================================
venv/lib/python3.9/site-packages/botocore/httpsession.py:18
  /Users/alexsherstinsky/Development/GreatExpectations/great_expectations/venv/lib/python3.9/site-packages/botocore/httpsession.py:18: DeprecationWarning: 'urllib3.contrib.pyopenssl' module is deprecated and will be removed in a future release of urllib3 2.x. Read more in this issue: https://github.com/urllib3/urllib3/issues/2680
    from urllib3.contrib.pyopenssl import orig_util_SSLContext as SSLContext

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================ 1 passed, 1 warning in 9.80s ================================================================================================================

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

Kilo59 · 2023-02-14T18:40:37Z

great_expectations/validator/metrics_calculator.py

-    SparkDFExecutionEngine,
-    SqlAlchemyExecutionEngine,
-)
+from great_expectations.execution_engine import ExecutionEngine  # noqa: TCH001


Should move this to the TYPE_CHECKING block.

@Kilo59 That did not work in this PR, despite what RUFF said. Hence, I had to escape it. However, it seems to be working fine in #7094 -- I do not understand the reason for this inconsistency.

* develop: (29 commits) [BUGFIX] pydantic>=1.10.4 - ImportError: cannot import name dataclass_transform (#7163) [MAINTENANCE] ZEP - update asset factories method signatures from asset models (#7096) Delete cli v012 tests. (#7159) [CONTRIB] added new Expectations - India_zip_code expectation and not_to_be_future_date expectation (#6086) [MAINTENANCE] Remove unused dockerfile (#7152) [DOCS] doc-464 consolidating and standardizing snippets (#7154) [BUGFIX] Patch broken rendered content Cloud tests (#7155) [MAINTENANCE] Clean up `mypy` violations in `CardinalityChecker` (#7146) [MAINTENANCE] Clean up pathlib.Path() usage in DataConnector utilities and restore tighter formatting in great_expectations/util.py (#7149) [MAINTENANCE] Change all instances of `create_expectation_suite` to `add_expectation_suite` in tests, docs, and source code (#7117) [BUGFIX] Parse pandas version correctly for development builds (#7147) [MAINTENANCE] Update V3 DataConnector utilities to support New Datasources (ZEP) (#7144) [BUGFIX] Patch inconsistent ordering within GCP test asserts (#7130) Refactor sql splitter to take selectable instead of str. (#7133) [BUGFIX] `TupleAzureBlobStoreBackend` no longer gives warning when obfuscating connection string (#7139) [MAINTENANCE] ruff 0.0.246 update (#7137) [MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine (#7134) [BUGFIX] Copy previous versions after checking out the the current commit (#7142) [DOCS] Remove sitemap.xml (#7141) [MAINTENANCE] mypy `v1.0.0` (#7138) ...

Alex Sherstinsky added 4 commits February 13, 2023 15:53

WIP

5c0f715

clean up

5315b00

Merge branch 'develop' into maintenance/GREAT-1654/GREAT-1637/alexshe…

082b8d1

…rstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

remove ExecutionEngine dependence from MetricsCalculator and use it i…

b845b8a

…n ZEP head

github-actions bot added core-team platform labels Feb 14, 2023

alexsherstinsky marked this pull request as ready for review February 14, 2023 01:04

alexsherstinsky requested a review from a team February 14, 2023 01:04

alexsherstinsky enabled auto-merge February 14, 2023 01:04

Alex Sherstinsky and others added 5 commits February 13, 2023 17:34

remove ExecutionEngine dependence from MetricsCalculator and use it i…

5df7923

…n ZEP head

remove ExecutionEngine dependence from MetricsCalculator and use it i…

e003ddb

…n ZEP head

remove ExecutionEngine dependence from MetricsCalculator and use it i…

6aa8402

…n ZEP head

remove ExecutionEngine dependence from MetricsCalculator and use it i…

d6128a9

…n ZEP head

Merge develop into maintenance/GREAT-1654/GREAT-1637/alexsherstinsky/…

d6ef312

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

auto-merge was automatically disabled February 14, 2023 14:35
Merge queue setting changed

Merge branch 'develop' into maintenance/GREAT-1654/GREAT-1637/alexshe…

068063e

…rstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

alexsherstinsky enabled auto-merge (squash) February 14, 2023 15:40

NathanFarmer reviewed Feb 14, 2023

View reviewed changes

Merge develop into maintenance/GREAT-1654/GREAT-1637/alexsherstinsky/…

95571c3

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

alexsherstinsky requested a review from NathanFarmer February 14, 2023 16:40

NathanFarmer approved these changes Feb 14, 2023

View reviewed changes

github-actions bot added 2 commits February 14, 2023 16:45

Merge develop into maintenance/GREAT-1654/GREAT-1637/alexsherstinsky/…

c1406b2

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

Merge develop into maintenance/GREAT-1654/GREAT-1637/alexsherstinsky/…

e7a4872

…metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296

alexsherstinsky merged commit b87f035 into develop Feb 14, 2023

alexsherstinsky deleted the maintenance/GREAT-1654/GREAT-1637/alexsherstinsky/metrics/output_consistent_data_format_from_table_head_metric-2023_02_13-296 branch February 14, 2023 18:16

Kilo59 reviewed Feb 14, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine #7134

[MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine #7134

alexsherstinsky commented Feb 14, 2023

netlify bot commented Feb 14, 2023 •

edited

Loading

ghost commented Feb 14, 2023 •

edited by ghost

Loading

Legend

NathanFarmer left a comment

NathanFarmer Feb 14, 2023

NathanFarmer Feb 14, 2023

alexsherstinsky commented Feb 14, 2023

NathanFarmer commented Feb 14, 2023

alexsherstinsky commented Feb 14, 2023

alexsherstinsky commented Feb 14, 2023

Kilo59 Feb 14, 2023

alexsherstinsky Feb 15, 2023

[MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine #7134

[MAINTENANCE] Output Consistent Data Format from "table.head" Metric for every ExecutionEngine #7134

Conversation

alexsherstinsky commented Feb 14, 2023

Scope

Previous Design Review notes:

Definition of Done

netlify bot commented Feb 14, 2023 • edited Loading

✅ Deploy Preview for niobium-lead-7998 ready!

ghost commented Feb 14, 2023 • edited by ghost Loading

Legend

NathanFarmer left a comment

Choose a reason for hiding this comment

NathanFarmer Feb 14, 2023

Choose a reason for hiding this comment

NathanFarmer Feb 14, 2023

Choose a reason for hiding this comment

alexsherstinsky commented Feb 14, 2023

NathanFarmer commented Feb 14, 2023

alexsherstinsky commented Feb 14, 2023

alexsherstinsky commented Feb 14, 2023

Kilo59 Feb 14, 2023

Choose a reason for hiding this comment

alexsherstinsky Feb 15, 2023

Choose a reason for hiding this comment

netlify bot commented Feb 14, 2023 •

edited

Loading

ghost commented Feb 14, 2023 •

edited by ghost

Loading