Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter null values before plotting #8535

Merged
merged 7 commits into from
Dec 13, 2023
Merged

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Dec 10, 2023

I noticed that seaborn's plot was responding much faster than xarray's version with the same data.
Turn's out seaborn drops any nulls: https://github.com/mwaskom/seaborn/blob/056413d7393e3daec597d430c076e45938d53376/seaborn/relational.py#L399

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@Illviljan Illviljan marked this pull request as ready for review December 10, 2023 17:48
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@headtr1ck
Copy link
Collaborator

Isn't matplotlib dropping nans anyway? I never had any issues with nan values in my data...

@Illviljan
Copy link
Contributor Author

No, they aren't dropped. That's what the test shows. Do you use an interactive version when you look at plots? It was painfully slow when I was zooming in and out.

I was able to reproduce the problem with a raw matplotlib version as well, removing the nans was indeed the solution.

@headtr1ck
Copy link
Collaborator

No, they aren't dropped. That's what the test shows. Do you use an interactive version when you look at plots? It was painfully slow when I was zooming in and out.

I was able to reproduce the problem with a raw matplotlib version as well, removing the nans was indeed the solution.

ok, never look at the plots interactively...
Then its fine to merge this :)

@headtr1ck headtr1ck added the plan to merge Final call for comments label Dec 10, 2023
@@ -948,6 +948,12 @@ def newplotfunc(
size_ = kwargs.pop("_size", linewidth)
size_r = _LINEWIDTH_RANGE

# Remove any nulls, .where(m, drop=True) doesn't work when m is a dask array,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only be done for scatter plots. A line plot would totally change.

Copy link
Contributor Author

@Illviljan Illviljan Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only scatter uses plot1d so it should be fine for now.
Seaborn doesn't seem to filter nan on line however.
Can't move it later in the chain as this nan-trick is used to split on different dimensions (for a future line plot implementation), can move it to the if-check above though.

NaN's are usually not wanted in line plots either in my experience. Do you have a usecase in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the null filter inside a scatter check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN's are usually not wanted in line plots either in my experience. Do you have a usecase in mind?

Plenty. it's important to know where the gaps are. That's useful information. It is annoying that matplotlib will skip NaNs at the beginning and ends of a line plot.

@dcherian dcherian removed the plan to merge Final call for comments label Dec 10, 2023
@Illviljan Illviljan added the plan to merge Final call for comments label Dec 12, 2023
@dcherian dcherian merged commit 2971994 into pydata:main Dec 13, 2023
27 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 18, 2023
* main: (26 commits)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  Use numbagg for `rolling` methods (pydata#8493)
  Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
  fix RTD docs build (pydata#8519)
  Fix type of `.assign_coords` (pydata#8495)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 20, 2023
* main: (58 commits)
  Adapt map_blocks to use new Coordinates API (pydata#8560)
  add xeofs to ecosystem.rst (pydata#8561)
  Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
  Generalize cumulative reduction (scan) to non-dask types (pydata#8019)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 4, 2024
commit 0a0f800
Merge: 33c8033 41d33f5
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:42:51 2024 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

commit 33c8033
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:40:42 2024 -0700

    Don't skip for resampling

commit d7be352
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Jan 3 03:24:13 2024 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit d13fa0e
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:23:43 2024 -0700

    Apply suggestions from code review

    Co-authored-by: Michael Niklas  <[email protected]>

commit dd6ea53
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 19:29:40 2023 -0700

    Silence more warnings

commit 44e5a41
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 19:21:06 2023 -0700

    minimize test mods

commit 94c1c1f
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:55:46 2023 -0700

    Add tests for pydata#8263

commit 0ab4eb6
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:47:41 2023 -0700

    Fix typing

commit a064430
Merge: d6a3f2d 03ec3cb
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:47:04 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main:
      Fix mypy type ignore (pydata#8564)
      Support for the new compression arguments. (pydata#7551)
      FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
      Adapt map_blocks to use new Coordinates API (pydata#8560)
      add xeofs to ecosystem.rst (pydata#8561)
      Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
      Generalize cumulative reduction (scan) to non-dask types (pydata#8019)

commit d6a3f2d
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:46:50 2023 -0700

    Fix generator for aggregations

commit 97f1695
Author: Deepak Cherian <[email protected]>
Date:   Tue Dec 19 10:58:11 2023 -0700

    Fix docs

commit 5b33b98
Author: Deepak Cherian <[email protected]>
Date:   Sun Dec 17 20:35:53 2023 -0700

    fix whats-new

commit 80b2b36
Author: Deepak Cherian <[email protected]>
Date:   Sun Dec 17 20:26:17 2023 -0700

    Reduce more warnings

commit 5f6f4ea
Merge: a57d4ae 2971994
Author: Deepak Cherian <[email protected]>
Date:   Sat Dec 16 20:33:13 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main: (26 commits)
      Filter null values before plotting (pydata#8535)
      Update concat.py (pydata#8538)
      Add getitem to array protocol (pydata#8406)
      Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
      Filter out doctest warning (pydata#8539)
      Bump actions/setup-python from 4 to 5 (pydata#8540)
      Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
      Add Cumulative aggregation (pydata#8512)
      dev whats-new
      Whats-new for 2023.12.0 (pydata#8532)
      explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
      Add `eval` method to Dataset (pydata#7163)
      Deprecate ds.dims returning dict (pydata#8500)
      test and fix empty xindexes repr (pydata#8521)
      Remove PR labeler bot (pydata#8525)
      Hypothesis strategy for generating Variable objects (pydata#8404)
      Use numbagg for `rolling` methods (pydata#8493)
      Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
      fix RTD docs build (pydata#8519)
      Fix type of `.assign_coords` (pydata#8495)
      ...

commit a57d4ae
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:36:04 2023 -0700

    Test one more warning

commit bf8139d
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:33:45 2023 -0700

    Update xarray/tests/test_groupby.py

commit 4e9a063
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:10:14 2023 -0700

    Set squeeze=None for Dataset too

commit c2e576e
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:54:17 2023 -0700

    Fix first, last

commit 6d8e822
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:46:21 2023 -0700

    better warning

commit 62c334b
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:45:17 2023 -0700

    silence warnings

commit b7805a8
Author: dcherian <[email protected]>
Date:   Tue Aug 15 10:54:25 2023 -0600

    Deprecate `squeeze` in GroupBy.

    Closes pydata#2157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants