🔥 delegate nan behavior to aggregators (#294)

* 🔥 delegate nan behavior to aggregators * 🙈 formatting + fixing tests * 💨 formatting * 🖍️ updating changelog * ✨ altering tests (to reduce false negatives) * 💨 adding changelog
predict-idlab · Mar 12, 2024 · 17b6411 · 17b6411
1 parent 3ad7dea
commit 17b6411
Show file tree

Hide file tree

Showing 8 changed files with 169 additions and 328 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,46 @@
+# `TODO`
+## New Features
+
+## What's Changed
+- Removed the `check_nans` argument of the FigureResampler constructor and its `add_traces` method. This argument was used to check for NaNs in the input data, but this is now handled by the `nan_policy` argument of specific aggregators (see for instance the constructor of the `MinMax` and `MinMaxLTTB` aggregator).
+
+
+# v0.9.2
+### ⚡ `overview` / `rangeslider` support 🎉 
+
+* ➡️  [code example](https://github.com/predict-idlab/plotly-resampler/blob/main/examples/dash_apps/05_cache_overview_subplots.py):
+* 🖍️ [high level docs](https://predict-idlab.github.io/plotly-resampler/v0.9.2/getting_started/#overview)
+* 🔍 [API docs](https://predict-idlab.github.io/plotly-resampler/v0.9.2/api/figure_resampler/figure_resampler/#figure_resampler.figure_resampler.FigureResampler.__init__)
+  * make sure to take a look at the doc strings of the `create_overview`, `overview_row_idxs`, and `overview_kwargs` arguments of the  `FigureResampler` its constructor.
+![Peek 2023-10-25 01-51](https://github.com/predict-idlab/plotly-resampler/assets/38005924/5b3a40e0-f058-4d7e-8303-47e51896347a)
+
+
+
+### 💨 remove [traceUpdater](https://github.com/predict-idlab/trace-updater) dash component as a dependency.
+> **context**: see #281 #271 
+> `traceUpdater` was developed during a period when Dash did not yet contain the [Patch ](https://dash.plotly.com/partial-properties)feature for partial property updates. As such, `traceUpdater` has become somewhat redundant is now effectively replaced with Patch.
+
+🚨 This is a breaking change with previous `Dash` apps!!!
+
+## What's Changed
+* Support nested admonitions  by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/245
+* 👷 build: create codeql.yml by @NielsPraet in https://github.com/predict-idlab/plotly-resampler/pull/248
+* :sparkles: first draft of improved xaxis filtering by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/250
+* :arrow_up: update dependencies by @jvdd in https://github.com/predict-idlab/plotly-resampler/pull/260
+* :muscle: update dash-extensions by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/261
+* fix for #263 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/264
+* Rangeslider support by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/254
+* :pray: fix mkdocs by @jvdd in https://github.com/predict-idlab/plotly-resampler/pull/268
+* ✈️  fix for #270 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/272
+* :mag: adding init kwargs to show dash - fix for #265 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/269
+* Refactor/remove trace updater by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/281
+* Bug/pop rangeselector by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/279
+* :sparkles: fix for #275 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/286
+* Bug/rangeselector by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/287
+
+
+**Full Changelog**: https://github.com/predict-idlab/plotly-resampler/compare/v0.9.1...v0.9.2
+
 
 # v0.9.1
 ## Major changes:

diff --git a/plotly_resampler/aggregation/aggregators.py b/plotly_resampler/aggregation/aggregators.py
@@ -17,6 +17,8 @@
     LTTBDownsampler,
     MinMaxDownsampler,
     MinMaxLTTBDownsampler,
+    NaNMinMaxDownsampler,
+    NaNMinMaxLTTBDownsampler,
 )
 
 from ..aggregation.aggregation_interface import DataAggregator, DataPointSelector
@@ -171,18 +173,25 @@ class MinMaxAggregator(DataPointSelector):
 
     """
 
-    def __init__(self, **downsample_kwargs):
+    def __init__(self, nan_policy="omit", **downsample_kwargs):
         """
         Parameters
         ----------
         **downsample_kwargs
             Keyword arguments passed to the :class:`MinMaxDownsampler`.
             - The `parallel` argument is set to False by default.
+        nan_policy: str, optional
+            The policy to handle NaNs. Can be 'omit' or 'keep'. By default, 'omit'.
 
         """
         # this downsampler supports all dtypes
         super().__init__(**downsample_kwargs)
-        self.downsampler = MinMaxDownsampler()
+        if nan_policy not in ("omit", "keep"):
+            raise ValueError("nan_policy must be either 'omit' or 'keep'")
+        if nan_policy == "omit":
+            self.downsampler = MinMaxDownsampler()
+        else:
+            self.downsampler = NaNMinMaxDownsampler()
 
     def _arg_downsample(
         self,
@@ -208,21 +217,31 @@ class MinMaxLTTB(DataPointSelector):
     Paper: [https://arxiv.org/pdf/2305.00332.pdf](https://arxiv.org/pdf/2305.00332.pdf)
     """
 
-    def __init__(self, minmax_ratio: int = 4, **downsample_kwargs):
+    def __init__(
+        self, minmax_ratio: int = 4, nan_policy: str = "omit", **downsample_kwargs
+    ):
         """
         Parameters
         ----------
         minmax_ratio: int, optional
             The ratio between the number of data points in the MinMax-prefetching and
             the number of data points that will be outputted by LTTB. By default, 4.
+        nan_policy: str, optional
+            The policy to handle NaNs. Can be 'omit' or 'keep'. By default, 'omit'.
         **downsample_kwargs
             Keyword arguments passed to the `MinMaxLTTBDownsampler`.
             - The `parallel` argument is set to False by default.
             - The `minmax_ratio` argument is set to 4 by default, which was empirically
               proven to be a good default.
 
         """
-        self.minmaxlttb = MinMaxLTTBDownsampler()
+        if nan_policy not in ("omit", "keep"):
+            raise ValueError("nan_policy must be either 'omit' or 'keep'")
+        if nan_policy == "omit":
+            self.minmaxlttb = MinMaxLTTBDownsampler()
+        else:
+            self.minmaxlttb = NaNMinMaxLTTBDownsampler()
+
         self.minmax_ratio = minmax_ratio
 
         super().__init__(

diff --git a/plotly_resampler/figure_resampler/figure_resampler_interface.py b/plotly_resampler/figure_resampler/figure_resampler_interface.py
@@ -555,7 +555,6 @@ def _parse_get_trace_props(
         hf_hovertext: Iterable = None,
         hf_marker_size: Iterable = None,
         hf_marker_color: Iterable = None,
-        check_nans: bool = True,
     ) -> _hf_data_container:
         """Parse and capture the possibly high-frequency trace-props in a datacontainer.
 
@@ -572,11 +571,6 @@ def _parse_get_trace_props(
         hf_hovertext : Iterable, optional
             High-frequency trace "hovertext" data, overrides the current trace its
             hovertext data.
-        check_nans: bool, optional
-            Whether the `hf_y` should be checked for NaNs, by default True.
-            As checking for NaNs is expensive, this can be disabled when the `hf_y` is
-            already known to contain no NaNs (or when the downsampler can handle NaNs,
-            e.g., EveryNthPoint).
 
         Returns
         -------
@@ -654,7 +648,8 @@ def _parse_get_trace_props(
                 if hf_y.ndim != 0:  # if hf_y is an array
                     hf_x = pd.RangeIndex(0, len(hf_y))  # np.arange(len(hf_y))
                 else:  # if no data as y or hf_y is passed
-                    hf_x = np.asarray(None)
+                    hf_x = np.asarray([])
+                    hf_y = np.asarray([])
 
             assert hf_y.ndim == np.ndim(hf_x), (
                 "plotly-resampler requires scatter data "
@@ -677,22 +672,6 @@ def _parse_get_trace_props(
             if isinstance(hf_marker_color, (tuple, list, np.ndarray, pd.Series)):
                 hf_marker_color = np.asarray(hf_marker_color)
 
-            # Remove NaNs for efficiency (storing less meaningless data)
-            # NaNs introduce gaps between enclosing non-NaN data points & might distort
-            # the resampling algorithms
-            if check_nans and pd.isna(hf_y).any():
-                not_nan_mask = ~pd.isna(hf_y)
-                hf_x = hf_x[not_nan_mask]
-                hf_y = hf_y[not_nan_mask]
-                if isinstance(hf_text, np.ndarray):
-                    hf_text = hf_text[not_nan_mask]
-                if isinstance(hf_hovertext, np.ndarray):
-                    hf_hovertext = hf_hovertext[not_nan_mask]
-                if isinstance(hf_marker_size, np.ndarray):
-                    hf_marker_size = hf_marker_size[not_nan_mask]
-                if isinstance(hf_marker_color, np.ndarray):
-                    hf_marker_color = hf_marker_color[not_nan_mask]
-
             # Try to parse the hf_x data if it is of object type or
             if len(hf_x) and (hf_x.dtype.type is np.str_ or hf_x.dtype == "object"):
                 try:
@@ -876,7 +855,6 @@ def add_trace(
         hf_hovertext: Union[str, Iterable] = None,
         hf_marker_size: Union[str, Iterable] = None,
         hf_marker_color: Union[str, Iterable] = None,
-        check_nans: bool = True,
         **trace_kwargs,
     ):
         """Add a trace to the figure.
@@ -932,13 +910,6 @@ def add_trace(
         hf_marker_color: Iterable, optional
             The original high frequency marker color. If set, this has priority over the
             trace its ``marker.color`` argument.
-        check_nans: boolean, optional
-            If set to True, the trace's data will be checked for NaNs - which will be
-            removed. By default True.
-            As this is a costly operation, it is recommended to set this parameter to
-            False if you are sure that your data does not contain NaNs (or when the
-            downsampler can handle NaNs, e.g., EveryNthPoint). This should considerably
-            speed up the graph construction time.
         **trace_kwargs: dict
             Additional trace related keyword arguments.
             e.g.: row=.., col=..., secondary_y=...
@@ -1019,7 +990,6 @@ def add_trace(
             hf_hovertext,
             hf_marker_size,
             hf_marker_color,
-            check_nans,
         )
 
         # These traces will determine the autoscale its RANGE!
@@ -1078,7 +1048,6 @@ def add_traces(
         downsamplers: None | List[AbstractAggregator] | AbstractAggregator = None,
         gap_handlers: None | List[AbstractGapHandler] | AbstractGapHandler = None,
         limit_to_views: List[bool] | bool = False,
-        check_nans: List[bool] | bool = True,
         **traces_kwargs,
     ):
         """Add traces to the figure.
@@ -1124,14 +1093,6 @@ def add_traces(
             by default False.\n
             Remark that setting this parameter to True ensures that low frequency traces
             are added to the ``hf_data`` property.
-        check_nans : None | List[bool] | bool, optional
-            List of check_nans booleans for the added traces. If set to True, the
-            trace's datapoints will be checked for NaNs. If a single boolean is passed,
-            all to be added traces will use this value, by default True.\n
-            As this is a costly operation, it is recommended to set this parameter to
-            False if the data is known to contain no NaNs (or when the downsampler can
-            handle NaNs, e.g., EveryNthPoint). This will considerably speed up the graph
-            construction time.
         **traces_kwargs: dict
             Additional trace related keyword arguments.
             e.g.: rows=.., cols=..., secondary_ys=...
@@ -1174,16 +1135,11 @@ def add_traces(
             gap_handlers = [gap_handlers] * len(data)
         if isinstance(limit_to_views, bool):
             limit_to_views = [limit_to_views] * len(data)
-        if isinstance(check_nans, bool):
-            check_nans = [check_nans] * len(data)
 
-        zipped = zip(
-            data, max_n_samples, downsamplers, gap_handlers, limit_to_views, check_nans
-        )
-        for (
-            i,
-            (trace, max_out, downsampler, gap_handler, limit_to_view, check_nan),
-        ) in enumerate(zipped):
+        zipped = zip(data, max_n_samples, downsamplers, gap_handlers, limit_to_views)
+        for (i, (trace, max_out, downsampler, gap_handler, limit_to_view)) in enumerate(
+            zipped
+        ):
             if (
                 trace.type.lower() not in self._high_frequency_traces
                 or self._hf_data.get(trace.uid) is not None
@@ -1194,7 +1150,7 @@ def add_traces(
             if not limit_to_view and (trace.y is None or len(trace.y) <= max_out_s):
                 continue
 
-            dc = self._parse_get_trace_props(trace, check_nans=check_nan)
+            dc = self._parse_get_trace_props(trace)
             self._hf_data[trace.uid] = self._construct_hf_data_dict(
                 dc,
                 trace=trace,