Skip to content

Commit

Permalink
🔥 delegate nan behavior to aggregators (#294)
Browse files Browse the repository at this point in the history
* 🔥 delegate nan behavior to aggregators

* 🙈 formatting + fixing tests

* 💨 formatting

* 🖍️ updating changelog

* ✨ altering tests (to reduce false negatives)

* 💨 adding changelog
  • Loading branch information
jonasvdd authored Mar 12, 2024
1 parent 3ad7dea commit 17b6411
Show file tree
Hide file tree
Showing 8 changed files with 169 additions and 328 deletions.
43 changes: 43 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,46 @@
# `TODO`
## New Features

## What's Changed
- Removed the `check_nans` argument of the FigureResampler constructor and its `add_traces` method. This argument was used to check for NaNs in the input data, but this is now handled by the `nan_policy` argument of specific aggregators (see for instance the constructor of the `MinMax` and `MinMaxLTTB` aggregator).


# v0.9.2
### `overview` / `rangeslider` support 🎉

* ➡️ [code example](https://github.com/predict-idlab/plotly-resampler/blob/main/examples/dash_apps/05_cache_overview_subplots.py):
* 🖍️ [high level docs](https://predict-idlab.github.io/plotly-resampler/v0.9.2/getting_started/#overview)
* 🔍 [API docs](https://predict-idlab.github.io/plotly-resampler/v0.9.2/api/figure_resampler/figure_resampler/#figure_resampler.figure_resampler.FigureResampler.__init__)
* make sure to take a look at the doc strings of the `create_overview`, `overview_row_idxs`, and `overview_kwargs` arguments of the `FigureResampler` its constructor.
![Peek 2023-10-25 01-51](https://github.com/predict-idlab/plotly-resampler/assets/38005924/5b3a40e0-f058-4d7e-8303-47e51896347a)



### 💨 remove [traceUpdater](https://github.com/predict-idlab/trace-updater) dash component as a dependency.
> **context**: see #281 #271
> `traceUpdater` was developed during a period when Dash did not yet contain the [Patch ](https://dash.plotly.com/partial-properties)feature for partial property updates. As such, `traceUpdater` has become somewhat redundant is now effectively replaced with Patch.
🚨 This is a breaking change with previous `Dash` apps!!!

## What's Changed
* Support nested admonitions by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/245
* 👷 build: create codeql.yml by @NielsPraet in https://github.com/predict-idlab/plotly-resampler/pull/248
* :sparkles: first draft of improved xaxis filtering by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/250
* :arrow_up: update dependencies by @jvdd in https://github.com/predict-idlab/plotly-resampler/pull/260
* :muscle: update dash-extensions by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/261
* fix for #263 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/264
* Rangeslider support by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/254
* :pray: fix mkdocs by @jvdd in https://github.com/predict-idlab/plotly-resampler/pull/268
* ✈️ fix for #270 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/272
* :mag: adding init kwargs to show dash - fix for #265 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/269
* Refactor/remove trace updater by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/281
* Bug/pop rangeselector by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/279
* :sparkles: fix for #275 by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/286
* Bug/rangeselector by @jonasvdd in https://github.com/predict-idlab/plotly-resampler/pull/287


**Full Changelog**: https://github.com/predict-idlab/plotly-resampler/compare/v0.9.1...v0.9.2


# v0.9.1
## Major changes:
Expand Down
27 changes: 23 additions & 4 deletions plotly_resampler/aggregation/aggregators.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
LTTBDownsampler,
MinMaxDownsampler,
MinMaxLTTBDownsampler,
NaNMinMaxDownsampler,
NaNMinMaxLTTBDownsampler,
)

from ..aggregation.aggregation_interface import DataAggregator, DataPointSelector
Expand Down Expand Up @@ -171,18 +173,25 @@ class MinMaxAggregator(DataPointSelector):
"""

def __init__(self, **downsample_kwargs):
def __init__(self, nan_policy="omit", **downsample_kwargs):
"""
Parameters
----------
**downsample_kwargs
Keyword arguments passed to the :class:`MinMaxDownsampler`.
- The `parallel` argument is set to False by default.
nan_policy: str, optional
The policy to handle NaNs. Can be 'omit' or 'keep'. By default, 'omit'.
"""
# this downsampler supports all dtypes
super().__init__(**downsample_kwargs)
self.downsampler = MinMaxDownsampler()
if nan_policy not in ("omit", "keep"):
raise ValueError("nan_policy must be either 'omit' or 'keep'")
if nan_policy == "omit":
self.downsampler = MinMaxDownsampler()
else:
self.downsampler = NaNMinMaxDownsampler()

def _arg_downsample(
self,
Expand All @@ -208,21 +217,31 @@ class MinMaxLTTB(DataPointSelector):
Paper: [https://arxiv.org/pdf/2305.00332.pdf](https://arxiv.org/pdf/2305.00332.pdf)
"""

def __init__(self, minmax_ratio: int = 4, **downsample_kwargs):
def __init__(
self, minmax_ratio: int = 4, nan_policy: str = "omit", **downsample_kwargs
):
"""
Parameters
----------
minmax_ratio: int, optional
The ratio between the number of data points in the MinMax-prefetching and
the number of data points that will be outputted by LTTB. By default, 4.
nan_policy: str, optional
The policy to handle NaNs. Can be 'omit' or 'keep'. By default, 'omit'.
**downsample_kwargs
Keyword arguments passed to the `MinMaxLTTBDownsampler`.
- The `parallel` argument is set to False by default.
- The `minmax_ratio` argument is set to 4 by default, which was empirically
proven to be a good default.
"""
self.minmaxlttb = MinMaxLTTBDownsampler()
if nan_policy not in ("omit", "keep"):
raise ValueError("nan_policy must be either 'omit' or 'keep'")
if nan_policy == "omit":
self.minmaxlttb = MinMaxLTTBDownsampler()
else:
self.minmaxlttb = NaNMinMaxLTTBDownsampler()

self.minmax_ratio = minmax_ratio

super().__init__(
Expand Down
58 changes: 7 additions & 51 deletions plotly_resampler/figure_resampler/figure_resampler_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,6 @@ def _parse_get_trace_props(
hf_hovertext: Iterable = None,
hf_marker_size: Iterable = None,
hf_marker_color: Iterable = None,
check_nans: bool = True,
) -> _hf_data_container:
"""Parse and capture the possibly high-frequency trace-props in a datacontainer.
Expand All @@ -572,11 +571,6 @@ def _parse_get_trace_props(
hf_hovertext : Iterable, optional
High-frequency trace "hovertext" data, overrides the current trace its
hovertext data.
check_nans: bool, optional
Whether the `hf_y` should be checked for NaNs, by default True.
As checking for NaNs is expensive, this can be disabled when the `hf_y` is
already known to contain no NaNs (or when the downsampler can handle NaNs,
e.g., EveryNthPoint).
Returns
-------
Expand Down Expand Up @@ -654,7 +648,8 @@ def _parse_get_trace_props(
if hf_y.ndim != 0: # if hf_y is an array
hf_x = pd.RangeIndex(0, len(hf_y)) # np.arange(len(hf_y))
else: # if no data as y or hf_y is passed
hf_x = np.asarray(None)
hf_x = np.asarray([])
hf_y = np.asarray([])

assert hf_y.ndim == np.ndim(hf_x), (
"plotly-resampler requires scatter data "
Expand All @@ -677,22 +672,6 @@ def _parse_get_trace_props(
if isinstance(hf_marker_color, (tuple, list, np.ndarray, pd.Series)):
hf_marker_color = np.asarray(hf_marker_color)

# Remove NaNs for efficiency (storing less meaningless data)
# NaNs introduce gaps between enclosing non-NaN data points & might distort
# the resampling algorithms
if check_nans and pd.isna(hf_y).any():
not_nan_mask = ~pd.isna(hf_y)
hf_x = hf_x[not_nan_mask]
hf_y = hf_y[not_nan_mask]
if isinstance(hf_text, np.ndarray):
hf_text = hf_text[not_nan_mask]
if isinstance(hf_hovertext, np.ndarray):
hf_hovertext = hf_hovertext[not_nan_mask]
if isinstance(hf_marker_size, np.ndarray):
hf_marker_size = hf_marker_size[not_nan_mask]
if isinstance(hf_marker_color, np.ndarray):
hf_marker_color = hf_marker_color[not_nan_mask]

# Try to parse the hf_x data if it is of object type or
if len(hf_x) and (hf_x.dtype.type is np.str_ or hf_x.dtype == "object"):
try:
Expand Down Expand Up @@ -876,7 +855,6 @@ def add_trace(
hf_hovertext: Union[str, Iterable] = None,
hf_marker_size: Union[str, Iterable] = None,
hf_marker_color: Union[str, Iterable] = None,
check_nans: bool = True,
**trace_kwargs,
):
"""Add a trace to the figure.
Expand Down Expand Up @@ -932,13 +910,6 @@ def add_trace(
hf_marker_color: Iterable, optional
The original high frequency marker color. If set, this has priority over the
trace its ``marker.color`` argument.
check_nans: boolean, optional
If set to True, the trace's data will be checked for NaNs - which will be
removed. By default True.
As this is a costly operation, it is recommended to set this parameter to
False if you are sure that your data does not contain NaNs (or when the
downsampler can handle NaNs, e.g., EveryNthPoint). This should considerably
speed up the graph construction time.
**trace_kwargs: dict
Additional trace related keyword arguments.
e.g.: row=.., col=..., secondary_y=...
Expand Down Expand Up @@ -1019,7 +990,6 @@ def add_trace(
hf_hovertext,
hf_marker_size,
hf_marker_color,
check_nans,
)

# These traces will determine the autoscale its RANGE!
Expand Down Expand Up @@ -1078,7 +1048,6 @@ def add_traces(
downsamplers: None | List[AbstractAggregator] | AbstractAggregator = None,
gap_handlers: None | List[AbstractGapHandler] | AbstractGapHandler = None,
limit_to_views: List[bool] | bool = False,
check_nans: List[bool] | bool = True,
**traces_kwargs,
):
"""Add traces to the figure.
Expand Down Expand Up @@ -1124,14 +1093,6 @@ def add_traces(
by default False.\n
Remark that setting this parameter to True ensures that low frequency traces
are added to the ``hf_data`` property.
check_nans : None | List[bool] | bool, optional
List of check_nans booleans for the added traces. If set to True, the
trace's datapoints will be checked for NaNs. If a single boolean is passed,
all to be added traces will use this value, by default True.\n
As this is a costly operation, it is recommended to set this parameter to
False if the data is known to contain no NaNs (or when the downsampler can
handle NaNs, e.g., EveryNthPoint). This will considerably speed up the graph
construction time.
**traces_kwargs: dict
Additional trace related keyword arguments.
e.g.: rows=.., cols=..., secondary_ys=...
Expand Down Expand Up @@ -1174,16 +1135,11 @@ def add_traces(
gap_handlers = [gap_handlers] * len(data)
if isinstance(limit_to_views, bool):
limit_to_views = [limit_to_views] * len(data)
if isinstance(check_nans, bool):
check_nans = [check_nans] * len(data)

zipped = zip(
data, max_n_samples, downsamplers, gap_handlers, limit_to_views, check_nans
)
for (
i,
(trace, max_out, downsampler, gap_handler, limit_to_view, check_nan),
) in enumerate(zipped):
zipped = zip(data, max_n_samples, downsamplers, gap_handlers, limit_to_views)
for (i, (trace, max_out, downsampler, gap_handler, limit_to_view)) in enumerate(
zipped
):
if (
trace.type.lower() not in self._high_frequency_traces
or self._hf_data.get(trace.uid) is not None
Expand All @@ -1194,7 +1150,7 @@ def add_traces(
if not limit_to_view and (trace.y is None or len(trace.y) <= max_out_s):
continue

dc = self._parse_get_trace_props(trace, check_nans=check_nan)
dc = self._parse_get_trace_props(trace)
self._hf_data[trace.uid] = self._construct_hf_data_dict(
dc,
trace=trace,
Expand Down
Loading

0 comments on commit 17b6411

Please sign in to comment.