Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epoch indexing and dataviz layer #451

Merged
merged 96 commits into from
Feb 12, 2023
Merged

Conversation

goodboy
Copy link
Contributor

@goodboy goodboy commented Feb 2, 2023

Another in the push for #420.. and yes I realize this one is massive, but it's mostly internal data processing related stuff for improving curve overlays and making it easier to manage Flume data for visualization.

This adds both epoch time indexing: meaning overlayed curves are now correctly aligned in an "epoch time stamp per sample" sense as well as a new piker.ui._dataviz layer which supports all the higher-level time series processing to enable sane update logic in piker.ui._display update tasks.

To focus, note the new modules:

piker.ui._render.py and ._dataviz.py and piker.data._pathops.py which is a bunch of path processing related code moved out of the ui. subpkg.

Also we drop some reallly old code from the piker.qt subpkgs which was never really anything other then example snippets from ages ago.


As in #449 there are bunch of things yet to change you'll probably want but which aren't yet proposed here:

  • still cluttered L1 labels
  • order mode doesn't really work on the "front most" fqsn
  • still terrible perf in terms of UX latency during chart interaction
  • buncha debuggin print msgs on piker chart console that are cleaned up in later history

@goodboy goodboy force-pushed the multi_symbol_input branch from 38c6680 to 07ab853 Compare February 2, 2023 20:06
@goodboy goodboy force-pushed the epoch_indexing_and_dataviz_layer branch from 6ffaf3d to a50a09f Compare February 2, 2023 20:09
Base automatically changed from multi_symbol_input to master February 9, 2023 21:20
@goodboy goodboy force-pushed the epoch_indexing_and_dataviz_layer branch from ead2e1e to 956cacb Compare February 9, 2023 21:29
@goodboy
Copy link
Contributor Author

goodboy commented Feb 9, 2023

Huh I guess something musta changed in recent msgspec?

from ci:

==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_services.py ____________________
tests/test_services.py:20: in <module>
    from piker.clearing import (
/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/piker/clearing/__init__.py:21: in <module>
    from ._client import open_ems
/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/piker/clearing/_client.py:33: in <module>
    from ._messages import Order, Cancel
/opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/piker/clearing/_messages.py:232: in <module>
    class BrokerdStatus(Struct):
E   TypeError: Using a non-empty mutable collection ({'name': ''}) as a default value is unsafe. Instead configure a `default_factory` for this field.

@guilledk guilledk force-pushed the epoch_indexing_and_dataviz_layer branch from 956cacb to b112d0e Compare February 12, 2023 18:14
The type is better described as a "data visualization":
https://en.wikipedia.org/wiki/Data_and_information_visualization

Add `ChartPlotWidget.get_viz()` to start working towards not accessing
the private table directly XD

We'll probably end up using the name `Flow` for a type that tracks
a collection of composed/cascaded `Flume`s:
https://en.wikipedia.org/wiki/Two-port_network#Cascade_connection
Since these modules no longer contain Qt specific code we might
as well include them in the data sub-package.

Also, add `IncrementalFormatter.index_field` as single point to def the
indexing field that should be used for all x-domain graphics-data
rendering.
Turns out numba/numba#8622 is real
and the suggested `numba.literally` hack doesn't seem to work..
Probably means it doesn't need to be a `Flume` method but it's
convenient to expect the caller to pass in the `np.ndarray` with
a `'time'` field instead of a `timeframe: str` arg; also, return the
slice mask instead of the sliced array as output (again allowing the
caller to do any slicing). Also, handle the slice-outside-time-range
case by just returning the entire index range with a `None` mask.

Adjust `Viz.view_data()` to instead do timeframe (for rt vs. hist shm
array) lookup and equiv array slicing with the returned mask.
Remove harcoded `'index'` field refs from all formatters in a first
attempt at moving towards epoch-time alignment (though don't actually
use it it yet).

Adjustments to the formatter interface:
- property for `.xy_nd` the x/y nd arrays.
- property for and `.xy_slice` the nd format array(s) start->stop index
  slice.

Internal routine tweaks:
- drop `read_src_from_key` and always pass full source array on updates
  and adjust handlers to expect to have to index the data field of
  interest.
- set `.last_read` right after update calls instead of after 1d
  conversion.
- drop `slice_to_head` array read slicing.
- add some debug points for testing 'time' indexing (though not used
  here yet).
- add `.x_nd` array update logic for when the `.index_field` is not
  'index' - i.e. when we begin to try and support epoch time.
- simplify some new y_nd updates to not require use of `np.broadcast()`
  where possible.
Don't expect values (array + slice) to be returned and applied by
`.incr_update_xy_nd()` and instead presume this will implemented
internally in each (sub)formatter.

Attempt to simplify some incr-update routines, (particularly in the step
curve formatter, though most of it was reverted to just a simpler form
of the original implementation XD) including:
- dropping the need for the `slice_to_head: int` control.
- using the `xy_nd_start/stop` index counters over custom lookups.
As in make the call to `Flume.slice_from_time()` to try and convert any
time index values from the view range to array-indices; all untested
atm.

Also drop some old/unused/moved methods:
- `._set_xlimits()`
- `.bars_range()`
- `.curve_width_pxs()`

and fix some `flow` -> `viz` var naming.
In an effort to make it easy to override the indexing scheme.

Further, this repairs the `.datums_range()` special case to handle when
the view box is to-the-right-of the data set (i.e. l > datum_start).
Previously we were drawing with the middle of the bar on each index with
arms to either side: +/- some arm length. Instead this changes so that
each bar is drawn *after* each index/timestamp such that in graphics
coords the bar span more correctly matches the time span in the
x-domain. This makes the linked region between slow and fast chart
directly match (without any transform) for epoch-time indexing such that
the last x-coord in view on the fast chart is no more then the
next time step in (downsampled) slow view.

Deats:
- adjust in `._pathops.path_arrays_from_ohlc()` and take an `bar_w` bar
  width input (normally taken from the data step size).
- change `.ui._ohlc.bar_from_ohlc_row()` and
  `BarItems.draw_last_datum()` to match.
Define the x-domain coords "offset" (determining the curve graphics
per-datum placement) for each formatter such that there's only on place
to change it when needed. Obviously each graphics type has it's own
dimensionality and this is reflected by the array shapes on each
subtype.
Might as well since it makes the chart look less gappy and we can easily
flip the index switch now B)

Also adds a new `'i_slow_last'` key to `DisplayState` for a singleton
across all slow charts and thus no more need for special case logic in
`viz.incr_info()`.
Mainly it was the global (should we )increment logic that needs to be
independent for the fast vs. slow chart such that the slow isn't
update-shifted by the fast and vice versa. We do this using a new
`'i_last_slow'` key in the `DisplayState.globalz: dict` which is
singleton for each sample-rate-specific chart and works for both time
and array indexing.

Also, we drop some old commented `graphics.draw_last_datum()` code that
never ended up being needed again inside the coordinate cache reset
bloc.
Turns out we were updating the wrong ``Viz``/``DisplayState`` inside the
closure style `increment_history_view()`` (probably due to looping
through the flumes and dynamically closing in that task-func).. Instead
define the history incrementer at module level and pass in the
`DisplayState` explicitly. Further rework the `DisplayState` attrs to be
more focused around the `Viz` associated with the fast and slow chart
and be sure to adjust output from each `Viz.incr_info()` call to latest
update. Oh, and just tweaked the line palette for the moment.

FYI "treading" here is referring to  the x-shifting of the curve when
the last datum is in view such that on new sampled appends the "last"
datum is kept in the same x-location in UI terms.
Previously with array-int indexing we had to map the input x-domain
"indexes" passed to `DynamicDateAxis._indexes_to_timestr()`. In the
epoch-time indexing case we obviously don't need to lookup time stamps
from the underlying shm array and can instead just cast to `int` and
relay the values verbatim.

Further, this patch includes some style adjustments to `AxisLabel` to
better enable multi-feed chart overlays by avoiding L1 label clutter
when multiple y-axes are stacked adjacent:
- adjust the `Axis` typical max string to include a couple spaces suffix
 providing for a bit more margin between side-by-side y-axes.
- make the default label (fill) color the "default" from the global
 color scheme and drop it's opacity to .9
- add some new label placement options and use them in the
 `.boundingRect()` method:
 * `._x/y_br_offset` for relatively shifting the overall label relative
   to it's parent axis.
 * `._y_txt_h_scaling` for increasing the bounding rect's height
   without including more whitespace in the label's text content.
- ensure labels have a high z-value such that by default they are always
 placed "on top" such that when we adjust the l1 labels they can be set
 to a lower value and thus never obscure the last-price label.
We want the fast and slow chart to behave the same on calls to
`Viz.default_view()` so adjust the offset calc to make both work:
- just offset by the line len regardless of step / uppx
- add back the `should_line: bool` output from `render_bar_items()` (and
  use it to set a new `ds_allowed: bool` guard variable) so that we can
  bypass calling the m4 downsampler unless the bars have been switched
  to the interpolation line graphic (which we normally required before
  any downsampling of OHLC graphics data).

Further, this drops use of the `use_vr: bool` flag from all rendering
since we pretty much always use it by default.
Doesn't seem like we really need to handle the situation where the start
or stop input time stamps are outside the index range of the data since
the new binary search handling via `numpy.searchsorted()` covers this
case at minimal runtime cost and with an equally correct output. Allows
us to drop some other indexing endpoint internal variables as well.
Use proper uppx scaling when either of scaling the data to the x-domain
index-range or when the uppx is < 1 (now that we support it) such that
both the fast and slow chart always appropriately scale and offset to
the y-axis with the last datum graphic just adjacent to the order line
arrow markers.

Further this fixes the `.index_step()` calc to use the "earliest" 16
values to compute the expected sample step diff since the last set often
contained gaps due to start up race conditions and generated
unexpected/incorrect output.

Further this drops the `.curve_width_pxs()` method and replaces it with
`.px_width()`, taken from the graphics object API and instead returns
the pixel account for the whole view width instead of the
x-domain-data-range within the view.
Factor some common methods into the parent type:
- `.x_uppx()` for reading the horizontal units-per-pixel.
- `.x_last()` for reading the "closest to y-axis" last datum coordinate
  for zooming "around" during mouse interaction.
- `.px_width()` for computing the max width of any curve in view in
  pixels.

Adjust all previous derived `pg.GraphicsObject` child types to now
inherit from this new parent and in particular enable proper `.x_uppx()`
support to `BarItems`.
In the case where the last-datum-graphic hasn't been created yet, simply
return a `None` from this method so the caller can choose to ignore the
output. Further, drop `.px_width()` since it makes more sense defined on
`Viz` as well as the previously commented `BarItems.x_uppx()` method.
Also, don't round the `.x_uppx()` output since it can then be used when
< 1 to do x-domain scaling during high zoom usage.
@goodboy goodboy force-pushed the epoch_indexing_and_dataviz_layer branch from b112d0e to 340045a Compare February 12, 2023 19:00
@goodboy
Copy link
Contributor Author

goodboy commented Feb 12, 2023

@guilledk once you get the protobuf dep fix in we can land this i think.

@goodboy goodboy merged commit d690ad2 into master Feb 12, 2023
@goodboy goodboy deleted the epoch_indexing_and_dataviz_layer branch February 12, 2023 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants