-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REF-1988] API to Get instance of Arbitrary State class #2678
Conversation
Rudimentary protection for state instance access from a background task (StateProxy)
Fix StateProxy for substates and parent_state attributes (have to handle in __getattr__, not property) Fix type annotation for `get_state`
Reset the substate tracking only when the class is instantiated.
Ensure that `get_state` returns the proper "branch" of the state tree depending on what substate is requested.
…True Avoid user errors trying to directly instantiate State classes
Unify the implementation of generating and decoding the token + state name format used for redis state sharding.
7ecda2d
to
48bd579
Compare
read and write substates concurrently (allow redis to shine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax for accessing the state looks good.
On a broader note, I'm scared the code is getting too complicated, particularly now the state class has so many methods and that file is huge. It
The methods are also a bit intimidating - for a newcomer I'm not sure they'd be able to reason around our codebase very easily now. We should have a design meeting to look over how we're doing anything to see if there's some things we can simplify.
reflex/state.py
Outdated
""" | ||
if not _reflex_internal_init and not _is_testing_env(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of checking _is_testing_env
we should check the inverse, is_reflex_run
. So that when external users work with it, they don't have to set this flag (similar to the hack I had to do in flexdown)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets talk about this on monday, i'm not really grokking the suggestion. how does is_reflex_run
get determined?
@@ -218,24 +262,39 @@ def __init__( | |||
*args, | |||
parent_state: BaseState | None = None, | |||
init_substates: bool = True, | |||
_reflex_internal_init: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle this with only the flag to avoid adding another argument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a broader note, I'm scared the code is getting too complicated, particularly now the state class has so many methods and that file is huge.
I agree, there's too much going on inside the BaseState
. I have some ideas for simplifying it, particularly via the use of descriptors for Var and EventHandler access, could clean up a lot of the __init_subclass__
logic where we treat the same attribute differently on the class vs the instance.
The methods are also a bit intimidating
I split up get_state
into 6 smaller methods with hopefully more descriptive readable names to help paint a picture of what's going on. I don't see an easy way to get away from the intimidation of the BaseState
generally though... I was thinking we could split up some of the distinct functionality into a few mixin classes (i.e. one for var/dirty/delta management, one for substate/parent state management, and one for event processing), but i'm not sure that would be more enlightening or worth the time at this point.
I think a bigger state refactoring is on the horizon, but I don't think we can justify further investment on the BaseState
after getting this API in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem here if a substate has a cached_var that depends on something in a parent state, but that substate isn't loaded because the event is being processed on a sibling state, then the cached var doesn't update (and it throws an exception).
This has been fixed with the merge of #2725, now cached_var
are fetched, similarly to var
. This might be inefficient, but it's necessary to ensure consistency of updates.
Avoid loading the entire state tree to process these common internal events. If the state tree is very large, this allow page navigation to occur more quickly. Pre-fetch substates that contain cached vars, as they may need to be recomputed if certain vars change.
This is a waste of time and memory, and can be handled via a special case in __getattribute__
Avoid wasting time serializing states that have no modifications
Wait until the state is actually modified, and then persist it as part of `set_state`. Factor out common logic into helper methods for readability and to reduce duplication of common logic. To avoid having to recursively call `get_state`, which would require persisting the instance and then getting it again, some of the initialization logic regarding parent_state and substates is duplicated when creating a new instance. This is for performance reasons.
substate = substates[substate_name] | ||
substate.dirty_vars.add(var) | ||
substate._mark_dirty() | ||
|
||
def _get_was_touched(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called set_was_touched
? I got confused below on line 1566 when you called it, as I didn't realize this also sets the value. The get
is just the attribute itself right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i went back and forth on what to call this one... The flag is called _was_touched
, and this function _get_was_touched
checks both the flag and other conditions (and then updates the flag when those conditions indicate touching has occurred). The reason you would call this function is to decide whether the state has been modified since it was instantiated. I'll split the getting and updating logic out to clarify.
Improve clarity in cases where _get_was_touched was being called for its side effects only.
) * WiP get_state * Refactor get_state fast path Rudimentary protection for state instance access from a background task (StateProxy) * retain dirty substate marking per `_mark_dirty` call to avoid test changes * Find common ancestor by part instead of by character Fix StateProxy for substates and parent_state attributes (have to handle in __getattr__, not property) Fix type annotation for `get_state` * test_state: workflow test for `get_state` functionality * Do not reset _always_dirty_substates when adding vars Reset the substate tracking only when the class is instantiated. * test_state_tree: test substate access in a larger state tree Ensure that `get_state` returns the proper "branch" of the state tree depending on what substate is requested. * test_format: fixup broken tests from adding substates of TestState * Fix flaky integration tests with more polling * AppHarness: reset _always_dirty_substates on rx.State * RuntimeError unless State is instantiated with _reflex_internal_init=True Avoid user errors trying to directly instantiate State classes * Helper functions for _substate_key and _split_substate_key Unify the implementation of generating and decoding the token + state name format used for redis state sharding. * StateManagerRedis: use create_task in get_state and set_state read and write substates concurrently (allow redis to shine) * test_state_inheritance: use polling cuz life too short for flaky tests kthnxbai ❤️ * Move _is_testing_env to reflex.utils.exec.is_testing_env Reuse the code in app.py * Break up `BaseState.get_state` and friends into separate methods * Add test case for pre-fetching cached var dependency * Move on_load_internal and update_vars_internal to substates Avoid loading the entire state tree to process these common internal events. If the state tree is very large, this allow page navigation to occur more quickly. Pre-fetch substates that contain cached vars, as they may need to be recomputed if certain vars change. * Do not copy ROUTER_DATA into all substates. This is a waste of time and memory, and can be handled via a special case in __getattribute__ * Track whether State instance _was_touched Avoid wasting time serializing states that have no modifications * Do not persist states in `StateManagerRedis.get_state` Wait until the state is actually modified, and then persist it as part of `set_state`. Factor out common logic into helper methods for readability and to reduce duplication of common logic. To avoid having to recursively call `get_state`, which would require persisting the instance and then getting it again, some of the initialization logic regarding parent_state and substates is duplicated when creating a new instance. This is for performance reasons. * Remove stray print() * context.js.jinja2: fix check for empty local storage / cookie vars * Add comments for onLoadInternalEvent and initialEvents * nit: typo * split _get_was_touched into _update_was_touched Improve clarity in cases where _get_was_touched was being called for its side effects only. * Remove extraneous information from incorrect State instantiation error * Update missing redis exception message
) * WiP get_state * Refactor get_state fast path Rudimentary protection for state instance access from a background task (StateProxy) * retain dirty substate marking per `_mark_dirty` call to avoid test changes * Find common ancestor by part instead of by character Fix StateProxy for substates and parent_state attributes (have to handle in __getattr__, not property) Fix type annotation for `get_state` * test_state: workflow test for `get_state` functionality * Do not reset _always_dirty_substates when adding vars Reset the substate tracking only when the class is instantiated. * test_state_tree: test substate access in a larger state tree Ensure that `get_state` returns the proper "branch" of the state tree depending on what substate is requested. * test_format: fixup broken tests from adding substates of TestState * Fix flaky integration tests with more polling * AppHarness: reset _always_dirty_substates on rx.State * RuntimeError unless State is instantiated with _reflex_internal_init=True Avoid user errors trying to directly instantiate State classes * Helper functions for _substate_key and _split_substate_key Unify the implementation of generating and decoding the token + state name format used for redis state sharding. * StateManagerRedis: use create_task in get_state and set_state read and write substates concurrently (allow redis to shine) * test_state_inheritance: use polling cuz life too short for flaky tests kthnxbai ❤️ * Move _is_testing_env to reflex.utils.exec.is_testing_env Reuse the code in app.py * Break up `BaseState.get_state` and friends into separate methods * Add test case for pre-fetching cached var dependency * Move on_load_internal and update_vars_internal to substates Avoid loading the entire state tree to process these common internal events. If the state tree is very large, this allow page navigation to occur more quickly. Pre-fetch substates that contain cached vars, as they may need to be recomputed if certain vars change. * Do not copy ROUTER_DATA into all substates. This is a waste of time and memory, and can be handled via a special case in __getattribute__ * Track whether State instance _was_touched Avoid wasting time serializing states that have no modifications * Do not persist states in `StateManagerRedis.get_state` Wait until the state is actually modified, and then persist it as part of `set_state`. Factor out common logic into helper methods for readability and to reduce duplication of common logic. To avoid having to recursively call `get_state`, which would require persisting the instance and then getting it again, some of the initialization logic regarding parent_state and substates is duplicated when creating a new instance. This is for performance reasons. * Remove stray print() * context.js.jinja2: fix check for empty local storage / cookie vars * Add comments for onLoadInternalEvent and initialEvents * nit: typo * split _get_was_touched into _update_was_touched Improve clarity in cases where _get_was_touched was being called for its side effects only. * Remove extraneous information from incorrect State instantiation error * Update missing redis exception message
Implements a new api on
rx.State
:other = await self.get_state(MyOtherState)
This is called from an event handler to get access to the latest instance of
MyOtherState
associated with the same token as the currently running event handler.How it Works
With the recent Redis state sharding PR #2574, instead of fetching the entire state tree to process an event, only the "branch" of the state tree pertaining to the event handler is fetched from redis and deserialized.
For the given state containing the event handler the following are considered a "branch":
To access other "sibling" / "cousin" states, a new async API
State.get_state
is provided which can dynamically fetch the requested state from redis and patch it into the state tree as another "branch".Practically, this means that state can scale horizontally without significant performance overhead (aside from hydration which occurs when the websocket comes up and on_load which occurs on page navigation). It also minimizes the changes needed to the existing State API, retaining compatibility with existing apps (that don't use
parent_state
chasing).Delta calculations are performed in exactly the same way as before: by finding the top-level rx.State and recursively calling
get_delta
on all its substates... the difference is that not all substates are included anymore, so the calculation can be performed more quickly.Similarly during serialization, the top-level
rx.State
and its substates are persisted to redis, but only the substates that were previously fetched are serialized. If an event only targets a single substate in a mostly flat state hierachy, this has significant performance benefits for rapidly occuring events, like controlled sliders.Limitations and Caveats
Computed vars and cached computed vars cannot use this API, because properties do not work with
async
/await
. This is probably fine: just define the computed var within the state containing the vars it will use.Avoid computed vars in a deeply nested substate, because all of the parents will have to be fetched and instantiated for every event 😱. Any substate containing a computed var will be fetched for every event, to ensure it always gets recalculated.
Using an
rx.cached_var
is nice here, because those only get fetched when the "branch" containing it is fetched, since we already know what vars and substates the cached var depends on, and can be assured that it can only depend on vars in itself or one of its parent states. Breaking Change: When calculating the deps of a computed var, we no longer allow access toparent_state
,substates
,get_substate
, orget_state
(and it wouldn't work anyway). This ensures that all vars can be properly accounted for, and avoids the user shooting themselves in the foot with a cached_var that won't update properly.Background events must first call
async with self
before accessing this API. However after the block exits, there is no protection against subsequent modifications to the instance of the state, although these changes cannot be persisted and will not be considered for delta calculation (i.e. it just wont work). Down the road, we might implement some guardrails that will raise an exception when attempting to modify a stale state instance.All of this only applies to redis use. The in-memory state manager used in dev mode works the same way it always has: keeping everything in memory all the time.
Bonus
create_task
to achieve better concurrency. In reflex-web, this results in a 2x speedup on initial state creation (for a new user) and a 10x speedup deserializing existing app state.on_load_internal
handling.Some example code i was playing with
Don't ask me what it does 😹