-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete entity key from Redis only when all attached feature views are gone #2240
Conversation
Signed-off-by: pyalex <[email protected]>
… gone Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
c59dd35
to
9124605
Compare
Signed-off-by: pyalex <[email protected]>
Signed-off-by: pyalex <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #2240 +/- ##
===========================================
- Coverage 85.24% 60.28% -24.96%
===========================================
Files 107 107
Lines 8605 8631 +26
===========================================
- Hits 7335 5203 -2132
- Misses 1270 3428 +2158
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if, for example, you have two different FeatureViews both with the same join_key. Now you want to delete one of those FeatureViews but not the other.
In the Redis data model the features from both feature views are stored as keys under the same HASH. With these changes the extra features won't be deleted because the 'Entity' is still used.
I think there needs to be a method which is able to delete certain Features (from a FeatureView) when that FeatureView is changed or deleted. This method would have to use HDEL
.
I guess this does fix #2150 but it introduces a new bug (which is admittedly less system breaking).
@judahrand |
Why not? The signature of the method you've renamed to Then once you have that you can also determine the fields for that fields = [_mmh3(f"{table.name}:{feature.name}") for feature in table.features ] Once you've done that you just need to loop over the HASHs which match your prefix and call HDEL for each field. I was thinking that def delete_table_values(self, config: RepoConfig, table: FeatureView):
client = self._get_client(config.online_store)
deleted_count = 0
pipeline = client.pipeline()
prefix = _redis_key_prefix(table.entities)
fields = [_mmh3(f"{table.name}:{feature.name}") for feature in table.features]
for _k in client.scan_iter(
b"".join([prefix, b"*", config.project.encode("utf8")])
):
pipeline.hdel(_k, *fields)
deleted_count += 1
pipeline.execute()
logger.debug(f"Deleted {deleted_count} keys for {table.name}") This would at least delete the correct features when an entire FeatureView was deleted.
Yeah, this is true. And arguably is a bigger problem. Letting the OnlineStore just get less and less efficient/bloated etc should not be acceptable. 🤔 I can see, however, that my proposed solution does still have issues. If the FeatureView which has been deleted has been edited a lot in the past (eg. Feature renames, partial removals, etc.) then it will leave some garbage behind. But that feels better to me than what sort of amounts to a reference counter on the whole key - and the key is only released when there are 0 references (even though there could be thousands of unused fields under that key). |
do you think |
@felixwang9817 yes, I hope that with new design and APIs objects like online store will have more control and will receive all information, like about renaming a view or renaming a feature, whereas right now options are very limited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: felixwang9817, pyalex The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add backticks to left_table_query_string (#2250) Signed-off-by: david <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Delete entity key from Redis only when all attached feature views are gone (#2240) * Delete entity from redis when the last attached feature view is deleted Signed-off-by: pyalex <[email protected]> * Delete entity key from Redis only when all attached feature views are gone Signed-off-by: pyalex <[email protected]> * make lint happy Signed-off-by: pyalex <[email protected]> * make lint happy Signed-off-by: pyalex <[email protected]> * one more try with mypy Signed-off-by: pyalex <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * historical_field_mappings2 merge for one sign off commit (#2252) Signed-off-by: Michelle Rascati <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Correct inconsistent dependency (#2255) Signed-off-by: Judah Rand <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Add snowflake environment variables to allow testing on snowflake infra (#2258) * add snowflake environment vars to test framework Signed-off-by: sfc-gh-madkins <[email protected]> * add snowflake environment vars to test framework Signed-off-by: sfc-gh-madkins <[email protected]> * Return `UNIX_TIMESTAMP` as Python `datetime` (#2244) * Refactor `UNIX_TIMESTAMP` conversion Signed-off-by: Judah Rand <[email protected]> * Return `UNIX_TIMESTAMP` types as `datetime` to user Signed-off-by: Judah Rand <[email protected]> * Fix linting errors Signed-off-by: Judah Rand <[email protected]> * Rename variable to something more sensible Signed-off-by: Judah Rand <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Feast plan clean up (#2256) * Run validation and inference on views and entities during plan Signed-off-by: Felix Wang <[email protected]> * Do not log objects that are unchanged Signed-off-by: Felix Wang <[email protected]> * Rename Fco to FeastObject Signed-off-by: Felix Wang <[email protected]> * Remove useless method Signed-off-by: Felix Wang <[email protected]> * Lint Signed-off-by: Felix Wang <[email protected]> * Always initialize registry during feature store initialization Signed-off-by: Felix Wang <[email protected]> * Fix usage test Signed-off-by: Felix Wang <[email protected]> * Remove print statements Signed-off-by: Felix Wang <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Squash commits Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Add error type and refactor query execution to have retries Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Handle more snowflake errors Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Fix lint errors Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Fix lint errors Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Fix lint errors Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * Fix wrong import Signed-off-by: Danny Chiao <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * modify registry.db s3 object initialization to work in S3 subdirectory with Java Feast Server (#2259) Signed-off-by: NalinGHub <[email protected]> Signed-off-by: sfc-gh-madkins <[email protected]> * clean up docs Signed-off-by: sfc-gh-madkins <[email protected]> * lint-python Signed-off-by: sfc-gh-madkins <[email protected]> * fixed historical test Signed-off-by: sfc-gh-madkins <[email protected]> * fixed historical test Signed-off-by: sfc-gh-madkins <[email protected]> Co-authored-by: David Miller <[email protected]> Co-authored-by: Oleksii Moskalenko <[email protected]> Co-authored-by: Michelle Rascati <[email protected]> Co-authored-by: Judah Rand <[email protected]> Co-authored-by: Felix Wang <[email protected]> Co-authored-by: Danny Chiao <[email protected]> Co-authored-by: Nalin Mehra <[email protected]>
What this PR does / why we need it:
This PR adds the test for the use case described in #2150 as well as the fix for it.
Which issue(s) this PR fixes:
Fixes #2150
Does this PR introduce a user-facing change?: