Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete entity key from Redis only when all attached feature views are gone #2240

Merged
merged 5 commits into from
Jan 28, 2022

Conversation

pyalex
Copy link
Collaborator

@pyalex pyalex commented Jan 26, 2022

What this PR does / why we need it:

This PR adds the test for the use case described in #2150 as well as the fix for it.

Which issue(s) this PR fixes:

Fixes #2150

Does this PR introduce a user-facing change?:

NONE

@pyalex pyalex force-pushed the redis-delete-by-entity branch from c59dd35 to 9124605 Compare January 26, 2022 09:36
@pyalex pyalex assigned judahrand and unassigned achals Jan 26, 2022
Signed-off-by: pyalex <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2022

Codecov Report

Merging #2240 (5145f5e) into master (396f729) will decrease coverage by 24.95%.
The diff coverage is 14.28%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #2240       +/-   ##
===========================================
- Coverage   85.24%   60.28%   -24.96%     
===========================================
  Files         107      107               
  Lines        8605     8631       +26     
===========================================
- Hits         7335     5203     -2132     
- Misses       1270     3428     +2158     
Flag Coverage Δ
integrationtests ?
unittests 60.28% <14.28%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/online_stores/redis.py 26.45% <10.00%> (-67.63%) ⬇️
.../integration/online_store/test_universal_online.py 15.00% <13.04%> (-82.24%) ⬇️
...ts/integration/feature_repos/repo_configuration.py 45.71% <50.00%> (-48.58%) ⬇️
.../integration/online_store/test_online_retrieval.py 17.39% <0.00%> (-82.61%) ⬇️
...fline_store/test_universal_historical_retrieval.py 19.18% <0.00%> (-80.82%) ⬇️
sdk/python/tests/utils/online_read_write_test.py 20.68% <0.00%> (-79.32%) ⬇️
...gration/registration/test_feature_service_apply.py 31.25% <0.00%> (-68.75%) ⬇️
sdk/python/tests/data/data_creator.py 34.78% <0.00%> (-65.22%) ⬇️
sdk/python/tests/integration/e2e/test_usage_e2e.py 35.00% <0.00%> (-65.00%) ⬇️
... and 53 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 396f729...5145f5e. Read the comment docs.

Copy link
Member

@judahrand judahrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, for example, you have two different FeatureViews both with the same join_key. Now you want to delete one of those FeatureViews but not the other.

In the Redis data model the features from both feature views are stored as keys under the same HASH. With these changes the extra features won't be deleted because the 'Entity' is still used.

I think there needs to be a method which is able to delete certain Features (from a FeatureView) when that FeatureView is changed or deleted. This method would have to use HDEL.

I guess this does fix #2150 but it introduces a new bug (which is admittedly less system breaking).

@pyalex
Copy link
Collaborator Author

pyalex commented Jan 26, 2022

@judahrand
Truth be told the current storage model used by Redis integration was not designed to delete features at all and generally speaking it's not possible (at least together with the current Provider API). Since we do not store feature names (only hashes) in Redis maps we need to know those names in advance. But what if you deleted some feature or renamed it. At the moment the feature view is deleted there's no guarantee that we can correctly determine all features (Redis map fields) that must be deleted. And Feast APIs are designed in such way that when the feature is renamed or deleted - it's not being clearly communicated to an online store implementation.

@judahrand
Copy link
Member

judahrand commented Jan 26, 2022

Truth be told the current storage model used by Redis integration was not designed to delete features at all and generally speaking it's not possible (at least together with the current Provider API).

Why not? The signature of the method you've renamed to delete_entity_values used to take a FeatureView. From that you can determine the hash key prefix _redis_key_prefix(table.entities).

Then once you have that you can also determine the fields for that FeatureView by doing

fields = [_mmh3(f"{table.name}:{feature.name}") for feature in table.features ]

Once you've done that you just need to loop over the HASHs which match your prefix and call HDEL for each field.

I was thinking that delete_table_values would just need to be changed to look like this:

def delete_table_values(self, config: RepoConfig, table: FeatureView):
        client = self._get_client(config.online_store)
        deleted_count = 0
        pipeline = client.pipeline()
        prefix = _redis_key_prefix(table.entities)
        fields = [_mmh3(f"{table.name}:{feature.name}") for feature in table.features]

        for _k in client.scan_iter(
            b"".join([prefix, b"*", config.project.encode("utf8")])
        ):
            pipeline.hdel(_k, *fields)
            deleted_count += 1
        pipeline.execute()

        logger.debug(f"Deleted {deleted_count} keys for {table.name}")

This would at least delete the correct features when an entire FeatureView was deleted.

Since we do not store feature names (only hashes) in Redis maps we need to know those names in advance.

And Feast APIs are designed in such way that when the feature is renamed or deleted - it's not being clearly communicated to an online store implementation.

Yeah, this is true. And arguably is a bigger problem. Letting the OnlineStore just get less and less efficient/bloated etc should not be acceptable. 🤔


I can see, however, that my proposed solution does still have issues. If the FeatureView which has been deleted has been edited a lot in the past (eg. Feature renames, partial removals, etc.) then it will leave some garbage behind. But that feels better to me than what sort of amounts to a reference counter on the whole key - and the key is only released when there are 0 references (even though there could be thousands of unused fields under that key).

@felixwang9817 felixwang9817 self-assigned this Jan 27, 2022
@felixwang9817
Copy link
Collaborator

@pyalex

Truth be told the current storage model used by Redis integration was not designed to delete features at all and generally speaking it's not possible (at least together with the current Provider API). Since we do not store feature names (only hashes) in Redis maps we need to know those names in advance. But what if you deleted some feature or renamed it. At the moment the feature view is deleted there's no guarantee that we can correctly determine all features (Redis map fields) that must be deleted. And Feast APIs are designed in such way that when the feature is renamed or deleted - it's not being clearly communicated to an online store implementation.

do you think feast plan can resolve this? the goal here is that each online store should report exactly what tables should exist, and the registry contains info on what tables currently exist - then the diff can be applied. so even if you rename a FV, feast plan should be able to tear down the old table and create a new one

@pyalex
Copy link
Collaborator Author

pyalex commented Jan 27, 2022

@felixwang9817 yes, I hope that with new design and APIs objects like online store will have more control and will receive all information, like about renaming a view or renaming a feature, whereas right now options are very limited.

Copy link
Collaborator

@felixwang9817 felixwang9817 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: felixwang9817, pyalex

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 5fc0b52 into feast-dev:master Jan 28, 2022
feast-ci-bot pushed a commit that referenced this pull request Jan 31, 2022
* Add backticks to left_table_query_string (#2250)

Signed-off-by: david <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Delete entity key from Redis only when all attached feature views are gone (#2240)

* Delete entity from redis when the last attached feature view is deleted

Signed-off-by: pyalex <[email protected]>

* Delete entity key from Redis only when all attached feature views are gone

Signed-off-by: pyalex <[email protected]>

* make lint happy

Signed-off-by: pyalex <[email protected]>

* make lint happy

Signed-off-by: pyalex <[email protected]>

* one more try with mypy

Signed-off-by: pyalex <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* historical_field_mappings2 merge for one sign off commit (#2252)

Signed-off-by: Michelle Rascati <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Correct inconsistent dependency (#2255)

Signed-off-by: Judah Rand <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Add snowflake environment variables to allow testing on snowflake infra (#2258)

* add snowflake environment vars to test framework

Signed-off-by: sfc-gh-madkins <[email protected]>

* add snowflake environment vars to test framework

Signed-off-by: sfc-gh-madkins <[email protected]>

* Return `UNIX_TIMESTAMP` as Python `datetime` (#2244)

* Refactor `UNIX_TIMESTAMP` conversion

Signed-off-by: Judah Rand <[email protected]>

* Return `UNIX_TIMESTAMP` types as `datetime` to user

Signed-off-by: Judah Rand <[email protected]>

* Fix linting errors

Signed-off-by: Judah Rand <[email protected]>

* Rename variable to something more sensible

Signed-off-by: Judah Rand <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Feast plan clean up (#2256)

* Run validation and inference on views and entities during plan

Signed-off-by: Felix Wang <[email protected]>

* Do not log objects that are unchanged

Signed-off-by: Felix Wang <[email protected]>

* Rename Fco to FeastObject

Signed-off-by: Felix Wang <[email protected]>

* Remove useless method

Signed-off-by: Felix Wang <[email protected]>

* Lint

Signed-off-by: Felix Wang <[email protected]>

* Always initialize registry during feature store initialization

Signed-off-by: Felix Wang <[email protected]>

* Fix usage test

Signed-off-by: Felix Wang <[email protected]>

* Remove print statements

Signed-off-by: Felix Wang <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Squash commits

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Add error type and refactor query execution to have retries

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Handle more snowflake errors

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Fix lint errors

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Fix lint errors

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Fix lint errors

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* Fix wrong import

Signed-off-by: Danny Chiao <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* modify registry.db s3 object initialization to work in S3 subdirectory with Java Feast Server (#2259)

Signed-off-by: NalinGHub <[email protected]>
Signed-off-by: sfc-gh-madkins <[email protected]>

* clean up docs

Signed-off-by: sfc-gh-madkins <[email protected]>

* lint-python

Signed-off-by: sfc-gh-madkins <[email protected]>

* fixed historical test

Signed-off-by: sfc-gh-madkins <[email protected]>

* fixed historical test

Signed-off-by: sfc-gh-madkins <[email protected]>

Co-authored-by: David Miller <[email protected]>
Co-authored-by: Oleksii Moskalenko <[email protected]>
Co-authored-by: Michelle Rascati <[email protected]>
Co-authored-by: Judah Rand <[email protected]>
Co-authored-by: Felix Wang <[email protected]>
Co-authored-by: Danny Chiao <[email protected]>
Co-authored-by: Nalin Mehra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redis can delete incorrect keys
6 participants