Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use C-type long long (8 bytes) type format to pack int64 values #2609

Closed
wants to merge 4 commits into from

Conversation

roelschr
Copy link

What this PR does / why we need it:
This PR fixes a bug related to the materialization of INT64 entity keys, by packing int64 values as C-type "long long". It also adds a new test that tests the serialization behavior with an INT64 entity key value bigger than int32 max.

Which issue(s) this PR fixes:
Fixes #2345.

@roelschr
Copy link
Author

/kind bug

@roelschr
Copy link
Author

roelschr commented Apr 26, 2022

Converted to draft to work on a backwards compatible solution.

@achals achals self-assigned this Apr 26, 2022
@achals achals self-requested a review April 26, 2022 17:18
@achals
Copy link
Member

achals commented Jun 28, 2022

@roelschr friendly ping, are you working on this? If not, I'd be more than happy to invest some time here to introduce this compatibility layer.

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: roelschr
To complete the pull request process, please assign achals after the PR has been reviewed.
You can assign the PR to them by writing /assign @achals in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@achals achals force-pushed the fix-int64-serialization branch from 30b7cce to 4d8cdb8 Compare June 29, 2022 18:23
@codecov-commenter
Copy link

codecov-commenter commented Jun 29, 2022

Codecov Report

Merging #2609 (4d8cdb8) into master (7bb3e21) will decrease coverage by 21.30%.
The diff coverage is 88.88%.

@@             Coverage Diff             @@
##           master    #2609       +/-   ##
===========================================
- Coverage   80.68%   59.38%   -21.31%     
===========================================
  Files         176      177        +1     
  Lines       15670    15685       +15     
===========================================
- Hits        12643     9314     -3329     
- Misses       3027     6371     +3344     
Flag Coverage Δ
integrationtests ?
unittests 59.38% <88.88%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/batch_feature_view.py 100.00% <ø> (ø)
...thon/feast/infra/online_stores/contrib/postgres.py 32.69% <0.00%> (ø)
sdk/python/feast/on_demand_feature_view.py 74.33% <ø> (-6.80%) ⬇️
sdk/python/feast/request_feature_view.py 62.50% <ø> (ø)
sdk/python/feast/infra/online_stores/helpers.py 78.26% <75.00%> (-21.74%) ⬇️
sdk/python/feast/base_feature_view.py 71.11% <100.00%> (-9.58%) ⬇️
sdk/python/feast/feature_view.py 83.76% <100.00%> (-5.18%) ⬇️
sdk/python/feast/infra/key_encoding_utils.py 76.47% <100.00%> (-17.65%) ⬇️
sdk/python/feast/infra/online_stores/sqlite.py 93.70% <100.00%> (-0.79%) ⬇️
sdk/python/feast/registry.py 68.05% <100.00%> (-15.41%) ⬇️
... and 86 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bb3e21...4d8cdb8. Read the comment docs.

@@ -108,12 +108,15 @@ jobs:
node-version: '17.x'
registry-url: 'https://registry.npmjs.org'
- name: Build and install dependencies
# There's a `git restore` in here because `make install-go-ci-dependencies` is actually messing up go.mod & go.sum.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this stuff

if value_type == "string_val":
return v.string_val.encode("utf8"), ValueType.STRING
elif value_type == "bytes_val":
return v.bytes_val, ValueType.BYTES
elif value_type == "int32_val":
return struct.pack("<i", v.int32_val), ValueType.INT32
elif value_type == "int64_val":
return struct.pack("<l", v.int64_val), ValueType.INT64
return struct.pack("<q", v.int64_val), ValueType.INT64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the compatibility approach you're taking here - can you explain?

what do the various values of entity_key_serialization_version represent? and there needs to be some fork in the behavior here depending on entity_key_serialization_version right?

I was imagining we should do something like the following:

  • add entity_key_serialization_version to the protos, as you've done, where 0 will represent the current (broken) serialization method and 1 will represent the new (correct) serialization method
  • ensure that existing feature view protos do not magically get entity_key_serialization_version switched from 0 (the default value since the field doesn't exist) to 1, which requires special logic in the registry (as I think you've done)
  • default all new feature views to use serialization method 1

@@ -75,6 +75,9 @@ message FeatureViewSpec {

// Whether these features should be served online or not
bool online = 8;

// Needed for backwards compatible behaviour when fixing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment after when fixing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INT64 uses "C long" type in struct.pack which is 4 bytes.
7 participants