Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT64 uses "C long" type in struct.pack which is 4 bytes. #2345

Closed
varsenyantitan opened this issue Feb 27, 2022 · 8 comments
Closed

INT64 uses "C long" type in struct.pack which is 4 bytes. #2345

varsenyantitan opened this issue Feb 27, 2022 · 8 comments
Labels
kind/bug priority/p0 Highest priority

Comments

@varsenyantitan
Copy link

varsenyantitan commented Feb 27, 2022

Expected Behavior

INT64 should be packed as "C long long" to have 64 bits memory capacity.

Current Behavior

INT64 is packed as "C long" which is 32 bits

Steps to reproduce

Materialization of entities with IDs larger than supported range of "C long" raises exception.

Specifications

The bug is in the _serialize_val function in key_encoding_utils.

def _serialize_val(value_type, v: ValueProto) -> Tuple[bytes, int]:
    if value_type == "string_val":
        return v.string_val.encode("utf8"), ValueType.STRING
    elif value_type == "bytes_val":
        return v.bytes_val, ValueType.BYTES
    elif value_type == "int32_val":
        return struct.pack("<i", v.int32_val), ValueType.INT32
    elif value_type == "int64_val":
        return struct.pack("<q", v.int64_val), ValueType.INT64
    else:
        raise ValueError(f"Value type not supported for Firestore: {v}")
  • Version:
  • feast: 0.18.1
  • Platform:
  • MacOS 12.0.1
  • Subsystem:
  • python: 3.8.1

Possible Solution

changing struct.pack("<l", v.int64_val), ValueType.INT64 to struct.pack("<q", v.int64_val), ValueType.INT64 fixes this.

@woop
Copy link
Member

woop commented Feb 28, 2022

Hi @varsenyantitan, thanks for raising this issue. We're aware of it already, but let's keep this issue open until it's resolved.

I think we'll likely need to introduce a new materialization version and support both the old and new modes simultaneously.

@varsenyantitan
Copy link
Author

Hi @woop. Thank you for a fast response.
I am just wandering about the drawbacks of the struct.pack("<q", v.int64_val), ValueType.INT64.
Also, does the same issue present in Tecton?

@woop
Copy link
Member

woop commented Feb 28, 2022

There are no drawbacks. We simply need to fix this bug. This issue is not present in Tecton.

@roelschr
Copy link

Hi @woop, I'm unable to see why we'll need a new materialization version, do you have examples?

I have made this change on a forked repo (and I'm happy to open a PR). Added a new test, while all other tests seem to pass normally. Tested in our current use case for Feast, with both local sqlite and elasticache redis online stores.

In my mind, this change should not break current materialization use cases.

@achals
Copy link
Member

achals commented Apr 25, 2022

There's more context on #1908

Basically, the backwards compatibility issue comes in because entity keys with join keys of type int64 values that have been already written will no longer be readable - since the serialized bytes will be different with this patch, from before.

So, when users upgrade, they will see missing values for entities that they previously would have been able to read values for. We could just require users to rematerialize, but it seems problematic to do so and I would prefer if we can come up with a backwards compatible way of doing it.

I haven't given enough thought to it, but we could add a new field in the FeatureView proto called entity_key_encoding_version or something and use that to represent the current (buggy) way of serializing, or the new (correct) way of serializing. And then we could continue to read values for already-materialized feature views, but newly created feature views could use the newer value.

Does that make sense?

@roelschr
Copy link

That makes total sense, thank you for this! I'll move my PR back to draft and try to work on this backward compatibility issue.

@achals
Copy link
Member

achals commented Apr 26, 2022

Thanks @roelschr! Please let me know if we can help in this change in any way.

@mjurkus
Copy link

mjurkus commented Apr 29, 2022

We ran into the same issue and we had patched on our fork too in the same way. But that's not the only issue - the java server doesn't work with our patch because of this EntityKeySerializerV2.java#L82-L85

Is it really necessary to add backward compatibility for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug priority/p0 Highest priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants