You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As noted in #14362 (comment), dest-s3's avro and parquet output formats currently use int32. This causes problems for users with int64 values. We should have dest-s3 produce int64 values for integer inputs.
The reason we use int32 right now is because we want to support union types, which behave kind of weirdly in edge cases. Consider this JSON schema:
The problem is that Avro will detect that both types are actually long under the hood, so this isn't actually a valid union type. Furthermore, the json<>avro converter behaves differently based on the logic type:
{"created_at": 1634982000} can be parsed as "long", but not as {"type": "long", "logicalType": "timestamp-micros"}
{"created_at": "1634982000"} (note the quotes around 1634982000) can be parsed as {"type": "long", "logicalType": "timestamp-micros"}, but not as "long".
Options:
Decide that the union-typing is too niche of a usecase (maybe true) and stop supporting it
Otherwise, we need to:
Decide what the correct output type is (either long or timestamp)
Patch the converter library to parse values correctly, given that decision
The text was updated successfully, but these errors were encountered:
Hello ! wanted to create an issue towards this but found this ticket. This causes issues on destinations like databricks as well (because based on S3 parquet), and it can make the data less safe in cases of int64 ids, since there can be a clash between 2 ids then (zendesk support ids don't fit into 32 bits).
Nice to know that the issue is known and taken into account anyways, GL with that
As noted in #14362 (comment), dest-s3's avro and parquet output formats currently use int32. This causes problems for users with int64 values. We should have dest-s3 produce int64 values for
integer
inputs.The reason we use int32 right now is because we want to support union types, which behave kind of weirdly in edge cases. Consider this JSON schema:
Ideally, this would resolve to an Avro union:
The problem is that Avro will detect that both types are actually
long
under the hood, so this isn't actually a valid union type. Furthermore, the json<>avro converter behaves differently based on the logic type:{"created_at": 1634982000}
can be parsed as"long"
, but not as{"type": "long", "logicalType": "timestamp-micros"}
{"created_at": "1634982000"}
(note the quotes around1634982000
) can be parsed as{"type": "long", "logicalType": "timestamp-micros"}
, but not as"long"
.Options:
The text was updated successfully, but these errors were encountered: