-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove support for nullable components #6819
Comments
Just to clarify a bit further, this would make it so that the inner values of arrow batches are never null. However the outer list-arrays itself would still have nullable arrays if an optional value wasn't provided in a row. Maybe this should be articulated as: "Remove support for individually nullable items within a batch". Logically I'm totally onboard, though there may be lurking consequences when it comes to how some range-operations are processed. This is probably fresher in @teh-cmc 's mind. It could be worth thinking some more about both this and #6602 together in terms of simplifying the things we support. Also very peripherally related: I remember nullabillity (along with unions) were one of the annoying things that we ran up against when we were previously using polars. A simplification that made it easier to guarantee Rerun data frame results can be loaded into polars without issues would also be a huge win. |
Good clarification. Just to be extra clear:
|
There are two distinct issues here: one is the presence of field-level nullability, another is dealing with that nullability efficiently. We do make use of instance-level nullability, just not at the top layer (because of limitations in the flatbuffers IDL which have thankfully prevented us to do so). If you look at any datatype that has optional fields, like e.g. the /// Material properties of a mesh, e.g. its color multiplier.
struct Material (
"attr.rust.derive": "Copy, PartialEq, Eq, Hash"
) {
/// Optional color multiplier.
albedo_factor: rerun.datatypes.Rgba32 (nullable, order: 100);
} this will need to go through the I would love to get rid of nullability entirely, but I don't think that's realistic at the moment: we need the extra expressiveness in many instances, and relying on default values would be a massive waste of space and compute. There might be a path to ultimately get there by making careful use of component conversions and redesigning our schemas to rely on more, smaller components instead of nullable fields, but I'm not sure how practical that would be in practice. E.g. table TranslationAndMat3x3 (
"attr.rust.derive": "Copy, PartialEq"
) {
translation: rerun.datatypes.Vec3D (nullable, order: 100);
mat3x3: Mat3x3 (nullable, order: 200);
from_parent: bool = false (order: 300);
} E.g. table TensorDimension (
"attr.rust.derive_only": "Clone, Default, Eq, PartialEq"
) {
size: ulong (order: 100);
name: string (order: 200, nullable);
} Now when it comes to dealing with this efficiently and reducing implementation complexity, I maintain that the problem is that we still have deserialization code that checks instead of assuming: |
After some discussion, we all agree that we want to move away from inner nullability of components and datatypes. The road to there requires changes in a few places: |
At the moment we don’t ever make use of nullable components.
For instance: we don’t allow setting the radius of only some points in a point-cloud to
null
.Still, we have full support for it in our
Loggable
trait and in our codegen.We have a
to_arrow_opt
taking an iterator of optional values, and afrom_arrow_opt
that returns aVec<Option<Self>>
, yet these are never called.This unused support has several downsides:
from_arrow_opt
andfrom_arrow
)to_arrow_opt
, but only ever callto_arrow
which wraps each element inSome(…)
)Removing support for it would definitely make the transition away from
arrow2
easier. And we can always add it back in the future.It seems silly to support a feature we may want in the future (YAGNI / ease into complexity).
The text was updated successfully, but these errors were encountered: