Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove support for nullable components #6819

Open
emilk opened this issue Jul 8, 2024 · 4 comments
Open

Remove support for nullable components #6819

emilk opened this issue Jul 8, 2024 · 4 comments
Labels
🏹 arrow concerning arrow 🚜 refactor Change the code, not the functionality 🎄 tracking issue issue that tracks a bunch of subissues

Comments

@emilk
Copy link
Member

emilk commented Jul 8, 2024

At the moment we don’t ever make use of nullable components.
For instance: we don’t allow setting the radius of only some points in a point-cloud to null.

Still, we have full support for it in our Loggable trait and in our codegen.
We have a to_arrow_opt taking an iterator of optional values, and a from_arrow_opt that returns a Vec<Option<Self>>, yet these are never called.

This unused support has several downsides:

  • Extra complexity in the codegen
  • A lot of extra generated code (we generate both from_arrow_opt and from_arrow)
  • Runtime overhead (we codegen to_arrow_opt, but only ever call to_arrow which wraps each element in Some(…))

Removing support for it would definitely make the transition away from arrow2 easier. And we can always add it back in the future.

It seems silly to support a feature we may want in the future (YAGNI / ease into complexity).

@emilk emilk added 🏹 arrow concerning arrow 🛑 controversial 🚜 refactor Change the code, not the functionality labels Jul 8, 2024
@jleibs
Copy link
Member

jleibs commented Jul 8, 2024

Just to clarify a bit further, this would make it so that the inner values of arrow batches are never null. However the outer list-arrays itself would still have nullable arrays if an optional value wasn't provided in a row.

Maybe this should be articulated as: "Remove support for individually nullable items within a batch".

Logically I'm totally onboard, though there may be lurking consequences when it comes to how some range-operations are processed. This is probably fresher in @teh-cmc 's mind.

It could be worth thinking some more about both this and #6602 together in terms of simplifying the things we support.

Also very peripherally related: I remember nullabillity (along with unions) were one of the annoying things that we ran up against when we were previously using polars. A simplification that made it easier to guarantee Rerun data frame results can be loaded into polars without issues would also be a huge win.

@emilk
Copy link
Member Author

emilk commented Jul 9, 2024

Just to clarify a bit further, this would make it so that the inner values of arrow batches are never null. However the outer list-arrays itself would still have nullable arrays if an optional value wasn't provided in a row.

Good clarification. Just to be extra clear:

  • We currently support: Option<Vec<Option<C>>>
  • The proposal is to only support Option<Vec<C>>

@teh-cmc
Copy link
Member

teh-cmc commented Jul 9, 2024

There are two distinct issues here: one is the presence of field-level nullability, another is dealing with that nullability efficiently.


We do make use of instance-level nullability, just not at the top layer (because of limitations in the flatbuffers IDL which have thankfully prevented us to do so).

If you look at any datatype that has optional fields, like e.g. the Material datatype:

/// Material properties of a mesh, e.g. its color multiplier.
struct Material (
  "attr.rust.derive": "Copy, PartialEq, Eq, Hash"
) {
  /// Optional color multiplier.
  albedo_factor: rerun.datatypes.Rgba32 (nullable, order: 100);
}

this will need to go through the *_opt paths at runtime since albedo_factor will end up being an array of optional Rgba32s (Vec<Option<Rgba32>>).
(Well in that specific case, it will take yet another nullability-aware specialized path because Rgba32 is a native type, but the point stands: there is nullability to account for.)

I would love to get rid of nullability entirely, but I don't think that's realistic at the moment: we need the extra expressiveness in many instances, and relying on default values would be a massive waste of space and compute.

There might be a path to ultimately get there by making careful use of component conversions and redesigning our schemas to rely on more, smaller components instead of nullable fields, but I'm not sure how practical that would be in practice.

E.g. TranslationAndMat3x3 is an obvious candidate for component conversions instead of nullable:

table TranslationAndMat3x3 (
  "attr.rust.derive": "Copy, PartialEq"
) {
  translation: rerun.datatypes.Vec3D (nullable, order: 100);
  mat3x3: Mat3x3 (nullable, order: 200);
  from_parent: bool = false (order: 300);
}

E.g. TensorDimension is one of these examples where you'd need to split name in its own TensorDimensionName type instead of using a nullable field:

table TensorDimension (
  "attr.rust.derive_only": "Clone, Default, Eq, PartialEq"
) {
  size: ulong (order: 100);
  name: string (order: 200, nullable);
}

Now when it comes to dealing with this efficiently and reducing implementation complexity, I maintain that the problem is that we still have deserialization code that checks instead of assuming:

@emilk emilk added the blocked can't make progress right now label Jul 9, 2024
@emilk
Copy link
Member Author

emilk commented Jul 9, 2024

After some discussion, we all agree that we want to move away from inner nullability of components and datatypes.

The road to there requires changes in a few places:

@emilk emilk added 🎄 tracking issue issue that tracks a bunch of subissues and removed blocked can't make progress right now 🛑 controversial labels Jul 9, 2024
@emilk emilk changed the title Proposal: remove support for nullable components Remove support for nullable components Jul 9, 2024
@emilk emilk assigned emilk and Wumpf Jul 9, 2024
@emilk emilk unassigned emilk and Wumpf Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 🚜 refactor Change the code, not the functionality 🎄 tracking issue issue that tracks a bunch of subissues
Projects
None yet
Development

No branches or pull requests

4 participants