-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use released version of arrow #80
Comments
It helps, but doesn't solve the underlying issue. If you depend on another crate that in turn depends on arrow and isn't exposed by DataFusion, e.g. arrow-flight, or you want to set different features from what DataFusion sets, you end up having to replicate the exact version pins from DataFusion into all other crates |
I think that the general problem is that we pin arrow (and many others) in datafusion; datafusion is a library and it should thus avoid pinning dependencies. Instead, it should bracket them, via e.g. As it stands, consumers must use the exact same version of arrow that datafusion uses or cargo will pick two different arrow versions. This happens because Cargo cannot guarantee that the two different arrow versions (what datafusion demands and what the consumer wants) are ABI compatible. Consumers can't pass structs from a version of arrow (that they use) to another version of arrow (that datafusion uses). Note that in this context a different feature set corresponds to a different version, as cargo has no way of knowing whether a feature will retain ABI compatibility. So, I think the ask here is:
Is this the idea, @tustvold ? I am not sure whether we could get away with the multiple paths approach, e.g.
|
Yes, this would be my ideal for the reasons you articulated.
This would definitely be something I'd be supportive of, but possibly somewhat tangential to making DataFusion as a library easier to consume. Cargo has a good story for overriding dependencies within a workspace, including indirect dependencies, provided those versions aren't pinned within the libraries. Therefore if DataFusion were to move to a released version of arrow it wouldn't preclude users from opting-in to newer, potentially unreleased versions of arrow. However, DataFusion itself would not be able to opt-in to unreleased arrow functionality, and so if there are frequent DataFusion changes coupled with arrow changes, then yes a more frequent arrow release cycle would possibly be a pre-condition of moving to using a released version of arrow.
I've not come across this approach, I'd worry that it might be vulnerable to rust-lang/cargo#5478 which would prevent users from opting into newer versions of arrow within their workspaces, which imo would be unfortunate |
I agree. My point is that the reason we use pinned hashes of arrow is so that we do not have to wait for a new release. So, I think that to stop pinning in DataFusion, we need to release arrow more frequently. But I agree that from the consumers' point of view, it is not needed, as you can just point to a hash in |
@alamb this can be closed now |
Currently the Datafusion crates pin a version of arrow by git revision. Unfortunately this makes it complicated to use Datafusion with other libraries that also depend on arrow, making it is necessary to update all the version pins in lock-step across potentially multiple repositories. #39 helps somewhat with this, but doesn't help with crates that Datafusion isn't dependent on that themselves depend on arrow, or if you want to configure different feature flags from what Datafusion sets.
Unfortunately there is an outstanding bug /feature omission with cargo that makes patch not work for overriding git version pins, and the workarounds on the ticket no longer seem to work.
If Datafusion instead depended instead on a released version of arrow, Cargo's semver compatibility rules would avoid a lot of this pain for most users, and would allow users to use the patch syntax to override the arrow version in use for all their dependencies within their workspace, if they want to use an unreleased arrow version.
I'm not sure how feasible any of this is, but thought I would raise a ticket to maybe kickstart a discussion on this
The text was updated successfully, but these errors were encountered: