-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-40361: [C++] Make flatbuffers serialization more deterministic #40392
GH-40361: [C++] Make flatbuffers serialization more deterministic #40392
Conversation
|
|
|
Approved! |
|
As discussed on the GH issue, it would be useful to add a test using output from your machine as a reference. |
Hi @noamross, would you be interested in adding the test described above to this PR? If not, I could take a shot at it. |
@amoeba Sorry to open this and abandon it, it just became a very busy month and my C++ is pretty limited. I'd be happy if you would! |
No problem at all @noamross, it's awesome you filed a PR for the issue and it's very much appreciated. I'll work on adding and testing the test. |
Thanks! I expect the files attached to #40202 (comment) could serve for a test of whether a generated file is identical to a reference from a different system, but I note that they would change with the R version being used. |
I put up a draft PR with a test to exercise this and hopefully watch it fail on CI at #41018. It fails on my amd64 linux machine so I think it's close to what we need here. I'll watch CI on that PR and report back. |
@github-actions crossbow submit -g cpp |
|
@github-actions crossbow submit -g cpp |
Revision: 9117945 Submitted crossbow builds: ursacomputing/crossbow @ actions-e9bd83c5da |
Things look good locally (testing on macOS-aarch64-clang and Linux-amd64-gcc) but I see an issue with my includes in a few of the above jobs. I'll fix those soon. |
@github-actions crossbow submit -g cpp |
Revision: c3af79d Submitted crossbow builds: ursacomputing/crossbow @ actions-08f54365f0 |
Co-authored-by: Sutou Kouhei <[email protected]>
I'm pretty sure this is the right thing to do now that apache@4c8eff2 is merged.
11958e1
to
4bd0339
Compare
Thanks @pitrou for the review. I rebased from apache/main, accepted your three changes as-is, and added one extra change I caught while dealing with conflicts on rebase, 4bd0339. In 4c8eff2 @kou tweaked the CMakeList and I think this PR should make the same change as was made there. PS: After pushing I found the ipc/CMakeLists.txt needed formatting, so that was added here too. |
Parquet tests were failing all over on CI and due to needing to bump submodules so I pushed 4ebe53b so we could see CI. |
@github-actions crossbow submit -g cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there are still spurious parquet-testing
updates in this PR?
Revision: 4ebe53b Submitted crossbow builds: ursacomputing/crossbow @ actions-d1985bc136 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, can you switch the edit of parquet-testing back?
Co-authored-by: Antoine Pitrou <[email protected]>
This reverts commit 4ebe53b.
Thanks all. The submodule change to help CI pass has been reverted in this PR. I'm merging this now. |
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 74f7578. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 13 possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change #40392 introduced a mistake in the the associated CMakeLists.txt, noticed by @ kou in #40392 (comment). ### What changes are included in this PR? A reversion of that change. ### Are these changes tested? No. ### Are there any user-facing changes? No. Authored-by: Bryce Mecum <[email protected]> Signed-off-by: Bryce Mecum <[email protected]>
…ic (apache#40392) ### Rationale for this change This is the start of a PR to address apache#40361, and in turn apache#40202, to make metadata in parquet files written by arrow to be identical irrespective of the platform configuration. This is limited, as platform-specific differences in R or Python versions or compression libraries could still result in differences. ### What changes are included in this PR? So far I have only made a partial change to part of the metadata serialization. I need to look at whether other calls to flatbuffers require similar treatment. ### Are these changes tested? Not yet, this is a draft PR ### Are there any user-facing changes? No * GitHub Issue: apache#40361 Lead-authored-by: Noam Ross <[email protected]> Co-authored-by: Bryce Mecum <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Bryce Mecum <[email protected]>
### Rationale for this change apache#40392 introduced a mistake in the the associated CMakeLists.txt, noticed by @ kou in apache#40392 (comment). ### What changes are included in this PR? A reversion of that change. ### Are these changes tested? No. ### Are there any user-facing changes? No. Authored-by: Bryce Mecum <[email protected]> Signed-off-by: Bryce Mecum <[email protected]>
Rationale for this change
This is the start of a PR to address #40361, and in turn #40202, to make metadata in parquet files written by arrow to be identical irrespective of the platform configuration. This is limited, as platform-specific differences in R or Python versions or compression libraries could still result in differences.
What changes are included in this PR?
So far I have only made a partial change to part of the metadata serialization. I need to look at whether other calls to flatbuffers require similar treatment.
Are these changes tested?
Not yet, this is a draft PR
Are there any user-facing changes?
No