Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally disable data validation for arrow-ipc #6933

Closed
totoroyyb opened this issue Jan 3, 2025 · 2 comments · May be fixed by #6938
Closed

Optionally disable data validation for arrow-ipc #6933

totoroyyb opened this issue Jan 3, 2025 · 2 comments · May be fixed by #6938
Labels
question Further information is requested

Comments

@totoroyyb
Copy link

Which part is this question about
Regarding the library API usage.

Describe your question
I am using high-level API (FileReader and FileDecoder) to read IPC files via mmap. I have noticed that validate_data() in the Array building process (here) adds significant overhead.

I am targeting an ultra-low-latency scenario. With validate_data I got 290ms for reading a 2.2GB IPC file (via mmap), and 3.8ms without validate_data, which I tested locally by commenting that out. 3.8ms latency is pretty much identical to c++ arrow implementation I tested, and I suspect c++ codebase didn't do this sanity check (not entirely sure).

The functions for the "unchecked" building are here in the codebase, but they are not accessible from high-level API, where I can easily disable them without creating my own array and everything on top of it.

I wonder if there is any better way to achieve that?

Additional context
Low latency is critical in my case. Thus, I am trying to avoid any additional overhead (C++ codebase as the baseline, maybe?)

@totoroyyb totoroyyb added the question Further information is requested label Jan 3, 2025
@tustvold
Copy link
Contributor

tustvold commented Jan 3, 2025

I think this is a duplicate of #3287

@alamb alamb changed the title Optionally disable data validation Optionally disable data validation for arrow-ipc Jan 10, 2025
@alamb
Copy link
Contributor

alamb commented Jan 10, 2025

I moved the relevant content to #3287 (comment) and so let's close this ticket and continue the conversation there

@alamb alamb closed this as not planned Won't fix, can't repro, duplicate, stale Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
3 participants