You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which part is this question about
Regarding the library API usage.
Describe your question
I am using high-level API (FileReader and FileDecoder) to read IPC files via mmap. I have noticed that validate_data() in the Array building process (here) adds significant overhead.
I am targeting an ultra-low-latency scenario. With validate_data I got 290ms for reading a 2.2GB IPC file (via mmap), and 3.8ms without validate_data, which I tested locally by commenting that out. 3.8ms latency is pretty much identical to c++ arrow implementation I tested, and I suspect c++ codebase didn't do this sanity check (not entirely sure).
The functions for the "unchecked" building are here in the codebase, but they are not accessible from high-level API, where I can easily disable them without creating my own array and everything on top of it.
I wonder if there is any better way to achieve that?
Additional context
Low latency is critical in my case. Thus, I am trying to avoid any additional overhead (C++ codebase as the baseline, maybe?)
The text was updated successfully, but these errors were encountered:
Which part is this question about
Regarding the library API usage.
Describe your question
I am using high-level API (
FileReader
andFileDecoder
) to read IPC files via mmap. I have noticed thatvalidate_data()
in the Array building process (here) adds significant overhead.I am targeting an ultra-low-latency scenario. With
validate_data
I got 290ms for reading a 2.2GB IPC file (via mmap), and 3.8ms withoutvalidate_data
, which I tested locally by commenting that out. 3.8ms latency is pretty much identical to c++ arrow implementation I tested, and I suspect c++ codebase didn't do this sanity check (not entirely sure).The functions for the "unchecked" building are here in the codebase, but they are not accessible from high-level API, where I can easily disable them without creating my own array and everything on top of it.
I wonder if there is any better way to achieve that?
Additional context
Low latency is critical in my case. Thus, I am trying to avoid any additional overhead (C++ codebase as the baseline, maybe?)
The text was updated successfully, but these errors were encountered: