Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Invalid memory read in cudf::io::parquet::gpu::gpuDecodePageData #13571

Closed
davidwendt opened this issue Jun 14, 2023 · 0 comments · Fixed by #13586
Closed

[BUG] Invalid memory read in cudf::io::parquet::gpu::gpuDecodePageData #13571

davidwendt opened this issue Jun 14, 2023 · 0 comments · Fixed by #13586
Assignees
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@davidwendt
Copy link
Contributor

The nightly memcheck tests found an invalid global device memory read in cudf::io::parquet::gpu::gpuDecodePageData

========= Invalid __global__ read of size 4 bytes
=========     at 0x4dd0 in void cudf::io::parquet::gpu::<unnamed>::gpuDecodePageData<(int)256, unsigned char>(cudf::io::parquet::gpu::PageInfo *, cudf::device_span<const cudf::io::parquet::gpu::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long)
=========     by thread (81,0,0) in block (0,0,0)
=========     Address 0x7f0511e00ad8 is out of bounds
=========     and is inside the nearest allocation at 0x7f0511e00a00 of size 218 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x304e32]
=========                in /usr/lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x1488c]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudart.so.11.0
=========     Host Frame:cudaLaunchKernel [0x6c318]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudart.so.11.0
=========     Host Frame:cudf::io::parquet::gpu::DecodePageData(cudf::detail::hostdevice_vector<cudf::io::parquet::gpu::PageInfo>&, cudf::detail::hostdevice_vector<cudf::io::parquet::gpu::ColumnChunkDesc> const&, unsigned long, unsigned long, int, rmm::cuda_stream_view) [0x155aaef]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::reader::impl::decode_page_data(unsigned long, unsigned long) [0x1564ba9]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::reader::impl::read_chunk_internal(bool) [0x1566b2d]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::reader::impl::read(long, std::optional<int> const&, bool, cudf::host_span<std::vector<int, std::allocator<int> > const, 18446744073709551615ul>) [0x15672dc]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudf.so
=========     Host Frame:cudf::io::detail::parquet::reader::read(cudf::io::parquet_reader_options const&) [0x155d03b]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudf.so
=========     Host Frame:cudf::io::read_parquet(cudf::io::parquet_reader_options const&, rmm::mr::device_memory_resource*) [0x147505e]
=========                in /opt/conda/envs/test/bin/gtests/libcudf/../../../lib/libcudf.so

The error can be recreated using the PARQUET_TEST as follows:

compute-sanitizer --tool memcheck gtests/PARQUET_TEST --gtest_filter=ParquetWriterNumericTypeTest/2.SingleColumnWithNulls --rmm_mode=cuda
@davidwendt davidwendt added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Jun 14, 2023
@ttnghia ttnghia linked a pull request Jun 16, 2023 that will close this issue
@ttnghia ttnghia self-assigned this Jun 16, 2023
rapids-bot bot pushed a commit that referenced this issue Jun 23, 2023
After `rmm` removed memory padding (rapidsai/rmm#1278), some of cuIO code started to have out-of-bound access issues because many of its compute kernels shift the input pointers back and forth to satisfy some alignment.

This adds back padding to various memory buffers so the buffers now will have some extra space enough for such shifting.

With this fix, the reported issues (#13567,  #13571, #13570) no longer show up.

Closes:
 * #13567
 * #13571
 * #13570

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #13586
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
3 participants