Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix read! for types with size not a power of 2 #41711

Closed
wants to merge 2 commits into from
Closed

Conversation

ararslan
Copy link
Member

Currently, the fast path in read!(::IO, ::AbstractArray{T}) checks for isbitstype(T) and whether the array is some nice type, then calls unsafe_read with sizeof for the number of bytes. This works great in the majority of cases, unless sizeof(T) is not a power of 2. The aligned size is what's used in sizeof, which overestimates the number of bytes required to read from the stream. To fix this, we can adjust the condition of the fast path to also determine whether sizeof matches the aligned size.

Note that I've marked this for backport to 1.6 and 1.7.

Currently, the fast path in `read!(::IO, ::AbstractArray{T})` checks for
`isbitstype(T)` and whether the array is some nice type, then calls
`unsafe_read` with `sizeof` for the number of bytes. This works great in
the majority of cases, unless `sizeof(T)` is not a power of 2. The
aligned size is what's used in `sizeof`, which overestimates the number
of bytes required to read from the stream. To fix this, we can adjust
the condition of the fast path to also determine whether `sizeof`
matches the aligned size.
@ararslan ararslan added io Involving the I/O subsystem: libuv, read, write, etc. bugfix This change fixes an existing bug backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Jul 26, 2021
@ararslan ararslan requested a review from JeffBezanson July 26, 2021 23:42
@ararslan
Copy link
Member Author

Of course this had to break tests I didn't run locally... The error is

Error in testset subarray:
Error During Test at /usr/home/julia/worker-tabularasa/tester_freebsd64/build/share/julia/test/subarray.jl:610
  Got exception outside of a @test
  The IO stream does not support reading objects of type Main.Test63Main_subarray.UInt48.
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:33
    [2] read(s::IOBuffer, T::Type)
      @ Base ./io.jl:971
    [3] read!
      @ ./io.jl:746 [inlined]
    [4] macro expansion
      @ /usr/home/julia/worker-tabularasa/tester_freebsd64/build/share/julia/test/subarray.jl:616 [inlined]

It must be because read isn't explicitly extended for the UInt48 type defined in the tests and there's no appropriate fallback. I can add that, but it makes me a bit worried that there could be code relying on being able to read! without read.

Since `sizeof(UInt48)` is not a power of 2, the change in the previous
commit makes `read!(::IO, ::AbstractVector{UInt48})` hit the
non-fast-path branch that does successive calls to `read` rather than
using `unsafe_read`. In order for that to work, we need a `read` method
defined.
@ararslan
Copy link
Member Author

At least one of the following is true:

  • This fix is incorrect
  • My understanding of the problem is incorrect

It may well be both but it's certainly not neither.

@JeffBezanson
Copy link
Member

I think in this case we assume the disk and memory representations are the same, i.e. that the alignment padding also exists on disk. So this is not necessarily a bug, unless there is some other code that makes a different assumption.

@@ -588,6 +588,7 @@ end
primitive type UInt48 48 end
UInt48(x::UInt64) = Core.Intrinsics.trunc_int(UInt48, x)
UInt48(x::UInt32) = Core.Intrinsics.zext_int(UInt48, x)
Base.read(io::IO, ::Type{UInt48}) = only(reinterpret(UInt48, read(io, sizeof(UInt48))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be what's causing the failure --- it reads too few bytes, failing to skip the alignment padding.

@ararslan
Copy link
Member Author

ararslan commented Jul 27, 2021

So this is not necessarily a bug, unless there is some other code that makes a different assumption.

I ran into this when trying to read data encoded on disk as 24-bit integers as part of the BDF format. If you have an Int24 e.g. from BitIntegers.jl then read!ing into a Vector{Int24} fails (hits EOF on the stream) because it reads more than the values need.

@ararslan
Copy link
Member Author

I think in this case we assume the disk and memory representations are the same, i.e. that the alignment padding also exists on disk.

I think this is the critical context I was missing here. I guess that means that the case I mentioned above is more of a niche case that would need to be handled explicitly/differently rather than relying on the default behavior of read!. Is that true?

@ararslan ararslan removed backport 1.6 Change should be backported to release-1.6 backport 1.7 bugfix This change fixes an existing bug labels Aug 2, 2021
@ararslan
Copy link
Member Author

ararslan commented Aug 2, 2021

Is that true?

I will assume that is the case and close

@ararslan ararslan closed this Aug 2, 2021
@ararslan ararslan deleted the aa/read-non-pow2 branch August 2, 2021 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants