Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btl/vader: handle unexpected short read/write in process_vm_{read,wri… #4876

Merged
merged 1 commit into from
Mar 26, 2018

Conversation

ggouaillardet
Copy link
Contributor

…te}v

Important note :

According to the man page
"On success, process_vm_readv() returns the number of bytes read and
process_vm_writev() returns the number of bytes written. This return
value may be less than the total number of requested bytes, if a
partial read/write occurred. (Partial transfers apply at the
granularity of iovec elements. These system calls won't perform a
partial transfer that splits a single iovec element.)"

So since we use a single iovec element, the returned size should either
be 0 or size, and the do loop should not be needed here.
We tried on various Linux kernels with size > 2 GB, and surprisingly,
the returned value is always 0x7ffff000 (fwiw, it happens to be the size
of the larger number of pages that fits a signed 32 bits integer).
We do not know whether this is a bug from the kernel, the libc or even
the man page, but for the time being, we do as is process_vm_readv() could
return any value.

Thanks Heiko Bauke for the bug report.

Refs. #4829

Signed-off-by: Gilles Gouaillardet [email protected]

(cherry picked from commit 9fedf28)

…te}v

Important note :

According to the man page
"On success, process_vm_readv() returns the number of bytes read and
process_vm_writev() returns the number of bytes written.  This return
value may be less than the total number of requested bytes, if a
partial read/write occurred.  (Partial transfers apply at the
granularity of iovec elements.  These system calls won't perform a
partial transfer that splits a single iovec element.)"

So since we use a single iovec element, the returned size should either
be 0 or size, and the do loop should not be needed here.
We tried on various Linux kernels with size > 2 GB, and surprisingly,
the returned value is always 0x7ffff000 (fwiw, it happens to be the size
of the larger number of pages that fits a signed 32 bits integer).
We do not know whether this is a bug from the kernel, the libc or even
the man page, but for the time being, we do as is process_vm_readv() could
return any value.

Thanks Heiko Bauke for the bug report.

Refs. open-mpi#4829

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@9fedf28)
@hppritcha hppritcha modified the milestones: v2.1.3, v2.1.4 Mar 15, 2018
@hppritcha hppritcha merged commit 7394ddd into open-mpi:v2.x Mar 26, 2018
@hppritcha hppritcha added the NEWS label Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants