-
-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relocation variants #6364
Relocation variants #6364
Conversation
2f6aedd
to
087b01b
Compare
libs/core/type_support/include/hpx/type_support/uninitialized_relocate_n_primitive.hpp
Outdated
Show resolved
Hide resolved
087b01b
to
3916299
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. I didn't look closely from the point of my last comment onward: reviewer silence doesn't necessarily mean endorsement. :)
libs/core/algorithms/include/hpx/parallel/algorithms/uninitialized_relocate.hpp
Outdated
Show resolved
Hide resolved
libs/core/algorithms/include/hpx/parallel/algorithms/uninitialized_relocate.hpp
Show resolved
Hide resolved
auto dest_first = std::prev(dest_last, count); | ||
|
||
return parallel_uninitialized_relocate_n( | ||
HPX_FORWARD(ExPolicy, policy), first, count, dest_first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether this is OK depends on the preconditions of your parallel
algorithms. I expect that this is not OK. The important thing IMO is that we want this to work:
Widget a[10] = {1,2,3,4,5,6,7,8,9,destroyed};
std::uninitialized_relocate_backward(a, a+9, a+10);
assert(a == {destroyed,1,2,3,4,5,6,7,8,9});
(modulo syntax errors and language-lawyering over the lifetimes of those Widget objects). If you copy them in the forward direction, the output will be something like
assert(a == {destroyed,1,1,1,1,1,1,1,1,1});
So the idea of having uninitialized_relocate_backward
dispatch to uninitialized_relocate_n
smells wrong to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct in pointing this out, however I should look into how (if at all) parallelizable this would be, while retaining the correct ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naïvely, it's not parallelizable at all unless you can prove (by pointer comparisons) that the two ranges don't overlap. If they don't overlap, then this implementation is fine. It occurs to me that HPX already has a std::copy{,_backward}
implementation; you should look at what it does, and copy that strategy.
(It might be that HPX's std::copy{,_backward}
also completely ignore the overlapping issue. If so, then I think it would be perfectly fine for you to copy that ignorance here. uninitialized_relocate_backward
is not intended to be any more difficult to implement than copy_backward
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #6364 (comment):
In standard C++20 we have
- the non-parallel std::copy_backward, which forbids overlap
- the ExecutionPolicy overload of std::copy_backward, which also forbids overlap
and thus {parallel_,}vector::erase fundamentally isn't allowed to use std::copy_backward.
D1144R10 currently proposes
- a non-parallel std::uninitialized_relocate_backward which permits overlap
- an ExecutionPolicy overload of std::uninitialized_relocate_backward, which also permits overlap
And noting that:
uninitialized_relocate_backward is not intended to be any more difficult to implement than copy_backward.
Are you proposing permitting overlapping ranges with parallel copy
, or breaking the symmetry between copy
and relocation
?
I can experiment with implementing and benchmarking a parallel relocation algorithm for overlapping ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm proposing breaking the symmetry between copy
and uninitialized_relocate
. (But I admit that's bad; and I also admit I still don't know why copy
forbids overlap in the first place.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, for now I will structure the code in a way to accept overlaps from the expected direction in each of the forward/backward algorithms.
Do you think they should permit overlaps from both sides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is:
uninitialized_relocate{,_n}
should permit "shift left" overlap, as needed byvector::erase
.uninitialized_relocate_backward
should permit "shift right" overlap, as needed byvector::insert
.
My proposed wording reflects this since P1144R6: the behavior of uninitialized_relocate
is "as if" by a plain old for-loop relocating each element in sequence from first to last, left to right.
libs/core/algorithms/include/hpx/parallel/algorithms/uninitialized_relocate.hpp
Outdated
Show resolved
Hide resolved
libs/core/algorithms/include/hpx/parallel/algorithms/uninitialized_relocate.hpp
Outdated
Show resolved
Hide resolved
libs/core/algorithms/include/hpx/parallel/algorithms/uninitialized_relocate.hpp
Outdated
Show resolved
Hide resolved
// if count is representing a negative value, we do nothing | ||
if (hpx::parallel::detail::is_negative(count)) | ||
{ | ||
return parallel::util::detail::algorithm_result<ExPolicy, | ||
BiIter2>::get(HPX_MOVE(dest_last)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be "library UB" according to P1144. You're welcome to treat it as a no-op (documented or undocumented), but it might be more appropriate to assert-fail or something, I don't know.
libs/core/algorithms/tests/unit/algorithms/uninitialized_relocate_backward.cpp
Outdated
Show resolved
Hide resolved
libs/core/algorithms/tests/unit/algorithms/uninitialized_relocate_backward.cpp
Outdated
Show resolved
Hide resolved
libs/core/algorithms/tests/unit/algorithms/uninitialized_relocate_backward.cpp
Outdated
Show resolved
Hide resolved
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
0d18168
to
bf8425d
Compare
@isidorostsa could you please rebase this and resolve the conflicts while you do? |
bf8425d
to
74155d9
Compare
@hkaiser, I rebased and resolved the conflicts. Before we finalize the merge, there are some design choices I'd like to discuss:
void foo() noexcept(!(mode == throwing)){
if constexpr(mode == throwing) { throw; }
} |
Yes, sounds good.
Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should strat cheating here ;-)
I can take care of those. |
@hkaiser In case we want to preserve parallelization while not overwriting objects I have two techniques in mind: A. Execute the overlapping portion of the relocation sequentially and do what is left in parallel. However, I suspect that in most common use cases, the ranges will largely overlap, rendering the operation mostly sequential. So it may be more practical to opt for a purely sequential implementation. Is there an alternative solution that I'm overlooking? |
retest lsu |
74155d9
to
1b08b70
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferences |
@isidorostsa what's the status for this PR? |
I think last time I pushed the hpx tests were broken, which is why we have not merged yet But locally the tests related to this are passing. |
The broken tests are unrelated. So this is good to go, then? |
@hkaiser |
P1144R9 is out already, so we do need to rush merging this! |
961680e
to
0f91e09
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesYou may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation |
"Relocating from this source type to this destination type is " | ||
"ill-formed"); | ||
|
||
auto count = std::distance(first, last); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkaiser @Pansysk75 @gonidelis Considering #6364 (comment), it may be sensible to include a warning here such as:
"Calling uninitialized_relocate_backward
with a non-sequential execution policy does not guarantee the order of execution of the relocations to be backward. Use on overlapping ranges could yield undefined behavior."
However, I am not sure how we would issue such a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to do that. The same warning applies to std::copy
and std::move
. There, it is up to the user to ensure these preconditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to do that. [...] it is up to the user to ensure these preconditions.
Does that contradict your earlier #6364 (comment) ?
Using a backwards algorithm usually has a reason, e.g., you know that doing an inplace operation would overwrite values before they can be used if the operation is done in a certain way. I don't think we should start cheating here ;-)
Either "no overlap" is a precondition that you don't need to check, or else we expect overlap to be the common case for the backwards algorithm (because doing it forwards would overwrite values in the overlapping portion). But we can't have both!
FYI, I'm very interested in feedback into the design of these algorithms in P1144.
In standard C++20 we have
- the non-parallel
std::copy_backward
, which forbids overlap - the
ExecutionPolicy
overload ofstd::copy_backward
, which also forbids overlap
and thus {parallel_,}vector::erase
fundamentally isn't allowed to use std::copy_backward
.
D1144R10 currently proposes
- a non-parallel
std::uninitialized_relocate_backward
which permits overlap - an
ExecutionPolicy
overload ofstd::uninitialized_relocate_backward
, which also permits overlap
so that {parallel_,}vector::erase
will be allowed to use std::uninitialized_relocate_backward
(because I think that's an important feature). But this means inconsistency between copy_backward
(which forbids overlap) and std::uninitialized_relocate_backward
(which permits it); and also between std::uninitialized_relocate_backward
(which permits overlap) and hpx::experimental::uninitialized_relocate_backward
(which currently forbids it).
I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n
check for overlap (that is, check whether we're going to end up using memcpy
, so we surely have contiguous ranges, and then check whether those contiguous ranges overlap) and if so, then don't call util::partitioner_with_cleanup
— just do the whole range sequentially-not-in-parallel (or, for trivially relocatable types, use memmove
instead of memcpy
). (Or to put it a different way that implies more major surgery: implement a parallel_memmove
primitive and then make the ExecutionPolicy
overloads of uninitialized_relocate{,_n,_backward}
dispatch straight to parallel_memmove
. But then you get a parallel speedup only for trivially relocatable types, because for non-trivial types and/or non-pointer iterators you can't tell whether the ranges overlap or not? Yuck.)
OTOH, maybe you'd rather that P1144's std::uninitialized_relocate_backward
should forbid overlap for consistency with std::copy_backward
. But if so, then I'd like to know what's your plan for implementing things like {parallel_,}vector::erase
. I think the Standard Library needs some primitive algorithm that fits that use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Quuxplusone Thank you for the detailed write-up, and sorry for my late response.
I think my previous comment: #6364 (comment), is incorrect and there does exist a way to parallelize the relocation of overlapping ranges, maybe like in the following image: (example for 2 threads and an overlapping offset of 1)
Supposing this is true, I see no reason to avoid offering this as a feature. Furthermore, I don't see why we would need non-overlapping ranges to be a precondition for relocation!
I think you could fix the latter inconsistency by having hpx::parallel::detail::parallel_uninitialized_relocate_n check for overlap
I think for the time being we will do this, only parallelizing the guaranteed safe case (contiguous iterator + no overlap), and have the rest be sequenced, but definitely add proper parallelization support on the todo list. @hkaiser What do you think?
maybe you'd rather that P1144's std::uninitialized_relocate_backward should forbid overlap .... If so, I'd like to know what's your plan for implementing things like {parallel_,}vector::erase
For the time being we do not implement parallel data structures, but there is a draft PR for using relocation inside hpx::detail::small_vector
(which is like static_vector
except it grows to dynamic storage after running out of static space) (isidorostsa#9).
0f91e09
to
84afafa
Compare
@hkaiser this is okay to merge if you agree with this |
@hkaiser I noticed that after the merge some tests are failing and I will get onto fixing that as soon as possible. |
@isidorostsa I'd like to revert this merge. Let's fix the issues with a new PR. Can you please do that? |
I take this back. The issues are unrelated to your changes. I will fix them separately. |
Thank you for letting me know. Unfortunately I did not get a chance to review it myself due to traveling for the visa. |
This PR introduces additional features related to relocation, aimed at both internal and public use. The specifics are:
The algorithm
uninitialized_relocate_backward
and its underlying lower-level primitives have been implemented. Corresponding tests for these algorithms have also been added.The function
uninitialized_relocate
has been refactored and it now uses a sentinel based for loop, instead of calculating the number of iterations and internally callinguninitialized_relocate_n
.The primitives of
uninitialized_relocate_backwards
will be utilized internally for thehpx::small_vector::insert()
method.