-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuned bcast data corruption #1763
Comments
I confirm this is a known issue. |
Indeed, the problem is that despite the fact that all processes must provide the same type signature, they can provide different counts, and the collective algorithm selection will be unable to cut the datatype at the expected boundary. Thus flattening the datatype, will generate the packed type that match the user provided type on all processes, a type that can be easily split in multiple fragments. Another possible approach would be to allow the collective to call into the PML at a lower level, where they can provide the convertor and the length to be sent. |
i like the second option better consider the following academic case
the only way I can think of sending data into several pieces is to copy the data into a buffer (flatten datatype) first. so let's enhance the PML framework to basically
makes sense ? |
I like the second suggestion better as well. So this will require an addition to the pml send interface. Lets discuss how that might look like. So an interface where we pass a convertor to the PML will be sufficient ? |
@ggouaillardet I don't think we can make this work the way you describe. My suggestion was that instead of the usual (pointer, datatype, count) we provide a well formed convertor and a length to the isend/irecv calls. This means that we maintain the same semantics as today (including matching per message), but we gain more flexibility on what par of the user data become a message by itself. @ggouaillardet the academic case is trivial. The 2 datatypes have exactly the same type signature, so one flattened the 2 datatypes should be identical. In the worst case the flattened datatype will lack a high level count, but because the signature is the same, the current collective algorithms will work. |
@bosilca ok for the pml extension you suggested back to the academic case, this is something i never understood ... but i d like to understand at some point in time ... let's create the and let's have one task call
and an other one call
and we want to split the message into two (two bytes) chunks. that is trivial if
ends up doing
put that requires extra memory and a local send/recv, which does impact performance (and sometimes pretty badly) is there a way (and if yes, which one) to use flatten datatypes without requiring extra memory and a local send/recv ? |
I made #1789 as a proof of concept Any thoughts ? |
@bosilca You mentioned (in person) that there is an MCA parameter to route around this issue. What was that parameter? |
@jjhursey I said we should provide one ;) Right now you can use the tuned collective dynamic configuration to change the decision function to prevent any pipelined function from being called. We might want to create a simpler way, by taking advantage of the PVARs. |
* If (legal) non-uniform data type signatures are used in ibcast then the chosen algorithm may fail on the request, and worst case it could produce wrong answers. * Add an MCA parameter that, by default, protects the user from this scenario. If the user really wants to use it then they have to 'opt-in' by setting the following parameter to false: - `-mca coll_libnbc_ibcast_skip_dt_decision f` * Once the following Issues are resolved then this parameter can be removed. - open-mpi#2256 - open-mpi#1763 * Fixes STG Defect #114680 Signed-off-by: Joshua Hursey <[email protected]>
* If (legal) non-uniform data type signatures are used in ibcast then the chosen algorithm may fail on the request, and worst case it could produce wrong answers. * Add an MCA parameter that, by default, protects the user from this scenario. If the user really wants to use it then they have to 'opt-in' by setting the following parameter to false: - `-mca coll_libnbc_ibcast_skip_dt_decision f` * Once the following Issues are resolved then this parameter can be removed. - open-mpi#2256 - open-mpi#1763 Signed-off-by: Joshua Hursey <[email protected]>
* If (legal) non-uniform data type signatures are used in ibcast then the chosen algorithm may fail on the request, and worst case it could produce wrong answers. * Add an MCA parameter that, by default, protects the user from this scenario. If the user really wants to use it then they have to 'opt-in' by setting the following parameter to false: - `-mca coll_libnbc_ibcast_skip_dt_decision f` * Once the following Issues are resolved then this parameter can be removed. - open-mpi#2256 - open-mpi#1763 Signed-off-by: Joshua Hursey <[email protected]> (cherry picked from commit 350ef67) Signed-off-by: Joshua Hursey <[email protected]>
* If (legal) non-uniform data type signatures are used in ibcast then the chosen algorithm may fail on the request, and worst case it could produce wrong answers. * Add an MCA parameter that, by default, protects the user from this scenario. If the user really wants to use it then they have to 'opt-in' by setting the following parameter to false: - `-mca coll_libnbc_ibcast_skip_dt_decision f` * Once the following Issues are resolved then this parameter can be removed. - open-mpi#2256 - open-mpi#1763 Signed-off-by: Joshua Hursey <[email protected]> (cherry picked from commit 350ef67) Signed-off-by: Joshua Hursey <[email protected]>
For anyone fixing this issue make sure to also fix it in |
* If (legal) non-uniform data type signatures are used in ibcast then the chosen algorithm may fail on the request, and worst case it could produce wrong answers. * Add an MCA parameter that, by default, protects the user from this scenario. If the user really wants to use it then they have to 'opt-in' by setting the following parameter to false: - `-mca coll_libnbc_ibcast_skip_dt_decision f` * Once the following Issues are resolved then this parameter can be removed. - open-mpi#2256 - open-mpi#1763 Signed-off-by: Joshua Hursey <[email protected]> (cherry picked from commit 350ef67) Signed-off-by: Joshua Hursey <[email protected]>
Per discussion on the 2017-02-21 call, removed the blocker label and moved to v3.0.0. |
Doesn't look like this is going to make it into 3.0.0. Should we move to future? |
👍 |
@nysal @bosilca @ggouaillardet @jjhursey Did this get fixed after v3.0.x? Regardless, unless the fix is trivial, it's unlikely to get fixed in v3.0.x or v3.1.x. Should this be moved to v4.0.x or closed? |
The underlying cause of this issue cannot get fixed in an efficient way in the current OMPI infrastructure. @ggouaillardet created a proof of concept PR but due to time constraints it never got completed, which means we are stuck with a suboptimal solution: disabling all tuned decisions for collective where the user provides different counts on the different nodes (implemented by @jjhursey for libnbc). |
Ah, that's #1789? Looks like that was closed with no time to work on it. Ok, let's leave this issue open, but I'll remove the milestone targets. |
There are multiple issues with bcast algorithms when the tasks involved specify different counts and datatypes for a given bcast operation.
One low hanging fix is :
I'm seeing issues with both the pipelined and split binary tree algorithms. This is probably a known issue with pipelined algorithms in tuned module. We should probably start discussing the right approach to solve this.
Here's a reproducer:
The text was updated successfully, but these errors were encountered: