-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel performance with OpenMP #2
Comments
I think there are several factors at play here
Note that when you use the MPI+OpenMP combo by turning on What I suggest is to first test the MPI implementation using the default build steps outlined
You can then test by setting different values to the Then, to test the
When running the OpenMP build make sure to set Here are some test runs to test the scalability of the MPI version first, the first build. -8{0 1 2 3 4 5 6 7} linear 3{20 20 1}
+8{0 1 2 3 4 5 6 7} linear 3{100 100 1} Run single MPI rank run
Takes about 9073 milliseconds
Takes about 4672 milliseconds for a speedup of Lets do the same with the OpenMP implementation. Make sure to use the
Then we set number of threads to 2
We can now run the test case with 1 mpi rank
It took about 6739 milliseconds. It is slower than the MPI implementation but still faster than the serial implementation with a speedup of I hope this helps. Please let me know if you encounter additional issues. Daniel |
@capitalaslash I have now added basically what I described above in the README file so that others will not be confused by it. Thank you! |
Ok, so I was already running with OpenMP on all my cores and oversubscribing with MPI led to the degraded performance. |
Describe the bug
Parallel performance with OpenMP is severely degraded wrt serial.
To Reproduce
Steps to reproduce the behavior:
I run the same test case (cavity-amr) in serial and parallel.
I configured with
I tested the parallel performance with several applications and poses to try to understand why the parallel runs are much much slower, but to no avail.
Do you have any idea why this happens?
Should I try to activate USE_ACC in the configuration?
Expected behavior
At least similar performance, if not an improvement in computational time.
Screenshots
Desktop (please complete the following information):
Additional context
None
The text was updated successfully, but these errors were encountered: