-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallelisation problem on CPU's #342
Comments
I would say there is probably something wrong with how the threads land. I'm not sure the version of PETSc supports hybrid MPI/OpenMP parallelisation. Can you try to run on 40 MPI tasks with 1 thread each? |
thanks for your quick response! I hadn't extensively used OpenMP... So I was a little bit lost. If I understood correctly, the way to set 40 MPI taks with was to set the enviromental variable export OMP_NUM_THREADS=1 which i now define on my script. |
Yes, I think this is how you set it. But it might be different on your HPC facility. Have a look at their documentation and what they say about threaded parallelisation and hybrid jobs. |
Couple of points here:
Point (1) would explain the large observed difference in rate of progress. |
Hi Kevin, indeed, in my previous script I wasn't defining the number of threads in openmp, so the "40 OpenMP threads" was a default option. Indeed this caused the simulation to effectively run in serial. Now that I set it to 1, I get "1 OpenMP threads with 21 maximum", and performs as expected. I also found a small issue: I'm using point charges to set a charge pattern in a solid. In psi_init.c , psi_init_sigma subroutine, |
Yes, that looks like a bug, as the charge densities must be positive. |
Been using a ludwig version from a couple of years ago, more precisely, the electrokinetic implementation. I was running it on a supercomputer and was pretty fast, using 40 cores could easily reach 4 million steps. Decided to try the last version, and apparently it works an order of magnitude slower... Can barely made it to 400000 steps. Not sure what the reason is. I attach the first lines of the output with the parallelisation info of both the old version, and the new one . In the old version, there is no target thread model. In the last version, using the same slurm script, the target thead model is openMP, and the number of threads is 21. I'm also using the standard parallel build in the two cases.
Any idea of why this might be happening?
PD: In the last version I'm using a different gnu compiler version to run it with petsc library. But it is not caused by this, since I also checked that running with the previous compiler resulted in the same performance.
OLD VERSION
Welcome to: Ludwig v0.19.1 (MPI version running on 40 processes)
Start time: Thu Jan 9 18:32:16 2025
Compiler:
name: Gnu 11.2.0
version-string: 11.2.0
options: -O3 -g
Note assertions via standard C assert() are on.
Target thread model: None.
Read 37 user parameters from input
System details
System size: 80 80 2
Decomposition: 20 2 1
Local domain: 4 40 2
Periodic: 1 1 1
Halo nhalo: 1
Reorder: true
Initialised: 1
LAST LUDWIG VERSION
Compiler:
name: Gnu 10.4.0
version-string: 10.4.0
options: -O3 -g -Wall
Note assertions via standard C assert() are on.
Target thread model: OpenMP.
OpenMP threads: 40; maximum number of threads: 21.
Read 36 user parameters from input
System details
System size: 80 80 2
Decomposition: 20 2 1
Local domain: 4 40 2
Periodic: 1 1 1
Halo nhalo: 1
Reorder: true
Initialised: 1
The text was updated successfully, but these errors were encountered: