Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelisation problem on CPU's #342

Open
slevinskygra opened this issue Jan 14, 2025 · 6 comments
Open

parallelisation problem on CPU's #342

slevinskygra opened this issue Jan 14, 2025 · 6 comments

Comments

@slevinskygra
Copy link
Contributor

Been using a ludwig version from a couple of years ago, more precisely, the electrokinetic implementation. I was running it on a supercomputer and was pretty fast, using 40 cores could easily reach 4 million steps. Decided to try the last version, and apparently it works an order of magnitude slower... Can barely made it to 400000 steps. Not sure what the reason is. I attach the first lines of the output with the parallelisation info of both the old version, and the new one . In the old version, there is no target thread model. In the last version, using the same slurm script, the target thead model is openMP, and the number of threads is 21. I'm also using the standard parallel build in the two cases.

Any idea of why this might be happening?

PD: In the last version I'm using a different gnu compiler version to run it with petsc library. But it is not caused by this, since I also checked that running with the previous compiler resulted in the same performance.

OLD VERSION

Welcome to: Ludwig v0.19.1 (MPI version running on 40 processes)

Start time: Thu Jan 9 18:32:16 2025

Compiler:
name: Gnu 11.2.0
version-string: 11.2.0
options: -O3 -g

Note assertions via standard C assert() are on.

Target thread model: None.

Read 37 user parameters from input

System details

System size: 80 80 2
Decomposition: 20 2 1
Local domain: 4 40 2
Periodic: 1 1 1
Halo nhalo: 1
Reorder: true
Initialised: 1

LAST LUDWIG VERSION

Compiler:
name: Gnu 10.4.0
version-string: 10.4.0
options: -O3 -g -Wall

Note assertions via standard C assert() are on.

Target thread model: OpenMP.
OpenMP threads: 40; maximum number of threads: 21.

Read 36 user parameters from input

System details

System size: 80 80 2
Decomposition: 20 2 1
Local domain: 4 40 2
Periodic: 1 1 1
Halo nhalo: 1
Reorder: true
Initialised: 1

@ohenrich
Copy link
Collaborator

I would say there is probably something wrong with how the threads land. I'm not sure the version of PETSc supports hybrid MPI/OpenMP parallelisation. Can you try to run on 40 MPI tasks with 1 thread each?

@slevinskygra
Copy link
Contributor Author

slevinskygra commented Jan 14, 2025

thanks for your quick response! I hadn't extensively used OpenMP... So I was a little bit lost. If I understood correctly, the way to set 40 MPI taks with was to set the enviromental variable

export OMP_NUM_THREADS=1

which i now define on my script.
and this indeed worked for me. sent a simulation 15 minutes ago, already in step 150000. Thank you very much!

@ohenrich
Copy link
Collaborator

Yes, I think this is how you set it. But it might be different on your HPC facility. Have a look at their documentation and what they say about threaded parallelisation and hybrid jobs.

@kevinstratford
Copy link
Collaborator

Couple of points here:

  1. Recommend use MPI-only for electrokinetics (as in your "old" version). Most of the time goes into the Poisson solve, where there is no threaded version at the moment. So you will effectively be running in serial. (40 OpenMP threads with 21 maximum also looks a bit odd to me, but I'm not sure what you're running on.)
  2. If you care about performance, please compile with -DNDEBUG as suggested at https://ludwig.epcc.ed.ac.uk/building/index.html#preprocessor-options
    to get rid of the assertions.

Point (1) would explain the large observed difference in rate of progress.

@slevinskygra
Copy link
Contributor Author

Hi Kevin,

indeed, in my previous script I wasn't defining the number of threads in openmp, so the "40 OpenMP threads" was a default option. Indeed this caused the simulation to effectively run in serial. Now that I set it to 1, I get "1 OpenMP threads with 21 maximum", and performs as expected.

I also found a small issue: I'm using point charges to set a charge pattern in a solid. In psi_init.c , psi_init_sigma subroutine,
line 244, If sigma<0, the sign of rho in psi_rho_set should be changed, psi_rho_set(psi,index,1,-sigma). Otherwise, the free energy density calculation triggers an assertion and fed diverges (logarithm of a negative number).

@kevinstratford
Copy link
Collaborator

Yes, that looks like a bug, as the charge densities must be positive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants