Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG FIX] ILP64: use type(MPI_*) to avoid MPI type mismatch. #368

Closed
wants to merge 620 commits into from

Conversation

fghoussen
Copy link
Collaborator

Compiling with INTERFACE64 means you need to have control on integer type.
If MPI is also required, then MPI types must be consistent with ILP64:
to make sure of this, we must use the ISO_C_BINDING API provided by MPI.

Pull request purpose

May fix issue #348

@skriesch
Copy link

I have tested this status of the PR with the master branch. We have got following error messages now:


[   24s] /usr/bin/gfortran -DHAVE_MPI_ICB=1 -I/home/abuild/rpmbuild/BUILD/arpack-ng-master/build -I/home/abuild/rpmbuild/BUILD/arpack-ng-master -I/usr/lib64/mpi/gcc/openmpi2/include -I/usr/lib64/mpi/gcc/openmpi2/lib64 -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fPIC  -cpp -fdefault-integer-8 -O2 -g -DNDEBUG -c /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90 -o CMakeFiles/issue46.dir/PARPACK/TESTS/MPI/issue46.F90.o
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:74:57:
[   24s] 
[   24s]    74 |       parameter       (maxnloc=256, maxnev=10, maxncv=25,
[   24s]       |                                                         1
[   24s] Error: Expected variable name at (1) in PARAMETER statement
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:75:7:
[   24s] 
[   24s]    75 |      &                 ldv=maxnloc )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:81:22:
[   24s] 
[   24s]    81 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:82:7:
[   24s] 
[   24s]    82 |      &                 v(ldv,maxncv), workl(maxncv*(maxncv+8)),
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:83:7:
[   24s] 
[   24s]    83 |      &                 workd(3*maxnloc), d(maxncv,2), resid(maxnloc),
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:84:7:
[   24s] 
[   24s]    84 |      &                 ax(maxnloc)
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:97:7:
[   24s] 
[   24s]    97 |      &                 nconv, maxitr, mode, ishfts
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:99:22:
[   24s] 
[   24s]    99 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:100:7:
[   24s] 
[   24s]   100 |      &                 tol, sigma
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:106:22:
[   24s] 
[   24s]   106 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:107:7:
[   24s] 
[   24s]   107 |      &                  mv_buf(maxnloc)
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:113:22:
[   24s] 
[   24s]   113 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:114:7:
[   24s] 
[   24s]   114 |      &                 zero
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:121:22:
[   24s] 
[   24s]   121 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:122:7:
[   24s] 
[   24s]   122 |      &                 pdnorm2
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:232:70:
[   24s] 
[   24s]   232 |          call pdsaupd ( comm, ido, bmat, nloc, which, nev, tol, resid,
[   24s]       |                                                                      1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:233:7:
[   24s] 
[   24s]   233 |      &                 ncv, v, ldv, iparam, ipntr, workd, workl,
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:234:7:
[   24s] 
[   24s]   234 |      &                 lworkl, info )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:248:45:
[   24s] 
[   24s]   248 |             call av ( comm, nloc, nx, mv_buf,
[   24s]       |                                             1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:249:7:
[   24s] 
[   24s]   249 |      &               workd(ipntr(1)), workd(ipntr(2)))
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:292:48:
[   24s] 
[   24s]   292 |          call pdseupd ( comm, rvec, 'A', select,
[   24s]       |                                                1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:293:7:
[   24s] 
[   24s]   293 |      &        d, v, ldv, sigma,
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:294:7:
[   24s] 
[   24s]   294 |      &        bmat, nloc, which, nev, tol, resid, ncv, v, ldv,
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:295:7:
[   24s] 
[   24s]   295 |      &        iparam, ipntr, workd, workl, lworkl, info )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:350:58:
[   24s] 
[   24s]   350 |              call pdmout(comm, 6, nconv, 2, d, maxncv, -6,
[   24s]       |                                                          1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:351:7:
[   24s] 
[   24s]   351 |      &            'Ritz values and direct residuals')
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:365:23:
[   24s] 
[   24s]   365 |             print *, ' No shifts could be applied during implicit
[   24s]       |                       1
[   24s] Error: Unterminated character constant beginning at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:366:7:
[   24s] 
[   24s]   366 |      &                 Arnoldi update, try increasing NCV.'
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:377:61:
[   24s] 
[   24s]   377 |          print *, ' The number of Arnoldi vectors generated',
[   24s]       |                                                             1
[   24s] Error: Expected expression in PRINT statement at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:378:7:
[   24s] 
[   24s]   378 |      &            ' (NCV) is ', ncv
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:380:61:
[   24s] 
[   24s]   380 |          print *, ' The number of converged Ritz values is ',
[   24s]       |                                                             1
[   24s] Error: Expected expression in PRINT statement at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:381:7:
[   24s] 
[   24s]   381 |      &              nconv
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:382:59:
[   24s] 
[   24s]   382 |          print *, ' The number of Implicit Arnoldi update',
[   24s]       |                                                           1
[   24s] Error: Expected expression in PRINT statement at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:383:7:
[   24s] 
[   24s]   383 |      &            ' iterations taken is ', iparam(3)
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:434:22:
[   24s] 
[   24s]   434 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:435:7:
[   24s] 
[   24s]   435 |      &                  v(nloc), w(nloc), mv_buf(nx), one
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:464:65:
[   24s] 
[   24s]   464 |          call mpi_send( v((np-1)*nx+1), nx, MPI_DOUBLE_PRECISION,
[   24s]       |                                                                 1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:465:7:
[   24s] 
[   24s]   465 |      &                  next, myid+1, comm, ierr )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:468:69:
[   24s] 
[   24s]   468 |          call mpi_recv( mv_buf, nx, MPI_DOUBLE_PRECISION, prev, myid,
[   24s]       |                                                                     1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:469:7:
[   24s] 
[   24s]   469 |      &                  comm, status, ierr )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:474:55:
[   24s] 
[   24s]   474 |          call mpi_send( v(1), nx, MPI_DOUBLE_PRECISION,
[   24s]       |                                                       1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:475:7:
[   24s] 
[   24s]   475 |      &                  prev, myid-1, comm, ierr )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:478:69:
[   24s] 
[   24s]   478 |          call mpi_recv( mv_buf, nx, MPI_DOUBLE_PRECISION, next, myid,
[   24s]       |                                                                     1
[   24s] Error: Syntax error in argument list at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:479:7:
[   24s] 
[   24s]   479 |      &                  comm, status, ierr )
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:489:22:
[   24s] 
[   24s]   489 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:490:7:
[   24s] 
[   24s]   490 |      &                  x(nx), y(nx), dd, dl, du
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:492:22:
[   24s] 
[   24s]   492 |       Double precision
[   24s]       |                      1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:493:7:
[   24s] 
[   24s]   493 |      &                 one
[   24s]       |       1
[   24s] Error: Invalid character in name at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:85:30:
[   24s] 
[   24s]    85 |       logical          select(maxncv)
[   24s]       |                              1
[   24s] Error: Variable 'maxncv' cannot appear in the expression at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:342:16:
[   24s] 
[   24s]   342 |                 d(j,2) = pdnorm2( comm, nloc, ax, 1 )
[   24s]       |                1
[   24s] Error: The function result on the lhs of the assignment at (1) must have the pointer attribute.
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:340:48:
[   24s] 
[   24s]   340 |                 call av(comm, nloc, nx, mv_buf, v(1,j), ax)
[   24s]       |                                                1
[   24s] Error: Expected a procedure for argument 'v' at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:340:56:
[   24s] 
[   24s]   340 |                 call av(comm, nloc, nx, mv_buf, v(1,j), ax)
[   24s]       |                                                        1
[   24s] Error: Expected a procedure for argument 'w' at (1)
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:470:31:
[   24s] 
[   24s]   341 |                 call daxpy(nloc, -d(j,1), v(1,j), 1, ax, 1)
[   24s]       |                                          2
[   24s] ......
[   24s]   470 |          call daxpy( nx, -one, mv_buf, 1, w(1), 1 )
[   24s]       |                               1
[   24s] Error: Type mismatch between actual argument at (1) and actual argument at (2) (INTEGER(8)/REAL(4)).
[   24s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:480:31:
[   24s] 
[   24s]   341 |                 call daxpy(nloc, -d(j,1), v(1,j), 1, ax, 1)
[   24s]       |                                          2
[   24s] ......
[   24s]   480 |          call daxpy( nx, -one, mv_buf, 1, w(lo+1), 1 )
[   24s]       |                               1
[   24s] Error: Type mismatch between actual argument at (1) and actual argument at (2) (INTEGER(8)/REAL(4)).
[   24s] make[2]: *** [CMakeFiles/issue46.dir/build.make:78: CMakeFiles/issue46.dir/PARPACK/TESTS/MPI/issue46.F90.o] Error 1
[   24s] make[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/arpack-ng-master/build'
[   24s] make[1]: *** [CMakeFiles/Makefile2:311: CMakeFiles/issue46.dir/all] Error 2
[   24s] make: *** [Makefile:169: all] Error 2
[   24s] error: Bad exit status from /var/tmp/rpm-tmp.lMy12i (%build)

@fghoussen
Copy link
Collaborator Author

@skriesch: no point to test the patch as it's not working for now (red-ball CI)! You'll can test it when/if the patch is done (green ball): not sure to get there anyway. No guaranty: I'll do my best when/if I can get some time.

@fghoussen
Copy link
Collaborator Author

All these crappy changes were just no-op (just needed to move MPI samples from .f to .F90 to handle #ifdef).
@skriesch: now only ilp64 build is broken with the problem you may experience too in #348

mpif90 -DPACKAGE_NAME=\"ARPACK-NG\" -DPACKAGE_TARNAME=\"arpack-ng\" -DPACKAGE_VERSION=\"3.9.0\" -DPACKAGE_STRING=\"ARPACK-NG\ 3.9.0\" -DPACKAGE_BUGREPORT=\"[https://github.com/opencollab/arpack-ng/issues/\](https://github.com/opencollab/arpack-ng/issues//)" -DPACKAGE_URL=\"[https://github.com/opencollab/arpack-ng/\](https://github.com/opencollab/arpack-ng//)" -DPACKAGE=\"arpack-ng\" -DVERSION=\"3.9.0\" -DHAVE_BLAS=1 -DHAVE_LAPACK=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_CXX11=1 -I.  -I../../..   -DMKL_ILP64 -I/usr/include/mkl -fdefault-integer-8 -DHAVE_MPI_ICB=1 -cpp -c -o issue46-issue46.o `test -f 'issue46.F90' || echo './'`issue46.F90
issue46.F90:458:50:
  458 |                         next, myid+1, comm, ierr )
      |                                                  1
Error: There is no specific subroutine for the generic ‘mpi_send’ at (1)
issue46.F90:462:44:
  462 |                         comm, status, ierr )
      |                                            1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
issue46.F90:468:50:
  468 |                         prev, myid-1, comm, ierr )
      |                                                  1
Error: There is no specific subroutine for the generic ‘mpi_send’ at (1)
issue46.F90:472:44:
  472 |                         comm, status, ierr )
      |                                            1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)

This problem is related to type mismatch: MPI API only support integer*4 and it's built-in types (type(MPI_Comm),type(MPI_Status) - this is why you need #ifdef). This problem doesn't show up on regular distro (x86), but, it's no really surprise it could show up on exotic HW/distro (s390x).

Hope next commit will be enough to fix this... Otherwise, I am not sure to have a solution here...

@fghoussen
Copy link
Collaborator Author

Seems fixed for recv... but for send...

mpif90 -DPACKAGE_NAME=\"ARPACK-NG\" -DPACKAGE_TARNAME=\"arpack-ng\" -DPACKAGE_VERSION=\"3.9.0\" -DPACKAGE_STRING=\"ARPACK-NG\ 3.9.0\" -DPACKAGE_BUGREPORT=\"[https://github.com/opencollab/arpack-ng/issues/\](https://github.com/opencollab/arpack-ng/issues//)" -DPACKAGE_URL=\"[https://github.com/opencollab/arpack-ng/\](https://github.com/opencollab/arpack-ng//)" -DPACKAGE=\"arpack-ng\" -DVERSION=\"3.9.0\" -DHAVE_BLAS=1 -DHAVE_LAPACK=1 -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_MPI=1 -DHAVE_CXX11=1 -I.  -I../../..   -DMKL_ILP64 -I/usr/include/mkl -fdefault-integer-8 -DHAVE_MPI_ICB=1 -cpp -c -o issue46-issue46.o `test -f 'issue46.F90' || echo './'`issue46.F90
issue46.F90:458:50:
  458 |                         next, myid+1, comm, ierr )
      |                                                  1
Error: There is no specific subroutine for the generic ‘mpi_send’ at (1)
issue46.F90:468:50:
  468 |                         prev, myid-1, comm, ierr )
      |                                                  1
Error: There is no specific subroutine for the generic ‘mpi_send’ at (1)

@fghoussen
Copy link
Collaborator Author

fghoussen commented Jul 14, 2022

@skriesch: you should experience the exact same problem on s390x. Can you check and report here?
If you find a fix on s390, would be good to know (push on top of this branch)... Otherwise this could likely be a dead-end (no clue for what to do next...).

@skriesch
Copy link

Nice. We are coming closer. :)
Yes. s390x has got following error:

[  290s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:459:50:
[  290s] 
[  290s]   459 |                         next, myid+1, comm, ierr )
[  290s]       |                                                  1
[  290s] Error: There is no specific subroutine for the generic 'mpi_send' at (1)
[  290s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:469:50:
[  290s] 
[  290s]   469 |                         prev, myid-1, comm, ierr )
[  290s]       |                                                  1
[  290s] Error: There is no specific subroutine for the generic 'mpi_send' at (1)
[  290s] make[2]: *** [CMakeFiles/issue46.dir/build.make:78: CMakeFiles/issue46.dir/PARPACK/TESTS/MPI/issue46.F90.o] Error 1

One hint for the other architectures. PowerPC Little Endian (ppc64le) and aarch64 (arm) have got more issues now:

integer-8 -O2 -g -DNDEBUG -c /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90 -o CMakeFiles/issue46.dir/PARPACK/TESTS/MPI/issue46.F90.o
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:311:15:
[  103s] 
[  103s]   311 |                 print *, ' '
[  103s]       |                 1
[  103s] Warning: Nonconforming tab character at (1) [-Wtabs]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:312:15:
[  103s] 
[  103s]   312 |                 print *, ' Error with _seupd, info = ', info
[  103s]       |                 1
[  103s] Warning: Nonconforming tab character at (1) [-Wtabs]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:313:15:
[  103s] 
[  103s]   313 |                 print *, ' Check the documentation of _seupd. '
[  103s]       |                 1
[  103s] Warning: Nonconforming tab character at (1) [-Wtabs]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:314:15:
[  103s] 
[  103s]   314 |                 print *, ' '
[  103s]       |                 1
[  103s] Warning: Nonconforming tab character at (1) [-Wtabs]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:442:17:
[  103s] 
[  103s]   442 |             lo = (j-1)*nx
[  103s]       |                 1
[  103s] Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1) [-Wconversion]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:441:21:
[  103s] 
[  103s]   441 |          do 10 j = 2, np-1
[  103s]       |                     1
[  103s] Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1) [-Wconversion]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:450:14:
[  103s] 
[  103s]   450 |          lo = (np-1)*nx
[  103s]       |              1
[  103s] Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1) [-Wconversion]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:455:13:
[  103s] 
[  103s]   455 |       next = myid + 1
[  103s]       |             1
[  103s] Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1) [-Wconversion]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:456:13:
[  103s] 
[  103s]   456 |       prev = myid - 1
[  103s]       |             1
[  103s] Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1) [-Wconversion]
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:459:50:
[  103s] 
[  103s]   459 |                         next, myid+1, comm, ierr )
[  103s]       |                                                  1
[  103s] Error: There is no specific subroutine for the generic 'mpi_send' at (1)
[  103s] /home/abuild/rpmbuild/BUILD/arpack-ng-master/PARPACK/TESTS/MPI/issue46.F90:469:50:
[  103s] 
[  103s]   469 |                         prev, myid-1, comm, ierr )
[  103s]       |                                                  1
[  103s] Error: There is no specific subroutine for the generic 'mpi_send' at (1)
[  103s] make[2]: *** [CMakeFiles/issue46.dir/build.make:78: CMakeFiles/issue46.dir/PARPACK/TESTS/MPI/issue46.F90.o] Error 1
[  103s] make[2]: Leaving directory '/home/abuild/rpmbuild/BUILD/arpack-ng-master/build'
[  103s] make[1]: *** [CMakeFiles/Makefile2:311: CMakeFiles/issue46.dir/all] Error 2
[  103s] make: *** [Makefile:169: all] Error 2

@fghoussen
Copy link
Collaborator Author

fghoussen commented Jul 14, 2022

We are coming closer. :)

Not so sure about that. I really have no clue left: trying a last idea (d8e3820 - not sure at all this can fix the problem)... But I don't understand why there is a problem with mpi_send signature (pretty close to mpi_recv which seems to be OK). If you find a fix on s390, would be good to know.

@fghoussen
Copy link
Collaborator Author

Didn't expected that but d8e3820 does indeed fix the ILP64 build for issue46.F90, but, issue46 test fail now

make[4]: Entering directory '/home/runner/work/arpack-ng/arpack-ng/PARPACK/TESTS/MPI'
FAIL: issue46
PASS: icb_parpack_c
PASS: icb_parpack_cpp

@sylvestre: do you remember what issue46 is about? Is it worth to run on ILP64? (not part of original MPI tests right?)

@fghoussen
Copy link
Collaborator Author

@skriesch: can you report here traces of issue46 on s390? revert 04f6580 for that. Traces may help to understand the problem

@fghoussen
Copy link
Collaborator Author

@skriesch: the problem is unfortunately a lot deeper that expected... You should be able to compile but many tests should fail (crash or deadlock). I'll try to have a look at that when possible

@fghoussen
Copy link
Collaborator Author

fghoussen commented Jul 15, 2022

This sounds like dead-end (unfortunately).

I looked at impacts: they are huge. I don't see anybody who could go for them... So I guess this is the end of the road.

ICB (ISO_C_BINDING) is an evolution of the fortran norm (2003 or 2008 - never remember) who is meant to type variables (with F77, typing variables is not mandatory - which is why fortran may be so buggy). With ICB, interaction with other languages (like C/C++) is greatly facilitated (as compiler knows variable types). I added ICB to arpack long time ago (hard-long-boring job over years - this means arpack can be easily called/used from C/C++ now), but for parpack it looks like it's even harder/longer... And certainly too much of an overkill for one man.

For parpack to compile with ILP64 on s390, parpack must use the ICB provided by MPI. As MPI is used in each and every guts of parpack, it means you basically need to impact all parpack files (each one of them will be a pain).

From now, that I did is quite "simple": I "just" turned MPI fortran samples from .f to .F90 to allow ifdef to use correct types:

  1. Move .f to .F90 (capital F to allow implicitly fortran compiler to preprocess files)
  • Replace c by ! (comments)

  • Replace continuation lines (& at column 6 in F77, but end of previous line in F90)

  • Use F90 includes when needed

-#include "debug.h"
-#include "stat.h"
+#include "debugF90.h"
+#include "statF90.h"
  • Possibly some other tricks to move .f to .F90 (* at line beginning is also comment)

  • Make sure cmake/autotools files (configure.ac, Makefile.am) are impacted (files moved from .f to .F90)

  1. Add ifdef to allow use of ICB when needed
#ifdef HAVE_MPI_ICB
      use :: mpi_f08
#else
#include "mpif.h"
#endif
  1. Use ICB type when/where needed:
  • Use type(MPI_*) when defined
#ifdef HAVE_MPI_ICB
      type(MPI_Comm)    comm
      type(MPI_Status)  status
#else
      integer*4         comm, status(MPI_STATUS_SIZE)
#endif

  • Use integer*4elsewhere for all parameters involved in MPI API (mpi_send, mpi_recv, etc)
      integer*4         nprocs, myid, ierr, next, prev, tag
  • When needed, force types to be integer*4 avoiding implicit fortran cast: d8e3820

  • Track all variables that will ultimately be involved in MPI calls to make it an integer*4 (some variables may be declared at different scopes passing from functions to functions and ultimately end-up in MPI calls: the type of these variables must be changed at each level/scope).

I started with a few (5 or 6) MPI .f samples, but it turns out that, logically, these impacts needs to be propagated to all parpack files as they all use MPI (for example they all have comm as first argument, and, they may use plenty of parameters all over the place passed as arguments to MPI functions [myid, next, prev, tag, etc]...). As there were a few sample (test) files, it was still possible (boring but possible!) to handle them: now, I understand they all must be impacted the same way.

All parpack code must be literally re-written: this means at least 57 files must be moved and then impacted in many many ways

>> ll PARPACK/SRC/MPI/p*.f | wc
     57     513    3883

That is to say that parpack must be fully rewritten to make build/run possible on ILP64 arch?!... This is too much of an overkill for one man.

Even if somebody would move/impact the 57 files of parpack (huge work), the result is not guaranteed: this may not work (ILP64 build/run) in the end...

From what I have seen parpack is not as used as arpack. As arpack-ng is only maintained by a few volunteers and there is no pay behind (full open-source), there is no way somebody can go for such a huge job for no pay...

@skriesch: if you need or want parpack on s390, you can push on top of this PR all modifications (1, 2, 3 - you need to cover all cases, watch over all variables [integer*4] involved in MPI calls and you must not forget/miss anything otherwise build/run will fil on s390) for all files of parpack (the 57 files in PARPACK/SRC/MPI/p*.f). This will take a while... Do you really need parpack on s390? If so, why?

@sylvestre: maybe we could raise an error in cmake/autotools to say "No more support for parpack with INTERFACE64" ? It's a shame but... I don't see anybody who can afford that job to be done...

@fghoussen
Copy link
Collaborator Author

At my side:

  • I see only issue46 and p*drv* MPI tests KO.

  • stack (with debug) tells the crash occurs in lapack (pdlarnv that calls dlarnv) only with MPI (sequential is OK calling directly dlarnv)

@skriesch: at your side, seems you have more KO (bug_142 and different stack for issue46 - not sure if stack is comparable if you run without debug).

Anyway, adding :

diff --git a/PARPACK/SRC/MPI/pdlarnv.F90 b/PARPACK/SRC/MPI/pdlarnv.F90
index caf18bf0..a5bdf949 100644
--- a/PARPACK/SRC/MPI/pdlarnv.F90
+++ b/PARPACK/SRC/MPI/pdlarnv.F90
@@ -78,6 +78,9 @@
 !     ..
 !     .. Executable Statements ..
 !
+      write(*,*) "idist", idist
+      write(*,*) "iseed", iseed
+      write(*,*) "n", n
       call dlarnv ( idist, iseed, n, x )
 !
       return

gives

>> mpirun -n 2 PARPACK/TESTS/MPI/issue46
 Hello           0
 Hello           1
 idist                    2
 iseed                    1                    3                    5                    7
 n          17179869284
 idist                    2
 iseed      140730764032432      140730764031408                 4073      140730764033984
 n          17179869234
 idist                    2
 iseed                    1                    3                    5                    7
 n          17179869234

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f2ded0218c2 in ???
#1  0x7f2ded020a55 in ???
#2  0x7f2decda991f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3  0x7f2dedd74530 in ???
#4  0x7f2df3d5466c in ???
#5  0x7f2df47e8706 in pdlarnv_
	at /home/fghoussen/Downloads/arpack-ng/PARPACK/SRC/MPI/pdlarnv.F90:84

But lapack dlarnv doc says :

*> DLARNV returns a vector of n random real numbers from a uniform or
*> normal distribution.
...
*>
*> \param[in,out] ISEED
*> \verbatim
*>          ISEED is INTEGER array, dimension (4)
*>          On entry, the seed of the random number generator; the array
*>          elements must be between 0 and 4095, and ISEED(4) must be
*>          odd.

And iseed is not between 0 and 4095 for all MPI procs!... Which explain the crash in lapack.

lapack ?larnv.

This means killing inits variable:
- Using MPI, inits (shared variable) may be set to false by one proc,
and prevent other procs to initialize seeds (as inits is shared by
use of the save fortran keyword).
- Not using MPI, inits only prevents from re-initializing seeds which
have already been initialized.

This commit is not-op. From a functional point of view, we are doing
the same thing. From an implementation point of view, we make sure
iseed is always (for all MPI procs) initialized to values allowed
by lapack (if not, lapack crashes).

This problem doesn't occur with sequential code. Changes have been done
in both sequential and MPI code to keep things symmetric.
@fghoussen
Copy link
Collaborator Author

fghoussen commented Aug 6, 2022

Now, issue46 pass but p*drv1 still fail... Althought, iseed are correct (print added). No solution for now.

>> mpirun -n 2 ./PARPACK/EXAMPLES/MPI/psndrv1
 iseed                    1                    3                    5                    7
 iseed                    1                    3                    5                    7

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fd39d0218c2 in ???
#1  0x7fd39d020a55 in ???
#0  0x7fb82b6218c2 in ???
#1  0x7fb82b620a55 in ???
#2  0x7fb82b45f91f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#2  0x7fd39cda991f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#3  0x7fd39df01038 in ???
#3  0x7fb82c50103c in ???
#4  0x7fb832290916 in ???
#5  0x7fb832e31ad0 in pslarnv_
	at /home/fghoussen/Downloads/arpack-ng/PARPACK/SRC/MPI/pslarnv.F90:82

And others kind of problems seems to appear elsewhere too inside ILP64 version of lapack... We use here the ILP64 version of lapack provided by MKL: no way to debug MKL...

>> mpirun -n 2 ./PARPACK/EXAMPLES/MPI/psndrv3
  
  ERROR with _pttrf. 

>> (gdb) r
Starting program: /home/fghoussen/Downloads/arpack-ng/PARPACK/EXAMPLES/MPI/.libs/psndrv3 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff04b3640 (LWP 2372691)]
[New Thread 0x7fffeeb79640 (LWP 2372695)]

Thread 1 "psndrv3" received signal SIGSEGV, Segmentation fault.
0x00007ffff17401ee in mkl_lapack_spttrf ()
   from /usr/lib/x86_64-linux-gnu/libmkl_core.so
(gdb) bt
#0  0x00007ffff17401ee in mkl_lapack_spttrf ()
   from /usr/lib/x86_64-linux-gnu/libmkl_core.so
#1  0x00007ffff74a59f7 in spttrf_ ()
   from /usr/lib/x86_64-linux-gnu/libmkl_gf_ilp64.so

@fghoussen
Copy link
Collaborator Author

620 commits here which is more than the number of commits on master ?!... And no time and solution for now.
As this is a(n expected...) mess, I'll split this PR in 2 parts: the last 2 commits are specific to seeds so I'll create a PR for that and, then, I rebase this PR on top of it. Will probably be easier to handle.

@fghoussen
Copy link
Collaborator Author

OK, rebasing (end back to beginning) is a headache... As too many files have been changed and renamed... Giving up! :(
Try to have a look around when possible... This PR may sounds easy at first, but is a very difficult topic...

@fghoussen
Copy link
Collaborator Author

Thinking about this: maybe the best solution would be to restart from scratch and never merge this PR (let it apart for memory and future restart?).

This messy PR was useful is the sense it helped to find solutions (scripts) and understand problems (what to replace where and why). Now, from lesson learned, I guess it would be more wise to restart from scratch applying lesson learned (+ maybe find more simple solutions - example: trying to keep .f if possible). @sylvestre: what do you say?

@fghoussen
Copy link
Collaborator Author

Huge work here, but, no time/pay... Ended in 04e1b3c: closing for now.
Hope to reopen/merge this when/if possible...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants