-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG FIX] ILP64: use type(MPI_*) to avoid MPI type mismatch. #368
Conversation
0db5ac0
to
8fa2687
Compare
I have tested this status of the PR with the master branch. We have got following error messages now:
|
@skriesch: no point to test the patch as it's not working for now (red-ball CI)! You'll can test it when/if the patch is done (green ball): not sure to get there anyway. No guaranty: I'll do my best when/if I can get some time. |
All these crappy changes were just no-op (just needed to move MPI samples from
This problem is related to type mismatch: MPI API only support Hope next commit will be enough to fix this... Otherwise, I am not sure to have a solution here... |
Seems fixed for recv... but for send...
|
@skriesch: you should experience the exact same problem on s390x. Can you check and report here? |
Nice. We are coming closer. :)
One hint for the other architectures. PowerPC Little Endian (ppc64le) and aarch64 (arm) have got more issues now:
|
Not so sure about that. I really have no clue left: trying a last idea (d8e3820 - not sure at all this can fix the problem)... But I don't understand why there is a problem with mpi_send signature (pretty close to mpi_recv which seems to be OK). If you find a fix on s390, would be good to know. |
Didn't expected that but d8e3820 does indeed fix the ILP64 build for
@sylvestre: do you remember what |
@skriesch: the problem is unfortunately a lot deeper that expected... You should be able to compile but many tests should fail (crash or deadlock). I'll try to have a look at that when possible |
This sounds like dead-end (unfortunately). I looked at impacts: they are huge. I don't see anybody who could go for them... So I guess this is the end of the road. ICB (ISO_C_BINDING) is an evolution of the fortran norm (2003 or 2008 - never remember) who is meant to type variables (with F77, typing variables is not mandatory - which is why fortran may be so buggy). With ICB, interaction with other languages (like C/C++) is greatly facilitated (as compiler knows variable types). I added ICB to arpack long time ago (hard-long-boring job over years - this means arpack can be easily called/used from C/C++ now), but for parpack it looks like it's even harder/longer... And certainly too much of an overkill for one man. For parpack to compile with ILP64 on s390, parpack must use the ICB provided by MPI. As MPI is used in each and every guts of parpack, it means you basically need to impact all parpack files (each one of them will be a pain). From now, that I did is quite "simple": I "just" turned MPI fortran samples from
I started with a few (5 or 6) MPI All parpack code must be literally re-written: this means at least 57 files must be moved and then impacted in many many ways
That is to say that parpack must be fully rewritten to make build/run possible on ILP64 arch?!... This is too much of an overkill for one man. Even if somebody would move/impact the 57 files of parpack (huge work), the result is not guaranteed: this may not work (ILP64 build/run) in the end... From what I have seen parpack is not as used as arpack. As arpack-ng is only maintained by a few volunteers and there is no pay behind (full open-source), there is no way somebody can go for such a huge job for no pay... @skriesch: if you need or want parpack on s390, you can push on top of this PR all modifications (1, 2, 3 - you need to cover all cases, watch over all variables [ @sylvestre: maybe we could raise an error in cmake/autotools to say "No more support for parpack with INTERFACE64" ? It's a shame but... I don't see anybody who can afford that job to be done... |
At my side:
@skriesch: at your side, seems you have more KO ( Anyway, adding :
gives
But lapack dlarnv doc says :
And |
lapack ?larnv. This means killing inits variable: - Using MPI, inits (shared variable) may be set to false by one proc, and prevent other procs to initialize seeds (as inits is shared by use of the save fortran keyword). - Not using MPI, inits only prevents from re-initializing seeds which have already been initialized. This commit is not-op. From a functional point of view, we are doing the same thing. From an implementation point of view, we make sure iseed is always (for all MPI procs) initialized to values allowed by lapack (if not, lapack crashes). This problem doesn't occur with sequential code. Changes have been done in both sequential and MPI code to keep things symmetric.
Now,
And others kind of problems seems to appear elsewhere too inside ILP64 version of lapack... We use here the ILP64 version of lapack provided by MKL: no way to debug MKL...
|
620 commits here which is more than the number of commits on master ?!... And no time and solution for now. |
OK, rebasing (end back to beginning) is a headache... As too many files have been changed and renamed... Giving up! :( |
Thinking about this: maybe the best solution would be to restart from scratch and never merge this PR (let it apart for memory and future restart?). This messy PR was useful is the sense it helped to find solutions (scripts) and understand problems (what to replace where and why). Now, from lesson learned, I guess it would be more wise to restart from scratch applying lesson learned (+ maybe find more simple solutions - example: trying to keep |
Huge work here, but, no time/pay... Ended in 04e1b3c: closing for now. |
Compiling with INTERFACE64 means you need to have control on integer type.
If MPI is also required, then MPI types must be consistent with ILP64:
to make sure of this, we must use the ISO_C_BINDING API provided by MPI.
Pull request purpose
May fix issue #348