-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why one can get right result on A6000, but problem can't converge on H100 in FP64? #382
Comments
Output on A6000: Number of nonzeros in constraint Jacobian............: 9118 Total number of variables............................: 4559 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls Number of Iterations....: 21
Objective...............: -1.0195004737747260e+00 -1.0195004737747260e+00 Number of objective function evaluations = 22 EXIT: Optimal Solution Found (tol = 1.2e-06). |
with exactly same code, output on H100 in Float64: Number of nonzeros in constraint Jacobian............: 9118 Total number of variables............................: 4559 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls Number of Iterations....: 500
Objective...............: -3.5140181986441943e-01 -3.5140181986441943e-01 Number of objective function evaluations = 1558 EXIT: Maximum Number of Iterations Exceeded. |
Hi @Franc-Z By any chance, can you try running this code on your A6000 and on your H100 (just to test what result we get using a different factorization algorithm): solver = MadNLPGPU.MadNLPSolver(
nlp;
tol= tol,
callback=MadNLP.DenseCallback,
kkt_system=MadNLP.DenseCondensedKKTSystem,
linear_solver=MadNLPGPU.LapackGPUSolver,
lapack_algorithm=MadNLP.LU,
acceptable_tol = tol,
equality_treatment = MadNLP.EnforceEquality,
print_level = MadNLP.INFO,
max_iter = 500,
)
@time CUDA.@allowscalar results = MadNLPGPU.solve!(solver) |
hi, @frapac Francois, I have tested the LU decomposition, on A6000 we can get right results, and on H100 it can't converge either. I'm quite curious that why "LapackGPUSolver would return different results on different hardware"? Is there a stable solver (if for dense case would be better ) in MadNLP, since I want to try it in quant tasks. Thanks! |
This is the output of MadNLP.LU: (doesn't converge) Number of nonzeros in constraint Jacobian............: 9118 Total number of variables............................: 4559 iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls Number of Iterations....: 500
Objective...............: -3.5086428274561127e-01 -3.5086428274561127e-01 Number of objective function evaluations = 1018 EXIT: Maximum Number of Iterations Exceeded. |
Apparently the convergence is the same, not sure the linear solver is faulty there. By any chance, can you share your code so I can try to reproduce the results locally? |
I concur with @frapac. Could you check if your callbacks are returning the same values on A6000 and H100? |
Thanks, I will test and post the callbacks' results on H100. |
As title, I just use MadNLPGPU to solve a NLP problem.
I found I can get the right result on A6000, but problem can't converge on H100 (which runs much faster than on A6000 in Float64).
Below is the solver configuration:
solver = MadNLPGPU.MadNLPSolver
(nlp;
tol= tol,
callback=MadNLP.DenseCallback,
kkt_system=MadNLP.DenseCondensedKKTSystem,
#blas_num_threads = 10000,
linear_solver=MadNLPGPU.LapackGPUSolver,
lapack_algorithm=MadNLP.CHOLESKY,
acceptable_tol = tol,
equality_treatment = MadNLP.EnforceEquality,
print_level = MadNLP.INFO,
max_iter = 500,
)
@time CUDA.@allowscalar results = MadNLPGPU.solve!(solver)
The text was updated successfully, but these errors were encountered: