Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky factorization test failure #69

Closed
StefanKarpinski opened this issue Feb 2, 2014 · 24 comments
Closed

Cholesky factorization test failure #69

StefanKarpinski opened this issue Feb 2, 2014 · 24 comments
Labels
test This change adds or pertains to unit tests

Comments

@StefanKarpinski
Copy link
Member

This has been failing for at least a day:

    JULIA test/linalg
     * linalg
exception on 1: ERROR: assertion failed: |b - :(*(apd,\(capd,b)))| <= 1.4432899320127035e-12
  b = 5 1
3   5
5   3
2   1
4   1
1   5
3   5
2   3
2   5
1   2

  :(*(apd,\(capd,b))) = 4.999999999999602   1.0000000000022737
2.9999999999998863  4.999999999999545
4.999999999999943   3.0000000000004547
1.9999999999999432  1.0000000000015916
4.000000000000114   1.0000000000020464
1.0000000000002274  4.999999999998636
3   5.000000000001592
2.0000000000000853  3.0000000000002274
2.0000000000001137  4.999999999999318
1.0000000000000568  2.000000000001819

  difference = 2.2737367544323206e-12 > 1.4432899320127035e-12
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:39
 in runtests at /Users/stefan/projects/julia.alt/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
ERROR: assertion failed: |b - :(*(apd,\(capd,b)))| <= 1.4432899320127035e-12
  b = 5 1
3   5
5   3
2   1
4   1
1   5
3   5
2   3
2   5
1   2

  :(*(apd,\(capd,b))) = 4.999999999999602   1.0000000000022737
2.9999999999998863  4.999999999999545
4.999999999999943   3.0000000000004547
1.9999999999999432  1.0000000000015916
4.000000000000114   1.0000000000020464
1.0000000000002274  4.999999999998636
3   5.000000000001592
2.0000000000000853  3.0000000000002274
2.0000000000001137  4.999999999999318
1.0000000000000568  2.000000000001819

  difference = 2.2737367544323206e-12 > 1.4432899320127035e-12
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:39
 in runtests at /Users/stefan/projects/julia.alt/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
while loading /Users/stefan/projects/julia.alt/test/runtests.jl, in expression starting on line 23

make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2

cc: @jiahao, @andreasnoackjensen – did you guys monkey around with these tests some time on Friday or yesterday?

Julia Version 0.3.0-prerelease+1362
Commit d0aa799* (2014-02-02 17:49 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.0.0)
  CPU: Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm
@jiahao
Copy link
Member

jiahao commented Feb 2, 2014

I just pushed c88a989b557b9c27538278c9a65a5abc30eaf717 which was intended to fix this. Can you verify if this works for you? (ref: #67)

@jiahao
Copy link
Member

jiahao commented Feb 2, 2014

Those pesky winged monkeys. You leave them unattended for five minutes and they escape the castle and wreak havoc.

@timholy
Copy link
Member

timholy commented Feb 2, 2014

You just need one of those magic caps so you can get them to do your bidding ("write another 200 linalg test cases, mwaaa haaa haaaaa").

@StefanKarpinski
Copy link
Member Author

I'm not sure if this works or not, but now the linalg tests don't finish at all :-|. I'm trying to get some float range stuff working, so I don't have time to debug this at the moment.

@jiahao
Copy link
Member

jiahao commented Feb 3, 2014

Situation normal for me.

@amitmurthy
Copy link

Situation normal on Ubuntu 13.04 with latest master. The linalg tests do take a long time though.

@andreasnoack
Copy link
Member

@StefanKarpinski Do you still see this error?

@mschauer
Copy link
Contributor

mschauer commented Feb 6, 2014

This fails with

type of a: Int32 type of b: Float16
(Automatic) upper Cholesky factor

with

exception on 1: ERROR: assertion failed: |:(det(capd)) - :(det(apd))| <= 0.002384185791015625
  :(det(capd)) = 1.626428240984628e9
  :(det(apd)) = 1.6264282409980435e9
  difference = 0.01341557502746582 > 0.002384185791015625

on

Julia Version 0.3.0-prerelease+1408
Commit fb58104* (2014-02-06 21:53 UTC)
Platform Info:
  System: Linux (i686-linux-gnu)
  CPU: Intel(R) Core(TM) Duo CPU      T2450  @ 2.00GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

@StefanKarpinski
Copy link
Member Author

No, I'm good now, but there do still seem to be a lot of errors on various systems.

@amitmurthy
Copy link

This is really odd.

On the REPL

using Base.Test
import Base.LinAlg
import Base.LinAlg: BlasComplex, BlasFloat, BlasReal

n     = 10
a = rand(n,n)
for elty in (Float32, Float64, Complex64, Complex128)
    a = convert(Matrix{elty}, a)
    # cond
    @test_approx_eq_eps cond(a, 1) 4.837320054554436e+02 0.01
    @test_approx_eq_eps cond(a, 2) 1.960057871514615e+02 0.01
    @test_approx_eq_eps cond(a, Inf) 3.757017682707787e+02 0.01
    @test_approx_eq_eps cond(a[:,1:5]) 10.233059337453463 0.01
end

fails with

ERROR: assertion failed: |:(cond(a,1)) - 483.7320054554436| <= 0.01
  :(cond(a,1)) = 2468.8115
  483.7320054554436 = 483.7320054554436
  difference = 1985.0795179820564 > 0.01
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:4

while julia runtests.jl linalg goes through.

On Ubuntu 13.10

julia> versioninfo()
Julia Version 0.3.0-prerelease+1419
Commit a673e4c* (2014-02-06 22:55 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

@andreasnoack
Copy link
Member

I think you forgot to srand(1234321) in the REPL.

@amitmurthy
Copy link

Right. I was trying to parallelize the linalg tests and it was barfing. Thanks.

@jiahao
Copy link
Member

jiahao commented Feb 7, 2014

The parallelization of the linalg tests brings up the question of how the random number generator behaves in parallel. I don't think the Mersenne Twister implementation we have currently is guaranteed to behave well for providing multiple parallel streams. Perhaps @ViralBShah knows...?

We can't actually parallelize the linalg tests reliably without risking breaking all the bounds because we implicitly rely on the stream of matrices produced by the RNG (even though it is set to a deterministic seed), since we can no longer guarantee the order in which matrices are generated for various tests, and as I showed in JIN5705, there is a small but significant probability that changing the input matrix will cause tests to fail. If we want to pursue parallelizing the linalg tests, the only sane thing we can do now is to snapshot all the matrices being computed and save them into the test suite or as @stevengj suggested in JuliaLang/julia#5705, use fixed matrices and adjust the existing bounds as necessary, so that the tests are deterministic, while in the long run continue to chip away at #67.

@jiahao
Copy link
Member

jiahao commented Feb 7, 2014

Reopening with @mschauer 's reported failure. Updated title to identify the specific test that is failing.

@pao
Copy link
Member

pao commented Feb 7, 2014

Parallel RNG: JuliaLang/julia#94.

@amitmurthy
Copy link

Wouldn't simply setting srand(1234321) before running the specific subset suffice? It will be always deterministic for that subset. No?

@jiahao
Copy link
Member

jiahao commented Feb 7, 2014

No, this does not preserve the current behavior. Imagine if you broke up the test suite halfway through the file and wrapped them in parallel blocks. The first test in the block from the second half of the file will be getting a matrix constructed directly from the first few numbers in the stream seeded by srand(1234321). However, the current test we have would be sampling (say) the 9000-9024th numbers of the stream from srand(1234321) and would be testing an entirely different matrix.

@amitmurthy
Copy link

I understand that. We will have to change the test parameters. But if we could intersperse the current single test suite with calls to srand(1234321) (and changed test parameters) - wherever we want tests grouped together logically - they could be run in parallel without any issues, right?

I don't know the amount of work involved in doing this, the snapshot approach may be simpler.

@jiahao
Copy link
Member

jiahao commented Feb 7, 2014

Again, the point I'm trying to make is not that the current behavior is all that desirable, it is merely that we can only guarantee the tests for the current stream of matrices, because the tolerances are all essentially hard-coded for the current stream. If we change the input stream, we would in principle have to readjust more magic numbers until the tests stop failing. But I think we are both in agreement on this point.

@StefanKarpinski
Copy link
Member Author

Sorry. Ugh. That button is where the cancel comment button should be.

@andreasnoack
Copy link
Member

This one has been fixed a while ago.

@jiahao
Copy link
Member

jiahao commented Jun 2, 2014

@andreasnoackjensen what was the fix?

@andreasnoack
Copy link
Member

My investigation suggests the initial error reported by @StefanKarpinski was fixed by your JuliaLang/julia@c88a989, but that the error reported later by @mschauer for 32 bit systems was fixed by my JuliaLang/julia@b695c7a

@jiahao
Copy link
Member

jiahao commented Jun 2, 2014

Ah, right. For some reason I was thinking about the ARPACK failure when I saw this issue.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test This change adds or pertains to unit tests
Projects
None yet
Development

No branches or pull requests

7 participants