Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another argweaver bug #25

Closed
hyanwong opened this issue Jan 29, 2017 · 5 comments
Closed

Another argweaver bug #25

hyanwong opened this issue Jan 29, 2017 · 5 comments

Comments

@hyanwong
Copy link
Contributor

hyanwong commented Jan 29, 2017

argweaver/bin/arg-sample --sites data/argweaver_bug.sites --popsize 5000 --recombrate 2.5e-08 --mutrate 3.76782964726e-06 --overwrite --quiet --randseed 1355090636 --iters 5000 --sample-step 5000 --output tmp/bug
arg-sample: src/argweaver/sample_thread.cpp:517: int argweaver::sample_hmm_posterior_step(const argweaver::TransMatrixSwitch*, const double*, int): Assertion `matrix->get(k, state2) != 0.0' failed.
Aborted

This only fails on holly, though. It works OK on my laptop. Some sort of rounding / maths bug that is processor or C library dependent?

@hyanwong
Copy link
Contributor Author

May be worth posting this (and the argweaver_bug.sites file) to the ARGweaver github repo.

@hyanwong
Copy link
Contributor Author

hyanwong commented Jan 29, 2017

NB. This causes the following error in the plots.py script:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "src/plots.py", line 494, in infer_worker
    return int(row[0]), runner.run()
  File "src/plots.py", line 293, in run
    ret = self.__run_ARGweaver()
  File "src/plots.py", line 367, in __run_ARGweaver
    self.row.aw_iter_out_freq, int(self.row.aw_burnin_iters))
  File "src/plots.py", line 452, in run_argweaver
    '--output', burn_prefix])
  File "src/plots.py", line 243, in time_cmd
    " ".join(cmd), exit_status, stderr.read()))
ValueError: Error running '/home/yan/treeseq-inference/src/../argweaver/bin/arg-sample --sites data/raw__NOBACKUP__/metrics_by_mutation_rate/simulations/msprime-n10_Ne
5000.0_l5000_rho0.000000025_mu0.00000376783-gs1355090636_ms1355090636err0.1.sites --popsize 5000 --recombrate 2.5e-08 --mutrate 3.76782964726e-06 --overwrite --quiet -
-randseed 1355090636 --iters 5000 --sample-step 5000 --output data/raw__NOBACKUP__/metrics_by_mutation_rate/simulations/aweaver+msprime-n10_Ne5000.0_l5000_rho0.0000000
25_mu0.00000376783-gs1355090636_ms1355090636err0.1+ws1355090636_burn': status=134:stderrb"arg-sample: src/argweaver/sample_thread.cpp:517: int argweaver::sample_hmm_po
sterior_step(const argweaver::TransMatrixSwitch*, const double*, int): Assertion `matrix->get(k, state2) != 0.0' failed.\nCommand terminated by signal 6\n12716 2.40 96
.44\n"
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "src/plots.py", line 1486, in <module>
    main()
  File "src/plots.py", line 1480, in main
    args.func(cls, args)
  File "src/plots.py", line 1366, in run_infer
    f.infer(args.processes, args.threads, args.force)
  File "src/plots.py", line 698, in infer
    for row_id, updated in pool.imap_unordered(infer_worker, work):
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 689, in next
    raise value
ValueError: Error running '/home/yan/treeseq-inference/src/../argweaver/bin/arg-sample --sites data/raw__NOBACKUP__/metrics_by_mutation_rate/simulations/msprime-n10_Ne
5000.0_l5000_rho0.000000025_mu0.00000376783-gs1355090636_ms1355090636err0.1.sites --popsize 5000 --recombrate 2.5e-08 --mutrate 3.76782964726e-06 --overwrite --quiet -
-randseed 1355090636 --iters 5000 --sample-step 5000 --output data/raw__NOBACKUP__/metrics_by_mutation_rate/simulations/aweaver+msprime-n10_Ne5000.0_l5000_rho0.0000000
25_mu0.00000376783-gs1355090636_ms1355090636err0.1+ws1355090636_burn': status=134:stderrb"arg-sample: src/argweaver/sample_thread.cpp:517: int argweaver::sample_hmm_po
sterior_step(const argweaver::TransMatrixSwitch*, const double*, int): Assertion `matrix->get(k, state2) != 0.0' failed.\nCommand terminated by signal 6\n12716 2.40 96
.44\n"

This is raised as a ValueError on line 241. But we should probably carry on so that any such error doesn't doesn't kill the entire run. For later output, it doesn't matter if the AW run fails. The plots.py script should just omit a row if it can't find the right output files.

@hyanwong
Copy link
Contributor Author

Now hacked around by wrapping in a try-except block - if the error message contains ''src/argweaver/sample_thread.cpp:517", the exception is caught, logged, and the process continues. Otherwise the exception is re-raised and the process should stop. This should be enough to work around this specific bug until we can solve why ARGweaver is complaining.

@hyanwong
Copy link
Contributor Author

Reported at mdrasmus/argweaver#21, so closing

@jeromekelleher
Copy link
Member

Sounds good to me @hyanwong. Re the ArgWeaver bug, a possible cause might be differences between GCC and clang, and specifically wrt to default optimisations enabled. It might be worth hacking the makefile to set "CXX = clang++" on holly and seeing if problem persists.

I doubt the problem is processor dependent, as all intel processors look very much the same these days, and IEEE float semantics takes nearly all the nastiness out of floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants