-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No improvement after scaffolding #27
Comments
Hi, Please email me the full Statistics.txt file and I can take a look. Thanks for reporting, |
Dear Kristoffer, Thank you for the email. Please find attached the statistics of a) all Best Regards, On Mon, Mar 21, 2016 at 7:20 AM, Kristoffer [email protected]
Time elapsed for reading in contig sequences:6.37683582306 PASS 1 Mean before filtering : 372.398304 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 0: 24.3527109623 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 2 Mean before filtering : 545.350935 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 1: 28.3541140556 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 3 Mean before filtering : 3242.973364 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 2: 30.964922905 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 4 Mean before filtering : 5164.87230601 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 3: 39.8271889687 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 5 Mean before filtering : 8605.9644549 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 4: 24.5135059357 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 6 Mean before filtering : 9895.02139207 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 5: 38.256578207 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 7 Mean before filtering : 14258.6958588 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 6: 35.921377182 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 8 Mean before filtering : 17446.0756672 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 7: 50.5277101994 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 PASS 9 Mean before filtering : 88365.8989331 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 8: 1.34961390495 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 176 L50: 0 N50: 0 Initial contig assembly length: 735565227 Time elapsed for reading in contig sequences:6.48578095436 PASS 1 Mean before filtering : 88365.8989331 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 0: 1.38423609734 Parsing BAM file... 259 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 3973 L50: 91 N50: 2062798 Initial contig assembly length: 735565227 |
Thank you, That is aready a very high quality assembly. Some points:
I would suggest you to rerun the scaffolding with only the larger libraries. If you do, please attach the Statistics.txt, and I can see if everything looks ok. If you decide to run scaffolding with also the 10k MP libs, I would suggest only using one of them, maybe the one with the highest coverage, or the narrowest insert size distribution. I cannot see the coverage of these libraries in the log due to the bug however. A simple fix to get around the bug in the "filtering of repeats" is to set -z with int large, say, e.g., 1000000. This will ignore filtering out contigs before scaffolding. Best, |
Dear Kristoffer, Thank you for the detailed email. I tried executing the scaffolding with 15k, 20k MP and Bac-end libraries a) I am attaching both the statistics. Could you kindly let me know, if the Best Regards, On Mon, Mar 21, 2016 at 10:38 AM, Kristoffer [email protected]
Time elapsed for reading in contig sequences:7.1270070076 PASS 1 Mean before filtering : 14258.6958588 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 0: 37.0993001461 Parsing BAM file... 3796 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 3765 PASS 2 Mean before filtering : 17446.0756672 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 1: 51.5304591656 Parsing BAM file... 7347 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core |
The stats looks ok to me now, -z is therefore not needed. However, the output you pasted seems to be truncated --- I can't find the BAC-library stats. Good that you see improvement in results. I would also suggest you to try the parameter --no_score for your dataset, to see if the output improves further. Best, |
Dear Kristoffer, Thank you for the assistance. I appreciate it. I tried as suggested by you with --no_score option. As you predicted, there Which one would you suggest is the most reliable, in terms of lesser Thank you. Best Regards, On Tue, Mar 22, 2016 at 1:11 AM, Kristoffer [email protected]
Time elapsed for reading in contig sequences:7.66952085495 PASS 1 Mean before filtering : 14258.6958588 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 0: 35.5389959812 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core (super)Contigs after scaffolding: 3528 PASS 2 Mean before filtering : 17446.0756672 LIBRARY STATISTICS Time elapsed for getting libmetrics, iteration 1: 49.885283947 Parsing BAM file... 0 link edges created. Searching for paths BETWEEN scaffolds Entering ELS.BetweenScaffolds single core |
Hi, That is difficult to tell. There is always some tradeoff. I would maybe rerun the scaffolding with -e 10 or something to remove creating links between contigs that has low support. Because you seem to have a high coverage. The parameter -e is actually inferred by besst automatically in latest develop version --- but I haven't released it to master yet.. But I think 10 could be suitable for you data set. -e is set to 3 by default by the way. Now, if you really want to optimize, I would run several runs with BESST for different -e and take the one that gives relatively best tradeoff between contiguity (highest N50 and lowest number of scaffolds) with the highest value of -e -- If there exist such a "clear" winner. That is, my prediction is that lower -e will scaffold more, but might also be more error prone. If you would try e.g. -e from 3 to 20 (with some stepsize) and pick the highest value of -e that is "relatively good performing" (whatever that means) compared to lower values on -e. Best, |
Dear Kristoffer, Thank you for the explanation. I tried several -e options, but is still Best Regards, On Wed, Mar 23, 2016 at 12:32 AM, Kristoffer [email protected]
|
Ok. I'm closing this issue now and have opened a separate one for the coverage bug. If you have further questions, email me directly at the adress provided either on github or in the article describing BESST. Kristoffer |
I was using 2 sets of PE libraries, 6 sets of MP libraries, and a relatively small BAC-end library to scaffold a plant genome of ~750MB. But I do not see any difference between the input contigs and the resulting scaffolds after using BESST. (The command used is: runBESST -c contigs.fasta -f 300_bwa_sorted.bam 500_bwa_sorted.bam 3k_bwa_sorted.bam 5k_bwa_sorted.bam 10k_bwa_sorted.bam 10knew_bwa_sorted.bam 15k_bwa_sorted.bam 20k_bwa_sorted.bam bac_bwa_sorted.bam --orientation fr fr rf rf rf rf rf rf fr)
The last few lines of the Statistics.txt are below:
Entering ELS.BetweenScaffolds single core
Elapsed time single core pathfinder: 5.29289245605e-05
0 paths detected are with score greater or equal to 1.5
Nr of contigs left: 0.0 Nr of linking edges left: 0.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 0
Time elapsed for making scaffolds, iteration 8: 4.90158486366
(super)Contigs after scaffolding: 176
L50: 0 N50: 0 Initial contig assembly length: 735565227
Total time for scaffolding: 4683.50872993
However, when I use only the BAC-end library, I do see improvements in the scaffolds. (The command used is: runBESST -c contigs.fasta -f bac_bwa_sorted.bam --orientation fr)
The last few lines of the Statistics.txt are below:
Entering ELS.BetweenScaffolds single core
Elapsed time single core pathfinder: 0.00954294204712
4 paths detected are with score greater or equal to 1.5
Path: path length: 1.0, nr bad links: 0, score: 12
Path taken! path length: 1.0, nr bad links: 0, score: 12
Path: path length: 1.0, nr bad links: 0, score: 11
Path taken! path length: 1.0, nr bad links: 0, score: 11
Path: path length: 1.0, nr bad links: 0, score: 10
Path taken! path length: 1.0, nr bad links: 0, score: 10
Path: path length: 1.0, nr bad links: 0, score: 8
Path taken! path length: 1.0, nr bad links: 0, score: 8
Nr of contigs left: 3973.0 Nr of linking edges left: 66.0
Number of gaps estimated by GapEst-LP module order_contigs in this step is: 8
Time elapsed for making scaffolds, iteration 0: 6.29209303856
(super)Contigs after scaffolding: 3973
L50: 91 N50: 2062798 Initial contig assembly length: 735565227
Total time for scaffolding: 31.9193239212
Any help would be appreciated. Thank you.
The text was updated successfully, but these errors were encountered: