problem with defect mode #29

timydaley · 2017-02-21T05:43:10Z

@slzhao, I'm moving your issue to the preseq issues.

Does defect mode work when using 5M reads? Or just when using >50M?

slzhao · 2017-02-21T05:52:25Z

Thanks timydaley!
defect mode works with 51G (>50M) data.

Here are some lines of the .mr file:
chr1 9983 10082 HISEQ-KERMIT:328:C9U3TANXX:5:1115:3551:13564 5 - GTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATGGTGTTATAG FFF7/7FF<FF<///
FB<B<//<FF/FBBF<FB/F<F<</F<FFFFFFFFFFFFFFFFFBBFF/FFBFFBFB/FFBFFFFFFBFFFBFFFBFFFBBBBB
chr1 9985 10060 HISEQ-KERMIT:328:C9U3TANXX:5:2308:14005:85954 5 - GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGTTTAGGGTTAGGGTTAGGGGATAGATAGTAT ///<<BFBBFFFFBF<<//BFFFB/FFFFFFBBFFBFF/
F<F<F/<<FF/<<BF/F/<<FFFFFFFFFBFBFFB/

Here are some lines of the preseq result in defect mode with 51G file.
TOTAL_READS EXPECTED_DISTINCT LOWER_0.95CI UPPER_0.95CI
0 0 0 0
1000000.0 988826.5 988342.2 989311.1
2000000.0 1966887.5 1965891.6 1967883.9
3000000.0 2937630.0 2936107.0 2939153.8
4000000.0 3901946.5 3899906.2 3903987.8

timydaley · 2017-02-21T05:53:13Z

It works?

slzhao · 2017-02-21T06:11:58Z

Yes. -D (defect mode) works for 51G .mr file. 2017-02-20 23:53 GMT-06:00 Timothy Daley <[email protected]>:

…

It works? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmyjSxgZhr90-lQZJe4126Rvb8QLwODks5renvJgaJpZM4MG5u1> .

timydaley · 2017-02-21T06:12:44Z

But not the bam file?

slzhao · 2017-02-21T06:22:45Z

I don't have a bam file. I made the .mr file from .fastq by walt directly. I've just tested a .mr file with 5M lines (1.4g size): with -D: Works without -D: ERROR: too many defects in the approximation, consider running in defect mode I've also tested a .mr file with 0.5M lines (140M size): with -D: Works without -D: Works 2017-02-21 0:12 GMT-06:00 Timothy Daley <[email protected]>:

…

But not the bam file? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmyjT69uBa-AD3skhELmiqalRh7Q7f8ks5reoBcgaJpZM4MG5u1> .

timydaley · 2017-02-21T06:25:51Z

Cool. If it works without -D, then I would use those estimates. Defect mode is only for when it doesn't work. You should be able to tell visually if there are issues if you're running it in defect mode, as there will be a localized instability in the curve.

Feel free to email me if you have any more problems, issues, or questions.

slzhao · 2017-02-21T22:47:34Z

Hi Timothy, Thanks for your reply. Did you mean I should use the .mr file with 5M lines and without -D (in normal mode) to estimate the whole sample/file? But there are two questions: 1. The 5M lines .mr file was only the first part of a very large (51G) mr file. Can it represent the whole sample/file? And the large .mr file was sorted so the 5M lines .mr file was only some reads in chromosome 1. Is it ok to do so? 3. The result has too many NAN. Is it correct? Result example below: TOTAL_READS EXPECTED_DISTINCT LOWER_0.95CI UPPER_0.95CI 0 0 0 0 1000000.0 928722.5 928237.7 929207.6 2000000.0 1740922.5 1739981.0 1741864.5 3000000.0 2460491.5 2459124.0 2461859.8 4000000.0 3104312.5 3102543.0 3106083.1 5000000.0 3684906.5 nan nan 6000000.0 4212105.7 nan nan 7000000.0 4694161.2 nan nan 8000000.0 5135336.6 nan nan 9000000.0 5539966.7 nan nan 10000000.0 nan nan nan 11000000.0 6255121.5 nan nan 12000000.0 nan nan nan 13000000.0 6840500.0 nan nan 2017-02-21 0:25 GMT-06:00 Timothy Daley <[email protected]>:

…

Cool. If it works without -D, then I would use those estimates. Defect mode is only for when it doesn't work. You should be able to tell visually if there are issues if you're running it in defect mode, as there will be a localized instability in the curve. Feel free to email me if you have any more problems, issues, or questions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmyjTXSooD8m7QCTNtx37uSw29yTovDks5reoNwgaJpZM4MG5u1> .

timydaley · 2017-02-21T23:10:08Z

The NaN are not correct for sure. There is probably an overflow issue due to defects. It's strange that these appear at and near 5M reads, since that's the size of the experiment and no defects should appear near there. This is because the power series expansion has a radius of convergence of 1, meaning you can accurately extrapolate out to 2x of your input experiment with no problems.
This is not something I have encountered before.
If you can, run preseq on one of the datasets where this happens and include the -v (for verbose) option. Send me the resulting output, specifically the counts histogram. I should be able to work with that to recreate this issue.

slzhao · 2017-02-24T18:03:07Z

Hello Timothy Here is the output from 5M reads file with -v and without -D parameter. Would you please help to fix it? Thanks! BED_INPUT TOTAL READS = 5000000 DISTINCT READS = 3.68512e+06 DISTINCT COUNTS = 108 MAX COUNT = 754 COUNTS OF 1 = 2.7627e+06 MAX TERMS = 46 OBSERVED COUNTS (755) 1 2762699 2 666512 3 175262 4 55215 5 15663 6 5959 7 1807 8 833 9 327 10 187 11 88 12 79 13 53 14 52 15 37 16 32 17 17 18 19 19 15 20 19 21 13 22 17 23 12 24 12 25 5 26 9 27 4 28 11 29 3 30 6 31 8 32 7 33 8 34 4 35 4 36 7 37 2 38 4 39 2 40 4 41 5 42 3 43 2 44 3 45 1 46 2 47 2 50 1 51 2 52 2 53 3 54 4 55 3 56 1 57 1 58 2 59 2 60 1 61 1 63 2 65 2 66 2 67 1 68 1 69 2 71 1 72 2 74 1 76 1 77 1 78 2 80 1 84 1 85 1 86 1 87 2 91 1 95 1 97 1 100 1 101 1 113 1 116 1 118 1 119 1 121 1 122 1 125 1 131 1 133 1 135 1 142 1 143 1 147 1 151 1 154 1 157 1 158 1 170 2 186 1 215 1 225 1 253 1 268 1 274 1 288 1 375 1 754 1 [ESTIMATING YIELD CURVE] [BOOTSTRAPPING HISTOGRAM]

…

__________________.___________________________________________._________.____________________.______________._______________________.___________________________.__________________._______._____._______________._______._._________.________________________.____.____._____._________________________.._______________________________._______________________________________________________________.___.________________.____________.___________________________________________________________________________________._____________________._______________._____________.___..______._________________________________________._________________________________.__.__.___.____._______.______________.___________________..___.__________________.________________.___________________________.___________________.__.___________________._._________________.____.____________________________________________________.___._______________________.___________________________________.________________________.___________ ERROR: too many defects in the approximation, consider running in defect mode 2017-02-21 17:10 GMT-06:00 Timothy Daley <[email protected]>:

The NaN are not correct for sure. There is probably an overflow issue due to defects. It's strange that these appear at and near 5M reads, since that's the size of the experiment and no defects should appear near there. This is because the power series expansion has a radius of convergence of 1, meaning you can accurately extrapolate out to 2x of your input experiment with no problems. This is not something I have encountered before. If you can, run preseq on one of the datasets where this happens and include the -v (for verbose) option. Send me the resulting output, specifically the counts histogram. I should be able to work with that to recreate this issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmyjU0wcbGSNY-hXhiKUINaEPrEIXZ9ks5re27QgaJpZM4MG5u1> .

timydaley · 2017-03-15T17:22:26Z

I apologize for the delay, but I have no problem running preseq in defect mode using the most recent version of preseq. Below is the output for when I run preseq using the above histogram. Can you verify that you are using the most recent version by deleting preseq and re-cloning from the github repository?

./preseq lc_extrap -v -o out.txt -H test.txt -D
HIST_INPUT
TOTAL READS = 5000000
DISTINCT READS = 3.68512e+06
DISTINCT COUNTS = 108
MAX COUNT = 754
COUNTS OF 1 = 2.7627e+06
MAX TERMS = 46
OBSERVED COUNTS (755)
1 2762699
2 666512
3 175262
4 55215
5 15663
6 5959
7 1807
8 833
9 327
10 187
11 88
12 79
13 53
14 52
15 37
16 32
17 17
18 19
19 15
20 19
21 13
22 17
23 12
24 12
25 5
26 9
27 4
28 11
29 3
30 6
31 8
32 7
33 8
34 4
35 4
36 7
37 2
38 4
39 2
40 4
41 5
42 3
43 2
44 3
45 1
46 2
47 2
50 1
51 2
52 2
53 3
54 4
55 3
56 1
57 1
58 2
59 2
60 1
61 1
63 2
65 2
66 2
67 1
68 1
69 2
71 1
72 2
74 1
76 1
77 1
78 2
80 1
84 1
85 1
86 1
87 2
91 1
95 1
97 1
100 1
101 1
113 1
116 1
118 1
119 1
121 1
122 1
125 1
131 1
133 1
135 1
142 1
143 1
147 1
151 1
154 1
157 1
158 1
170 2
186 1
215 1
225 1
253 1
268 1
274 1
288 1
375 1
754 1

[ESTIMATING YIELD CURVE]
[BOOTSTRAPPING HISTOGRAM]
....................................................................................................
[COMPUTING CONFIDENCE INTERVALS]
[WRITING OUTPUT]

timydaley closed this as completed Feb 21, 2017

timydaley reopened this Feb 21, 2017

timydaley closed this as completed May 2, 2018

bounlu mentioned this issue Oct 4, 2023

Preseq failing most of the time nf-core/methylseq#161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem with defect mode #29

problem with defect mode #29

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017 via email

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017 via email

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017 via email

timydaley commented Feb 21, 2017

slzhao commented Feb 24, 2017 via email

timydaley commented Mar 15, 2017

problem with defect mode #29

problem with defect mode #29

Comments

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017 via email

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017 via email

timydaley commented Feb 21, 2017

slzhao commented Feb 21, 2017 via email

timydaley commented Feb 21, 2017

slzhao commented Feb 24, 2017 via email

timydaley commented Mar 15, 2017