Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with defect mode #29

Closed
timydaley opened this issue Feb 21, 2017 · 10 comments
Closed

problem with defect mode #29

timydaley opened this issue Feb 21, 2017 · 10 comments

Comments

@timydaley
Copy link
Contributor

@slzhao, I'm moving your issue to the preseq issues.

Does defect mode work when using 5M reads? Or just when using >50M?

@slzhao
Copy link

slzhao commented Feb 21, 2017

Thanks timydaley!
defect mode works with 51G (>50M) data.

Here are some lines of the .mr file:
chr1 9983 10082 HISEQ-KERMIT:328:C9U3TANXX:5:1115:3551:13564 5 - GTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATGGTGTTATAG FFF7/7FF<FF<///
FB<B<//<FF/FBBF<FB/F<F<</F<FFFFFFFFFFFFFFFFFBBFF/FFBFFBFB/FFBFFFFFFBFFFBFFFBFFFBBBBB
chr1 9985 10060 HISEQ-KERMIT:328:C9U3TANXX:5:2308:14005:85954 5 - GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGTTTAGGGTTAGGGTTAGGGGATAGATAGTAT ///<<BFBBFFFFBF<<//BFFFB/FFFFFFBBFFBFF/
F<F<F/<<FF/<<BF/F/<<FFFFFFFFFBFBFFB/

Here are some lines of the preseq result in defect mode with 51G file.
TOTAL_READS EXPECTED_DISTINCT LOWER_0.95CI UPPER_0.95CI
0 0 0 0
1000000.0 988826.5 988342.2 989311.1
2000000.0 1966887.5 1965891.6 1967883.9
3000000.0 2937630.0 2936107.0 2939153.8
4000000.0 3901946.5 3899906.2 3903987.8

@timydaley
Copy link
Contributor Author

It works?

@slzhao
Copy link

slzhao commented Feb 21, 2017 via email

@timydaley
Copy link
Contributor Author

But not the bam file?

@slzhao
Copy link

slzhao commented Feb 21, 2017 via email

@timydaley
Copy link
Contributor Author

Cool. If it works without -D, then I would use those estimates. Defect mode is only for when it doesn't work. You should be able to tell visually if there are issues if you're running it in defect mode, as there will be a localized instability in the curve.

Feel free to email me if you have any more problems, issues, or questions.

@slzhao
Copy link

slzhao commented Feb 21, 2017 via email

@timydaley
Copy link
Contributor Author

The NaN are not correct for sure. There is probably an overflow issue due to defects. It's strange that these appear at and near 5M reads, since that's the size of the experiment and no defects should appear near there. This is because the power series expansion has a radius of convergence of 1, meaning you can accurately extrapolate out to 2x of your input experiment with no problems.
This is not something I have encountered before.
If you can, run preseq on one of the datasets where this happens and include the -v (for verbose) option. Send me the resulting output, specifically the counts histogram. I should be able to work with that to recreate this issue.

@timydaley timydaley reopened this Feb 21, 2017
@slzhao
Copy link

slzhao commented Feb 24, 2017 via email

@timydaley
Copy link
Contributor Author

I apologize for the delay, but I have no problem running preseq in defect mode using the most recent version of preseq. Below is the output for when I run preseq using the above histogram. Can you verify that you are using the most recent version by deleting preseq and re-cloning from the github repository?

./preseq lc_extrap -v -o out.txt -H test.txt -D
HIST_INPUT
TOTAL READS = 5000000
DISTINCT READS = 3.68512e+06
DISTINCT COUNTS = 108
MAX COUNT = 754
COUNTS OF 1 = 2.7627e+06
MAX TERMS = 46
OBSERVED COUNTS (755)
1 2762699
2 666512
3 175262
4 55215
5 15663
6 5959
7 1807
8 833
9 327
10 187
11 88
12 79
13 53
14 52
15 37
16 32
17 17
18 19
19 15
20 19
21 13
22 17
23 12
24 12
25 5
26 9
27 4
28 11
29 3
30 6
31 8
32 7
33 8
34 4
35 4
36 7
37 2
38 4
39 2
40 4
41 5
42 3
43 2
44 3
45 1
46 2
47 2
50 1
51 2
52 2
53 3
54 4
55 3
56 1
57 1
58 2
59 2
60 1
61 1
63 2
65 2
66 2
67 1
68 1
69 2
71 1
72 2
74 1
76 1
77 1
78 2
80 1
84 1
85 1
86 1
87 2
91 1
95 1
97 1
100 1
101 1
113 1
116 1
118 1
119 1
121 1
122 1
125 1
131 1
133 1
135 1
142 1
143 1
147 1
151 1
154 1
157 1
158 1
170 2
186 1
215 1
225 1
253 1
268 1
274 1
288 1
375 1
754 1

[ESTIMATING YIELD CURVE]
[BOOTSTRAPPING HISTOGRAM]
....................................................................................................
[COMPUTING CONFIDENCE INTERVALS]
[WRITING OUTPUT]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants