-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with defect mode #29
Comments
Thanks timydaley! Here are some lines of the .mr file: Here are some lines of the preseq result in defect mode with 51G file. |
It works? |
Yes. -D (defect mode) works for 51G .mr file.
2017-02-20 23:53 GMT-06:00 Timothy Daley <[email protected]>:
… It works?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmyjSxgZhr90-lQZJe4126Rvb8QLwODks5renvJgaJpZM4MG5u1>
.
|
But not the bam file? |
I don't have a bam file. I made the .mr file from .fastq by walt directly.
I've just tested a .mr file with 5M lines (1.4g size):
with -D: Works
without -D: ERROR: too many defects in the approximation, consider running
in defect mode
I've also tested a .mr file with 0.5M lines (140M size):
with -D: Works
without -D: Works
2017-02-21 0:12 GMT-06:00 Timothy Daley <[email protected]>:
… But not the bam file?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmyjT69uBa-AD3skhELmiqalRh7Q7f8ks5reoBcgaJpZM4MG5u1>
.
|
Cool. If it works without -D, then I would use those estimates. Defect mode is only for when it doesn't work. You should be able to tell visually if there are issues if you're running it in defect mode, as there will be a localized instability in the curve. Feel free to email me if you have any more problems, issues, or questions. |
Hi Timothy,
Thanks for your reply. Did you mean I should use the .mr file with 5M lines
and without -D (in normal mode) to estimate the whole sample/file? But
there are two questions:
1. The 5M lines .mr file was only the first part of a very large (51G) mr
file. Can it represent the whole sample/file? And the large .mr file was
sorted so the 5M lines .mr file was only some reads in chromosome 1. Is it
ok to do so?
3. The result has too many NAN. Is it correct?
Result example below:
TOTAL_READS EXPECTED_DISTINCT LOWER_0.95CI UPPER_0.95CI
0 0 0 0
1000000.0 928722.5 928237.7 929207.6
2000000.0 1740922.5 1739981.0 1741864.5
3000000.0 2460491.5 2459124.0 2461859.8
4000000.0 3104312.5 3102543.0 3106083.1
5000000.0 3684906.5 nan nan
6000000.0 4212105.7 nan nan
7000000.0 4694161.2 nan nan
8000000.0 5135336.6 nan nan
9000000.0 5539966.7 nan nan
10000000.0 nan nan nan
11000000.0 6255121.5 nan nan
12000000.0 nan nan nan
13000000.0 6840500.0 nan nan
2017-02-21 0:25 GMT-06:00 Timothy Daley <[email protected]>:
… Cool. If it works without -D, then I would use those estimates. Defect
mode is only for when it doesn't work. You should be able to tell visually
if there are issues if you're running it in defect mode, as there will be a
localized instability in the curve.
Feel free to email me if you have any more problems, issues, or questions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmyjTXSooD8m7QCTNtx37uSw29yTovDks5reoNwgaJpZM4MG5u1>
.
|
The NaN are not correct for sure. There is probably an overflow issue due to defects. It's strange that these appear at and near 5M reads, since that's the size of the experiment and no defects should appear near there. This is because the power series expansion has a radius of convergence of 1, meaning you can accurately extrapolate out to 2x of your input experiment with no problems. |
Hello Timothy
Here is the output from 5M reads file with -v and without -D parameter.
Would you please help to fix it? Thanks!
BED_INPUT
TOTAL READS = 5000000
DISTINCT READS = 3.68512e+06
DISTINCT COUNTS = 108
MAX COUNT = 754
COUNTS OF 1 = 2.7627e+06
MAX TERMS = 46
OBSERVED COUNTS (755)
1 2762699
2 666512
3 175262
4 55215
5 15663
6 5959
7 1807
8 833
9 327
10 187
11 88
12 79
13 53
14 52
15 37
16 32
17 17
18 19
19 15
20 19
21 13
22 17
23 12
24 12
25 5
26 9
27 4
28 11
29 3
30 6
31 8
32 7
33 8
34 4
35 4
36 7
37 2
38 4
39 2
40 4
41 5
42 3
43 2
44 3
45 1
46 2
47 2
50 1
51 2
52 2
53 3
54 4
55 3
56 1
57 1
58 2
59 2
60 1
61 1
63 2
65 2
66 2
67 1
68 1
69 2
71 1
72 2
74 1
76 1
77 1
78 2
80 1
84 1
85 1
86 1
87 2
91 1
95 1
97 1
100 1
101 1
113 1
116 1
118 1
119 1
121 1
122 1
125 1
131 1
133 1
135 1
142 1
143 1
147 1
151 1
154 1
157 1
158 1
170 2
186 1
215 1
225 1
253 1
268 1
274 1
288 1
375 1
754 1
[ESTIMATING YIELD CURVE]
[BOOTSTRAPPING HISTOGRAM]
…__________________.___________________________________________._________.____________________.______________._______________________.___________________________.__________________._______._____._______________._______._._________.________________________.____.____._____._________________________.._______________________________._______________________________________________________________.___.________________.____________.___________________________________________________________________________________._____________________._______________._____________.___..______._________________________________________._________________________________.__.__.___.____._______.______________.___________________..___.__________________.________________.___________________________.___________________.__.___________________._._________________.____.____________________________________________________.___._______________________.___________________________________.________________________.___________
ERROR: too many defects in the approximation, consider running in defect
mode
2017-02-21 17:10 GMT-06:00 Timothy Daley <[email protected]>:
The NaN are not correct for sure. There is probably an overflow issue due
to defects. It's strange that these appear at and near 5M reads, since
that's the size of the experiment and no defects should appear near there.
This is because the power series expansion has a radius of convergence of
1, meaning you can accurately extrapolate out to 2x of your input
experiment with no problems.
This is not something I have encountered before.
If you can, run preseq on one of the datasets where this happens and
include the -v (for verbose) option. Send me the resulting output,
specifically the counts histogram. I should be able to work with that to
recreate this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABmyjU0wcbGSNY-hXhiKUINaEPrEIXZ9ks5re27QgaJpZM4MG5u1>
.
|
I apologize for the delay, but I have no problem running preseq in defect mode using the most recent version of preseq. Below is the output for when I run preseq using the above histogram. Can you verify that you are using the most recent version by deleting preseq and re-cloning from the github repository? ./preseq lc_extrap -v -o out.txt -H test.txt -D [ESTIMATING YIELD CURVE] |
@slzhao, I'm moving your issue to the preseq issues.
Does defect mode work when using 5M reads? Or just when using >50M?
The text was updated successfully, but these errors were encountered: