-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathCHANGELOG
220 lines (194 loc) · 10.7 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
NASP Changelog
================
1.0.0:
----------
* New significantly faster matrix generator written in GO
* Support for adapter/quality trimming of reads
* SNAP aligner now works (need to add '-rf BadCigar' option to GATK when using SNAP)
* No .'s in frankenfastas
* Sample statistics modified to make all stats affected by duplicated regions
* Various bug fixes
* SolSNP and NovoAlign not enabled by default
0.9.9:
----------
* Fixes to nucmer: properly pass additional arguments, increase walltime and memory allocation
* Set time allocation for SLURM
* Tab autocomplete support in UI prompts
* Added filtered all position matrix
* Output multi-sample VCF files of the matrices
0.9.8:
----------
* Support for bowtie2 aligner
* Support for SGE job manager
* PyPI module support, setup script
0.9.6:
--------
* VCF files with missing headers now throw a "MalformedInputFile" error instead of hanging.
* Rewrote multithreading parameter passing algorithm to fix race condition again.
* Modified data storage algorithm to speed the matrix-building process at the cost of memory use.
* Improvements made to sample and analysis statistics, and new improved statistics file format.
* Aligner and SNP caller program name expected values are now more tolerant and open-ended.
* Insertion data no longer appears in the filtered matrix; one character per matrix cell.
* Files that fail to be read are now represented with blank matrix columns to alert the user.
* Fully Python job dispatcher and command-line UI -- all code now written in Python
* Run log is written to output folder with choices the user made in the UI and commands that were submitted to job manager
* Main nasp executable can accept an XML file with configuration parameters instead of going through UI
* XML configuration file is written to output folder on every run
* Support for SLURM job manager and specifying queues/partitions and other job manager options for both PBS and SLURM
0.9.5:
--------
* If there are errors during a file import, the handling thread will move on to the next file instead of dying.
* Samples will now appear in the same order in the final matrix across multiple runs.
* New builtin VCF parser that is much more efficient and custom-built for the pipeline.
0.9.4:
--------
* Fixed NUCmer issue where regions would appear duplicated by complicatedly mapping to themselves.
* Fasta files with unusual formatting will be standardized so as to not break downstream tools.
* An option to produce a "missing data" matrix instead of a "best SNPs" matrix was added.
* Statistics are now collected, and a global/contig stats file is now produced.
0.9.3:
--------
* Reference regions that did not align in an external fasta are now called 'X' instead of 'N'.
* Large VCF files no longer cause a race condition that prevents matrix generation from finishing.
* Errors during VCF import no longer prevent matrix generation from finishing.
* Errors during VCF import are now included in the error output.
* Bug in duplicate region import that caused some duplicated regions to not be marked is fixed.
* Proportion filter works.
* Strange calls in proportion filter are now handled properly.
* Strange crash when varscan calls heterozygous for deletion fixed.
* Proportion filter code is now broken up by the SNP caller it's meant to work for, since each has its own.
0.9.2:
--------
* Matrix generation is now in python instead of perl.
* The data import, analysis, and filtering portion of matrix generation is now multithreaded.
* Code is much cleaner, more maintainable, and now object-oriented.
* Custom matrix formats are now possible (but there is no user interface to specify the format yet).
* New python dispatcher is now included, but requires a hand-written XML file and has had limited testing.
* Statistics generation is incomplete; no statistics file is produced yet.
* Indel information is discarded; there is no indel matrix.
* The "Pattern" and "Pattern#" columns are omitted.
0.9.1:
--------
* External genomes are no longer concatenated before import.
* Perl and python external genome importers now produce identical output.
* Added an option to give custom arguments to NUCmer during external genome import.
* Duplicate checking and external genome import are now implemented in python.
* External genome segments now only align to one place on the reference, even if they match in multiple places.
* Support for the 'SNAP' aligner has been added, but the SNP callers do not like the VCF files.
0.9:
------
* Select portions of the pipeline now in python instead of perl.
0.8.7:
--------
* Various portability changes made.
* Additional statistics now included in stats file.
* Removed "read_folder" argument, now always assumes current working folder.
0.8.6:
--------
* External fasta files in DOS format should no longer try to align calls of "carriage return"; turns out most DNA doesn't contain carriage returns.
* Cryptic warning messages will no longer be produced when a sample had no reads that aligned to a particular reference contig/chromosome.
* Removed samtools filtering of unmapped reads and low-quality reads from bam files during alignment.
* The allvariant matrix has been split into an allsnp and an allindel matrix.
* Statistics file format changed drastically.
* Statistics file now breaks values up by both sample and contig/chromosome.
* Statistics are now broken up into two-dimensional tables instead of one statistic per line.
* Now gives an immediate error if you give it a blank or nonexistent reference, rather than 1.4 million errors after 17 hours.
0.8.5:
--------
* Fix for corrupt files released with broken 0.8.4 release.
* Support for more platforms, easier installation process.
0.8.3:
--------
* Undocumented super-secret changes (I lost the changelog for 0.8.3)...
0.8.2:
--------
* Added a column "SampleConsensus", which is a yes/no based on whether all analyses matched (intersection).
* Calls from VCFs and external genomes that are explicitly "N" are now considered uncalled.
* The bestsnps_matrix no longer contains columns CallWasMade, PassedDepthFilter, PassedProportionFilter.
* Modified the depth filter to not use any "incorrect" calculation methods.
* Some changes to statistics file.
* Added the requested data to the statistics file.
0.8.1:
--------
* Matrix format specification changes.
0.8:
------
* Several minor bugfixes.
* Fixed a problem that would break duplicated region checking in rare conditions*.
* Fixed a bug that was omitting the last position in the all-matrix in rare conditions*.
* Made file naming improvements.
* Pipeline now makes an intersection matrix of SNP calls where every caller agrees.
* Fixed a bug that inserts an underscore randomly into bam names in rare conditions*.
* Pipeline now makes a statistics file.
* Fixed a bug where varscan would accidentally trigger kitten-killing on delete calls.
* Vastly improved support for indels.
* always
0.7:
------
* Added support for finding duplicated regions.
* Made heading names produced from SolSNP output more understandable.
* Fixed VarScan.
* Creates a configuration log that lists options selected and commands queued.
* Fixed issue with bam indexes being created in the input folder.
0.6.1:
--------
* Fixed a bug where chromosome descriptions were included in the identifier.
* Fixed a bug where fasta headings were being written to the wrong snpfasta file.
* Removed a feature where Jason gets 3,000,000 bonus SNPs for his help debugging.
* Fixed a bug where inversions in external genomes were not reverse complemented.
* Fixed a bug where unpaired reads would have confusing heading names.
0.6:
------
* Added BWA mem support, but it seems like it broke samp/se.
* Created a script for removing columns from a finished matrix.
* Created a script for combining two finished matrices.
* Created a script for removing a list of positions from a finished matrix.
* Fixed a bug where choosing to only run BWA mem would not look for fastq files.
* Added support for external genomes, but it is not well-tested.
* Changed the format of matrix sample names to be more readable.
* Questions that lead to options that may be broken are now starred.
* Choosing at least one aligner and SNP caller is no longer required.
* Changed the call made by the low-coverage filter from "X" to "N".
* Changed the call representing an uncallable position from "N" to "X".
* Fixed a bug where "T" and "U" match each other sometimes but not others.
* Fixed a bug where "A" and "a" would be considered different calls in the pattern column.
* Degeneracies are now all listed as "N" in the pattern column.
* Added a matrix column "InDupRegion" that always has the value "unchecked".
* Pipeline murders kittens if you give it files pre-aligned to disagreeing references.
* Allcallable matrix now fills fully-uncalled regions with "X", rather than skipping them.
* Memory requirements for matrix building are significantly reduced.
0.5:
------
* Added VarScan support, but it is not well-tested.
* Fixed bam header information to not be randomly-generated ID's.
* Fixed a bug that would cause corrupt VCF files to kill the matrix step.
* Moved bam indexing to the end of the alignment step so it's only done once.
* Fixed VarScan so it wouldn't name every output "Sample1".
* Added reference line to SNP fasta output.
* Now yells at you if multiple samples have the same name, but continues.
* Doesn't print VCF file name in matrix sample names anymore.
* Sample names more reliably contain the aligner and SNP caller name.
0.4:
------
* The matrix-building step will no longer abort or fail if an aligner or SNP caller fails.
* Creates .dict file during indexing step, to keep GATK from stepping on itself and dying.
* Changed the heading previously called "Chromosome" in the matrix file to "Contig".
* Changed default CPU/RAM/walltime requests to be more optimal for clusters.
0.3:
------
* Now outputs matrices in the new matrix format.
* Choosing advanced options for aligners and SNP callers actually gives you options!
* What I assume are John's suggested options for GATK are on by default.
* Does reference indexing in an output subfolder, not in some arbitrary folder.
* Pattern numbers don't skip to match the allcallable file anymore.
* Outputs matrices in fasta format as well as matrix format.
Initial Release:
------------------
* Can use BWA and Novoalign.
* Can use GATK and SolSNP.
* Can include pre-aligned bam files, and pre-called vcf files.
* Uses allcallable, and does not have problems with incorrect reference calls.
* Understands and properly handles multiple contigs/chromosomes in one reference.
* Correctly handles degenerate bases.
* Handles indels gracefully (but not necessarily correctly)...
* Automatically pairs your read files, if you have paired reads.