CHANGELOG

 NASP Changelog
================

 1.0.0:
----------
 * New significantly faster matrix generator written in GO
 * Support for adapter/quality trimming of reads
 * SNAP aligner now works (need to add '-rf BadCigar' option to GATK when using SNAP)
 * No .'s in frankenfastas
 * Sample statistics modified to make all stats affected by duplicated regions
 * Various bug fixes
 * SolSNP and NovoAlign not enabled by default

 0.9.9:
----------
 * Fixes to nucmer: properly pass additional arguments, increase walltime and memory allocation
 * Set time allocation for SLURM
 * Tab autocomplete support in UI prompts
 * Added filtered all position matrix
 * Output multi-sample VCF files of the matrices

 0.9.8:
----------
 * Support for bowtie2 aligner
 * Support for SGE job manager
 * PyPI module support, setup script

 0.9.6:
--------
 * VCF files with missing headers now throw a "MalformedInputFile" error instead of hanging.
 * Rewrote multithreading parameter passing algorithm to fix race condition again.
 * Modified data storage algorithm to speed the matrix-building process at the cost of memory use.
 * Improvements made to sample and analysis statistics, and new improved statistics file format.
 * Aligner and SNP caller program name expected values are now more tolerant and open-ended.
 * Insertion data no longer appears in the filtered matrix; one character per matrix cell.
 * Files that fail to be read are now represented with blank matrix columns to alert the user.
 * Fully Python job dispatcher and command-line UI -- all code now written in Python
 * Run log is written to output folder with choices the user made in the UI and commands that were submitted to job manager
 * Main nasp executable can accept an XML file with configuration parameters instead of going through UI
 * XML configuration file is written to output folder on every run
 * Support for SLURM job manager and specifying queues/partitions and other job manager options for both PBS and SLURM

 0.9.5:
--------
 * If there are errors during a file import, the handling thread will move on to the next file instead of dying.
 * Samples will now appear in the same order in the final matrix across multiple runs.
 * New builtin VCF parser that is much more efficient and custom-built for the pipeline.

 0.9.4:
--------
 * Fixed NUCmer issue where regions would appear duplicated by complicatedly mapping to themselves.
 * Fasta files with unusual formatting will be standardized so as to not break downstream tools.
 * An option to produce a "missing data" matrix instead of a "best SNPs" matrix was added.
 * Statistics are now collected, and a global/contig stats file is now produced.

 0.9.3:
--------
 * Reference regions that did not align in an external fasta are now called 'X' instead of 'N'.
 * Large VCF files no longer cause a race condition that prevents matrix generation from finishing.
 * Errors during VCF import no longer prevent matrix generation from finishing.
 * Errors during VCF import are now included in the error output.
 * Bug in duplicate region import that caused some duplicated regions to not be marked is fixed.
 * Proportion filter works.
 * Strange calls in proportion filter are now handled properly.
 * Strange crash when varscan calls heterozygous for deletion fixed.
 * Proportion filter code is now broken up by the SNP caller it's meant to work for, since each has its own.

 0.9.2:
--------
 * Matrix generation is now in python instead of perl.
 * The data import, analysis, and filtering portion of matrix generation is now multithreaded.
 * Code is much cleaner, more maintainable, and now object-oriented.
 * Custom matrix formats are now possible (but there is no user interface to specify the format yet).
 * New python dispatcher is now included, but requires a hand-written XML file and has had limited testing.
 * Statistics generation is incomplete; no statistics file is produced yet.
 * Indel information is discarded; there is no indel matrix.
 * The "Pattern" and "Pattern#" columns are omitted.

 0.9.1:
--------
 * External genomes are no longer concatenated before import.
 * Perl and python external genome importers now produce identical output.
 * Added an option to give custom arguments to NUCmer during external genome import.
 * Duplicate checking and external genome import are now implemented in python.
 * External genome segments now only align to one place on the reference, even if they match in multiple places.
 * Support for the 'SNAP' aligner has been added, but the SNP callers do not like the VCF files.

 0.9:
------
 * Select portions of the pipeline now in python instead of perl.

 0.8.7:
--------
 * Various portability changes made.
 * Additional statistics now included in stats file.
 * Removed "read_folder" argument, now always assumes current working folder.

 0.8.6:
--------
 * External fasta files in DOS format should no longer try to align calls of "carriage return"; turns out most DNA doesn't contain carriage returns.
 * Cryptic warning messages will no longer be produced when a sample had no reads that aligned to a particular reference contig/chromosome.
 * Removed samtools filtering of unmapped reads and low-quality reads from bam files during alignment.
 * The allvariant matrix has been split into an allsnp and an allindel matrix.
 * Statistics file format changed drastically.
 * Statistics file now breaks values up by both sample and contig/chromosome.
 * Statistics are now broken up into two-dimensional tables instead of one statistic per line.
 * Now gives an immediate error if you give it a blank or nonexistent reference, rather than 1.4 million errors after 17 hours.

 0.8.5:
--------
 * Fix for corrupt files released with broken 0.8.4 release.
 * Support for more platforms, easier installation process.

 0.8.3:
--------
 * Undocumented super-secret changes (I lost the changelog for 0.8.3)...

 0.8.2:
--------
 * Added a column "SampleConsensus", which is a yes/no based on whether all analyses matched (intersection).
 * Calls from VCFs and external genomes that are explicitly "N" are now considered uncalled.
 * The bestsnps_matrix no longer contains columns CallWasMade, PassedDepthFilter, PassedProportionFilter.
 * Modified the depth filter to not use any "incorrect" calculation methods.
 * Some changes to statistics file.
 * Added the requested data to the statistics file.

 0.8.1:
--------
 * Matrix format specification changes.

 0.8:
------
 * Several minor bugfixes.
 * Fixed a problem that would break duplicated region checking in rare conditions*.
 * Fixed a bug that was omitting the last position in the all-matrix in rare conditions*.
 * Made file naming improvements.
 * Pipeline now makes an intersection matrix of SNP calls where every caller agrees.
 * Fixed a bug that inserts an underscore randomly into bam names in rare conditions*.
 * Pipeline now makes a statistics file.
 * Fixed a bug where varscan would accidentally trigger kitten-killing on delete calls.
 * Vastly improved support for indels.
    * always

 0.7:
------
 * Added support for finding duplicated regions.
 * Made heading names produced from SolSNP output more understandable.
 * Fixed VarScan.
 * Creates a configuration log that lists options selected and commands queued.
 * Fixed issue with bam indexes being created in the input folder.

 0.6.1:
--------
 * Fixed a bug where chromosome descriptions were included in the identifier.
 * Fixed a bug where fasta headings were being written to the wrong snpfasta file.
 * Removed a feature where Jason gets 3,000,000 bonus SNPs for his help debugging.
 * Fixed a bug where inversions in external genomes were not reverse complemented.
 * Fixed a bug where unpaired reads would have confusing heading names.

 0.6:
------
 * Added BWA mem support, but it seems like it broke samp/se.
 * Created a script for removing columns from a finished matrix.
 * Created a script for combining two finished matrices.
 * Created a script for removing a list of positions from a finished matrix.
 * Fixed a bug where choosing to only run BWA mem would not look for fastq files.
 * Added support for external genomes, but it is not well-tested.
 * Changed the format of matrix sample names to be more readable.
 * Questions that lead to options that may be broken are now starred.
 * Choosing at least one aligner and SNP caller is no longer required.
 * Changed the call made by the low-coverage filter from "X" to "N".
 * Changed the call representing an uncallable position from "N" to "X".
 * Fixed a bug where "T" and "U" match each other sometimes but not others.
 * Fixed a bug where "A" and "a" would be considered different calls in the pattern column.
 * Degeneracies are now all listed as "N" in the pattern column.
 * Added a matrix column "InDupRegion" that always has the value "unchecked".
 * Pipeline murders kittens if you give it files pre-aligned to disagreeing references.
 * Allcallable matrix now fills fully-uncalled regions with "X", rather than skipping them.
 * Memory requirements for matrix building are significantly reduced.

 0.5:
------
 * Added VarScan support, but it is not well-tested.
 * Fixed bam header information to not be randomly-generated ID's.
 * Fixed a bug that would cause corrupt VCF files to kill the matrix step.
 * Moved bam indexing to the end of the alignment step so it's only done once.
 * Fixed VarScan so it wouldn't name every output "Sample1".
 * Added reference line to SNP fasta output.
 * Now yells at you if multiple samples have the same name, but continues.
 * Doesn't print VCF file name in matrix sample names anymore.
 * Sample names more reliably contain the aligner and SNP caller name.

 0.4:
------
 * The matrix-building step will no longer abort or fail if an aligner or SNP caller fails.
 * Creates .dict file during indexing step, to keep GATK from stepping on itself and dying.
 * Changed the heading previously called "Chromosome" in the matrix file to "Contig".
 * Changed default CPU/RAM/walltime requests to be more optimal for clusters.

 0.3:
------
 * Now outputs matrices in the new matrix format.
 * Choosing advanced options for aligners and SNP callers actually gives you options!
 * What I assume are John's suggested options for GATK are on by default.
 * Does reference indexing in an output subfolder, not in some arbitrary folder.
 * Pattern numbers don't skip to match the allcallable file anymore.
 * Outputs matrices in fasta format as well as matrix format.

 Initial Release:
------------------
 * Can use BWA and Novoalign.
 * Can use GATK and SolSNP.
 * Can include pre-aligned bam files, and pre-called vcf files.
 * Uses allcallable, and does not have problems with incorrect reference calls.
 * Understands and properly handles multiple contigs/chromosomes in one reference.
 * Correctly handles degenerate bases.
 * Handles indels gracefully (but not necessarily correctly)...
 * Automatically pairs your read files, if you have paired reads.