All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Missing UMAP plots.
- Output filenames to include sample alias.
- Output filename formating standardised.
- In the report 'reads' now refers to number of reads not subreads.
kit
andexpected_cells
(visium excepted) are now required. Either as individual parameter or defined per sample via thesingle_cell_sample_sheet
.- Reconcile wf with template v5.3.3.
- Minimum read quality filter.
- 10x 5prime:v3 support.
- Barcode statistics output file.
- Output schema with correct expression matrix paths.
- Spatial plotting of visium data in workflow report for genes specified by
--genes_of_interest
.
- The genes to be used for annotating UMAP plots are now specified by
--genes_of_interest
. - Updated Ezcharts to v0.11.2.
- Alignment summary section to report.
- Support for 10x 3prime v4 (GEM-X) (
--kit 3prime:v4
).
- Options
--kit_name
and--kit_version
replaced with single option--kit
(eg--kit 3prime:v3
).
- Error handling when empty expression matrix is created.
- Support for Visium v1 kit.
- Error when a tags file is empty.
- More informative error message when all cells or features are filtered out.
- Mito gene counts all being zero.
- Skip publishing of gibberish mito-transcript count file.
- Note to README concerning singularity temporary directory.
- Ability to use BAM files as input.
- Use exact kmer matching during barcode correction for further 5x performance improvement. Very minor (<0.02%) difference compared to previous method.
- Reported cell count off by -1 in report summary table.
- Issue with TSV concat/splitting during
combine_bam_and_tags
stage. - Issue introduced in v1.1.0 that caused a partial BAM file to be output.
- Corrected example command in README.
- Incorrect reporting of unique gene and transcripts in report table.
- Processed expression matrix entries incorrectly filtered.
- Gene identity of multimapping reads could be incorrectly assigned.
- Read chunking done in library code.
--process_chunk_size
parameter changed to--fastq_chunk
- Resource declarations in Nextflow processes.
- Simplified read batching and decoupled from CPU usage parameters.
- Expression matrix construction code reworked to reduce memory usage.
- Adapter search step now 3x faster.
- Barcode assignment 3x faster.
- Feature assignment now 15x faster.
- UMI clustering 20x faster.
- UMAP creation memory use reduced 6-fold and up-to 30x faster (and always enabled).
- Final read tagging step is 3x faster.
- Combined various preprocessing steps into a single process to avoid unnecessary file writes.
- Updated stringtie2 to v2.2.2.
- Pre-calculate report summary data to reduce disk-space and IO overheads.
- Single BAM per-sample is now always produced (option
--merge_bam
is removed).
- Several workflow parameters as part of resource management simplification.
--plot_umaps
option, as UMAP generation has been made much more efficient and is always enabled.--merge_bam
option.
full_length_only
parameter to process only full length reads (default: true).- Trim adapters, barcodes and UMIs from reads before alignment.
- Memory directive for umap process to prevent parallel processes from using too much memory.
- Orient 3prime/multiome reads to mRNA sense to avoid need to flip later.
- Default
umap_n_repeats
lowered to 3. - Genome reference alignment done by chunk.
- Issue where splice junctions were searched for on incorrect strand.
- Publish stringtie transcriptome fasta and GFF files to output dir.
- More informative error message upon read duplicate detection.
- Remove duplicate fastcat call.
- Error interpreting CSV data types during BAM tagging.
<img>
tags in the docs.
- Docs to the new format.
single_cell_sample_sheet
samples with same kit name and version not compatible.
-exp_cells
to expected_cells
in single_cell_sample_sheet to be consistent with CLI option.
- Make
prepare_report_data
process more memory-efficient
- Increase the maximum memory available to the adapter_scan process
- Fix sequence truncation by 1 bp in adapter_scan step
- Make
summarize_adapter_table
process more memory-efficient
- Mitochondrial expression file not being copied to output directory
- Incorrect setting of polars maximum threads
- Allow
geneName
attribute in GTF annotation file
- Alignments generated from 5' 10x kit are now in the correct orientation.
- Memory directives to some processes to better manage system resources
- Bumped minimum required Nextflow version to 22.10.8
- GitHub issue templates
- Add chunking of input data to some processes to reduce memory usage
- Output BAM files with alignments from incorrect chromosomes
- Incorrect uncorrected_barcodes.tsv output
- Configuration for running demo data in AWS
- Barcode assignment error when chromosome has no no data
- Include reads in gene expression matrices (but not transcript matrices) that map to intron-only regions
- Incorrect UMAP colors
- Barcode quality extract error
- Saturation plotting error
- Gene ID assigned instead of gene name
- Empty dataframe bug when no data for a chromosome exists
- Improved isoform selection criteria
- Add multiprocessing to calc_saturation.py for ~ 2X speedup
- Use rapidfuzz for finding barcode matches in whielist; ~10x speedup
- Use multithreading to speed up sequencing saturation calcualtion
- Put UMAPs in report and make optional
- Combine barcode and umi extraction into single step.
- Get gene assignments from stringtie.
- workflow-glue to allow scripts to be run as a module.
- pytest testing using workflow container.
- Incorrectly stranded reads causing stringtie2 to generate incorrect transcripts.
- Incorrect UMIs reported and not collapsing into unique UMI counts.
- Sample_sheet format in schema to expect a file
- Updated description in manifest.
- A workflow report using ezcharts.
- Updates for the new Labs Launcher.
- Replace samlmon for minimap2 for assigning reads to transcripts.
- Reduced matrix to the top N principal components pripor to umap generation.
- Fix transcript matrices not in output folder.
- output of merged bam optional.
- Repeat umap creation with different random states.
- Transcript counting Salmon on stringtie-generated transcriptome.
- Several performance-related reforactorings including reductions in read write operations.
- single_cell_sample_sheet is optional and kit options can be supplied as workflow parameters.
- Better handling of sample_id conflicts in single_cell_sampkle_sheet and fastgingress.
- single_cell_sample_sheet is optional.
- Minor IO performance enhancements.
- kit options can be supplied from command line/config and applied to all samples.
- Transcript x cell matrix output.
- Combined gathering and splitting of fastqs into a single process.
- Use split2 for splitting fastqs.
- Remove unused kneeplot flags.
- check for identical sample_ids in single cell sample sheet and fastq data.
- First release. Port of Sockeye to Nextflow