Releases: vastgroup/vast-tools
v2.1.2
NEW
-
A new variable (
--use_all_excl_eej
) incombine
allows users to choose an alternative way of quantifying exclusion reads in the splice-site-based module. Together with--extra_eej
, it may increase sensitivity, but also the number of false positives. -
A new variable (
--extra_eej
) allows defining the number of additional further upstream (for the C1 exons) and further downstream (for the C2 exons) junctions that are considered to quantify exclusion in the annotation-based module as well as in the splice-site-based module if--use_all_excl_eej
is active. Default is 5. -
NOTE: running
combine
v2.1.2 with default options should provide identical results to v2.1.1.
Updates and fixes
- Updates in verbose text messages.
v2.1.1
Updates and fixes
- Further improvements in the the quantification of ANN events in
combine
aimed at reducing false positives in real RNA-seq samples.
Respect to v2.0.2, using a human (-sp Hsa -a hg19) sample (SRR3102173), these changes significantly (|deltaPSI| > 5) affect 79/52621 (0.15%) ANN exons with sufficient read coverage, while 99.22% have a |deltaPSI| < 1. NOTE: The original impact summary in release v2.1.0 was incorrect. Please check updated notes for comparison.
v2.1.0
NEW
-
The ANNOT (annotated exons; ANN) module from
combine
uses a slightly different strategy to define complex skipping reads, which may result in different PSIs for some events.
In a human (-sp Hsa -a hg19) sample (SRR3102173), it significantly (|deltaPSI| > 5) impacts 86 (0.17%) ANN exons, while 99.16% have a |deltaPSI| < 1 in v2.1.0 respect to the previous version. This change has been implemented since it performed better with reads simulating transcripts with random skipping of constitutive exons. Therefore, it may improve quantifications particularly for 'artificial' conditions such as KDs of RNA binding proteins. NOTE: while it has been shown to decrease the false negative rate, it might slightly increase the false positive rate in real biological samples. -
A new module
compare_expr
has been included to identify differentially expressed genes based on fold changes of cRPKMs between samples. It uses a similar logic to the one used bycompare
to identify differentially alternatively spliced events. -
compare
can provide all events (--print_all_ev
) and all AS events (10<PSI<90 in at least one compared sample;--print_AS_ev
) that pass the coverage criteria used in a given analysis. It can also print different sets of events to facilitate their downstream comparison using Matt (http://matt.crg.eu/) (--print_sets
):- CS: all events with coverage and constitutively spliced (PSI>95 for AltEx, PSI<5 for IR).
- CR: all events with coverage and cryptically spliced (PSI<5 for AltEx, PSI>95 for IR).
- AS_NC: all events with coverage, alternative (10 < av_PSI < 90 in a group) and that do not change between the two conditions (abs(dPSI)< max_dPSI).
Updates and fixes
-
trim5 option added on
align
to skip the first X nucleotides of the forward read. This is handy when there are ambiguous nucleotides that will result in no mapping in the strand determination step as well as in the gene expression quantification. -
Minor corrections and bug fixes.
-
Updates in help messages and README.
v2.0.2
Minor bug fixes:
-
Bug fix on
merge
. When using--move_to_PARTS
, merged info files were moved to PARTS/ whereas those of the subsamples were left in the to_combine/ folder. This will be interpreted bycombine
as if the merged file is not strand-specific. It only affected merges of strand-specific samples when using the--move_to_PARTS
option. Any other conditions were handled fine. -
Bug fix on
align
with new versions ofperl
: Experimental pop on scalar is now forbidden.
v2.0.1
This is an important bug fix for align
from v2.0.0. Non strand-specific reads were often detected as strand-specific, usually resulting in the loss of mappability for half of the reads. It is strongly recommended that non strand-specific reads mapped using vast-tools v2.0.0 are remapped with v2.0.1. This does not affect prior versions of vast-tools (v1).
v2.0.0
NEW
-
align
becomes strand-aware. Before mapping, reads are automatically tested to infer whether they are strand-specific or not, and in which direction (FR or RF). Mapping is then performed according to this information. It is possible to run any fastq file in the non-strand-aware mode (--ns
), which is equivalent to runningalign
from v1. -
combine
includes a new module that generates PSIs for all annotated exons (provided they fulfill some mappability and read balance requirements; see README for more information). These means the final INCLUSION table now contains tens of thousands of new exons, often with PSI ~ 100 (i.e. constitutive exons). They can be distinguish by the first digit of the event ID (e.g. HsaEX6000001). -
IMPORTANT NOTE I: these changes require new VASTDB files to be installed. In particular, strand-specific mapping requires different mappability files and the annotation module uses a new template. It is recommended that the entire libraries from version v1 are deleted, and the new libraries simply re-install them from scratch. You may download the new libraries for each available species here (you will only need to untar them afterwards and make sure they are inside VASTDB/):
- Human (Hsa): http://vastdb.crg.eu/libs/vastdb.hsa.16.02.18.tar.gz
- Mouse (Mmu): http://vastdb.crg.eu/libs/vastdb.mmu.16.02.18.tar.gz
- Chicken (Gga): http://vastdb.crg.eu/libs/vastdb.gga.16.02.18.tar.gz
- Zebrafish (Dre): http://vastdb.crg.eu/libs/vastdb.dre.16.02.18.tar.gz
- Sea urchin (Spu): http://vastdb.crg.eu/libs/vastdb.spu.16.02.18.tar.gz
- Planarian (Sme): http://vastdb.crg.eu/libs/vastdb.sme.16.02.18.tar.gz
-
IMPORTANT NOTE II: v2 and v1
align
outputs are still relatively compatible formerge
andcombine
. Intermediate outputs fromalign
in v2 a include a *.info file, which contains information about strand awareness. When runningmerge
orcombine
, vast-tools will first search for all *.info files. For samples with no info file, it will assume they have been mapped in a non-strand-aware manner (e.g. in v1). Forcombine
, each sample is processed according to each sample information; therefore, a final INCLUSION table may include both strand-specific and non-strand-specific samples. Formerge
, if one sample of a group is non-strand-specific or mapped in the non-strand-aware mode (--ns
or in v1), all samples from the group will be treated as non-strand-specific. Obviously, users should keep in mind that merging strand and non-strand-specific RNA-seq samples is risky.
Updates and fixes
-
Quantification of multi-microexon events has been modified so that only reads fully covering a microexon are used to support inclusion. For Spu and Dre, exon-microexon or microexon-exon junction were removed also for simple microexon events. [This update was done to avoid false positive microexon calls that overlap with longer exons]
-
It is possible to obtain the number of counts per exon-exon junction also for the microexon and transcript-based (exskX and MULTI3X) pipelines using the option
-ec, --EEJ_counts
inalign
. -
The non-strand-specific mappability file for the MULTI pipeline in Mmu was updated.
v1.3.0
NEW
-
Two new species have been added: zebrafish,
Danio rerio
(assembly danRer10; species key:Dre
), and sea urchin,Strongylocentrotus purpuratus
(assembly Spur 3.1; species key:Spu
). The associated VastDB libraries can be downloaded in http://vastdb.crg.eu/libs/vastdb.dre.10.03.17.tar.gz and http://vastdb.crg.eu/libs/vastdb.spu.10.03.17.tar.gz. -
The coverage scoring for ALTA and ALTD events after combine was changed to better match those of EX and INT events:
VLOW
: 15 <= X < 25 (previously 10 <= X < 20)LOW
: 25 <= X < 40 (previously 20 <= X < 40)
-
Resume option in
align
. If a run stops before all the steps are finalized properly,align
can be run using the option--resume
and it will identify the last step finished successfully and resume it from there. -
New module
tidy
implemented to filter and clean INCLUSION tables fromcombine
. The output oftidy
is a simpler table only with PSIs (no quality scores) for each event that pass certain filters (including coverage in a minimum number of samples, minimum PSI variation across samples, etc). PSIs for samples that do not reach the minimum coverage threshold are converted intoNA
. The output oftidy
is designed to be uploaded directly toR
. Finally, summary statistics by sample are provided (`% of events without coverage, etc.). -
plot
can now make plots for cRPKM values for gene expression (--expr=TRUE
).
Updates and fixes
-
Updates in
align
:- fastA files can be used for all steps.
- to handle fastq soft links as inputs.
- to map only to the intron retention libraries (
--onlyIR
).
-
Updates in
combine
:- "Last donor" is used for recursive exon-exon junction generation to improve complex PSI quantification (minor impact).
- The minimum number of mappable positions per junction for an AltEx event to be valid in
combi
increases from 1 to 2 (minor impact). - Other minor improvements in
combi
sub-module quantification for AltEx events. - Actual number of reads shown after the @ in the quality score when the number of reads is 0, 1 or 2 (before, all round down to 0).
-
merge
was updated to handle nested merges and to have an initial check for inconsistencies. -
Problems with installation of VASTDB folder fixed.
-
Updates in
README
to incorporate links to VastDB web server and information.
Citation update:
- Main
vast-tools
paper, including benchmarking:
Tapial, J., Ha, K.C.H., Sterne-Weiler, T., Gohr, A., Braunschweig, U., Hermoso-Pulido, A., Quesnel-Vallières, M., Permanyer, J., Sodaei, R., Marquez, Y., Cozzuto, L., Wang, X., Gómez-Velázquez, M., Rayón, M., Manzanares, M., Ponomarenko, J., Blencowe, B.J., Irimia, M. (2017). An Alternative Splicing Atlas Reveals New Regulatory Programs and Genes Simultaneously Expressing Multiple Major Isoforms in Vertebrates. Genome Res, 27(10):1759-1768
- Zebrafish and sea urchin databases:
Burguera, D., Marquez, Y., Racioppi, C., Permanyer, J., Torres-Mendez, T., Esposito, R., Albuixech, B., Fanlo, L., D'Agostino, Y., Gohr, A., Navas-Perez, E., Riesgo, A., Cuomo, C., Benvenuto, G., Christiaen, L.A., Martí, E., D'Aniello, S., Spagnuolo, A., Ristoratore, F., Arnone, M.I., Garcia-Fernàndez, J., Irimia, M. (2017). Evolutionary recruitment of flexible Esrp-dependent splicing programs into diverse embryonic morphogenetic processes. Nat Commun, In press.
v1.2.0
New
- A new species has been added: planarian,
Schmidtea mediterranea
(assembly v31). The associated VastDB library can be downloaded in http://vastdb.crg.eu/libs/vastdb.sme.13.11.15.tar.gz. The species key isSme
. - It is no longer necessary to provide the read length for quantifying gene expression. It is also not needed that all reads have the same length.
vast-tools v1.1.0
New
- It is now possible to generate the INCLUSION_LEVELS_FULL table from
combine
for the newest mouse (mm10) and human (hg38) assemblies. For this, simply provide the assembly version using the-a [hg19|hg38|mm9|mm10]
option. The default is still mm9/hg38.- Note:
combine
uses a conversion file for each species that is now included in a new VASTDB version (vastdb.hsa.22.06.16 and vastdb.mmu.22.06.16). If you already have VASTDB installed, the conversion files are also available for download in http://vastdb.crg.eu/libs/PATCH_mm10-hg38.tar.gz. They should be placed in the corresponding VASTDB/Sp/FILES/ folder. - Note:
vast-tools
still operates with mm9 and hg19 VASTDB versions. Only the coordinates are converted in the final output table.
- Note:
Updates and fixes
- Fix in
merge
: incorrect behavior to avoid overwriting merged files if already present.
vast-tools v1.0.0-beta.2
This update contains some bug fixes from v1.0.0-beta.1.
New
-
A new method to calculate intron retention is available (use
--IR_version 2
inalign
andcombine
). It is a modification of the original one (as described in Braunschweig et al, 2014), but uses multi exon-exon junction read counts for skipping. It provides a more realistic estimate of the Percent Intron Retention (PIR) at the gene level. The original method can still be used with--IR_version 1
.- New files are needed in the
VASTDB/Species/FILES/
folder for the new IR option as well as to obtain gene IDs for GO analysis. Additional files for human, mouse and chicken can be downloaded here: http://vastdb.crg.eu/libs/PATCH_IRv2.tar.gz. To install the patch:
tar -xzvf PATCH_IRv2.tar.gz rsync -av PATCH_IRv2/ /path/to/VASTDB/
- Alternatively, the three full VASTDB libraries can be re-downloaded from:
- Human (hg19) - 6.2G http://vastdb.crg.eu/libs/vastdb.hsa.13.11.15.tar.gz
- Mouse (mm9) - 5.6G http://vastdb.crg.eu/libs/vastdb.mmu.13.11.15.tar.gz
- Chicken (galGal3) - 1.4G http://vastdb.crg.eu/libs/vastdb.gga.13.11.15.tar.gz
- New files are needed in the
-
merge
: a new module to mergealign
outputs from multiple subsamples into new sample files. -
compare
: a new module to identify differentially spliced (DS) AS events based on average inclusion level differences. It also provides gene lists for GO analysis and directly plots DS AS events. This module is independent ofdiff
and is not a replacement for it.
Updates and fixes
- Fix in the calculation of gene expression (cRPKMs) to properly account for read length. A new option
--readLen
or a specific sample name format (Sample-readLen.fq.gz) is enforced inalign
. - Definition of species (
--sp
) is enforced incombine
. - Fix in
combine
. Some Alt3 and Alt5 AS events were not being outputted if their coordinate matched that of a cassette exon. - Fix in ylim setting in
plot
. - Update of
install.R
for the new VASTDB libraries. - Update of documentation.
- Misc. updates.