You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The step DESEQ2_QC_STAR_SALMON fails in script deseq2_qc.r when sample names have many components (i.e., underscores _ in their names) and the treatment group info is located in a component higher than 5. The error occurs on this line.
The root cause is how the columns of the data.frame coldata are named. For example, let's say I have six samples named as follows:
The script automatically extracts two groups from these names: Treated/Control (6th component) and Replicate number (7th) component. It adds two columns Group6 and Group7 to the data.frame colData. This happens on these lines here.
Further down in the script, the script selects components <= 5. Since components 1-5 do not contain group information (only components 6 and 7 do), the data.frame long_pc_grp is empty, resulting in an error.
The fix is straightforward. The columns need to be renumbered starting with 1. This can be accomplished simply by moving the line defining the column names a few lines further down - after irrelevant columns have been removed. I will submit a PR shortly, making the required code change clear.
Command used and terminal output
./deseq2_qc.r --count_file salmon.merged.gene_counts_length_scaled.tsv --outdir ./ --cores 6 --id_col 1 --outprefix deseq2 --count_col 3 --vst TRUEError in `$<-.data.frame`(`*tmp*`, "component", value = "PC ") : replacement has 1 row, data has 0 Calls: $<- -> $<-.data.frame
Description of the bug
The step
DESEQ2_QC_STAR_SALMON
fails in scriptdeseq2_qc.r
when sample names have many components (i.e., underscores_
in their names) and the treatment group info is located in a component higher than 5. The error occurs on this line.The root cause is how the columns of the data.frame
coldata
are named. For example, let's say I have six samples named as follows:The script automatically extracts two groups from these names: Treated/Control (6th component) and Replicate number (7th) component. It adds two columns
Group6
andGroup7
to the data.framecolData
. This happens on these lines here.Further down in the script, the script selects components <= 5. Since components 1-5 do not contain group information (only components 6 and 7 do), the data.frame
long_pc_grp
is empty, resulting in an error.The fix is straightforward. The columns need to be renumbered starting with 1. This can be accomplished simply by moving the line defining the column names a few lines further down - after irrelevant columns have been removed. I will submit a PR shortly, making the required code change clear.
Command used and terminal output
Relevant files
Here is a small test file to produce the error using the command above:
salmon.merged.gene_counts_length_scaled.tsv.zip
System information
No response
The text was updated successfully, but these errors were encountered: