Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deseq2_qc.r fails when sample names have many components #988

Closed
suhrig opened this issue Mar 29, 2023 · 1 comment
Closed

deseq2_qc.r fails when sample names have many components #988

suhrig opened this issue Mar 29, 2023 · 1 comment
Labels
bug Something isn't working
Milestone

Comments

@suhrig
Copy link
Contributor

suhrig commented Mar 29, 2023

Description of the bug

The step DESEQ2_QC_STAR_SALMON fails in script deseq2_qc.r when sample names have many components (i.e., underscores _ in their names) and the treatment group info is located in a component higher than 5. The error occurs on this line.

The root cause is how the columns of the data.frame coldata are named. For example, let's say I have six samples named as follows:

SRR0000001_Mus_musculus_Brain_Tissue_Control_Rep1
SRR0000002_Mus_musculus_Brain_Tissue_Control_Rep2
SRR0000003_Mus_musculus_Brain_Tissue_Control_Rep3
SRR0000004_Mus_musculus_Brain_Tissue_Treated_Rep1
SRR0000005_Mus_musculus_Brain_Tissue_Treated_Rep2
SRR0000006_Mus_musculus_Brain_Tissue_Treated_Rep3

The script automatically extracts two groups from these names: Treated/Control (6th component) and Replicate number (7th) component. It adds two columns Group6 and Group7 to the data.frame colData. This happens on these lines here.

Further down in the script, the script selects components <= 5. Since components 1-5 do not contain group information (only components 6 and 7 do), the data.frame long_pc_grp is empty, resulting in an error.

The fix is straightforward. The columns need to be renumbered starting with 1. This can be accomplished simply by moving the line defining the column names a few lines further down - after irrelevant columns have been removed. I will submit a PR shortly, making the required code change clear.

Command used and terminal output

./deseq2_qc.r --count_file salmon.merged.gene_counts_length_scaled.tsv --outdir ./ --cores 6 --id_col 1 --outprefix deseq2 --count_col 3 --vst TRUE

Error in `$<-.data.frame`(`*tmp*`, "component", value = "PC ") : 
    replacement has 1 row, data has 0
  Calls: $<- -> $<-.data.frame

Relevant files

Here is a small test file to produce the error using the command above:
salmon.merged.gene_counts_length_scaled.tsv.zip

System information

No response

@suhrig suhrig added the bug Something isn't working label Mar 29, 2023
@suhrig suhrig mentioned this issue Mar 30, 2023
9 tasks
@drpatelh drpatelh added this to the 3.11 milestone Mar 30, 2023
@drpatelh
Copy link
Member

Fixed in #990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants