You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pipeline failing with input and matrix sample names including "-" character on CUSTOM_MATRIXFILTER step, error gives all names of samples with "-" ID. When I check the work folder for the process, the names in each of the files have had "-" changed to ".", but I think this might be being compared to the original names and failing?
If I change names with sed in input and matrix file to remove "-", the pipeline continues without error.
Command used and terminal output
nextflow run nf-core/differentialabundance \ -profile singularity \ -r dev \ --input diffabundance/inputs/samplesheet.csv \ --contrasts diffabundance/inputs/serCan_contrasts.csv \ --matrix diffabundance/inputs/data/rsem.merged.transcript_counts.tsv \ --gtf reference/serCan2020/serCan2020_no_genes.gtf \ -w diffabundance/work \ --features_id_col 'transcript_id' \ --outdir diffabundance/results \ --max_memory '160.GB' \ --deseq2_cores 8 \ -resume
# output:executor > local (2)[32/9bd716] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1, cached: 1 ✔[52/70c121] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1 ✔[ed/3225f0] process > NFCORE_DIFFERENTIALABUNDANC... [100%] 1 of 1, failed: 1 ✘[- ] process > NFCORE_DIFFERENTIALABUNDANC... -[- ] process > NFCORE_DIFFERENTIALABUNDANC... -[- ] process > NFCORE_DIFFERENTIALABUNDANC... -[- ] process > NFCORE_DIFFERENTIALABUNDANC... -[- ] process > NFCORE_DIFFERENTIALABUNDANC... -Execution cancelled -- Finishing pending tasks before exit-[nf-core/differentialabundance] Pipeline completed with errors-Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER ([id:study])'Caused by: Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER ([id:study])` terminated with an error exit status (1)Command executed [/.nextflow/assets/nf-core/differentialabundance/./workflows/../modules/nf-core/custom/matrixfilter/templates/matrixfilter.R]: #!/usr/bin/env Rscript # Filter rows based on the number of columns passing the abundance threshold. By # default this will be any row with a value of 1 or more, which would be a # permissive threshold for RNA-seq data. # # In RNA-seq studies it's often not enough to just remove genes not expressed in # any sample. We also want to remove anything likely to be part of noise, or # which has sufficiently low expression that differential analysis would not be # useful. For that reason we might require a higher threshold than 1, and # require that more than one sample passes. # # Often we want to know that a gene is expressed in a substantial enough number # of sample that differential analysis worthwhile, so we may pick a threshold # sample number related to group size. Note that we do not filter with an # awareness of the groups themselves, since this adds bias towards discovery # between those groups. ################################################ ################################################ ## Functions ## ################################################ ################################################ #' Parse out options from a string without recourse to optparse #' #' @param x Long-form argument list like --opt1 val1 --opt2 val2 #' @param opt_defaults A lis with default argument values #' #' @return named list of options and values similar to optparse parse_args <- function(x, opt_defaults){ args_list <- unlist(strsplit(x, ' ?--')[[1]])[-1] args_vals <- unlist(lapply(args_list, function(y) strsplit(y, ' +')), recursive = FALSE) # Ensure the option vectors are length 2 (key/ value) to catch empty ones args_vals <- lapply(args_vals, function(z){ length(z) <- 2; z}) parsed_args <- structure(lapply(args_vals, function(x) x[2]), names = lapply(args_vals, function(x) x[1])) parsed_args[! is.na(parsed_args)] # Now apply CLI options to override defaults opt_types <- lapply(opt_defaults, class) for ( ao in names(parsed_args)){ if (! ao %in% names(opt_defaults)){ stop(paste("Invalid option:", ao)) }else{ # Preserve classes from defaults where possible if (! is.null(opt_defaults[[ao]])){ parsed_args[[ao]] <- as(parsed_args[[ao]], opt_types[[ao]]) } opt_defaults[[ao]] <- parsed_args[[ao]] } } opt_defaults } #' Flexibly read CSV or TSV files #' #' @param file Input file #' @param header Passed to read.delim() #' @param row.names Passed to read.delim() #' @param nrows Passed to read.delim() #' #' @return output Data frame read_delim_flexible <- function(file, header = TRUE, row.names = NULL, nrows = -1 ){ ext <- tolower(tail(strsplit(basename(file), split = "\\.")[[1]], 1)) if (ext == "tsv" || ext == "txt") { separator <- "\t" } else if (ext == "csv") { separator <- "," } else { stop(paste("Unknown separator for", ext)) } read.delim( file, sep = separator, header = header, row.names = row.names ) } # Set up default options opt <- list( abundance_matrix_file = 'rsem.merged.transcript_counts.assay.tsv', sample_file = 'samplesheet.sample_metadata.tsv', minimum_abundance = 1, minimum_samples = 1, minimum_proportion = 0, grouping_variable = NULL ) opt <- parse_args('--minimum_samples 1 --minimum_abundance 1', opt) abundance_matrix <- read_delim_flexible(opt$abundance_matrix_file, row.names = 1) feature_id_name <- colnames( read_delim_flexible(opt$abundance_matrix_file, nrows = 1)[1]) # If a sample sheet was specified, validate the matrix against it if (opt$sample_file != ''){ # Read the sample sheet and check against matrix samplesheet <- read_delim_flexible(opt$sample_file, row.names = 1) missing_samples <- setdiff(rownames(samplesheet), colnames(abundance_matrix)) if (length(missing_samples) > 0){ stop( paste( paste(missing_samples, collapse = ', '), 'not represented in supplied abundance matrix' ) ) }else{ abundance_matrix <- abundance_matrix[,rownames(samplesheet)] } }else{ # If we're not using a sample sheet to select columns, then at least make # sure the ones we have are numeric (some upstream things like the RNA-seq # workflow have annotation colummns as well) numeric_columns <- unlist(lapply(1:ncol(abundance_matrix), function(x) is.numeric(abundance_matrix[,x]))) abundance_matrix <- abundance_matrix[,numeric_columns] } # If we want to define n based on the levels of a grouping variable... if ((opt$sample_file != '') && ( ! is.null(opt$grouping_variable))){ # Pick a minimum number of samples to pass threshold based on group size if (! opt$grouping_variable %in% colnames(samplesheet)){ stop(paste(opt$grouping_variable, "not in supplied sample sheet")) }else{ opt$minimum_samples <- min(table(samplesheet[[opt$grouping_variable]])) if ( opt$minimum_proportion > 0){ opt$minimum_samples <- opt$minimum_samples * opt$minimum_proportion } } }else if (opt$minimum_proportion > 0){ # Or if we want to define it based on a static proportion of the sample number opt$minimum_samples <- ncol(abundance_matrix) * opt$minimum_proportion } # Generate a boolean vector specifying the features to retain keep <- apply(abundance_matrix, 1, function(x){ sum(x > opt$minimum_abundance) >= opt$minimum_samples }) # Write out the matrix retaining the specified rows and re-prepending the # column with the feature identifiers prefix = ifelse('study' == 'null', '', 'study') write.table( data.frame(rownames(abundance_matrix)[keep], abundance_matrix[keep,,drop = FALSE]), file = paste0( prefix, '.filtered.tsv' ), col.names = c(feature_id_name, colnames(abundance_matrix)), row.names = FALSE, sep = ' ', quote = FALSE ) ################################################ ################################################ ## R SESSION INFO ## ################################################ ################################################ sink("R_sessionInfo.log") print(sessionInfo()) sink() ################################################ ################################################ ## VERSIONS FILE ## ################################################ ################################################ r.version <- strsplit(version[['version.string']], ' ')[[1]][3] writeLines( c( '"NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:CUSTOM_MATRIXFILTER":', paste(' r-base:', r.version) ), 'versions.yml') ################################################ ################################################ ################################################ ################################################Command exit status: 1Command output: (empty)Command error: Error: ES1_48_Control-lipid_0, ES2_48_Control-lipid_14, ES3_48_Control-lipid_21, ES4_164_Control-protein_0, ES5_164_Control-protein_14, ES6_164_Control-protein_21, ES7_SEM19-1354_Control-protein_14, ES8_SEM19-1354_Control-protein_21, ES9_172_Control-protein_0, ES10_172_Control-protein_14, ES11_172_Control-protein_21, ES12_209_MG-lipid_0, ES13_209_MG-lipid_14, ES14_209_MG-lipid_21, ES15_601_Control-protein_14, ES16_601_Control-protein_21, ES17_615_Control-protein_0, ES18_615_Control-protein_14, ES19_615_Control-protein_21, ES20_618_Control-lipid_0, ES21_618_Control-lipid_14, ES22_618_Control-lipid_21, ES23_Blue-010_Control-lipid_0, ES24_Blue-010_Control-lipid_21, ES25_619_MG-lipid_0, ES26_619_MG-lipid_14, ES27_619_MG-lipid_21, ES103_625_Control-lipid_0, ES104_625_Control-lipid_14, ES105_625_Control-lipid_21, ES31_Blue-055_MG-protein_0, ES32_Blue-055_MG-protein_14, ES33_101-LS_MG-lipid_0, ES34_101-LS_MG-lipid_14, ES35_101-LS_MG-lipid_21, ES36_AR17-21_MG-lipid_0, ES37_AR17-21_MG-lipid_14, Execution haltedWork dir: diffabundance/work/ed/3225f0361d750970c0c0439ac3407cTip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Description of the bug
Pipeline failing with input and matrix sample names including "-" character on CUSTOM_MATRIXFILTER step, error gives all names of samples with "-" ID. When I check the work folder for the process, the names in each of the files have had "-" changed to ".", but I think this might be being compared to the original names and failing?
If I change names with sed in input and matrix file to remove "-", the pipeline continues without error.
Command used and terminal output
Relevant files
command_files.tgz
System information
The text was updated successfully, but these errors were encountered: