zUMI parallel::mclapply AND 'strsplit': object 'GE' not found errors #408

vishaa5 · 2025-02-11T12:17:44Z

Hi! I am trying to use zUMI on the example data provided (it was taking too long to process my data) and I keep encountering these issues:

Filtering...
Tue Feb 11 01:05:38 PM CET 2025
[1] "43 barcodes detected."
[1] "193783 reads were assigned to barcodes that do not correspond to intact cells."
Warning message:
Using size aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use linewidth instead.
Mapping...
[1] "2025-02-11 13:05:52 CET"
/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_install/STAR-2.7.11b/source/STAR --readFilesCommand /home/icb/bhavishya.nelakuditi/local/bin/samtools view -@ 1 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_index_49 --sjdbGTFfile /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/Homo_sapiens.GRCh38.109.gtf --runThreadN 1 --sjdbOverhang 49 --readFilesType SAM SE --readFilesIn /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/zUMIs_output/.tmpMerge//Example.Exampleaa.filtered.tagged.bam --outFileNamePrefix /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/Example.filtered.tagged.
STAR version: 2.7.11b compiled: Mon Feb 10 12:36:35 PM CET 2025 cpusrv93.scidom.de
Feb 11 13:05:54 ..... started STAR run
Feb 11 13:05:54 ..... loading genome
Feb 11 13:06:19 ..... processing annotations GTF
Feb 11 13:06:34 ..... inserting junctions into the genome indices
Feb 11 13:07:50 ..... started mapping
Feb 11 13:08:18 ..... finished mapping
Feb 11 13:08:18 ..... finished successfully
Tue Feb 11 01:08:18 PM CET 2025
Counting...
[1] "2025-02-11 13:08:44 CET"
[1] "9e+07 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/Example.final_annot.gtf"
[1] "Annotation loaded!"
[1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\
|| ||
|| Input files : 1 BAM file ||
|| S Example.filtered.tagged.Aligned.out.bam ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid2340542 ... ||
|| Features : 358361 ||
|| Meta-features : 62710 ||
|| Chromosomes/contigs : 47 ||
|| ||
|| Process BAM file Example.filtered.tagged.Aligned.out.bam... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 853296 ||
|| Successfully assigned alignments : 356617 (41.8%) ||
|| Running time : 0.07 minutes ||
|| ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\
|| ||
|| Input files : 1 BAM file ||
|| S Example.filtered.tagged.Aligned.out.bam.ex ... ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid2340542 ... ||
|| Features : 238917 ||
|| Meta-features : 28537 ||
|| Chromosomes/contigs : 33 ||
|| ||
|| Process BAM file Example.filtered.tagged.Aligned.out.bam.ex.featureCou ... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 853296 ||
|| Successfully assigned alignments : 205712 (24.1%) ||
|| Running time : 0.07 minutes ||
|| ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

[1] "2025-02-11 13:14:36 CET"
[1] "Coordinate sorting final bam file..."
[1] "2025-02-11 13:14:41 CET"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 43 barcodes in this chunk..."
Error in FUN(X[[i]], ...) : unused argument (mc.cores = 1)
Calls: reads2genes_new -> lapply -> lapply
Execution halted
Tue Feb 11 01:14:43 PM CET 2025
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/zUMIs_output/expression/Example.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Tue Feb 11 01:15:02 PM CET 2025
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2025-02-11 13:15:02 CET"
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/zUMIs_output/expression/Example.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Tue Feb 11 01:16:21 PM CET 2025

Here is the yaml file:
project: Example
sequence_files:
file1:
name: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/barcoderead_HEK.1mio.fq.gz
base_definition:
- BC(1-6)
- UMI(7-16)
file2:
name: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/cDNAread_HEK.1mio.fq.gz
base_definition:
- cDNA(1-50)
reference:
STAR_index: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_index_49
GTF_file: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/Homo_sapiens.GRCh38.109.gtf
additional_STAR_params: ''
additional_files: ~
out_dir: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test
num_threads: 1
mem_limit: 20
filter_cutoffs:
BC_filter:
num_bases: 1
phred: 20
UMI_filter:
num_bases: 1
phred: 20
barcodes:
barcode_num: ~
barcode_file: ~
automatic: yes
BarcodeBinning: 0
nReadsperCell: 100
counting_opts:
introns: yes
downsampling: '0'
strand: 0
Ham_Dist: 0
velocyto: no
primaryHit: yes
twoPass: no
make_stats: yes
which_Stage: Filtering
samtools_exec: /home/icb/bhavishya.nelakuditi/local/bin/samtools
pigz_exec: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/pigz-2.4/pigz
STAR_exec: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_install/STAR-2.7.11b/source/STAR
Rscript_exec: /home/icb/bhavishya.nelakuditi/miniconda3/bin/Rscript
zUMIs_directory: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zUMIs
read_layout: SE

I'm using Linux in a HPC cluster.

I keep getting the same errors when i implement it on my data too...

The text was updated successfully, but these errors were encountered:

mjoppich · 2025-02-11T20:58:01Z

I think this is mostly the same error as already mentioned in #358 #375 .

The error GE not found originates possibly first from the reads2genes_new function, if no GE tags were found.
This can be solved by checking for the presence of respective columns in the data.table:

if (!("GE" %in% colnames(dt)))
{
  if (dim(dt)[1] == 0)
  {
    dt[,GE:=character()]
  } else {
    dt[["GE"]] = NA
  }
  
}
if (!("GEin" %in% colnames(dt)))
{
  if (dim(dt)[1] == 0)
  {
    dt[,GEin:=character()]
  } else {
    dt[["GEin"]] = NA
  }
}

This, however, leads to a second error invalid 'size' argument, which originates from the .sampleReads4collapsing function.

Within the if-statement one needs to change the data.table manipulation slightly, particularly the sample-call:

testf = function( x, s)
{
    #print(head(x))
    #print(head(s))

    if (length(s)== 0)
    {
        s=0
    } else {
        s=unique(s)[1]
    }

    return(sample(x,s))
}

subrcl = rcl[ ,exn:=.N,by=RG][, targetN:=exn][ n> nmax, targetN:=rbinom(1,nmax,mean(exn)/mean(n) ), by=RG][targetN>exn, targetN:=exn][is.na(targetN),targetN :=0]
rcl_res=rcl[ unlist(subrcl[ ,list(list(testf(.I , targetN))),by = RG]$V1) ]
return(rcl_res)

Upon fixing those two issues, zUMIs ran through in my case. @cziegenhain , should i create a pull-request?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zUMI parallel::mclapply AND 'strsplit': object 'GE' not found errors #408

zUMI parallel::mclapply AND 'strsplit': object 'GE' not found errors #408

vishaa5 commented Feb 11, 2025

mjoppich commented Feb 11, 2025

zUMI parallel::mclapply AND 'strsplit': object 'GE' not found errors #408

zUMI parallel::mclapply AND 'strsplit': object 'GE' not found errors #408

Comments

vishaa5 commented Feb 11, 2025

mjoppich commented Feb 11, 2025