Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zUMI parallel::mclapply AND 'strsplit': object 'GE' not found errors #408

Open
vishaa5 opened this issue Feb 11, 2025 · 1 comment
Open

Comments

@vishaa5
Copy link

vishaa5 commented Feb 11, 2025

Hi! I am trying to use zUMI on the example data provided (it was taking too long to process my data) and I keep encountering these issues:

Filtering...
Tue Feb 11 01:05:38 PM CET 2025
[1] "43 barcodes detected."
[1] "193783 reads were assigned to barcodes that do not correspond to intact cells."
Warning message:
Using size aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use linewidth instead.
Mapping...
[1] "2025-02-11 13:05:52 CET"
/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_install/STAR-2.7.11b/source/STAR --readFilesCommand /home/icb/bhavishya.nelakuditi/local/bin/samtools view -@ 1 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_index_49 --sjdbGTFfile /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/Homo_sapiens.GRCh38.109.gtf --runThreadN 1 --sjdbOverhang 49 --readFilesType SAM SE --readFilesIn /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/zUMIs_output/.tmpMerge//Example.Exampleaa.filtered.tagged.bam --outFileNamePrefix /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/Example.filtered.tagged.
STAR version: 2.7.11b compiled: Mon Feb 10 12:36:35 PM CET 2025 cpusrv93.scidom.de
Feb 11 13:05:54 ..... started STAR run
Feb 11 13:05:54 ..... loading genome
Feb 11 13:06:19 ..... processing annotations GTF
Feb 11 13:06:34 ..... inserting junctions into the genome indices
Feb 11 13:07:50 ..... started mapping
Feb 11 13:08:18 ..... finished mapping
Feb 11 13:08:18 ..... finished successfully
Tue Feb 11 01:08:18 PM CET 2025
Counting...
[1] "2025-02-11 13:08:44 CET"
[1] "9e+07 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/Example.final_annot.gtf"
[1] "Annotation loaded!"
[1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\
|| ||
|| Input files : 1 BAM file ||
|| S Example.filtered.tagged.Aligned.out.bam ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid2340542 ... ||
|| Features : 358361 ||
|| Meta-features : 62710 ||
|| Chromosomes/contigs : 47 ||
|| ||
|| Process BAM file Example.filtered.tagged.Aligned.out.bam... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 853296 ||
|| Successfully assigned alignments : 356617 (41.8%) ||
|| Running time : 0.07 minutes ||
|| ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4

//========================== featureCounts setting ===========================\
|| ||
|| Input files : 1 BAM file ||
|| S Example.filtered.tagged.Aligned.out.bam.ex ... ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid2340542 ... ||
|| Features : 238917 ||
|| Meta-features : 28537 ||
|| Chromosomes/contigs : 33 ||
|| ||
|| Process BAM file Example.filtered.tagged.Aligned.out.bam.ex.featureCou ... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 853296 ||
|| Successfully assigned alignments : 205712 (24.1%) ||
|| Running time : 0.07 minutes ||
|| ||
|| ||
\===================== http://subread.sourceforge.net/ ======================//

[1] "2025-02-11 13:14:36 CET"
[1] "Coordinate sorting final bam file..."
[1] "2025-02-11 13:14:41 CET"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 43 barcodes in this chunk..."
Error in FUN(X[[i]], ...) : unused argument (mc.cores = 1)
Calls: reads2genes_new -> lapply -> lapply
Execution halted
Tue Feb 11 01:14:43 PM CET 2025
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/zUMIs_output/expression/Example.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Tue Feb 11 01:15:02 PM CET 2025
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2025-02-11 13:15:02 CET"
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test/zUMIs_output/expression/Example.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Tue Feb 11 01:16:21 PM CET 2025

Here is the yaml file:
project: Example
sequence_files:
file1:
name: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/barcoderead_HEK.1mio.fq.gz
base_definition:
- BC(1-6)
- UMI(7-16)
file2:
name: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/cDNAread_HEK.1mio.fq.gz
base_definition:
- cDNA(1-50)
reference:
STAR_index: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_index_49
GTF_file: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/Homo_sapiens.GRCh38.109.gtf
additional_STAR_params: ''
additional_files: ~
out_dir: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zumi_example/run_test
num_threads: 1
mem_limit: 20
filter_cutoffs:
BC_filter:
num_bases: 1
phred: 20
UMI_filter:
num_bases: 1
phred: 20
barcodes:
barcode_num: ~
barcode_file: ~
automatic: yes
BarcodeBinning: 0
nReadsperCell: 100
counting_opts:
introns: yes
downsampling: '0'
strand: 0
Ham_Dist: 0
velocyto: no
primaryHit: yes
twoPass: no
make_stats: yes
which_Stage: Filtering
samtools_exec: /home/icb/bhavishya.nelakuditi/local/bin/samtools
pigz_exec: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/pigz-2.4/pigz
STAR_exec: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/star_install/STAR-2.7.11b/source/STAR
Rscript_exec: /home/icb/bhavishya.nelakuditi/miniconda3/bin/Rscript
zUMIs_directory: /ictstr01/home/icb/bhavishya.nelakuditi/bulk_seq/new_counts/zUMIs
read_layout: SE

I'm using Linux in a HPC cluster.

I keep getting the same errors when i implement it on my data too...

@mjoppich
Copy link

I think this is mostly the same error as already mentioned in #358 #375 .

The error GE not found originates possibly first from the reads2genes_new function, if no GE tags were found.
This can be solved by checking for the presence of respective columns in the data.table:

if (!("GE" %in% colnames(dt)))
{
  if (dim(dt)[1] == 0)
  {
    dt[,GE:=character()]
  } else {
    dt[["GE"]] = NA
  }
  
}
if (!("GEin" %in% colnames(dt)))
{
  if (dim(dt)[1] == 0)
  {
    dt[,GEin:=character()]
  } else {
    dt[["GEin"]] = NA
  }
}

This, however, leads to a second error invalid 'size' argument, which originates from the .sampleReads4collapsing function.

Within the if-statement one needs to change the data.table manipulation slightly, particularly the sample-call:

testf = function( x, s)
{
    #print(head(x))
    #print(head(s))

    if (length(s)== 0)
    {
        s=0
    } else {
        s=unique(s)[1]
    }

    return(sample(x,s))
}

subrcl = rcl[ ,exn:=.N,by=RG][, targetN:=exn][ n> nmax, targetN:=rbinom(1,nmax,mean(exn)/mean(n) ), by=RG][targetN>exn, targetN:=exn][is.na(targetN),targetN :=0]
rcl_res=rcl[ unlist(subrcl[ ,list(list(testf(.I , targetN))),by = RG]$V1) ]
return(rcl_res)

Upon fixing those two issues, zUMIs ran through in my case. @cziegenhain , should i create a pull-request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants