sc_long_pipeline.py--> ValueError: invalid contig `chr1` #37

jeneaadams · 2023-01-16T23:43:17Z

Hi, both my genome.fa and gff3 files use contig chr1. Is there support for this format or parameters I can set to solve this error?

Traceback (most recent call last):
File "PATH/TO/FLAMES/python/sc_long_pipeline.py", line 240, in
sc_long_pipeline(args)
File "PATH/TO/FLAMES/python/sc_long_pipeline.py", line 193, in sc_long_pipeline
raw_gff3=raw_splice_isoform if config_dict["global_parameters"]["generate_raw_isoform"] else None)
File "PATH/TO/FLAMES/python/sc_longread.py", line 1123, in group_bam2isoform
it_region = bamfile.fetch(ch, bl.s, bl.e)
File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 686, in pysam.libchtslib.HTSFile.parse_region
ValueError: invalid contig `chr1

The text was updated successfully, but these errors were encountered:

LuyiTian · 2023-01-17T04:32:51Z

Oh I think it is a pysam error which happens when reading the bam file. It is likely that your bam file is generated with a different reference without chr1 or there is no chr1 reads in your bam files so it give this error. you can check your bam file header using samtools view -H $your_bam_file.

jeneaadams · 2023-01-17T15:31:07Z

Here's my output. There seem to be chr1 reads.

@hd @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @sq @pg @pg @pg VN:1.6 SO:coordinate
SN:chr1 LN:248956422
SN:chr10 LN:133797422
SN:chr11 LN:135086622
SN:chr12 LN:133275309
SN:chr13 LN:114364328
SN:chr14 LN:107043718
SN:chr15 LN:101991189
SN:chr16 LN:90338345
SN:chr17 LN:83257441
SN:chr18 LN:80373285
SN:chr19 LN:58617616
SN:chr2 LN:242193529
SN:chr20 LN:64444167
SN:chr21 LN:46709983
SN:chr22 LN:50818468
SN:chr3 LN:198295559
SN:chr4 LN:190214555
SN:chr5 LN:181538259
SN:chr6 LN:170805979
SN:chr7 LN:159345973
SN:chr8 LN:145138636
SN:chr9 LN:138394717
SN:chrM LN:16569
SN:chrX LN:156040895
SN:chrY LN:57227415
SN:KI270728.1 LN:1872759
SN:KI270727.1 LN:448248
SN:KI270442.1 LN:392061
SN:KI270729.1 LN:280839
SN:GL000225.1 LN:211173
SN:KI270743.1 LN:210658
SN:GL000008.2 LN:209709
SN:GL000009.2 LN:201709
SN:KI270747.1 LN:198735
SN:KI270722.1 LN:194050
SN:GL000194.1 LN:191469
SN:KI270742.1 LN:186739
SN:GL000205.2 LN:185591
SN:GL000195.1 LN:182896
SN:KI270736.1 LN:181920
SN:KI270733.1 LN:179772
SN:GL000224.1 LN:179693
SN:GL000219.1 LN:179198
SN:KI270719.1 LN:176845
SN:GL000216.2 LN:176608
SN:KI270712.1 LN:176043
SN:KI270706.1 LN:175055
SN:KI270725.1 LN:172810
SN:KI270744.1 LN:168472
SN:KI270734.1 LN:165050
SN:GL000213.1 LN:164239
SN:GL000220.1 LN:161802
SN:KI270715.1 LN:161471
SN:GL000218.1 LN:161147
SN:KI270749.1 LN:158759
SN:KI270741.1 LN:157432
SN:GL000221.1 LN:155397
SN:KI270716.1 LN:153799
SN:KI270731.1 LN:150754
SN:KI270751.1 LN:150742
SN:KI270750.1 LN:148850
SN:KI270519.1 LN:138126
SN:GL000214.1 LN:137718
SN:KI270708.1 LN:127682
SN:KI270730.1 LN:112551
SN:KI270438.1 LN:112505
SN:KI270737.1 LN:103838
SN:KI270721.1 LN:100316
SN:KI270738.1 LN:99375
SN:KI270748.1 LN:93321
SN:KI270435.1 LN:92983
SN:GL000208.1 LN:92689
SN:KI270538.1 LN:91309
SN:KI270756.1 LN:79590
SN:KI270739.1 LN:73985
SN:KI270757.1 LN:71251
SN:KI270709.1 LN:66860
SN:KI270746.1 LN:66486
SN:KI270753.1 LN:62944
SN:KI270589.1 LN:44474
SN:KI270726.1 LN:43739
SN:KI270735.1 LN:42811
SN:KI270711.1 LN:42210
SN:KI270745.1 LN:41891
SN:KI270714.1 LN:41717
SN:KI270732.1 LN:41543
SN:KI270713.1 LN:40745
SN:KI270754.1 LN:40191
SN:KI270710.1 LN:40176
SN:KI270717.1 LN:40062
SN:KI270724.1 LN:39555
SN:KI270720.1 LN:39050
SN:KI270723.1 LN:38115
SN:KI270718.1 LN:38054
SN:KI270317.1 LN:37690
SN:KI270740.1 LN:37240
SN:KI270755.1 LN:36723
SN:KI270707.1 LN:32032
SN:KI270579.1 LN:31033
SN:KI270752.1 LN:27745
SN:KI270512.1 LN:22689
SN:KI270322.1 LN:21476
SN:GL000226.1 LN:15008
SN:KI270311.1 LN:12399
SN:KI270366.1 LN:8320
SN:KI270511.1 LN:8127
SN:KI270448.1 LN:7992
SN:KI270521.1 LN:7642
SN:KI270581.1 LN:7046
SN:KI270582.1 LN:6504
SN:KI270515.1 LN:6361
SN:KI270588.1 LN:6158
SN:KI270591.1 LN:5796
SN:KI270522.1 LN:5674
SN:KI270507.1 LN:5353
SN:KI270590.1 LN:4685
SN:KI270584.1 LN:4513
SN:KI270320.1 LN:4416
SN:KI270382.1 LN:4215
SN:KI270468.1 LN:4055
SN:KI270467.1 LN:3920
SN:KI270362.1 LN:3530
SN:KI270517.1 LN:3253
SN:KI270593.1 LN:3041
SN:KI270528.1 LN:2983
SN:KI270587.1 LN:2969
SN:KI270364.1 LN:2855
SN:KI270371.1 LN:2805
SN:KI270333.1 LN:2699
SN:KI270374.1 LN:2656
SN:KI270411.1 LN:2646
SN:KI270414.1 LN:2489
SN:KI270510.1 LN:2415
SN:KI270390.1 LN:2387
SN:KI270375.1 LN:2378
SN:KI270420.1 LN:2321
SN:KI270509.1 LN:2318
SN:KI270315.1 LN:2276
SN:KI270302.1 LN:2274
SN:KI270518.1 LN:2186
SN:KI270530.1 LN:2168
SN:KI270304.1 LN:2165
SN:KI270418.1 LN:2145
SN:KI270424.1 LN:2140
SN:KI270417.1 LN:2043
SN:KI270508.1 LN:1951
SN:KI270303.1 LN:1942
SN:KI270381.1 LN:1930
SN:KI270529.1 LN:1899
SN:KI270425.1 LN:1884
SN:KI270396.1 LN:1880
SN:KI270363.1 LN:1803
SN:KI270386.1 LN:1788
SN:KI270465.1 LN:1774
SN:KI270383.1 LN:1750
SN:KI270384.1 LN:1658
SN:KI270330.1 LN:1652
SN:KI270372.1 LN:1650
SN:KI270548.1 LN:1599
SN:KI270580.1 LN:1553
SN:KI270387.1 LN:1537
SN:KI270391.1 LN:1484
SN:KI270305.1 LN:1472
SN:KI270373.1 LN:1451
SN:KI270422.1 LN:1445
SN:KI270316.1 LN:1444
SN:KI270340.1 LN:1428
SN:KI270338.1 LN:1428
SN:KI270583.1 LN:1400
SN:KI270334.1 LN:1368
SN:KI270429.1 LN:1361
SN:KI270393.1 LN:1308
SN:KI270516.1 LN:1300
SN:KI270389.1 LN:1298
SN:KI270466.1 LN:1233
SN:KI270388.1 LN:1216
SN:KI270544.1 LN:1202
SN:KI270310.1 LN:1201
SN:KI270412.1 LN:1179
SN:KI270395.1 LN:1143
SN:KI270376.1 LN:1136
SN:KI270337.1 LN:1121
SN:KI270335.1 LN:1048
SN:KI270378.1 LN:1048
SN:KI270379.1 LN:1045
SN:KI270329.1 LN:1040
SN:KI270419.1 LN:1029
SN:KI270336.1 LN:1026
SN:KI270312.1 LN:998
SN:KI270539.1 LN:993
SN:KI270385.1 LN:990
SN:KI270423.1 LN:981
SN:KI270392.1 LN:971
SN:KI270394.1 LN:970
ID:minimap2 PN:minimap2 VN:2.24-r1122 CL:minimap2 -ax splice -k14 -t 1 --junc-bed /PATH/TO/gencode.v39.annotation.sorted.bed --secondary=no --sam-hit-only /PATH/TO/genome.fa PATH/TO//RPL41_204_205_5.fq.gz
ID:samtools PN:samtools PP:minimap2 VN:1.15.1 CL:samtools view -hb PATH/TO//RPL41_204_205_5.sam
ID:samtools.1 PN:samtools PP:samtools VN:1.15.1 CL:samtools sort -o PATH/TO//RPL41_204_205_5.sort.bam PATH/TO//RPL41_204_205_5.bam

jeneaadams · 2023-01-19T21:26:07Z

Hello,

Here's the structure of our input fastq from the match_cell_barcode step:

Here's the BAM file the full pipeline starts to output:

The transcript ID replaced the chromosome ID in the name file, and there's no chromosome contig. Also there's no read name, just a cell barcode as a read name in the bam file.

Usage for sc_long_pipeline.py:
~/FLAMES/python/sc_long_pipeline.py --gff3 ~/gencode.v39.annotation.gff3 --infq ~/out_pcr5_hq_fastq.gz --outdir ~/flames_test_MYL6/ --genomefa ~/gencode.v39.transcripts.fa --minimap2_dir ~/anaconda3/envs/FLAMES3/bin/

Original fastq before barcode match:

LuyiTian · 2023-01-22T04:52:27Z

The genomefa should be the fasta file of genome instead of transcript. For human the most recent primary assembly should be: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/GRCh38.primary_assembly.genome.fa.gz. other versions have similar naming convention.

The FLAMES software will generate the transcript fasta file, which contains the known transcript and the new transcript called from the bam file.

jeneaadams · 2023-01-25T22:29:57Z

I see, thanks. With the updated input, I get a new error:

Traceback (most recent call last):
File "~FLAMES_analysis/FLAMES/python/count_tr.py", line 159, in parse_realigned_bam
bc, umi = r.split("#")[0].split("_") # assume cleaned barcode
ValueError: not enough values to unpack (expected 2, got 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "~FLAMES/python/sc_long_pipeline.py", line 240, in
sc_long_pipeline(args)
File "~FLAMES/python/sc_long_pipeline.py", line 225, in sc_long_pipeline
realign_bam, transcript_fa_idx, config_dict["isoform_parameters"]["Min_sup_cnt"], config_dict["transcript_counting"]["min_tr_coverage"], config_dict["transcript_counting"]["min_read_coverage"])
File "~FLAMES/python/count_tr.py", line 163, in parse_realigned_bam
"Please check if barcode and UMI are delimited by "_"")
ValueError: Please check if barcode and UMI are delimited by "__"

Our barcode is the first 16nt and the umi is the next 10nt that follows.

The fastq I provided for --infq is the binary fastq file from the first barcode matching step. Can you also confirm this should be a binary file generated from match_cell_barcode?

ChangqingW · 2023-01-30T00:46:22Z

Yes, it is expecting fastq files outputted by match_cell_barcode, where the identifier field has the format of @[barcode]_[umi]#[original id]

yuntianf · 2023-02-03T16:36:58Z

But this fastq file couldn't be viewed by less, it's a biniary file. And it couldn't be aligned with minimap2 either, here is the error log:

Input parameters:
        gene annotation: /home/reference/annotation/human/gencode.v39.annotation.gff3
        genome fasta: /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa
        input fastq: /home/FLAMES/out.fastq
        output directory: /home/FLAMES/
        directory contains minimap2: /home/software/minimap2-2.24_x64-linux/
### align reads to genome using minimap2 2023-02-03 11:25:22
b''
Traceback (most recent call last):
  File "./python/sc_long_pipeline.py", line 240, in <module>
    sc_long_pipeline(args)
  File "./python/sc_long_pipeline.py", line 168, in sc_long_pipeline
    seed=config_dict["alignment_parameters"]["seed"])
  File "/home/software/FLAMES/python/minimap2_align.py", line 41, in minimap2_align
    shell=True, stderr=subprocess.STDOUT))
  File "/home/software/anaconda3/envs/py3.7/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/home/software/anaconda3/envs/py3.7/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/software/minimap2-2.24_x64-linux/minimap2 -ax splice -t 12 --junc-bed /home/FLAMES/tmp.splice_anno.bed12 --junc-bonus 1  -k14 --secondary=no --seed 2022 /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa /home/FLAMES//out.fastq | samtools view -bS -@ 4 -m 2G -o /home/FLAMES/tmp.alig
n.bam -  ']' returned non-zero exit status 1.

Here is the output statistic for match_cell_barcode, which may help:

###total read: 26292
###barcode hm match: 13191
###barcode fuzzy match: 2127
###barcode not match: 10974
###too short: 0

LuyiTian · 2023-02-11T14:21:11Z

@dontwantcode there seem to be an extra / in the minimap code: /home/FLAMES//out.fastq

subprocess.CalledProcessError: Command '['/home/software/minimap2-2.24_x64-linux/minimap2 -ax splice -t 12 --junc-bed /home/FLAMES/tmp.splice_anno.bed12 --junc-bonus 1  -k14 --secondary=no --seed 2022 /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa /home/FLAMES//out.fastq | samtools view -bS -@ 4 -m 2G -o /home/FLAMES/tmp.alig
n.bam -  ']'

yuntianf · 2023-02-12T03:14:34Z

Hi Luyi,
I don't think it should be the problem of \\, which is equivalent as \ in linux path.
Here is a verification:

Input parameters:
        gene annotation: /home/reference/annotation/human/gencode.v39.annotation.gff3
        genome fasta: /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa
        input fastq: /home/FLAMES/out.fastq
        output directory: /home/FLAMES
        directory contains minimap2: /home/software/minimap2-2.24_x64-linux/
### align reads to genome using minimap2 2023-02-11 22:09:24
b''
Traceback (most recent call last):
  File "./python/sc_long_pipeline.py", line 240, in <module>
    sc_long_pipeline(args)
  File "./python/sc_long_pipeline.py", line 168, in sc_long_pipeline
    seed=config_dict["alignment_parameters"]["seed"])
  File "/home/software/FLAMES/python/minimap2_align.py", line 41, in minimap2_align
    shell=True, stderr=subprocess.STDOUT))
  File "/home/software/anaconda3/envs/py3.7/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/home/software/anaconda3/envs/py3.7/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/software/minimap2-2.24_x64-linux/minimap2 -ax splice -t 12 --junc-bed /home/FLAMES/tmp.splice_anno.bed12 --junc-bonus 1  -k14 --secondary=no --seed 2022 /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa /home/FLAMES/out.fastq | samtools view -bS -@ 4 -m 2G -o /home/FLAMES/tmp.align
.bam -  ']' returned non-zero exit status 1.

ChangqingW · 2023-02-12T22:06:25Z

But this fastq file couldn't be viewed by less, it's a biniary file. And it couldn't be aligned with minimap2 either, here is the error log:

Input parameters:
        gene annotation: /home/reference/annotation/human/gencode.v39.annotation.gff3
        genome fasta: /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa
        input fastq: /home/FLAMES/out.fastq
        output directory: /home/FLAMES/
        directory contains minimap2: /home/software/minimap2-2.24_x64-linux/
### align reads to genome using minimap2 2023-02-03 11:25:22
b''
Traceback (most recent call last):
  File "./python/sc_long_pipeline.py", line 240, in <module>
    sc_long_pipeline(args)
  File "./python/sc_long_pipeline.py", line 168, in sc_long_pipeline
    seed=config_dict["alignment_parameters"]["seed"])
  File "/home/software/FLAMES/python/minimap2_align.py", line 41, in minimap2_align
    shell=True, stderr=subprocess.STDOUT))
  File "/home/software/anaconda3/envs/py3.7/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/home/software/anaconda3/envs/py3.7/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/software/minimap2-2.24_x64-linux/minimap2 -ax splice -t 12 --junc-bed /home/FLAMES/tmp.splice_anno.bed12 --junc-bonus 1  -k14 --secondary=no --seed 2022 /home/reference/refdata-gex-GRCh38-2020-A/fasta/genome.fa /home/FLAMES//out.fastq | samtools view -bS -@ 4 -m 2G -o /home/FLAMES/tmp.alig
n.bam -  ']' returned non-zero exit status 1.

Here is the output statistic for match_cell_barcode, which may help:

###total read: 26292
###barcode hm match: 13191
###barcode fuzzy match: 2127
###barcode not match: 10974
###too short: 0

Should be a gzipped fastq, could you check with zcat? Or maybe post the first line of xxd out.fastq

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sc_long_pipeline.py--> ValueError: invalid contig `chr1` #37

sc_long_pipeline.py--> ValueError: invalid contig `chr1` #37

jeneaadams commented Jan 16, 2023

LuyiTian commented Jan 17, 2023

jeneaadams commented Jan 17, 2023

jeneaadams commented Jan 19, 2023 •

edited

Loading

LuyiTian commented Jan 22, 2023

jeneaadams commented Jan 25, 2023 •

edited

Loading

ChangqingW commented Jan 30, 2023

yuntianf commented Feb 3, 2023

LuyiTian commented Feb 11, 2023

yuntianf commented Feb 12, 2023

ChangqingW commented Feb 12, 2023

sc_long_pipeline.py--> ValueError: invalid contig chr1 #37

sc_long_pipeline.py--> ValueError: invalid contig chr1 #37

Comments

jeneaadams commented Jan 16, 2023

LuyiTian commented Jan 17, 2023

jeneaadams commented Jan 17, 2023

jeneaadams commented Jan 19, 2023 • edited Loading

LuyiTian commented Jan 22, 2023

jeneaadams commented Jan 25, 2023 • edited Loading

ChangqingW commented Jan 30, 2023

yuntianf commented Feb 3, 2023

LuyiTian commented Feb 11, 2023

yuntianf commented Feb 12, 2023

ChangqingW commented Feb 12, 2023

sc_long_pipeline.py--> ValueError: invalid contig `chr1` #37

sc_long_pipeline.py--> ValueError: invalid contig `chr1` #37

jeneaadams commented Jan 19, 2023 •

edited

Loading

jeneaadams commented Jan 25, 2023 •

edited

Loading