Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIRTOP_STATS IndexError #477

Open
anastasiaprime opened this issue Oct 15, 2024 · 18 comments
Open

MIRTOP_STATS IndexError #477

anastasiaprime opened this issue Oct 15, 2024 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@anastasiaprime
Copy link

anastasiaprime commented Oct 15, 2024

Description of the bug

Hello!

I'm trying to process my small rnaseq data using only R1 reads and always get the same error. What could it be? nextflow.log

I use smrnaseq v2.4.0, nextflow version 24.04.4
I tried dev version, but had the same error.
Command line: nextflow run nf-core/smrnaseq -profile docker --input samplesheet_1.csv --outdir Results_R1_test --fasta /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa --mirgenedb true --mirgenedb_species Hsa --mirgenedb_mature /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/miRNA/hsa.fas --mirgenedb_hairpin /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/miRNA/hsa-pre.fas --mirgenedb_gff /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/miRNA/hsa.gff --mirtrace_species hsa -c config
Config only for resources (max_cpus, max_memory)
Error:
`ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (770902000404_S5)'

Caused by:
Process NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (770902000404_S5) terminated with an error exit status (1)

Command executed:

mirtop
stats

--out stats
770902000404_S5_mirtop.gff

cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS":
mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
END_VERSIONS

Command exit status:
1

Command output:
['stats', '--out', 'stats', '770902000404_S5_mirtop.gff']
Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
/opt/conda/lib/python3.12/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a
future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you
still need the Bio.pairwise2 module.
warnings.warn(
10/15/2024 09:29:02 INFO Run stats.
10/15/2024 09:29:02 INFO Reading: 770902000404_S5_mirtop.gff
Traceback (most recent call last):
File "/opt/conda/bin/mirtop", line 10, in
sys.exit(main())
^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/command_line.py", line 34, in main
['stats', '--out', 'stats', '770902000404_S5_mirtop.gff']
stats(kwargs["args"])
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 38, in stats
out.append(_calc_stats(fn))
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 82, in _calc_stats
df = _summary(lines)
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 130, in _summary
df_sum = _add_missing(df_sum)
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 110, in _add_missing
df2 = pd.DataFrame({'category': category, 'sample': df['sample'].iat[0], 'counts': 0}, index=[0])
~~~~~~~~~~~~~~~~^^^
File "/opt/conda/lib/python3.12/site-packages/pandas/core/indexing.py", line 2527, in getitem
return self.obj._get_value(*key, takeable=self._takeable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/pandas/core/series.py", line 1234, in _get_value
return self._values[label]
~~~~~~~~~~~~^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0
Work dir:
/mnt/cephfs8_rw/oncology/miRNA/220118_VH00195_67_AAAHV32M5_fastq4/work/39/2e514730d2d4ad416afc2023156668

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details

`

Command used and terminal output

No response

Relevant files

No response

System information

No response

@anastasiaprime anastasiaprime added the bug Something isn't working label Oct 15, 2024
@atrigila
Copy link
Contributor

Hi! Thank you for reporting this bug. I think I know where the issue is and I am working on a solution. Just to confirm, could you please run the same command but without the --mirtrace_species hsa flag? Thank you!

@anastasiaprime
Copy link
Author

Hi @atrigila ! I run the command as you asked and the pipeline completed successfully, but Mirtrace and Mirtop didn't run
Image

@atrigila
Copy link
Contributor

atrigila commented Oct 16, 2024

The mirtop step requires both the mirtrace_species and its corresponding mirtrace gff file. If mirtrace_species is provided but the mirtrace gff is not supplied, the pipeline still attempts to run mirtop, which results in an error. This occurs because mirtop expects input data that depends on the presence of a valid GFF file for the specified mirtrace_species. Without this, the tool cannot properly process the data and fails.

mirtop was also not available for runs with using mirgenedb even in previous versions of the pipeline v2.3.1, which required --mirtrace_species to be present, for example:

if (params.mirtrace_species){
MIRTOP_QUANT ( BOWTIE_MAP_SEQCLUSTER.out.bam.collect{it[1]}, FORMAT_HAIRPIN.out.formatted_fasta.collect{it[1]}, gtf )
ch_mirtop_logs = MIRTOP_QUANT.out.logs
ch_versions = ch_versions.mix(MIRTOP_QUANT.out.versions)
TABLE_MERGE ( MIRTOP_QUANT.out.mirtop_table )
ch_versions = ch_versions.mix(TABLE_MERGE.out.versions)
}

This same behavior can be reproduced in older versions of the pipeline (nextflow run smnrnaseq -profile test_mirgenedb,docker --outdir test_mirgenedb_old --mirtrace_species hsa, commit id f8fd872034e214fe922118275cdfdf6e498a7f5c)

I will update documentation to clearly state that mirtop supports mirtrace inputs only and emit warnings in the code. I also contacted mirtop developers to see if there is a workaround using mirgenedb inputs. I'll add in this issue if I have any updates.

@nschcolnicov nschcolnicov mentioned this issue Nov 11, 2024
11 tasks
@nschcolnicov nschcolnicov closed this as completed by moving to Done in smrnaseq Nov 11, 2024
@nschcolnicov nschcolnicov mentioned this issue Nov 12, 2024
11 tasks
@atrigila
Copy link
Contributor

@anastasiaprime the issue should be solved now, but let us know if you have any additional questions. Just take into account that when using MirgeneDB inputs mirtop is hard-coded to use the pre sequences, which originate from the hairpin FASTA, rather than the pri sequences, which come from the mature FASTA. Users must provide pre files from the start to ensure consistency between the FASTA and GFF files, as the coordinates in the GFF file are referenced to pre sequences. This also ensures that names in the BAM file will match those in the GFF.

@epfarias
Copy link

epfarias commented Dec 24, 2024

Hello, happy holidays to all!!

I'm going through the same problem; however, I'm not using the mirGeneDB information, I'm using the flag --genome GRCh38 and putting the flag --mirtrace_species hsa and also the flag mirna_gtf hsa.gff3, which I downloaded from the miRBase. I've tried to run without the mirna_gtf flag but returned an error saying the flag is necessary to run the pipeline.

I'm using the smrnaseq v2.4.0 and nextflow version 24.10.1 build 5930.

Command line: nextflow run nf-core/smrnaseq --input ./samplesheet.csv --outdir ./results --genome GRCh38 -profile singularity --mirtrace_species hsa --mirna_gtf hsa.gff3 -with-tower

The error returned by the pipeline:

> The exit status of the task that caused the workflow execution to fail was: 1
> 
> Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (SRR7419121)'
> 
> Caused by:
>   Process `NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (SRR7419121)` terminated with an error exit status (1)
> 
> 
> Command executed:
> 
>   mirtop \
>       stats \
>        \
>       --out stats \
>       SRR7419121_mirtop.gff
>   
>   cat <<-END_VERSIONS > versions.yml
>   "NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS":
>       mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
>   END_VERSIONS
> 
> Command exit status:
>   1
> 
> Command output:
>   ['stats', '--out', 'stats', 'SRR7419121_mirtop.gff']
> 
> Command error:
>   /opt/conda/lib/python3.12/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
>     warnings.warn(
>   12/23/2024 11:59:59 INFO Run stats.
>   12/23/2024 11:59:59 INFO Reading: SRR7419121_mirtop.gff
>   Traceback (most recent call last):
>     File "/opt/conda/bin/mirtop", line 10, in <module>
>       sys.exit(main())
>                ^^^^^^
>     File "/opt/conda/lib/python3.12/site-packages/mirtop/command_line.py", line 34, in main
>       stats(kwargs["args"])
>     File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 38, in stats
>       out.append(_calc_stats(fn))
>                  ^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 82, in _calc_stats
>       df = _summary(lines)
>            ^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 130, in _summary
>       df_sum = _add_missing(df_sum)
>                ^^^^^^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 110, in _add_missing
>       df2 = pd.DataFrame({'category': category, 'sample': df['sample'].iat[0], 'counts': 0}, index=[0])
>                                                           ~~~~~~~~~~~~~~~~^^^
>     File "/opt/conda/lib/python3.12/site-packages/pandas/core/indexing.py", line 2527, in __getitem__
>       return self.obj._get_value(*key, takeable=self._takeable)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.12/site-packages/pandas/core/series.py", line 1234, in _get_value
>       return self._values[label]
>              ~~~~~~~~~~~~^^^^^^^
>   IndexError: index 0 is out of bounds for axis 0 with size 0
> 
> Work dir:
>   /mnt/beegfs/scratch/eddffilho/Collabs/Thyroid_miRNA/Raw_data/work/84/1ae41e3a345a6ea55cd999fc2df43e
> 
> Container:
>   /home/eddffilho/scratch/singularity_images/community.wave.seqera.io-library-mirtop_pybedtools_pysam_samtools_pruned-60b8208f3dbb2910.img
> 
> Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

@epfarias epfarias reopened this Dec 25, 2024
@atrigila
Copy link
Contributor

atrigila commented Dec 26, 2024

Hi @epfarias , I think I was able to replicate your error using public data:
nextflow run smrnaseq/ -profile singularity --outdir here --genome GRCh38 --input https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_skipfastp.csv --mirna_gtf https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hsa.gff3

Error:

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (Clone1_N1)'

Caused by:
  Process `NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (Clone1_N1)` terminated with an error exit status (1)


Command executed:

  mirtop \
      stats \
       \
      --out stats \
      Clone1_N1_mirtop.gff
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS":
      mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  WARNING: Error getting executable path, using default: readlink /proc/self/exe: no such file or directory
  .command.run: line 150: /proc/cpuinfo: No such file or directory
  .command.run: line 151: /proc/cpuinfo: No such file or directory
  grep: /proc/stat: No such file or directory
  .command.run: line 162: /dev/fd/63: No such file or directory
  .command.run: line 162: mem_proc: unbound variable

Work dir:
  /workspace/work/69/2c3d5a589b641e0b2c7ea57d09a857

Container:
  /workspace/work/singularity/community.wave.seqera.io-library-mirtop_pybedtools_pysam_samtools_pruned-60b8208f3dbb2910.img

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

 -- Check '.nextflow.log' file for details

The error is not exactly the same. We will have a look into this and add any updates in this ticket.

@atrigila
Copy link
Contributor

@epfarias please try running the latest dev version to see if you still encounter the error:
nextflow run nf-core/smrnaseq --input ./samplesheet.csv --outdir ./results --genome GRCh38 -profile singularity --mirtrace_species hsa --mirna_gtf hsa.gff3 -with-tower -r dev

@epfarias
Copy link

Hi @atrigila, thank you for the agility in the response, but unfortunately, the error remains. I've checked the data to see if it has an issue, but it is ok.

@atrigila
Copy link
Contributor

I opened a new branch with a small change in the singularity container. This does not raise the error anymore for me with public data:
Could you please try with:

  1. nextflow run nf-core/smrnaseq -profile singularity --outdir here --genome GRCh38 --input https://github.com/nf-core/test-d atasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_skipfastp.csv --mirna_gtf https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hsa.gff3 -r issue_477
  2. If that works, then replace that with your data and let me know how it goes.
  3. I noticed you did not add a protocol profile (https://nf-co.re/smrnaseq/2.4.0/docs/usage/#introduction), which is required for the pipeline to run. You could also test what happens adding that: -profile illumina,singularity.

@epfarias
Copy link

epfarias commented Jan 4, 2025

Hey Anabella,

Happy New Year!! So, I tested the first option and the pipeline worked completely. However, when I used my data the error happened again, I also inserted the illumina information that I was not using in the profile flag. It follows the command I have used:

nextflow run nf-core/smrnaseq -profile illumina,singularity --outdir results/ --genome GRCh38 --input samplesheet.csv --mirna_gtf https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/hsa.gff3 --mirtrace_species hsa -with-tower -r issue_477

Error report :

> The exit status of the task that caused the workflow execution to fail was: 1
> 
> Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (SRR7419066)'
> 
> Caused by:
>   Process `NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (SRR7419066)` terminated with an error exit status (1)
> 
> 
> Command executed:
> 
>   mirtop \
>       stats \
>        \
>       --out stats \
>       SRR7419066_mirtop.gff
>   
>   cat <<-END_VERSIONS > versions.yml
>   "NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS":
>       mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
>   END_VERSIONS
> 
> Command exit status:
>   1
> 
> Command output:
>   ['stats', '--out', 'stats', 'SRR7419066_mirtop.gff']
> 
> Command error:
>   /opt/conda/lib/python3.11/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you still need the Bio.pairwise2 module.
>     warnings.warn(
>   01/04/2025 02:06:50 INFO Run stats.
>   01/04/2025 02:06:50 INFO Reading: SRR7419066_mirtop.gff
>   Traceback (most recent call last):
>     File "/opt/conda/bin/mirtop", line 8, in <module>
>       sys.exit(main())
>                ^^^^^^
>     File "/opt/conda/lib/python3.11/site-packages/mirtop/command_line.py", line 34, in main
>       stats(kwargs["args"])
>     File "/opt/conda/lib/python3.11/site-packages/mirtop/gff/stats.py", line 38, in stats
>       out.append(_calc_stats(fn))
>                  ^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.11/site-packages/mirtop/gff/stats.py", line 82, in _calc_stats
>       df = _summary(lines)
>            ^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.11/site-packages/mirtop/gff/stats.py", line 130, in _summary
>       df_sum = _add_missing(df_sum)
>                ^^^^^^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.11/site-packages/mirtop/gff/stats.py", line 110, in _add_missing
>       df2 = pd.DataFrame({'category': category, 'sample': df['sample'].iat[0], 'counts': 0}, index=[0])
>                                                           ~~~~~~~~~~~~~~~~^^^
>     File "/opt/conda/lib/python3.11/site-packages/pandas/core/indexing.py", line 2527, in __getitem__
>       return self.obj._get_value(*key, takeable=self._takeable)
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     File "/opt/conda/lib/python3.11/site-packages/pandas/core/series.py", line 1234, in _get_value
>       return self._values[label]
>              ~~~~~~~~~~~~^^^^^^^
>   IndexError: index 0 is out of bounds for axis 0 with size 0
> 
> Work dir:
>   /mnt/beegfs/scratch/eddffilho/Collabs/Thyroid_miRNA/Raw_data/work/5d/7ad18cb2310336648584acf697764b
> 
> Container:
>   /home/eddffilho/scratch/singularity_images/community-cr-prod.seqera.io-docker-registry-v2-blobs-sha256-28-28ece5ab35c2432bf6f360682f58d4245aec76a0cbab3879478f44d248df0205-data.img
> 
> Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

@atrigila
Copy link
Contributor

atrigila commented Jan 7, 2025

Hi, thank you for testing the suggested options.
It might be an issue with your samples. You could troubleshoot it with the following:

  1. Check the work dir of mirtop_stats /mnt/beegfs/scratch/eddffilho/Collabs/Thyroid_miRNA/Raw_data/work/5d/7ad18cb2310336648584acf697764b and check if the .gff is empty. If it is empty, check the previous work dirs, for example, that of mirtop_gff and see if it's generating empty files. Check the directory to see which command was run and with which files:
    mirtop \\
        gff \\
        $args \\
        --sps $species \\
        --hairpin $hairpin \\
        --gtf $gtf \\
        -o mirtop \\
        $bam
  1. In that work dir, compare the input files (hairpin, gtf, bam) and other arguments (args, species) from your run to the example with public files that worked for you. Are they the same, do all of these files have the same structure?
  2. Test with an older release of the pipeline (e.g. https://nf-co.re/smrnaseq/2.3.1/docs/usage/), does the same error happen? Previous releases had a different structure with parameters (see example here: https://nf-co.re/smrnaseq/2.3.1/#usage), so you will have to adapt your run command. It is not guaranteed to work, though, as many bugs were fixed in the latest release.

@lpantano
Copy link
Contributor

Catching on this as well. Can you share the top lines of SRR7419066_mirtop.gff?

@epfarias
Copy link

epfarias commented Jan 14, 2025

Hey @lpantano , I cleaned my work directory, right after the message from @atrigila , because I was focused on understanding the data, once the problem could be with my samples. I reran the pipeline with the same samples to show you the .gff file. I've attached the print of the view from nano with the sample SRR7419086. The command as follows was used, to reproduce the error I've used an amount of 8 samples from the original 360 samples.

nextflow run nf-core/smrnaseq -profile illumina,singularity --outdir results/ --genome GRCh38 --input scratch/Collabs/Thyroid_miRNA/Raw_data/samplesheet_2.csv --mirtrace_species hsa -with-tower -r issue_477

Image

I do believe there is some issue with the data I'm using, maybe a problem with the barcode, because this data was initially multiplexed to do the sequencing, and the researchers uploaded the data demultiplexed without the barcode information or other additional information. I don't know, still trying to figure out what could have happened. It follows the screenshot right before the error:

Image

@lpantano
Copy link
Contributor

ok, thanks. I see the GFF is empty so there is a problem at this step: MIRTOP_GFF. Can you check the output of that process and having the log would be great, if you can go in that working directory and share the log. I believe we can get more information from there. It seems the issue is no good mapping to miRNAs, that could be easily due to the demultiplexing step. But maybe I can see more from the mirtop_gff process.

@epfarias
Copy link

So, the MIRTOP_GFF step presents the following results within the directory

Image

the .fa and .gff3 files it's as expected, however, the SRR7419086.bam and the SRR7419086_sort.bam it's weird, the screenshots are attached.

SRR7419086.bam
Image

SRR7419086_sort.bam
Image

The .log files are also attached.

run.log
trace.log
.command.log

@lpantano
Copy link
Contributor

the content of the run.log shows there is no miRNA sequence detected. Can you run this command on the bam file:

samtools idxstats SRR7419086.bam

@epfarias
Copy link

epfarias commented Jan 15, 2025

Unfortunately, another error, does not find/load the index for this bam file.

Image

It is my first time working with demultiplexed data, and I don't know what to do to solve this issue to preprocess this data. I've read some articles about what to do with multiplexed/demultiplexed data, and they said the same thing: it is necessary the barcode information associated with the multiplexing process. However, this information was not released by the data publisher.

They described very superficially the cDNA library construction and the alignment process; I looked at the data structure to see if something was wrong or abnormal, except the high amount of polyA tail but the fastp must deal with this, and I did not identify anything. Follows the screenshot of the fastq file for the SRR7419086 sample, with the first 12 reads.

Image

If you could help me, and give me some insights into what it could be, I'd be very happy.

@lpantano
Copy link
Contributor

Thanks for the extra information. You can do samtools sort SRR7419086.bam > SRR7419086.sort.bam and then samtools index SRR7419086.sort.bam and then the idxstats of the sorted file.

As well, you can do a grep of a common miRNA, for instance, let7-c or let7-b, with the raw fastq and the trimmed fastq to see where the mature sequence of the miRNA falls in the reads you have when you start and after trimming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

5 participants