Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: no @SQ lines will be outputted #58

Closed
kanchanhole opened this issue Nov 13, 2017 · 28 comments
Closed

Warning: no @SQ lines will be outputted #58

kanchanhole opened this issue Nov 13, 2017 · 28 comments

Comments

@kanchanhole
Copy link

I am working on Oxford nanopore reads(1D). I am using Minimap2 for aligning it to the human genome. I made an index file for the human genome to be used as a reference. when I started the alignment I got following warning.
"For a multi-part index, no @sq lines will be outputted"
The alignment is finished. The .sam file does not contain any @sq headers, as a result, I could not get any mapping statistics using samtools flagstat.
The command I used for building index:

minimap2 -d ref.mmi ref.fa

The command I used for alignment:

minimap2 -ax map-ont ref.fa ont-reads.fa > aln.sam
I am not sure why I am getting this warning.

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

What reference genome are you using? What is the total length of all sequences in ref.fa? Thanks.

@kanchanhole
Copy link
Author

kanchanhole commented Nov 13, 2017

Homo_sapiens.GRCh38.fa

Human genome built 38
In total has 455 fasta sequences
Downloaded from
ftp://ftp.ensembl.org/pub/release-77/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

Is this "toplevel" file from Ensembl? If yes, you should use this file instead:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

@kanchanhole
Copy link
Author

Is this human genome?

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

Ensembl's "toplevel" fasta is very confusing and should be avoided in general. See this discussion here.

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

The FTP link above is the official version from GRC.

@kanchanhole
Copy link
Author

kanchanhole commented Nov 13, 2017

Thank you very much. If I download the "Reference Genome Sequence" from:https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/

will it work fine?

@kanchanhole
Copy link
Author

Thank you very much. I have started the index building again with the link you provided.

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

Your new link doesn't give you chromosome sequences, apparently. Anyway, the right GRCh38 for most analyses is the FTP link above. It seems that I need to write a blog post to explain the differences between the different versions...

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

I will keep this open until you confirm minimap2 works properly. Meanwhile, I will write a blog post to explain the different genome versions.

@kanchanhole
Copy link
Author

Hello,
The alignment runs fine. Thank you very much.

However, how does minimap2 creates a stat file.
showing statistics like overall mapping? uniquely mapping?

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

Unfortunately, minimap2 doesn't keep track of these information. You have to use 3rd-party tools like samtools flagstat, samtools stats or others to collect the info as a separate step.

@kanchanhole
Copy link
Author

Okay. Sure.

Thank you very much for the help. I really appreciate it.

@lh3
Copy link
Owner

lh3 commented Nov 13, 2017

Welcome. I am closing the issue now. Thanks!

@lh3 lh3 closed this as completed Nov 13, 2017
@kanchanhole
Copy link
Author

Hello,

Is there any way I could download the official version of GRC for build 37?
The link you suggested above gives for 38.

I greatly appreciate your help. Thank you.

@lh3
Copy link
Owner

lh3 commented Dec 3, 2017

@kanchanhole
Copy link
Author

Thank you very much. This is really helpful.

@kanchanhole
Copy link
Author

Hello, as per your blog I have used the reference GRCh 37 in my alignment. However, when I try to visualize the alignment in IGV (using the GRCh37/hg19 included in IGV) I could not see the specific gene I am interested in with the coordinates I have. (which I could see with GRCh38.)

Do I need to download gene list for the reference you provided and load the genome on IGV?

@AnjuniPeiris
Copy link

Hi, I'm running minimap2 on a reference genome for wheat stripe rust and having the same issue, and receive this error message:

[WARNING] For a multi-part index, no @sq lines will be outputted.

Here's the reference genome: http://mbio.asm.org/content/9/1/e02275-17
It's a small genome but we are getting this error message even when mapping for just the primary genome assembly, without haplotypes. We are not getting the mapping statistics either. Any advice to solve this issue?
Thanks.

@lh3
Copy link
Owner

lh3 commented Feb 22, 2018

Could you send me the exact reference fasta to me? Thanks.

@lh3 lh3 reopened this Feb 22, 2018
@AnjuniPeiris
Copy link

The reference fasta are available here:
https://github.com/BenjaminSchwessinger/Pst_104_E137_A-_genome/tree/master/Assembly

I used the 'Pst_104E_v13_p_ctg.fa' file and also combined it with the 'Pst_104E_v13_h_ctg.fa' to get the primary and haplotype assembly.

@lh3
Copy link
Owner

lh3 commented Feb 22, 2018

Could you put the exact fasta you use somewhere? If this is a bug, it will only happen in extreme cases. Files created at my hand may not reproduce the issue. Thank you.

@lh3
Copy link
Owner

lh3 commented Feb 22, 2018

Also, what is the command line and the version of minimap2 are you using? (BTW, I am unable to reproduce the issue on my machine)

@AnjuniPeiris
Copy link

I'm using minimap2 version 2.8-r711.

Here is the link to the exact fasta files and also the fastq inputs I used for mapping:
https://anu365-my.sharepoint.com/personal/u5587237_anu_edu_au/_layouts/15/onedrive.aspx?slrid=ae254d9e-a034-5000-27e2-fb6b08e169ce&id=%2Fpersonal%2Fu5587237_anu_edu_au%2FDocuments%2Fminimap_inputs&FolderCTID=0x012000AB71B6903A88AA4C88946B5BD9284013

I tried mapping to the primary and haplotype (ph) reference genome, and then only the primary (p), but still get the same SQ error. Both the reference genomes are in the linked file.

Here is the command line:
~/myapps/minimap2/minimap2/minimap2 -ax map-ont ${PBS_JOBFS}/albacore_fastq/${name}_pass.fastq ${PBS_JOBFS}/GENOME/${genome_file} > ${name}_pass.minimap2.out.sam
~/myapps/minimap2/minimap2/minimap2 -ax map-ont ${PBS_JOBFS}/albacore_fastq/${name}_fail.fastq ${PBS_JOBFS}/GENOME/${genome_file} > ${name}_fail.minimap2.out.sam

@lh3
Copy link
Owner

lh3 commented Feb 23, 2018

~/myapps/minimap2/minimap2/minimap2 -ax map-ont ${PBS_JOBFS}/albacore_fastq/${name}_pass.fastq ${PBS_JOBFS}/GENOME/${genome_file} > ${name}_pass.minimap2.out.sam

When you map to a reference genome, the reference fasta should be the first argument. You are putting the reference as the second argument.

@lh3
Copy link
Owner

lh3 commented Feb 23, 2018

I mean: the command line should be:

~/myapps/minimap2/minimap2/minimap2 -ax map-ont ${PBS_JOBFS}/GENOME/${genome_file} ${PBS_JOBFS}/albacore_fastq/${name}_pass.fastq > ${name}_pass.minimap2.out.sam

Anyway, thanks for reporting!

@jtamames
Copy link

jtamames commented Jan 7, 2021

Hello

Congrats for this excellent and useful software!
We are using minimap2 in our metagenomics pipeline SqueezeMeta to map metagenomic reads to assembled contigs. One of our users ran into a problem when mapping PacBio samples. Here you can see the command line:

Mapping with Minimap2 (Li 2018, Bioinformatics 34(18), 3094-3100)
Working with sample 1: SampleI
Getting raw reads
Aligning to reference with minimap2-pb
...........
Version: 2.12-r827
CMD: /home/mphilipp/anaconda3/envs/SQM1.3/SqueezeMeta/bin/minimap2 -ax asm20 -t 70 /scratch/mphilipp/msm89/08_sqm/mg02_canu2_hifi2/results/01.mg02_canu2_hifi2.fasta /scratch/mphilipp/msm89/08_sqm/mg02_canu2_hifi2/temp/mg02_canu2_hifi2.SampleI.current_1
Real time: 1498.507 sec; CPU: 68155.684 sec

The problem is that no @SQ headers are present in the resulting SAM. I have seen that this was sometimes due to a incorrect command line where reads were specified bafore reference, but this is not our case. Could you hint us about the problem here?

Best,
Javier

@lh3
Copy link
Owner

lh3 commented Jan 7, 2021

See FAQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants