Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Guessing FASTQ quality system using seqkit convert #254

Closed
alienzj opened this issue Oct 27, 2021 · 10 comments
Closed

[QUESTION] Guessing FASTQ quality system using seqkit convert #254

alienzj opened this issue Oct 27, 2021 · 10 comments

Comments

@alienzj
Copy link

alienzj commented Oct 27, 2021

seqkit version

➤ seqkit version
seqkit v0.16.1

Problem I meet:

➤ seqkit convert V300109217_L03-506_1.fq.gz | head
[INFO] possible quality encodings: []
[ERRO] quality encoding not consistent

Dear Shenwei,

I used seqkit to guess the FASTQ quality system, but the quality system does not seem to be identified.
Any good suggestions ?

Thank you for developing seqkit and excellent documentation.

@alienzj
Copy link
Author

alienzj commented Oct 27, 2021

➤ seqkit version
seqkit v2.0.0

➤ seqkit convert V300109217_L03-506_1.fq.gz | head
[INFO] possible quality encodings: []
[ERRO] quality encoding not consistent

@shenwei356
Copy link
Owner

shenwei356 commented Oct 27, 2021

What's the sequencing platform?

@alienzj
Copy link
Author

alienzj commented Oct 27, 2021

What's the sequencing platform?

Transcriptome Resequencing based on DNBseq platform.

Below is FastQC report:

image

@shenwei356
Copy link
Owner

will check it on Friday.

@shenwei356
Copy link
Owner

Similar issue #239 .

I've extended the value range of 'QualityEncoding', which should correct this.

@alienzj
Copy link
Author

alienzj commented Nov 1, 2021

Hi, Shenwei, thanks for your help.

I downloaded the latest compiled version of seqkit you provided: seqkit_linux_amd64.tar.gz,
but the quality system still does not seem to be identified.

➤ ./seqkit version
seqkit v2.0.0

➤ ./seqkit convert V300109217_L03-506_1.fq.gz | head
[INFO] possible quality encodings: []
[ERRO] quality encoding not consistent

@shenwei356
Copy link
Owner

paste some sequences please.

@alienzj
Copy link
Author

alienzj commented Nov 1, 2021

test_100K.fq.gz

Dear Shenwei, thanks,
please see the attachments for 100K reads which were randomly generated from a bigger data sets.

You will see the following result:

➤ ./seqkit convert test_100K.fq.gz | head
[INFO] possible quality encodings: []
[ERRO] quality encoding not consistent

@shenwei356
Copy link
Owner

Some reads are treated as Illumina 1.5. It can be solved by using a small value of -N/--thresh-B-in-n-most-common, like 2.

I also changed the default value from 4 to 2, related discussion.

@alienzj
Copy link
Author

alienzj commented Nov 1, 2021

➤ ./seqkit convert test_100K.fq.gz | head
[INFO] possible quality encodings: [Sanger Illumina-1.8+]
[INFO] guessed quality encoding: Sanger
[INFO] converting Sanger -> Sanger
[WARN] source and target quality encoding match.

Great, Shenwei, thank you sou much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants