-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infinite loop in kseq_read for gz files that fail crc check #78
Comments
Another question is what is the correct behaviour of kseq in cases where |
Yes, return value from read/gzread should be checked. Could you send a pull request? Thanks a lot for this! |
jkbonfield
added a commit
to jkbonfield/klib
that referenced
this issue
Mar 8, 2023
In porting attractivechaos#78 over to htslib, I found a bug due to bracketing. In the original code, `c` is set to the result of the whole boolean if statement, so 0 or 1. Thus the `if (c == -3)` check can never pass.
jkbonfield
added a commit
to jkbonfield/htslib
that referenced
this issue
Mar 8, 2023
Original PR by Pall Melsted, with only manual merging and one trivial bug fix by myself. Co-authored-by: Pall Melsted <[email protected]>
daviesrob
pushed a commit
to samtools/htslib
that referenced
this issue
Mar 13, 2023
Original PR by Pall Melsted, with only manual merging and one trivial bug fix by myself. Co-authored-by: Pall Melsted <[email protected]>
daviesrob
pushed a commit
to daviesrob/htslib
that referenced
this issue
Apr 3, 2023
Original PR by Pall Melsted, with only manual merging and one trivial bug fix by myself. Co-authored-by: Pall Melsted <[email protected]>
clrpackages
pushed a commit
to clearlinux-pkgs/htslib
that referenced
this issue
Dec 28, 2023
Andrew Whitwham (12): Uodate copyright for winter release. January 2023 NEWS update. Keeping the NEWS file up-to-date. More additions and improvements. Added htscodecs update to v1.4.0 More space. Switched back to openssl for Alpine. Amalgamate multiple CIGAR ops into single entry. (#1607) Stop the overwriting of the end value. Summer 2023 copyright update. Speed up removal of lines in large headers. Winter News 2023 (PR #1703) Bergur Ragnarsson (3): draft fix fix memory leak exit early on error David Seifert (1): Use POSIX `grep` Fabian Klötzl (1): improve parsing performance Fangrui Song (1): Apply the packed attribute to uint*_u types for Clang James Bonfield (86): Update man SEE ALSO sections from .BR to .IR so the website uses URLs bgzip text compression mode Make the bgzip -g option less opaque. Make tabix support CSI indices with large positions. Prevent crash when only FASTA entry has no sequence. Add an fai_line_length function. Check for invalid BC tags in fastq output. Warn if ref file is given but it doesn't contain the refs we need. Fix buffer read-overrun in bam_plp_insertion_mod. Fix ref fix from c91804c Make it easier to modify shared library permissions during install Add CRAM SQ/M5 header checking when specifying a fasta file. (PR #1522) Speed up load_ref_portion. Expand CRAM API a bit to cope with new samtools cram_size command. (PR #1546) Merges neighbouring I and D ops into one op within pileup. (PR #1552) Improve API docs for bgzf_mt FIx a bug in the codec learning algorithm for TOKA Fix a bug with multi-threading and embed_ref=2 on name sorted data Use non-ref mode when all else fails for CRAM encoding Add some documentation on cram encoder code structure Fix Cram compression container substitution matrix generation. Tweak the CRAM_SUBST_MATRIX table. Prevent spurious and random system errors from test_bgzf.c Fix cram_index_query_last function Avoid deeply nested containment list on old CRAM indices. Permit fastq output to create empty FASTQ records for seq "*". Fix a couple small VCF auto-indexing bugs. Backport attractivechaos/klib#78 to htslib. Slightly speed up various cram decoding functions (#1580) Remove CRAM 3.1 warning. Trivial fix to expr, removing "^". Add MZ:i tag as a check for base modification validity. (#1590) Fix typo in kh_int_hash_func2 macro. Rename aux tag MZ to MN. Protect against overly large containers. Don't create overly large CRAM blocks. Add a missing break statement in cram_codec_to_id. (#1614) Fix fd_seek on pipes on modern MinGW releases. Change bounds checking in probaln_glocal Fix a containment bug in cram_index_last. Migrate base modification code out of sam.c Correct base modification implicit / explicit status when mixed together. Add a bam_mods_queryi interface. Add bam_parse_basemod2 API with additional flags argument. Add more internal sam_mods.c documentation Update bam_next_basemod too to cope with HTS_MOD_REPORT_UNCHECKED. Fix decompress_peek_gz to cope with files starting on empty gzip blocks. Fix to 2e672f33 decompress_peek_gz change. Add fai_thread_pool interface. NEWS updates for pending release Makes bam_parse_cigar able to modify existing BAM records rather than Fix cut and paste errors in bam_aux2f documentation The first stage of vcf_parse_format speed improvements. Further VCF reading speeds optimisations. Revert most of the vcf_parse_info improvements. Add an hclen SAM filter function. Fix a minor memory leak in malformed CRAM EXTERNAL blocks. (#1671) Enable auto-vectorisation in CRAM 3.1 codecs. Cache key header lengths. Allow vcf_format to work on packed data, plus bcf_fmt_array improvements. Speed up kputd by approx 130%. Minor tweaks to bcf_fmt_array and bcf_str_missing usage. Fix a cram decode hang from block_resize. Always do the CRAM mutex lock/destroys. Add C++ casts for external headers. Enable optimisation level -O3 for SAM QUAL+33 formatting. Avoid a NULL pointer dereference while building CRAM embedded ref. Avoid a NULL pointer deref when erroring writing CRAM to stdout Make CRAM internal data structures use hts_pos_t. Prevent CRAM 3 from attempting to write out out-of-bounds ref positions Remove memory leak when cram_encode_container fails during a close. Fix out by one error on extend_ref memory allocation. Don't call cram_ref_decr on consensus-based references. Check for 64-bit values in BETA codec initialiser. Protect against CRAM slice end going beyond end of reference. Prevent extend_ref from making huge mallocs on very sparse data. Improve the fuzzer to write BAM/CRAM and BCF too. Fix memory leaks on failed CRAM encode. Permit embed_ref=2 mode to be reenabled after using no_ref. Tighten memory constraints for cram_decode. Further rewrite of the fuzz test harness Reduce maxmimum BCF header len in fuzzer. Fix buffer read overrun in cram_encode_aux. Disable hts_set_fai_filename call in hts_open_fuzzer. Avoid undefined behaviour integer overflow in extend_ref Fix integer overflow in cram_compress_block2 John Marshall (19): Add bam_aux_first()/bam_aux_next() tagged aux field iterator API Document that bam_aux_del()'s `s` parameter must be non-NULL (& reformat) Add symbol versioning to the ELF shared-object file Mention in INSTALL that using plugins may need -rdynamic Set _XOPEN_SOURCE in configure if it's not already set Add missing Makefile dependencies [minor] Make last_in a pointer to const [minor] Add "uncompressed" in hts_format_description() where appropriate Add hclose()-doesn't-close-fd option and use it for hopen("-") Take advantage of shared hopen("-") in htsfile.c Explicitly fclose(stdout) in test/test_view.c too Fix hfile_libcurl small seek bug Apply memchr optimisation in the general delimiter case too Add test case compiling the public headers as C++ Remove remnants of HTS_HAVE_NEON, unused since PR #1587 Remove NUMERIC_VERSION, unused since PR #1226 Document primarily MM/ML and fix base-mod-related typos Compare as off_t (not size_t) and fix printf specifiers Install annot-tsv.1 man page and alphabetise annot-tsv rules Lilian Janin (2): Fix error code 0 returned by bcftools after error Make bcftools return an error code != 0 after [E::bgzf_read_block] Invalid BGZF header at offset xxx Petr Danecek (15): Remove variable redeclaration warnings from perl test script Make bcf_hdr_seqnames() work with gapped chromosome ids Make bcf_hdr_idinfo_exists() more robust Check if VCF POS column could be fully parsed Allow repeated calls of bcf_sr_set_regions (PR #1624) An attempt to parse malformatted region such as {1:1}-2 should fail Add new annot-tsv program Output full diff on failing tests Output full diff on failing tests Make qsort in regidx order-reproducible across platforms Provide a nill (dot) value when the field is empty Use EXIT_SUCCESS with -h and EXIT_FAILURE on errors Address various comments VCF parsing fix Allow renaming of the default -a annotations (#1709) Rob Davies (65): Fix n-squared complexity in sample line with many adjacent tabs Switch to building libdeflate with cmake Add faidx_seq_len64() and fai_adjust_region() interfaces Rework / add new faidx tests Fix build on ancient versions of gcc Ensure strings in config_vars.h are escaped correctly Switch MacOS CI tests to an ARM-based image Cut down the number of embed_ref=2 tests that get run Cap bgzf_getline return value to INT_MAX Make tbx_parse1 work for lines longer than 2Gbytes Use correct type for ret in vcf_write() Don't error when making an iterator on a tid not in the index Happy New Year Catch errors from bgzf_getline() in hts_readlist, hts_readlines Add configure (enable|disable)-versioned-symbols options and tests Add Makefile rule to update the symbol version file Strip out symbol versions from shlib-exports-so.txt Update to htscodecs v1.4.0 Minor NEWS adjustment and additonal item Switch to CURLINFO_CONTENT_LENGTH_DOWNLOAD_T for newer libcurl Fix crypt4gh redirection Remove use of sprintf() from HTSlib source Make SIMD tests work when building multiarch binaries Make MacOS tests build a multiarch version of the library Fix bug where bin number could overflow when looking for max_off Make reg2bins faster on whole-chromosome queries Make reg2intervals() faster on whole-chromosome queries Update to latest htscodecs Don't set _POSIX_C_SOURCE for htscodecs tests Fix trailing space in config.h made by configure Ignore generated config_vars.h file in copyright check Switch to `/usr/bin/env perl` for all perl scripts Adjust comments in probaln_glocal() Expand test-bcf-sr.c capabilities Add synced reader region tests, and move no-index tests Fix possible double frees in bcf_hdr_add_hrec() error handling Prevent dangling hrec pointer after bcf_hdr_add_hrec() failure Remove items from hdict in bcf_hdr_remove() Ensure number of modifications is always set in bam_parse_basemod2() Ensure simple_test_driver.sh cleans up its temporary files Ensure base mod test result is noticed by the Makefile Improve test/test_mod.c Switch to htscodecs 1.5.1 Skip CRC checks when fuzzing Prevent out-of-memory reports when fuzzing Add missing dependency on libhts.a for hts_open_fuzzer Fix virtual offset adjustment when indexing on-the-fly Fix BCF/VCF on-the-fly indexing issues Fix crypt4gh redirection Update htscodecs to v1.5.2 (2aca18b3) Simplify run_test function Fix test/test.pl -F dependencies Simplify test/test.pl Move all annot_tsv tests into their own function Fix up some annot-tsv error checks / reports Reformulate man page for house style Make compiler flag detection work with zig cc Fix a couple of unused value warnings when built with NDEBUG Fix @pg linking when records make a loop Fix possible shift of negative value in cram_encode_aux() Move typecast to the right place Ensure no-stored-sequence reads are counted in container size Make output on unrecognised options go to stderr Update to latest htscodecs Update htscodecs to v1.6.0 kojix2 (2): Fix a typo in sam.h documentation Fix example in docs for sam_hdr_add_line pd3 (4): Remove a bottleneck in VCF header processing Add support for non-standard chromosome names containing [:-] characters Clarify usage, include output example Temporary workaround when excessive memory is required by FORMAT fields vasudeva8 (10): Adds bcf_strerror method (PR #1510) Ensure NUL termination of Z/H data in sam_format_aux1; fix base mod state reuse Changes to avoid segfault with uncompressed bam (PR #1632) Demonstration of htslib/sam api usage. formatting update bgzf_useek fails when offset is above block limits Add support for multiple files to bgzip Limited usage text to 80 chars per line man update for -a option (PR #1716) Fix example column names and explain core column renaming Étienne Mollier (2): cram/cram_external.c: fix external htscodecs include htslib-s3-plugin.7: fix whatis entry Noteworthy changes in release 1.19 (12th December 2023) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Updates ------- * A temporary work-around has been put in the VCF parser so that it is less likely to fail on rows with a large number of ALT alleles, where Number=G tags like PL can expand beyond the 2Gb limit enforced by HTSlib. For now, where this happens the offending tag will be dropped so the data can be processed, albeit without the likelihood data. In future work, the library will instead convert such tags into their local alternatives (see samtools/hts-specs#434). (NEWS truncated at 15 lines)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When a gz file fails the crc check kseq_read enters an infinite loop.
This happens in https://github.com/attractivechaos/klib/blob/master/kseq.h#L101
The __read method when using gzread returns -1 when the crc check fails and the for loop never exits.
I have reproduced this bug using the test program from your website and the two supplied files. They are identical except I messed with a CRC byte in t.fastq.gz
reads_2.fastq.gz
t.fastq.gz
Interestingly seqtk does not fail, although it doesn't read all the lines
and gzcat (mac version of zcat) notices the error, but it returns the original data.
The text was updated successfully, but these errors were encountered: