Giant FASTQ support in stats #33

GoogleCodeExporter · 2015-08-27T17:58:41Z

Some stats programs have things like kmer (with -K) reports and probe-id 
counting (with -D).

These programs can consume a lot of RAM (>10GB), even with the highly efficient 
sparsehash library on very large files (> 200 mil reads).

The use of a disk-backed key-value store, like levelDB could see decent 
performance, like a hash, but would also allow growth past available RAM with 
decent performance.   I'm thinking that the code should switch to a DB-backed 
store at the 200 mil record level.   This would slow things down by about 3x 
(from 1 mil writes/sec to 300k writes/sec), but would also allow infinte 
growth.  Enabling a large LRU cache could it perform so similarly that the 
sparse hash can be abandoned, especially if the db remains an insigificant 
fraction of the stats collection process.

Original issue reported on code.google.com by [email protected] on 9 Jul 2014 at 2:26

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter added auto-migrated Priority-Medium Type-Enhancement labels Aug 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Giant FASTQ support in stats #33

Giant FASTQ support in stats #33

GoogleCodeExporter commented Aug 27, 2015

Giant FASTQ support in stats #33

Giant FASTQ support in stats #33

Comments

GoogleCodeExporter commented Aug 27, 2015