-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests fqgrep #16
Tests fqgrep #16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samfulcrum this is looking really really good! I left a lot of nit-picky comments, so my apologies in the advance for the extra work. I think the only thing I couldn't explain well was how to make call_opts
and write_opts
a single method, let me know if you want to chat about it.
src/main.rs
Outdated
let dir: TempDir = TempDir::new().unwrap(); | ||
let out_path = dir.path().join(String::from("output.fq")); | ||
// Clone path to read from below | ||
// TODO: refactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixme todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// TODO: refactor |
src/main.rs
Outdated
let out_path = dir.path().join(String::from("output.fq")); | ||
// Clone path to read from below | ||
// TODO: refactor | ||
let x = &out_path.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let x = &out_path.clone(); | |
let output_path = &out_path.clone(); |
src/main.rs
Outdated
let test_pattern = vec![String::from("^A")]; | ||
let mut opts_testcase = call_opts(&dir, &seqs, &test_pattern, true, Some(out_path)); | ||
// run fqgrep | ||
let _result = fqgrep_from_opts(&mut opts_testcase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let _result = fqgrep_from_opts(&mut opts_testcase); | |
let fqgrep_output = fqgrep_from_opts(&mut opts_testcase); |
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your great feedback! I have implemented all variable name changes/comment revision.
Note: In the future I will add suggestion to batch rather than committing each suggestion separately my bad
Larger changes include:
- implementing seq_io::fastq for both writing and reading fastq files
- write_opts() and call_opts() were merged into one function build_opts()
- slurp_output() is a new helper function that returns the sequences from a fastq file
- opts.count is automatically generated based on opts.output
- added sequences that do not match pattern to tests
I thought it would be easier to review these changes then add more tests like gzip file, more complicated patterns etc.
src/main.rs
Outdated
// Set io | ||
let io = Io::default(); | ||
// File name and path | ||
let name = String::from("pattern.txt"); | ||
let f1 = temp_dir.path().join(name); | ||
// Simply pass pattern to io.write_lines() | ||
io.write_lines(&f1, &*pattern).unwrap(); | ||
// Return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unnecessary comments, change 'name' to file_name, and 'f1' to pattern_path
// Set io | |
let io = Io::default(); | |
// File name and path | |
let name = String::from("pattern.txt"); | |
let f1 = temp_dir.path().join(name); | |
// Simply pass pattern to io.write_lines() | |
io.write_lines(&f1, &*pattern).unwrap(); | |
// Return | |
fn write_pattern(temp_dir: &TempDir, pattern: &Vec<String>) -> PathBuf { | |
let io = Io::default(); | |
let file_name = String::from("pattern.txt"); | |
let pattern_path = temp_dir.path().join(file_name); | |
io.write_lines(&pattern_path, &*pattern).unwrap(); |
gitignore Co-authored-by: Nils Homer <[email protected]>
cargo.toml
Co-authored-by: samfulcrum <[email protected]>
Major refactor of the tool and code to make its command line and behavior very similar to unix grep. 1. All reader, writer, and matching threads use a rayon thread pool. This means that `--threads` is respected. Previously reader and writer threads were always allocated outside the match pool, and there were specific arguments for the latter and compressing the output (the latter feature has been removed, plaintext FASTQ is the only output format, just pipe it if you need to). 2. Takes in a pattern as the first positional argument, which is now a regular expression (previously a fixed string). 3. Takes in zero or more file paths after the positional argument. Uses standard input if no file are given positionally or with `-f` below. 4. Input files are assumed to be plain uncompressed FASTQs unless the `--decompress` option is given, in which case they're assumed to be GZIP compressed. This includes standard input. The exception are `.gz/.bgz` and`.fastq/.fq` which are always treated as GZIP compressed and plain text respectively. 5. Implement the following options from grep: * `-c, --count`: simply return the count of matching records * `-F, --fixed-strings`: interpret pattern as a set of fixed strings * `-v,--invert-match`: Selected records are those not matching any of the specified patterns * `--color <color>`: color the output records with ANSI color codes * `-e, --regexp <regexp>...`: specify the pattern used during the search. Can be specified multiple times * `-f, --file <file>`: Read one or more newline separated patterns from file. * `-Z, --decompress`: treat all non `.gz`, `.bgz`, `.fastq`, and `.fq` files as GZIP compressed (default treat as uncompressed) 6. The exit code follows GREP, where we exit with 0 if one or more lines were selected, 1 if no lines were selected, and >1 if an error occurred. 7. Add non-grep options: * `--paired`: if one file or standard input is used, treat the input as an _interleaved_ paired end FASTQ. If more than one file is given, ensure that the number of files are a multiple of two, and treat each consecutive pair of files as R1 and R2 respectively. If the pattern matches either R1 _or_ R2, output both (interleaved FASTQ). * `--reverse-complement`: searches the reverse complement of the read sequences in addition * `--progress`: write progress (counts of records searched) * `-t, --threads <threads>`: see (1) above 8. Miscellaneous changes: * Add a rust-toolchain file in #15 * Unit tests added by @samfulcrum in #16 and #18.
Added test mod to main.rs and added hidden option --output:
Test Mod Helpers:
Tests:
test_single_fq()
pattern = "^A" or "^G" from opts.regexp
fastq structure = one fastq with two sequences - "AAGTCTGAATCCATGGAAAGCTATTG", "GGGTCTGAATCCATGGAAAGCTATTG"
looking for a return match count of 2
** test_multiple_fq()**
pattern = "^A" or "^G" from opts.file
fastq structure = two fastq each with two sequences - "AAGTCTGAATCCATGGAAAGCTATTG", "GGGTCTGAATCCATGGAAAGCTATTG"
opts.paired = true
looking for a return match count of 2
test_get_output()
pattern = "^A" from opts.file
opts.options = tempdir/output.fq
After calling fqgrep_from_opts() the output file is read into 'lines' where only lines containing sequences are retained ,the expected sequence match is manually defined as 'expected', and 'lines' should be equal to 'expected'. This function needs cleaning.