Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests fqgrep #16

Merged
merged 42 commits into from
Dec 14, 2022
Merged

Tests fqgrep #16

merged 42 commits into from
Dec 14, 2022

Conversation

sam-white04
Copy link

Added test mod to main.rs and added hidden option --output:

  • --output hidden option implemented in StructOpts and called in testing: takes Option
  • Added lines (80 - 98) to create 'maybe_writer' as None if --count, to a file if --output, else to stdout
  • Amended line 417 to include 'opts.output' when calling FastqWriter::new()
  • fqgrep_from_opts() was added so that an instance of StuctOpts could be passed to what was fqgrep() as a parameter
  • fqgrep() was amended to call fqgrep_from_opts() after creating opts using setup()

Test Mod Helpers:

  • write_fastq() helper function that generates fastq files from provided sequences in a temp directory
  • write_pattern() helper function that generates a .txt file with regex patterns in a temp directory
  • call_opts() helper function that creates an instance of StructOpts with the regex coming from either a PathBuf or a Vec
  • write_opts() helper function that actually returns an instance of StructOpts

Tests:

  • test_single_fq()

  • pattern = "^A" or "^G" from opts.regexp

  • fastq structure = one fastq with two sequences - "AAGTCTGAATCCATGGAAAGCTATTG", "GGGTCTGAATCCATGGAAAGCTATTG"

  • looking for a return match count of 2

  • ** test_multiple_fq()**

  • pattern = "^A" or "^G" from opts.file

  • fastq structure = two fastq each with two sequences - "AAGTCTGAATCCATGGAAAGCTATTG", "GGGTCTGAATCCATGGAAAGCTATTG"

  • opts.paired = true

  • looking for a return match count of 2

  • test_get_output()

  • pattern = "^A" from opts.file

    • fastq structure = two fastq one with "AAGTCTGAATCCATGGAAAGCTATTG" the other with "GGGTCTGAATCCATGGAAAGCTATTG"
  • opts.options = tempdir/output.fq

  • After calling fqgrep_from_opts() the output file is read into 'lines' where only lines containing sequences are retained ,the expected sequence match is manually defined as 'expected', and 'lines' should be equal to 'expected'. This function needs cleaning.

@sam-white04 sam-white04 requested a review from nh13 October 26, 2022 15:38
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samfulcrum this is looking really really good! I left a lot of nit-picky comments, so my apologies in the advance for the extra work. I think the only thing I couldn't explain well was how to make call_opts and write_opts a single method, let me know if you want to chat about it.

.gitignore Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated
let dir: TempDir = TempDir::new().unwrap();
let out_path = dir.path().join(String::from("output.fq"));
// Clone path to read from below
// TODO: refactor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixme todo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Suggested change
// TODO: refactor

src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated
let out_path = dir.path().join(String::from("output.fq"));
// Clone path to read from below
// TODO: refactor
let x = &out_path.clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let x = &out_path.clone();
let output_path = &out_path.clone();

src/main.rs Outdated
let test_pattern = vec![String::from("^A")];
let mut opts_testcase = call_opts(&dir, &seqs, &test_pattern, true, Some(out_path));
// run fqgrep
let _result = fqgrep_from_opts(&mut opts_testcase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let _result = fqgrep_from_opts(&mut opts_testcase);
let fqgrep_output = fqgrep_from_opts(&mut opts_testcase);

src/main.rs Outdated Show resolved Hide resolved
sam-white04 and others added 22 commits October 26, 2022 12:35
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
sam-white04 and others added 4 commits October 31, 2022 11:46
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
Co-authored-by: Nils Homer <[email protected]>
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
@fulcrumgenomics fulcrumgenomics deleted a comment from nh13 Nov 1, 2022
Copy link
Author

@sam-white04 sam-white04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great feedback! I have implemented all variable name changes/comment revision.
Note: In the future I will add suggestion to batch rather than committing each suggestion separately my bad
Larger changes include:

  1. implementing seq_io::fastq for both writing and reading fastq files
  2. write_opts() and call_opts() were merged into one function build_opts()
  3. slurp_output() is a new helper function that returns the sequences from a fastq file
  4. opts.count is automatically generated based on opts.output
  5. added sequences that do not match pattern to tests

I thought it would be easier to review these changes then add more tests like gzip file, more complicated patterns etc.

src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated
Comment on lines 603 to 610
// Set io
let io = Io::default();
// File name and path
let name = String::from("pattern.txt");
let f1 = temp_dir.path().join(name);
// Simply pass pattern to io.write_lines()
io.write_lines(&f1, &*pattern).unwrap();
// Return
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unnecessary comments, change 'name' to file_name, and 'f1' to pattern_path

Suggested change
// Set io
let io = Io::default();
// File name and path
let name = String::from("pattern.txt");
let f1 = temp_dir.path().join(name);
// Simply pass pattern to io.write_lines()
io.write_lines(&f1, &*pattern).unwrap();
// Return
fn write_pattern(temp_dir: &TempDir, pattern: &Vec<String>) -> PathBuf {
let io = Io::default();
let file_name = String::from("pattern.txt");
let pattern_path = temp_dir.path().join(file_name);
io.write_lines(&pattern_path, &*pattern).unwrap();

src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
src/main.rs Outdated Show resolved Hide resolved
@nh13
Copy link
Member

nh13 commented Nov 28, 2022

@samfulcrum can this PR be closed and branch be deleted given #18? I am not sure what's the difference?

nevermind #18 targets this one.

src/main.rs Outdated Show resolved Hide resolved
@sam-white04 sam-white04 merged commit f5aea07 into refactor/grep-like Dec 14, 2022
@sam-white04 sam-white04 deleted the tests-fqgrep branch December 14, 2022 16:34
nh13 added a commit that referenced this pull request Dec 14, 2022
Major refactor of the tool and code to make its command line and behavior very similar to unix grep.

1. All reader, writer, and matching threads use a rayon thread pool.  This means that `--threads` is respected.  Previously reader and writer threads were always allocated outside the match pool, and there were specific arguments for the latter and compressing the output (the latter feature has been removed, plaintext FASTQ is the only output format, just pipe it if you need to).
2. Takes in a pattern as the first positional argument, which is now a regular expression (previously a fixed string).
3. Takes in zero or more file paths after the positional argument.  Uses standard input if no file are given positionally or with `-f` below.
4. Input files are assumed to be plain uncompressed FASTQs unless the `--decompress` option is given, in which case they're assumed to be GZIP compressed.  This includes standard input.  The exception are `.gz/.bgz` and`.fastq/.fq` which are always treated as GZIP compressed and plain text respectively.
5. Implement the following options from grep:

* `-c, --count`: simply return the count of matching records
* `-F, --fixed-strings`: interpret pattern as a set of fixed strings
* `-v,--invert-match`: Selected records are those not matching any of the specified patterns
* `--color <color>`: color the output records with ANSI color codes
* `-e, --regexp <regexp>...`: specify the pattern used during the search.  Can be specified multiple times
* `-f, --file <file>`: Read one or more newline separated patterns from file.
* `-Z, --decompress`: treat all non `.gz`, `.bgz`, `.fastq`, and `.fq` files as GZIP compressed (default treat as uncompressed)

6. The exit code follows GREP, where we exit with 0 if one or more lines were selected, 1 if no lines were selected, and >1 if an error occurred.

7.  Add non-grep options:

* `--paired`: if one file or standard input is used, treat the input as an _interleaved_ paired end FASTQ.  If more than one file is given, ensure that the number of files are a multiple of two, and treat each consecutive pair of files as R1 and R2 respectively.  If the pattern matches either R1 _or_ R2, output both (interleaved FASTQ).
* `--reverse-complement`: searches the reverse complement of the read sequences in addition
* `--progress`: write progress (counts of records searched)
* `-t, --threads <threads>`: see (1) above

8. Miscellaneous changes:

* Add a rust-toolchain file in #15
* Unit tests added by @samfulcrum in #16 and #18.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants