Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CheckDuplicateMarking CLP #1507

Merged
merged 5 commits into from
Jul 10, 2020
Merged

Conversation

yfarjoun
Copy link
Contributor

  • added a new (undocumented) CLP that will check that all the records with the same queryname have the same duplicate marking.
  • added tests
  • moved another test into a markduplicates test directory

Description

Give your PR a concise yet descriptive title
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.


Checklist (never delete this)

Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.

Content

  • Added or modified tests to cover changes and any new functionality
  • Edited the README / documentation (if applicable)
  • All tests passing on Travis

Review

  • Final thumbs-up from reviewer
  • Rebase, squash and reword as applicable

For more detailed guidelines, see https://github.com/broadinstitute/picard/wiki/Guidelines-for-pull-requests

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments and questions. Also, what is the rationale to not make this documented?

private final Log log = Log.getInstance(CheckDuplicateMarking.class);
private final ProgressLogger progress = new ProgressLogger(log, (int) 1e6, "Checked.");

static private int NUM_WARNINGS = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be final

Suggested change
static private int NUM_WARNINGS = 100;
private static final int NUM_WARNINGS = 100;


//sort by queryname if not already
final Iterable<SAMRecord> samRecordIterable;
if (reader.getFileHeader().getSortOrder() != SAMFileHeader.SortOrder.queryname) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be clearer to have this if block in a method, ensureSortedBam()


checkDuplicateMarkingsInIterable(samRecordIterable);
log.info("Found " + numBadRecords + " bad records.");
return numBadRecords > 0 ? 1:0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Suggested change
return numBadRecords > 0 ? 1:0;
return numBadRecords > 0 ? 1 : 0;

}

checkDuplicateMarkingsInIterable(samRecordIterable);
log.info("Found " + numBadRecords + " bad records.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use error not info if there are bad records, and/or only log if there's an error.

private String currentReadName = "";
private boolean currentReadDuplicateMarked = false;

private final Log log = Log.getInstance(CheckDuplicateMarking.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be static

Suggested change
private final Log log = Log.getInstance(CheckDuplicateMarking.class);
private static final Log log = Log.getInstance(CheckDuplicateMarking.class);

if (!rec.getReadName().equals(currentReadName)) {
currentReadName = rec.getReadName();
currentReadDuplicateMarked = rec.getDuplicateReadFlag();
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary return since this will fall through to the return at line 136



private void checkDuplicateMarkingsInIterable(final Iterable<SAMRecord> iterable) throws IOException {
try (PrintWriter writer = OUTPUT==null?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace

Suggested change
try (PrintWriter writer = OUTPUT==null?
try (PrintWriter writer = OUTPUT == null ?
new PrintWriter(NullOutputStream.NULL_OUTPUT_STREAM) :

src/main/java/picard/sam/CheckDuplicateMarking.java Outdated Show resolved Hide resolved
import java.nio.file.Paths;

public class CheckDuplicateMarkingTest {
private static final String TEST_FILES_DIR="testdata/picard/sam/CheckDuplicateMarking";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reformat new code to follow conventions

Suggested change
private static final String TEST_FILES_DIR="testdata/picard/sam/CheckDuplicateMarking";
private static final String TEST_FILES_DIR = "testdata/picard/sam/CheckDuplicateMarking";


import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import picard.sam.DuplicationMetrics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to move the test but not the class? Normally the test for a class is in the same package as the class being tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean you will move it back? It looks like this wasn't addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. not sure what happened there...sorry.

@pshapiro4broad pshapiro4broad added the Waiting For Revisions This PR has received comments from reviewers and is waiting for the Author to respond label May 27, 2020
@yfarjoun yfarjoun added Needs Review This PR is waiting for a reviewer to respond and removed Waiting For Revisions This PR has received comments from reviewers and is waiting for the Author to respond labels Jun 16, 2020
@yfarjoun yfarjoun requested a review from pshapiro4broad June 16, 2020 20:58
Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nits left.


import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import picard.sam.DuplicationMetrics;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean you will move it back? It looks like this wasn't addressed.

}

private void checkDuplicateMarkingsInIterable(final Iterable<SAMRecord> iterable) throws IOException {
try (PrintWriter writer = OUTPUT == null ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could put ternary op inside constructor

        try (PrintWriter writer = new PrintWriter(OUTPUT != null ? new FileWriter(OUTPUT) : NullOutputStream.NULL_OUTPUT_STREAM)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no...it's a different constructor..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrintWriter(Writer) vs. PrintWriter(OutputStream)

@pshapiro4broad pshapiro4broad added Waiting For Revisions This PR has received comments from reviewers and is waiting for the Author to respond and removed Needs Review This PR is waiting for a reviewer to respond labels Jul 7, 2020
@yfarjoun yfarjoun removed the Waiting For Revisions This PR has received comments from reviewers and is waiting for the Author to respond label Jul 7, 2020
Yossi Farjoun added 4 commits July 8, 2020 10:56
… with the same queryname have the same duplicate marking.

- added tests
- moved another test into a markduplicates test directory
@yfarjoun yfarjoun force-pushed the yf_Add_Check_DuplicateMarking branch from 5488339 to 4c295d4 Compare July 8, 2020 14:59
@yfarjoun yfarjoun merged commit c2e66e9 into master Jul 10, 2020
@yfarjoun yfarjoun deleted the yf_Add_Check_DuplicateMarking branch July 10, 2020 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants