-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated COSMIC to annotate protein change strings with their counts. #5181
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you, @jonn-smith
final String proteinChange = getProteinChangeStringFromResults(resultSet); | ||
if ( !proteinChange.isEmpty() ) { | ||
if ( proteinChangeCounts.containsKey(proteinChange) ) { | ||
proteinChangeCounts.put(proteinChange, proteinChangeCounts.get(proteinChange) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a better method to use to do this:
final int count = proteinChangeCounts.getOrDefault(proteinChange, 0);
proteinChangeCounts.put(proteinChange, count+1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
} | ||
catch (final SQLException ex) { | ||
throw new GATKException("Cannot get Protein Position from column: " + GENOME_POSITION_COLUMN_NAME, ex); | ||
throw new GATKException("Cannot get Protein Change from column: " + GENOME_POSITION_COLUMN_NAME, ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think protein change should be lower case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
Now the COSMIC data source produces counts of each protein change found in the COSMIC database.
That is, rather than a raw count of the total number of protein changes (e.g.
2
or7
), it produces a count of each specific protein change found in the COSMIC database that overlaps a variant (e.g.p.E545K(2)
orp.E545K(2)|p.E542K(2)|p.H1047R(2)|p.N345K(1)
).Fixes #4400