Hg19 and hg38 from same workflow version #19

NagaComBio · 2021-03-05T14:36:18Z

Overview

Merging the hg38 indel calling branch to the main hg19 branch. Hg19 somatic indel analysis results will remain same as with version 2.6.0. There are some changes in the check sample swap part that will affect both hg19 and hg38.

Main

Support for both hg19 and hg38 (GRCh38_decoy_ebv_phiX_alt_hla_chr)
Support for both BAM and CRAM
Changes in the check sample swap
- Because of the availability of the new germline workflow, removed the ANNOAR annotation of the germline variants.
- To follow suit removed the bias annotations for the platypus SNVs in the sample swap detection part.
EVS and ExAC columns are removed in hg19 annotations, so result files are not completely backward compatible, needs a major version bump.

Minor

No control workflow
- Local control max AF increased to 0.05
- Annotation SNP_support_germline is based on the MAF filter value provided in the XML.

Integration test

Have performed integration tests on different scenarios, the results are in Phabricator T5184#87114

Summary by CodeRabbit

New Features
- Support for hg38/GRCh38 reference in analysis tools.
- New options for analysis workflows: --skipREMAP and runlowmaf.
- Enhanced confidence scoring in variant analysis.
- Added raw clustering and filtering steps in TiNDA-related processes.
- Introduced maximum allele frequency controls in variant annotation.
Enhancements
- Updated documentation to version 4.0.0 with detailed descriptions of new features and updates.
- Improved plot generation to use raw clusters.
- Streamlined annotation processes for somatic variants.
- Reformatted command options for better readability in workflows.
Bug Fixes
- Fixed issues with reading genomic data files.
- Adjusted filtering logic in sample swap checks.
- Removed outdated annotations and conditions from variant filtering scripts.
Refactor
- Updated COWorkflowsBasePlugin to version 1.4.2.
- Refactored file output and joining operations in analysis scripts.
- Restructured annotation process for more accurate variant counting.
Documentation
- Revised README.md to reflect new features and workflow adjustments.

…tion of analysisIndelCalling.xml with paths to GRCh38 directories, Changing tool code that required data that is not available for GRCh38.

vinjana

Quite some things are changed. In particular the changes let the code behave differently (bias filtering) and some cases are not covered anymore (e.g. /Tumor_Somatic/)

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl

resources/analysisTools/indelCallingWorkflow/filter_vcf.sh

resources/configurationFiles/analysisIndelCalling.xml

resources/analysisTools/indelCallingWorkflow/TiN_CanopyBasedClustering.R

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl

resources/configurationFiles/analysisIndelCalling.xml

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl

vinjana · 2023-04-03T11:48:55Z

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl


+print "\n$annotation_command\n";
+my $runAnnotation = system($annotation_command);

 if($runAnnotation != 0 ) {
  `rm $jsonFile`;


Suggested change

`rm $jsonFile`;

`rm '$jsonFile'`;

vinjana · 2023-04-03T11:57:47Z

resources/configurationFiles/analysisIndelCallingGRCh38.xml

+                description="-dbtype option used for the ANNOVAR_BINARY (see above)."/>
+
+        <!--Reference annotations-->
+        <cvalue name="CONFIDENCE_OPTS_INDEL" value="--refgenome GRCh38 ftp://ftp.sanger.ac.uk/pub/cancer/dockstore/human/GRCh38_hla_decoy_ebv/core_ref_GRCh38_hla_decoy_ebv.tar.gz" type="string" />


Could you add a description which values are allowed for refgenome?

vinjana · 2023-04-03T11:59:16Z

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

@@ -213,14 +217,12 @@ def main(args):
        penalties = ""
        infos = []

-        if args.no_control:
+        if args.no_control or args.refgenome[0] == 'GRCh38' or args.skipREMAP:


Here "GRCh38" and above "hs37d5"? Wouldn't it be better to use the same identifier namespace, e.g. "GRCh37" und "GRCh38"?

Is the d5 important here?

What about using a functions is_hg37 and is_hg38 that even test multiple alternative identifiers (e.g. hg37, GRCh37, hs37d5 and GRCh38, hg38). This would also avaid that you have to repeat the strings multiple times.

At least (and in any case) it should be documented outside the code, which values are allowed.

The if/elses on the assembly name repeat multiple times. You may consider putting all code into a classes.

I have observed the inconsistency, but I have left hs37d5 in for legacy reasons and worked around it.

hs37d5 is like a version number for 1000 genome reference genome of GRCh37.

Yes, it could be improved, I will look into these.

You could for instance add two functions grch37 and grch38 that each return 'true' or 'false' (0/1 in Perl :P) on match to one or multiple equivalent identifiers. Then you could also match hg37, hg19, hg38, hs37d5, GRCh37, GRCh38, etc. all equivalently.

If you want to get rid of the if/elses completely you'd probably have to extract the branch-code into methods on two classes, e.g. GRCh37 and GRCh38, both implementing the Assembly interface, or so. Not sure whether you want to go that far.

I just looked at the hg38 branch and saw that we had a related discussion about the used identifiers there. Probably the ideal would be to test these identifiers already in the Java/Groovy code, maybe even map from aliases to the canonical names used everywhere else in the code (e.g. hg19->hs37d5). However, maybe it is sufficient to have the identifiers documented well in the XML and maybe the README.

The room for improvement of the if/else spaghetti code remains a valid point, though.

vinjana · 2023-04-03T12:40:52Z

README.md

@@ -83,6 +83,22 @@ TBD

 # Changelist


Are there new workflow options, or were there options removed of which you think that they may have to be changed users (e.g. if the workflow is to be run with moved reference directories, dependent on the sample type, dependent on scientific question, dependent on sequencing-type (WES/WGS), etc.)?

Would you add a description for these options to the README and ensure they are properly documented in the XMLs (e.g. with a short description of the meaning, expected file content and format, default value).

vinjana · 2023-11-20T09:42:59Z

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

Consider limiting the heap-size with this approach.

Conflicts: README.md buildversion.txt

coderabbitai · 2024-01-24T14:29:46Z

Walkthrough

The update to version 4.0.0 brings substantial changes, including support for the hg38/GRCh38 reference genome, updates to the confidence scoring script, removal of certain annotations, and addition of new options like --skipREMAP and runlowmaf. The TiNDA-related processes have been adjusted. Additionally, the COWorkflowsBasePlugin is updated to 1.4.2. These changes reflect a shift towards more current genomic databases and a refinement of analysis tools for indel calling workflows.

Changes

File Path	Change Summary
`README.md`, `buildinfo.txt`	Updated to version 4.0.0, introducing hg38/GRCh38 support, `--skipREMAP`, and `runlowmaf` options, and updating `COWorkflowsBasePlugin` to 1.4.2.
`buildversion.txt`	Semantically updated from version 3.1.1 to 4.0.0.
`.../indelCallingWorkflow/TiN_CanopyBasedClustering.R`	Improved `chr.length` reading, added `shiftLength` column and raw filtering steps, updated plots and `rescueInfo` table to use raw clusters.
`.../indelCallingWorkflow/checkSampleSwap_TiN.pl`	Overhauled parameters, paths, filtering settings, annotation process, somatic number counting, and added allele frequency parsing subroutine.
`.../indelCallingWorkflow/confidenceAnnotation_Indels.py`	Introduced new GnomAD frequency filter, updated headers, refined confidence scoring logic, and added low MAF punishment options.
`.../indelCallingWorkflow/environments/tbi-cluster_checkSampleSwap.sh`	Added a bash script for setting up the Indel Calling Workflow environment on a cluster.
`.../indelCallingWorkflow/filter_vcf.sh`	Removed conditions related to `FILTER_ExAC` and `FILTER_EVS`, and added a condition related to `FILTER_1KGENOMES`.
`.../indelCallingWorkflow/indelCalling.sh`, `.../indelCallingWorkflow/platypusIndelAnnotation.sh`	Reformatted commands for better readability, added sorting step for indels, and modified annotation pipeline for VCF files, replacing annotations related to certain databases with GNOMAD WES and GNOMAD WGS databases. Introduced new parameters related to maximum MAF thresholds for various databases.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 9

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 19ce429 and 61d12f4.

Files ignored due to path filters (7)

IndelCallingWorkflow.jar is excluded by: !**/*.jar
docs/images/denbi.png is excluded by: !**/*.png
resources/analysisTools/indelCallingWorkflow/environments/conda.yml is excluded by: !**/*.yml
resources/analysisTools/indelCallingWorkflow/readbiasfunctions.pyc is excluded by: !**/*.pyc
resources/analysisTools/indelCallingWorkflow/readbiasplots.pyc is excluded by: !**/*.pyc
resources/configurationFiles/analysisIndelCalling.xml is excluded by: !**/*.xml
resources/configurationFiles/analysisIndelCallingGRCh38.xml is excluded by: !**/*.xml

Files selected for processing (11)

README.md (1 hunks)
buildinfo.txt (1 hunks)
buildversion.txt (1 hunks)
resources/analysisTools/indelCallingWorkflow/TiN_CanopyBasedClustering.R (7 hunks)
resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl (10 hunks)
resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.sh (1 hunks)
resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (12 hunks)
resources/analysisTools/indelCallingWorkflow/environments/tbi-cluster_checkSampleSwap.sh (1 hunks)
resources/analysisTools/indelCallingWorkflow/filter_vcf.sh (1 hunks)
resources/analysisTools/indelCallingWorkflow/indelCalling.sh (2 hunks)
resources/analysisTools/indelCallingWorkflow/platypusIndelAnnotation.sh (2 hunks)

Files skipped from review due to trivial changes (3)

buildversion.txt
resources/analysisTools/indelCallingWorkflow/environments/tbi-cluster_checkSampleSwap.sh
resources/analysisTools/indelCallingWorkflow/indelCalling.sh

Additional comments: 18

buildinfo.txt (1)

1-1: The dependency version for COWorkflowsBasePlugin has been updated from 1.4.1 to 1.4.2.

This change is consistent with the PR objectives and AI-generated summary.

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.sh (4)

31-34: Variables related to genomic databases have been renamed to reflect the updated database versions.

The changes align with the PR objectives to support hg38 alongside hg19.

39-39: The --chr_prefix variable is set using the CHR_PREFIX environment variable.

Assuming CHR_PREFIX is correctly set in the environment, this change is appropriate.

42-42: The --chrLengthFile parameter has been updated to use ${CHROM_SIZES_FILE}.

This change is consistent with the PR objectives and AI-generated summary.

50-50: Output file parameters have been renamed to reflect their new purpose.

Renaming these parameters clarifies their usage in the context of the updated workflow.

resources/analysisTools/indelCallingWorkflow/filter_vcf.sh (1)

33-33: The FILTER_1KGENOMES condition has been added to the FILTER_VALUES construction.

This addition is consistent with the PR objectives to remove ExAC and EVS conditions and add support for 1KGENOMES.

resources/analysisTools/indelCallingWorkflow/platypusIndelAnnotation.sh (2)

55-56: Annotations related to GNOMAD WES and WGS databases have been added.

This change aligns with the PR objectives to update the annotation pipeline.

127-128: New parameters for maximum MAF thresholds have been introduced in the confidence annotation step.

The addition of these parameters is consistent with the PR objectives to refine the confidence annotation process.

resources/analysisTools/indelCallingWorkflow/TiN_CanopyBasedClustering.R (5)

64-65: The chr.length data frame is now read using read_tsv and a new column shiftLength is added.

This change is consistent with the PR objectives to improve the reading of chromosome length data.

127-129: A raw filtering step has been added using mutate to create a new column rawCluster.

This addition is consistent with the PR objectives to introduce a new raw filtering step for cluster analysis.

191-196: Plots have been modified to use raw clusters instead of canopy clusters.

This change aligns with the PR objectives to adjust plotting functions to utilize raw clusters.

230-243: The rescueInfo table creation has been refactored to include raw clustering information.

This refactoring is consistent with the PR objectives to include raw clustering information in the rescueInfo table.

287-291: Minor adjustments have been made to file output and joining operations.

These adjustments are consistent with the PR objectives to refine the script.

README.md (1)

96-110: The changelog in the README.md file provides a clear summary of the major, minor, and patch changes for the version update to 4.0.0. It is important to ensure that all mentioned changes are reflected in the updated code and documentation.
resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl (4)
21-25: The script parameters have been updated to remove $biasScript and add $chr_prefix, $outfile_tindaVCF, and $outfile_SJ. Ensure that all references to the removed parameter have been updated accordingly and that the new parameters are used correctly throughout the script.

Verification successful

The verification process confirms that $biasScript is no longer used in the checkSampleSwap_TiN.pl script. The new parameters $chr_prefix, $outfile_tindaVCF, and $outfile_SJ have been added and are being used correctly within the script.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that $biasScript is no longer used in the script.
rg --type perl '$biasScript' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
# Verify that the new parameters are used correctly.
ast-grep --lang perl --pattern $'my ($$$, $chr_prefix, $$$, $outfile_tindaVCF, $$$, $outfile_SJ) = $$$;' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
Length of output: 381

Script:
#!/bin/bash
# Confirm that $biasScript is not used in the script.
rg --type perl '\$biasScript' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl

# Verify that the new parameters $chr_prefix, $outfile_tindaVCF, and $outfile_SJ are used correctly in the script.
rg --type perl '\$chr_prefix' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg --type perl '\$outfile_tindaVCF' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg --type perl '\$outfile_SJ' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
Length of output: 982
* 396-416: The script has been updated to count the number of somatic variants. Ensure that the logic for counting and classifying somatic variants is consistent with the updated workflow requirements and that the counts are being used appropriately downstream.

Verification successful

The verification process has confirmed the presence of the patterns Tumor_Somatic_Common, Tumor_Somatic_Rare, Control_Somatic_Common, and Control_Somatic_Rare in the checkSampleSwap_TiN.pl script. This indicates that the script contains logic to count somatic variants based on these categories, which aligns with the provided code snippet.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the logic for counting somatic variants is consistent with the workflow requirements.
ast-grep --lang perl --pattern $'if ($$$ =~ /Tumor_Somatic_Common/) { $$$ }' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
ast-grep --lang perl --pattern $'if ($$$ =~ /Tumor_Somatic_Rare/) { $$$ }' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
ast-grep --lang perl --pattern $'if ($$$ =~ /Control_Somatic_Common/) { $$$ }' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
ast-grep --lang perl --pattern $'if ($$$ =~ /Control_Somatic_Rare/) { $$$ }' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
Length of output: 1019

Script:
#!/bin/bash
# Use ripgrep to verify the logic for counting somatic variants in the Perl script.
rg 'if.*Tumor_Somatic_Common.*\{.*\}' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg 'if.*Tumor_Somatic_Rare.*\{.*\}' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg 'if.*Control_Somatic_Common.*\{.*\}' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg 'if.*Control_Somatic_Rare.*\{.*\}' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
Length of output: 431

Script:
#!/bin/bash
# Use ripgrep with a more lenient pattern to verify the logic for counting somatic variants in the Perl script.
rg 'Tumor_Somatic_Common' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg 'Tumor_Somatic_Rare' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg 'Control_Somatic_Common' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg 'Control_Somatic_Rare' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
Length of output: 541
* 453-453: The script includes commands to remove intermediate files. Ensure that these cleanup operations are necessary and that no data required for downstream processes is being deleted.

Verification successful

The search results indicate that the filenames constructed with snvs_.*\.GTfiltered_gnomAD\.SomaticIn\.vcf and snvs_.*\.GTfiltered_gnomAD\.Germline\.Rare\.txt patterns do not appear elsewhere in the codebase, except for their definitions in checkSampleSwap_TiN.pl. This suggests that these files are not referenced in other parts of the codebase and are likely safe to delete as part of cleanup operations.

However, the pattern snvs_.*\.GTfiltered_gnomAD\.Germline\.Rare\.vcf is found in the checkSampleSwap_TiN.pl script, which is the same file where the deletion command is issued. This is expected as it is the definition of the variable that holds the filename.

Since no other references to these files were found in the codebase, it can be inferred that the cleanup operation in the checkSampleSwap_TiN.pl script does not remove data required for downstream processes within the scope of the codebase we have access to.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the removed files are not required for downstream processes.
rg --type perl '$snvsGT_RawFile' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg --type perl '$snvsGT_gnomADFile' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
Length of output: 208

Script:
#!/bin/bash
# Find where the variables $snvsGT_somatic, $snvsGT_germlineRare, and $snvsGT_germlineRare_txt are assigned values
rg --type perl '\$snvsGT_somatic\s*=' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg --type perl '\$snvsGT_germlineRare\s*=' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl
rg --type perl '\$snvsGT_germlineRare_txt\s*=' resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl

# Assuming the above commands give us the filenames, we will then search for these filenames across the codebase.
# Placeholder for searching the actual filenames across the codebase:
# rg 'actual_filename_of_$snvsGT_somatic'
# rg 'actual_filename_of_$snvsGT_germlineRare'
# rg 'actual_filename_of_$snvsGT_germlineRare_txt'
Length of output: 642

Script:
#!/bin/bash
# Search the entire codebase for references to the filenames identified.
rg 'snvs_.*\.GTfiltered_gnomAD\.SomaticIn\.vcf'
rg 'snvs_.*\.GTfiltered_gnomAD\.Germline\.Rare\.vcf'
rg 'snvs_.*\.GTfiltered_gnomAD\.Germline\.Rare\.txt'
Length of output: 668
* 472-486: > 📝 **NOTE** > This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [456-486]
The subroutine parse_AF is introduced to parse allele frequencies from annotation matches. Ensure that this subroutine is robust and handles all expected input formats correctly. Additionally, the median function is added; verify that it is used correctly and that its results are handled appropriately.

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl

coderabbitai

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 61d12f4 and e4f9c9b.

Files selected for processing (2)

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl (10 hunks)
resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (13 hunks)

Additional comments: 16

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl (7)

21-25: The parameters $biasScript, $outfile_RG, $outfile_SR, and $outfile_AS have been removed, and new parameters $chr_prefix, $outfile_tindaVCF, and $outfile_SJ have been added. Ensure that the removal of parameters does not affect any downstream processes and that the new parameters are properly integrated and used within the script.

453-463: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [460-479]

A new subroutine parse_AF has been added to parse allele frequency from annotation lines. Ensure that this subroutine is robust against different formats of allele frequency data that may be encountered in the annotation lines.

480-490: A median function has been added to calculate the median allele frequency. Verify that this function is called with the correct data and that it handles edge cases properly, such as when the list of values is empty or contains only one element.

193-193: The filtering logic has been updated to include a new chromosome prefix variable $chr_prefix. Confirm that this variable is being set correctly and that the updated regex pattern correctly matches the intended chromosome names.

244-248: The annotation command has been updated to include new databases for annotation and remove some previous ones. Ensure that the new annotation sources are correctly formatted and accessible, and that the removal of the old sources does not negatively impact the analysis.

432-445: The script calculates percentages for various metrics and stores them in a JSON file. Verify that the calculations are correct and that division by zero is handled appropriately.

457-457: File cleanup operations have been updated to remove additional files. Confirm that these files are no longer needed and that their removal does not affect any subsequent steps in the workflow.

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (9)

55-67: The get_fixed_headers function has been modified to conditionally include headers based on the args.refgenome and args.no_control parameters. This is a good use of conditional logic to adapt the headers to the context of the analysis.

70-87: The get_variable_headers function uses a dictionary to map header names to regular expressions. This is a maintainable approach that allows for easy updates to header names.

89-98: The check_max_maf function is a new addition that checks if any allele frequency in a given column exceeds a maximum threshold. This is a critical function for filtering variants based on frequency data and is a good example of encapsulating logic into a reusable component.

163-163: The addition of a new filter FREQ in the VCF header is consistent with the PR objectives to introduce new filtering criteria based on allele frequency data.

179-180: The variables fixed_headers and variable_headers are set using the newly added functions. This is a good example of using functions to modularize code and make it more readable.

283-299: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [257-296]

The logic for setting flags based on various databases and allele frequencies has been refactored. This is a significant change that aligns with the PR objectives to adjust the annotation based on the MAF filter value from the XML.

357-385: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [313-368]

The confidence scoring logic has been modified, and while the previous review comments suggested refactoring this complex logic, the developer has chosen not to do so in this PR. The changes made here seem to be in line with the new allele frequency filtering logic.

478-536: The region confidence scoring logic has been updated to include new conditions. This is a complex section of code that could benefit from further refactoring, but it appears to be correctly implemented based on the PR objectives.

641-657: New command-line arguments have been added to control the maximum allele frequencies for different databases and the option to run low MAF punishment. This is consistent with the PR objectives and provides users with more control over the filtering process.

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between e4f9c9b and cdd1988.

Files selected for processing (1)

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (13 hunks)

Additional comments: 6

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (6)

55-67: The get_fixed_headers function dynamically adjusts headers based on the reference genome and control data presence. This is a good practice for maintainability and flexibility. However, ensure that all potential headers are accounted for in downstream processing.

70-87: The get_variable_headers function effectively manages variable headers based on conditions such as the reference genome used and the presence of control data. This modular approach enhances code readability and maintainability.

89-98: The check_max_maf function correctly identifies if any allele frequency exceeds the maximum allowed. This is crucial for filtering variants based on population frequency data. Ensure that the help dictionary structure and column_valid_key usage are consistent throughout the script.

45-103: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [100-114]

The header generation in the main function is well-structured, incorporating both fixed and additional headers. This approach allows for flexible header management. Ensure that all headers are correctly documented and used consistently in downstream analyses.

179-180: The retrieval of fixed and variable headers using get_fixed_headers and get_variable_headers functions is correctly implemented. This ensures that the script dynamically adjusts to the input data's specifics, such as the presence of control samples and the reference genome used.

639-655: The argument parsing section is comprehensive, providing a wide range of options for customizing the script's behavior. Ensure that the help messages are clear and informative for all users, particularly for complex options like --hetcontr and --homcontr.

coderabbitai · 2024-01-26T10:08:55Z

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

+## Helper functions
+def is_hg37(args):
+    return args.refgenome[0] == "hs37d5"
+
+def is_hg38(args):
+    return args.refgenome[0] == "GRCh38"
+
+def get_fixed_headers(args):
+    fixed_headers = ["^QUAL$", "^INFO$", "^FILTER$", "MAPABILITY", "SIMPLE_TANDEMREPEATS",
+                     "REPEAT_MASKER", "^CONFIDENCE$", "^CLASSIFICATION$", "^REGION_CONFIDENCE$",
+                     "^PENALTIES$", "^REASONS$"]
+
+    hs37d5_headers = ["DAC_BLACKLIST", "DUKE_EXCLUDED", "HISEQDEPTH", "SELFCHAIN"]
+    if is_hg37(args):
+        fixed_headers += hs37d5_headers
+
+    if not args.no_control:
+        fixed_headers += ["^INFO_control", "^ANNOTATION_control$"]
+
+    return fixed_headers


The implementation of is_hg37 and is_hg38 relies on hard-coded reference genome names. This approach is not scalable or flexible for future reference genomes. Consider adopting a more dynamic method for genome version checks.

- return args.refgenome[0] == "hs37d5" - return args.refgenome[0] == "GRCh38" + return args.refgenome[0].startswith("hs37") + return args.refgenome[0].startswith("GRCh38")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

## Helper functions

def is_hg37(args):

return args.refgenome[0] == "hs37d5"

def is_hg38(args):

return args.refgenome[0] == "GRCh38"

def get_fixed_headers(args):

fixed_headers = ["^QUAL$", "^INFO$", "^FILTER$", "MAPABILITY", "SIMPLE_TANDEMREPEATS",

"REPEAT_MASKER", "^CONFIDENCE$", "^CLASSIFICATION$", "^REGION_CONFIDENCE$",

"^PENALTIES$", "^REASONS$"]

hs37d5_headers = ["DAC_BLACKLIST", "DUKE_EXCLUDED", "HISEQDEPTH", "SELFCHAIN"]

if is_hg37(args):

fixed_headers += hs37d5_headers

if not args.no_control:

fixed_headers += ["^INFO_control", "^ANNOTATION_control$"]

return fixed_headers

## Helper functions

def is_hg37(args):

return args.refgenome[0].startswith("hs37")

def is_hg38(args):

return args.refgenome[0].startswith("GRCh38")

def get_fixed_headers(args):

fixed_headers = ["^QUAL$", "^INFO$", "^FILTER$", "MAPABILITY", "SIMPLE_TANDEMREPEATS",

"REPEAT_MASKER", "^CONFIDENCE$", "^CLASSIFICATION$", "^REGION_CONFIDENCE$",

"^PENALTIES$", "^REASONS$"]

hs37d5_headers = ["DAC_BLACKLIST", "DUKE_EXCLUDED", "HISEQDEPTH", "SELFCHAIN"]

if is_hg37(args):

fixed_headers += hs37d5_headers

if not args.no_control:

fixed_headers += ["^INFO_control", "^ANNOTATION_control$"]

return fixed_headers

coderabbitai · 2024-01-26T10:08:55Z

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

-            region_conf -= 1
-            reasons += "SelfchainAndOrSegdup(-1)"
-
-        if any(word in help["REPEAT_MASKER"] for word in ["Simple_repeat", "Low_", "Satellite", ]) or help["SIMPLE_TANDEMREPEATS_VALID"]:
-            region_conf -= 2
-            reasons += "Repeat(-2)"
-        # other repeat elements to penalize at least a bit
-        elif help["REPEAT_MASKER_VALID"]:
-            region_conf -= 1
-            reasons += "Other_repeat(-1)"
-
-        # Mapability is 1 for unique regions, 0.5 for regions appearing twice, 0.33... 3times, ...
-        # Everything with really high number of occurences is artefacts
-        # does not always correlate with the above regions
-        # is overestimating badness bc. of _single_ end read simulations
-        if help["MAPABILITY"] == ".":
-            # in very rare cases (CEN), there is no mapability => ".", which is not numeric but interpreted as 0
-            region_conf -= 5
-            reasons += "Not_mappable(-5)"
-        else:
-            reduce = 0
-            # can have several entries for indels, e.g. 0.5&0.25 - take worst (lowest) or best (highest)?!
-            mapability = min(map(float, help["MAPABILITY"].split("&"))) # just chose one - here: min
-            if mapability < 0.5:
-                # 0.5 does not seem to be that bad: region appears another time in
-                # the genome and we have paired end data!
+
+        ## DUKE, DAC, Hiseq, Self chain are only available for hg19 reference genome
+        if args.refgenome[0] == "hs37d5" and not args.skipREMAP:
+
+            if help["DUKE_EXCLUDED_VALID"] or help["DAC_BLACKLIST_VALID"] or help["HISEQDEPTH_VALID"]:
+                region_conf -= 3 # really bad region, usually centromeric repeats
+                reasons += "Blacklist(-3)"
+
+            if help["ANNOVAR_SEGDUP_COL_VALID"] or help["SELFCHAIN_VALID"]:
                region_conf -= 1
-                reduce += 1
+                reasons += "SelfchainAndOrSegdup(-1)"

-                #if mapability < 0.4: # 3-4 times appearing region is worse but still not too bad
-                #    region_conf -= 1
-                #    reduce += 1
+            if help["ANNOVAR_SEGDUP_COL_VALID"]:
+                region_conf -= 1
+                reasons += "Segdup(-1)"

-                if mapability < 0.25: # > 4 times appearing region
+            if any(word in help["REPEAT_MASKER"] for word in ["Simple_repeat", "Low_", "Satellite", ]) or help["SIMPLE_TANDEMREPEATS_VALID"]:
+                region_conf -= 2
+                reasons += "Repeat(-2)"
+            # other repeat elements to penalize at least a bit
+            elif help["REPEAT_MASKER_VALID"]:
+                region_conf -= 1
+                reasons += "Other_repeat(-1)"
+
+            # Mapability is 1 for unique regions, 0.5 for regions appearing twice, 0.33... 3times, ...
+            # Everything with really high number of occurences is artefacts
+            # does not always correlate with the above regions
+            # is overestimating badness bc. of _single_ end read simulations
+            if help["MAPABILITY"] == ".":
+                # in very rare cases (CEN), there is no mapability => ".", which is not numeric but interpreted as 0
+                region_conf -= 5
+                reasons += "Not_mappable(-5)"
+            else:
+                reduce = 0
+                # can have several entries for indels, e.g. 0.5&0.25 - take worst (lowest) or best (highest)?!
+                mapability = min(map(float, help["MAPABILITY"].split("&"))) # just chose one - here: min
+                if mapability < 0.5:
+                    # 0.5 does not seem to be that bad: region appears another time in
+                    # the genome and we have paired end data!
                    region_conf -= 1
                    reduce += 1

-                if mapability < 0.1: # > 5 times is bad
-                    region_conf -= 2
-                    reduce += 2
+                    #if mapability < 0.4: # 3-4 times appearing region is worse but still not too bad
+                    #    region_conf -= 1
+                    #    reduce += 1

-                if mapability < 0.05: # these regions are clearly very bad (Lego stacks)
-                    region_conf -= 3
-                    reduce += 3
+                    if mapability < 0.25: # > 4 times appearing region
+                        region_conf -= 1
+                        reduce += 1
+
+                    if mapability < 0.1: # > 5 times is bad
+                        region_conf -= 2
+                        reduce += 2
+
+                    if mapability < 0.05: # these regions are clearly very bad (Lego stacks)
+                        region_conf -= 3
+                        reduce += 3

-                reasons += "Low_mappability(%s=>-%d)"%(help["MAPABILITY"], reduce)
+                    reasons += "Low_mappability(%s=>-%d)"%(help["MAPABILITY"], reduce)

        if classification != "somatic" and not "PASS" in help["FILTER"]:
            # such an indel is probably also "unclear" and not "germline"


📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [257-534]

The variant classification and confidence scoring logic is extensive and handles a wide range of conditions. While the logic appears correct, the extensive use of magic numbers and complex conditional structures makes this section hard to maintain. Consider refactoring for clarity and maintainability, possibly by breaking down into smaller functions or using a more descriptive approach for magic numbers.

- confidence -= 1 + confidence -= CONFIDENCE_DECREMENT_FOR_CONDITION

Not working on this logic in this PR.

@NagaComBio Understood, thank you for the clarification. If there's anything else you need assistance with, feel free to ask!

coderabbitai

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between cdd1988 and 4a2b713.

Files selected for processing (1)

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (13 hunks)

Additional comments: 3

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py (3)

48-70: The addition of helper functions is_hg37, is_hg38, get_fixed_headers, and get_variable_headers improves modularity and readability by encapsulating specific logic for determining the reference genome and generating header lists based on conditions. This approach aligns well with best practices for maintainability and code clarity.

91-100: The check_max_maf function introduces a streamlined way to filter variants based on allele frequency, enhancing the script's flexibility in handling different databases and their respective maximum allele frequency thresholds. This function is a valuable addition for maintaining the accuracy and specificity of variant annotation.

475-539: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [102-658]

The main function has been significantly updated to incorporate new command-line arguments for controlling maximum allele frequencies and to utilize the newly added helper functions for conditional logic based on the reference genome and control data presence. These changes enhance the script's functionality and adaptability to different datasets and analysis requirements. However, ensure that all new command-line arguments are documented and validated to prevent potential user errors.

coderabbitai · 2024-01-26T10:47:08Z

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

-            region_conf -= 1
-            reasons += "SelfchainAndOrSegdup(-1)"
-
-        if any(word in help["REPEAT_MASKER"] for word in ["Simple_repeat", "Low_", "Satellite", ]) or help["SIMPLE_TANDEMREPEATS_VALID"]:
-            region_conf -= 2
-            reasons += "Repeat(-2)"
-        # other repeat elements to penalize at least a bit
-        elif help["REPEAT_MASKER_VALID"]:
-            region_conf -= 1
-            reasons += "Other_repeat(-1)"
-
-        # Mapability is 1 for unique regions, 0.5 for regions appearing twice, 0.33... 3times, ...
-        # Everything with really high number of occurences is artefacts
-        # does not always correlate with the above regions
-        # is overestimating badness bc. of _single_ end read simulations
-        if help["MAPABILITY"] == ".":
-            # in very rare cases (CEN), there is no mapability => ".", which is not numeric but interpreted as 0
-            region_conf -= 5
-            reasons += "Not_mappable(-5)"
-        else:
-            reduce = 0
-            # can have several entries for indels, e.g. 0.5&0.25 - take worst (lowest) or best (highest)?!
-            mapability = min(map(float, help["MAPABILITY"].split("&"))) # just chose one - here: min
-            if mapability < 0.5:
-                # 0.5 does not seem to be that bad: region appears another time in
-                # the genome and we have paired end data!
+
+        ## DUKE, DAC, Hiseq, Self chain are only available for hg19 reference genome
+        if args.refgenome[0] == "hs37d5" and not args.skipREMAP:
+
+            if help["DUKE_EXCLUDED_VALID"] or help["DAC_BLACKLIST_VALID"] or help["HISEQDEPTH_VALID"]:
+                region_conf -= 3 # really bad region, usually centromeric repeats
+                reasons += "Blacklist(-3)"
+
+            if help["ANNOVAR_SEGDUP_COL_VALID"] or help["SELFCHAIN_VALID"]:
                region_conf -= 1
-                reduce += 1
+                reasons += "SelfchainAndOrSegdup(-1)"

-                #if mapability < 0.4: # 3-4 times appearing region is worse but still not too bad
-                #    region_conf -= 1
-                #    reduce += 1
+            if help["ANNOVAR_SEGDUP_COL_VALID"]:
+                region_conf -= 1
+                reasons += "Segdup(-1)"

-                if mapability < 0.25: # > 4 times appearing region
+            if any(word in help["REPEAT_MASKER"] for word in ["Simple_repeat", "Low_", "Satellite", ]) or help["SIMPLE_TANDEMREPEATS_VALID"]:
+                region_conf -= 2
+                reasons += "Repeat(-2)"
+            # other repeat elements to penalize at least a bit
+            elif help["REPEAT_MASKER_VALID"]:
+                region_conf -= 1
+                reasons += "Other_repeat(-1)"
+
+            # Mapability is 1 for unique regions, 0.5 for regions appearing twice, 0.33... 3times, ...
+            # Everything with really high number of occurences is artefacts
+            # does not always correlate with the above regions
+            # is overestimating badness bc. of _single_ end read simulations
+            if help["MAPABILITY"] == ".":
+                # in very rare cases (CEN), there is no mapability => ".", which is not numeric but interpreted as 0
+                region_conf -= 5
+                reasons += "Not_mappable(-5)"
+            else:
+                reduce = 0
+                # can have several entries for indels, e.g. 0.5&0.25 - take worst (lowest) or best (highest)?!
+                mapability = min(map(float, help["MAPABILITY"].split("&"))) # just chose one - here: min
+                if mapability < 0.5:
+                    # 0.5 does not seem to be that bad: region appears another time in
+                    # the genome and we have paired end data!
                    region_conf -= 1
                    reduce += 1

-                if mapability < 0.1: # > 5 times is bad
-                    region_conf -= 2
-                    reduce += 2
+                    #if mapability < 0.4: # 3-4 times appearing region is worse but still not too bad
+                    #    region_conf -= 1
+                    #    reduce += 1

-                if mapability < 0.05: # these regions are clearly very bad (Lego stacks)
-                    region_conf -= 3
-                    reduce += 3
+                    if mapability < 0.25: # > 4 times appearing region
+                        region_conf -= 1
+                        reduce += 1
+
+                    if mapability < 0.1: # > 5 times is bad
+                        region_conf -= 2
+                        reduce += 2
+
+                    if mapability < 0.05: # these regions are clearly very bad (Lego stacks)
+                        region_conf -= 3
+                        reduce += 3

-                reasons += "Low_mappability(%s=>-%d)"%(help["MAPABILITY"], reduce)
+                    reasons += "Low_mappability(%s=>-%d)"%(help["MAPABILITY"], reduce)

        if classification != "somatic" and not "PASS" in help["FILTER"]:
            # such an indel is probably also "unclear" and not "germline"


📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [48-658]

While the script has been enhanced with new functionalities and improved modularity through the addition of helper functions, it's important to continuously evaluate the complexity of the variant annotation logic. Consider future refactoring opportunities to further simplify and decompose complex logic into smaller, more manageable functions. This will aid in maintainability and ease of understanding for new contributors to the project.

Holtkamp and others added 23 commits January 27, 2020 14:12

Adapted indelCallingWorkflow to work for GRCh38. Main changes: Adapta…

dd083fd

…tion of analysisIndelCalling.xml with paths to GRCh38 directories, Changing tool code that required data that is not available for GRCh38.

Added hg38 REFERENCE_GENOME and INDEX_PREFIX to config file

5198ab6

hg38 reference directory typo

cb34845

Updating hg38DatabasesDirectory

bff626e

TiNDA updated for hg38; Added clustering with raw filters

7815c9a

reading raw bgzip file

597ae75

Indel calling from CRAM files

0427748

Moving files to ngs_share

db4044c

Merging main branch

686a65f

Moving GRCh38 config to a separate XML

2a1523b

Changing chr column

5ac9468

Removing annovar and bias analysis in sample swap

f5e9f8b

Removing unused variables

b32f561

Analysis bam header for hg19

517f2f7

Blacklist/selfchain only for hg19

05d0c36

Max AF to 5% for local control filtering in nocontrol

627df27

Adding a WES local control for hg38

2deb979

Adjusting gnomad variable

6b6cf39

Updating GRCh38 XML

e2ee87e

Analysing bam header for hg19

0471b16

Fixing typo

08dec0d

Removing bias analysis scripts from the repo

ef7c1a7

Passing the max MAF values to confidence annotation

6247724

NagaComBio marked this pull request as ready for review March 12, 2021 10:55

NagaComBio requested a review from vinjana March 12, 2021 10:55

vinjana reviewed Mar 15, 2021

View reviewed changes

resources/analysisTools/indelCallingWorkflow/checkSampleSwap_TiN.pl Show resolved Hide resolved

Sorting chrs alphanumerically

064d592

vinjana reviewed Apr 12, 2021

View reviewed changes

resources/analysisTools/indelCallingWorkflow/filter_vcf.sh Outdated Show resolved Hide resolved

resources/configurationFiles/analysisIndelCalling.xml Show resolved Hide resolved

resources/configurationFiles/analysisIndelCalling.xml Outdated Show resolved Hide resolved

NagaComBio added 2 commits April 22, 2021 11:34

Nocontrol: 1000genomes - Using AF for filtering instead of EUR_AF

7024ae7

Nocontrol filtering: removed commented lines

7067082

vinjana reviewed May 4, 2021

View reviewed changes

NagaComBio added 15 commits October 1, 2021 09:26

Increasing the mem requirement

b06fdbe

Adding chr_prefix for hg38 chrs

849d77d

Fix column swap bug for nocontrol workflow

6220c3b

Update ngs_share path

41f1d64

Resource update

e2b1e59

Update confidence scoring based on SeqC2, runlowmaf

3c8effa

Merge branch 'master' into hg38

8be0cd0

Removing tVAF from penalities & alleleBias to -1

2394ac0

Upgrade to gencodev39 for hg38

b282b66

GnomAD and local control based confidence annotation

219ada5

Update Readme

c7bb74f

Bug fix

9f57365

Reverting SNP based confidence scoring

05776cc

Update reference

9a2ee42

Update PLATYPUS_PARAMS

6b206e2

vinjana reviewed Apr 3, 2023

View reviewed changes

vinjana reviewed Nov 20, 2023

View reviewed changes

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py

Copy link

Member

vinjana Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider limiting the heap-size with this approach.

NagaComBio added 2 commits January 24, 2024 15:22

Merge branch 'master' into hg38

7bfc634

Conflicts: README.md buildversion.txt

Update README.md

61d12f4

coderabbitai bot reviewed Jan 24, 2024

View reviewed changes

NagaComBio added 2 commits January 24, 2024 22:06

Creating helper functions

331e123

Moving backticks to system cmd

e4f9c9b

coderabbitai bot reviewed Jan 24, 2024

View reviewed changes

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py Outdated Show resolved Hide resolved

resources/analysisTools/indelCallingWorkflow/confidenceAnnotation_Indels.py Outdated Show resolved Hide resolved

Moving FREQ from filter to MAFCommon flag in info

cdd1988

coderabbitai bot reviewed Jan 26, 2024

View reviewed changes

Add more annotations options for reference genome

4a2b713

coderabbitai bot reviewed Jan 26, 2024

View reviewed changes

NagaComBio requested a review from vinjana February 23, 2024 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hg19 and hg38 from same workflow version #19

Hg19 and hg38 from same workflow version #19

NagaComBio commented Mar 5, 2021 •

edited by coderabbitai bot

Loading

vinjana left a comment •

edited

Loading

vinjana Apr 3, 2023

vinjana Apr 3, 2023

vinjana Apr 3, 2023

NagaComBio Apr 4, 2023

vinjana Apr 4, 2023 •

edited

Loading

vinjana Apr 4, 2023

vinjana Apr 3, 2023

vinjana Nov 20, 2023

coderabbitai bot commented Jan 24, 2024 •

edited

Loading

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

CodeRabbit Discord Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Jan 26, 2024

coderabbitai bot Jan 26, 2024 •

edited

Loading

NagaComBio Jan 26, 2024

coderabbitai bot Jan 26, 2024

coderabbitai bot left a comment

coderabbitai bot Jan 26, 2024

Hg19 and hg38 from same workflow version #19

Are you sure you want to change the base?

Hg19 and hg38 from same workflow version #19

Conversation

NagaComBio commented Mar 5, 2021 • edited by coderabbitai bot Loading

Overview

Main

Minor

Integration test

Summary by CodeRabbit

vinjana left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinjana Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot commented Jan 24, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2024

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 26, 2024

Choose a reason for hiding this comment

NagaComBio commented Mar 5, 2021 •

edited by coderabbitai bot

Loading

vinjana left a comment •

edited

Loading

vinjana Apr 4, 2023 •

edited

Loading

coderabbitai bot commented Jan 24, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot Jan 26, 2024 •

edited

Loading