CNVds
is an R package developed to evaluate the pathogenicities of
human autosomal copy number variations (CNVs) based on the dosage
sensitivity scores of the genes that are encompassed within. Dosage
sensitivity (DS) refers to the phenotypic changes a gene may produce due
to changes in its copy numbers. This package accepts inputs in the
format of CNV calls, which includes the chromosome number, start & end
position, type of variation (deletion ‘DEL’ or duplication ‘DUP’), and
the number of copies. Such information can be obtains using existing CNV
detection tools like
PennCNV. In addition,
this package can also accept pre-defined lists of genes by gene symbols
(e.g. ‘BRCA1’, ‘CDC45’) for matching genes to their corresponding DS
scores.
This package focuses on the following three score metrics:
- Probability of loss intolerance (pLI): the probability that a gene is intolerant to a loss of function mutation.
- Probability of haploinsufficiency (pHI): the probability that a gene is sensitive to copy number loss.
- Probability of triplosensitivity (pTS): the probability that a gene is sensitive to copy number gain.
The reference DS scores are based on recent publications of genome-wide
dosage sensitivity mappings. Compared to solely measuring the physical
attributes of the CNVs (e.g. size, number of genes, or known association
with disease), DS scores provide additional metrics that are useful in
the systematic identification of pathogenic CNVs. The CNVds
package
was developed using R version 4.2.1 (2022-06-23 ucrt)
,
Platform: x86_64-windows-10_x64
, and running under:
version build 19044
.
You can install the development version of CNVds from GitHub with:
require("devtools")
devtools::install_github("jenydu/CNVds", build_vignettes = TRUE)
library("CNVds")
To run the Shiny app:
runCNVds()
ls("package:CNVds")
data(package = "CNVds")
browseVignettes("CNVds")
CNVds
currently contains 8 functions:
annotateCNV()
: Given a CNV call, return all genes that are fully contained in this region based on the GRCh37 or GRCh38 reference genome (specified by user).findpLI()
: Given a list of genes, find the corresponding pLI (prob. of loss intolerance) scores for each gene.findpHI()
: Given a list of genes, find the corresponding pHI (prob. of haploinsufficiency) scores for each gene.findpTS()
: Given a list of genes, find the corresponding pTS (prob. of triplosensitivity) scores for each gene.geneDSscores()
: Given a single gene symbol, find its associated pLI, pHI, and pTS scores.geneNoScores()
: Given a list of genes and the name of the DS score (pLI/pHI/pTS), return the list of genes that don’t have a score in the corresponding reference table, and print out the percentage of genes without a score. (Note: it is recommended that you run this function before finding the dosage sensitivity scores, for sanity check purposes.)plotCNVcounts()
: Given a list of CNV calls, plot the total number of unique duplication and deletion CNV regions in each chromosome.plotScoresByChr()
: Given a list of genes and their corresponding DS scores (pLI/pHI/pTS), plot the distribution of the scores and label genes that are especially dosage-sensitive (i.e. above a certain user-defined threshold).
The author of the package is Jun Ni Du. The approach to dosage sensitivity score calculation is based on Alexander-Bloch, A. et al. (2022)’s publication, with no direct codes taken. The reference tables are obtained from the following publications and/or packages:
grch37.rda
andgrch38.rda
are obtained from theannotables
package. The source files are published by Ensembl (Cunningham, F. et al. (2022)). They are subsets of the original datasets to only include the gene symbol, chromosome number, and the start & end position of all autosomal genes.pHaplo_pTriplo_data.rda
is obtained from Collins, R. L. et al. (2022).pLI_data.rda
is obtained from Karczewski, K. J. et al. (2020).
A sample input of 200 CNV calls, sampleInputCNV.csv
, is included for
package demonstration purposes. It is randomly subsetted from Ming, C.
et al. (2021).
The plotScoresByChr and plotCNVcounts functions uses the ggplot
function from the ggplot2
R package. The plotScoresByChr funciton
also uses the drop_na function from the tidyr
R package, and the
setNames function from the stats
R package. The runCNVds function
uses the shiny
and shinyalert
R packages to build the shiny app. Its
implementation adapted codes from the “009-upload” Shiny example, the
runTestingPackage function from the TestingPackage
R package, and
code snippets from online forums such as Stack Overflow (see
References section for a complete list of sources; additional details
can also be found in inst/shiny-scripts/app.R
).
Other than the above mentioned, the rest of functions are implemented solely by the author.
Alexander-Bloch, A. et al. (2022). Copy number variant risk scores associated with cognition, psychopathology, and brain structure in youths in the philadelphia neurodevelopmental cohort. JAMA Psychiatry 79, 699–709. https://doi.org/10.1001/jamapsychiatry.2022.1017
Collins, R. L. et al. (2022). A cross-disorder dosage sensitivity map of the human genome. Cell 185, 3041-3055.e25. https://doi.org/10.1016/j.cell.2022.06.036
Cunningham, F. et al. (2022). Ensembl 2022. Nucleic Acids Research 50(1), D988-D995. https://doi.org/10.1093/nar/gkab1049
Jgraph (2022). Drawio-desktop. Image created by Du, J. Retrieved November 14, 2022, from https://app.diagrams.net/
Karczewski, K. J. et al. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443. https://doi.org/10.1038/s41586-020-2308-7
Ming, C. et al. (2021). Whole genome sequencing–based copy number variations reveal novel pathways and targets in Alzheimer’s disease. Alzheimer’s & Dementia 18, 1846-1867. https://doi.org/10.1002/alz.12507
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
RStudio (2018). rstudio/shiny-examples: 009-upload. Github. https://github.com/rstudio/shiny-examples/tree/main/009-upload
Silva, A. (2022). Anjalisilva/TestingPackage: A Simple R Package Illustrating Components of an R Package: 2019-2022 BCB410H - Applied Bioinformatics, University of Toronto, Canada. GitHub. https://github.com/anjalisilva/TestingPackage
Thothal. (2019). Use default csv file when no file uploaded to shiny app. Stack Overflow. https://stackoverflow.com/questions/55566874/use-default-csv-file-when-no-file-uploaded-to-shiny-app
Turner, S. (2022). Stephenturner/annotables: R data package for annotating/converting Gene IDs. Github. https://github.com/stephenturner/annotables
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4. https://ggplot2.tidyverse.org
Wickham, H., Girlich, M. (2022). tidyr: Tidy Messy Data. https://tidyr.tidyverse.org
This package was developed as part of an assessment for 2022 BCB410H:
Applied Bioinformatics course at the University of Toronto, Toronto,
CANADA. CNVds
welcomes issues, enhancement requests, and other
contributions. To submit an issue, use the GitHub
issues. Many thanks to those
who provided feedback to improve this package.