diff --git a/DESCRIPTION b/DESCRIPTION index 39f208a..79a6ebb 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: FastIntegration Type: Package Title: Fast Integration of Single-cell Data -Version: 0.1.0 +Version: 1.1.0 Author: c( person(given = "Mengwei", family = "Li", email = "li_mengwei@immunol.a-star.edu.sg", role = "aut"), person(given = "Xiaomeng", family = "Zhang", email = "zhang_xiao_meng@immunol.a-star.edu.sg", role = "aut"), diff --git a/R/GetDiscoData.R b/R/GetDiscoData.R new file mode 100644 index 0000000..641488a --- /dev/null +++ b/R/GetDiscoData.R @@ -0,0 +1,116 @@ +GetDiscoSample = function( + verbose = T +) { + if(!require("jsonlite")) { + stop("Please install jsonlite") + } + + tryCatch( + { + if (verbose) { + message("Starting to download metadata from DISCO database") + } + meta = fromJSON("https://www.immunesinglecell.org/api/sample/all") + return(meta) + }, + error=function(cond) { + stop("Network error. Please try again") + } + ) +} + +#' @import jsonlite +#' @export +FindSampleByMetadata = function( + tissue = c(), + disease = c(), + platform = c(), + project.id = c(), + sample.id = c(), + sample.type = c() +) { + meta.all = GetDiscoSample() + if (length(tissue) > 0) { + meta.all = meta.all[which(meta.all$tissue %in% tissue),] + } + if (length(disease) > 0) { + meta.all = meta.all[which(meta.all$disease %in% disease),] + } + if (length(platform) > 0) { + meta.all = meta.all[which(meta.all$platform %in% platform),] + } + if (length(project.id) > 0) { + meta.all = meta.all[which(meta.all$projectId %in% project.id),] + } + if (length(sample.type) > 0) { + meta.all = meta.all[which(meta.all$sampleType %in% sample.type),] + } + + meta.all = meta.all[which(meta.all$processStatus == "QC pass"),] + if (nrow(meta.all) == 0) { + stop("Sorry, no sample is found. Please try to use other filters.") + } else { + return(meta.all) + } +} + +#' @export +DownloadDiscoData = function( + metadata, + expressed.gene = c(), + unexpressed.gene = c(), + dir = "./disco_data" +) { + + meta = metadata + dir.create(dir) + tryCatch( + { + for (i in 1:nrow(meta)) { + message(paste0("Downloading the ", i, "st sample")) + rna = readRDS(url(paste0( + "http://dc.vishuo.com:8887/api/vishuo/download/getExp?project=",meta$projectId[i],"&sample=",meta$sampleId[i] + ))) + + if (length(expressed.gene) > 0) { + for (j in 1:length(expressed.gene)) { + if (length(which(rna@assays$RNA@data[expressed.gene[j],] > 0)) > 0) { + rna = subset(rna, cells = names(which(rna@assays$RNA@data[expressed.gene[j],] > 0))) + } else { + rna = NULL + break + } + } + } + + if (length(unexpressed.gene) > 0) { + for (j in 1:length(unexpressed.gene)) { + if (length(which(rna@assays$RNA@data[unexpressed.gene[j],] == 0)) > 0) { + rna = subset(rna, cells = names(which(rna@assays$RNA@data[unexpressed.gene[j],] == 0))) + } else { + rna = NULL + break + } + } + } + + if (is.null(rna) == F) { + rna@assays$RNA@counts = expm1(rna@assays$RNA@data) + saveRDS(rna, paste0(dir, "/", meta$sampleId[i], ".rds"), compress = F) + } else { + message(paste0("Sking the ", i, "st sample. No cells are found after filtering.")) + } + + } + }, + error=function(cond) { + stop("Network error. Please try again") + } + ) + message("Job finished") +} + + + + + diff --git a/README.md b/README.md index d961a56..a4eff2a 100644 --- a/README.md +++ b/README.md @@ -1,43 +1,45 @@ -# FastIntegration v1.0.0 +# FastIntegration v1.1.0 + FastIntegration is a fast and high-capacity version of Seurat Integration. FastIntegrate can integrate thousands of scRNA-seq datasets and outputs batch-corrected values for downstream analysis. +**Recent update: New functions which allow users to filter and download data in DISCO (), comprising 5200+ single-cell samples!** ## Requirement + FastIntegration requires the following packages: -* [R](https://www.r-project.org/) (>= 4.0.0) -* [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html) (>= 4.0.0) -* [SeuratObject](https://cran.r-project.org/web/packages/SeuratObject/index.html) (>= 4.0.0) -* [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) -* [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) -* [tictoc](https://cran.r-project.org/web/packages/tictoc/index.html) -* [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html) -* [pbmcapply](https://cran.r-project.org/web/packages/pbmcapply/index.html) +- [R](https://www.r-project.org/) (\>= 4.0.0) +- [Seurat](https://cran.r-project.org/web/packages/Seurat/index.html) (\>= 4.0.0) +- [SeuratObject](https://cran.r-project.org/web/packages/SeuratObject/index.html) (\>= 4.0.0) +- [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) +- [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) +- [tictoc](https://cran.r-project.org/web/packages/tictoc/index.html) +- [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html) +- [pbmcapply](https://cran.r-project.org/web/packages/pbmcapply/index.html) We highly recommend you to build R with openblas which will accelerate integration 2-3x times. Here is the common way to do it: -sudo yum install -y openblas openblas-threads openblas-openmp # for centos +sudo yum install -y openblas openblas-threads openblas-openmp \# for centos + +sudo apt-get install libopenblas-dev \# for debian -sudo apt-get install libopenblas-dev # for debian +./configure --enable-R-shlib --enable-byte-compiled-packages --enable-BLAS-shlib --enable-memory-profiling -./configure --enable-R-shlib --enable-byte-compiled-packages \ +--with-blas="-lopenblas" - --enable-BLAS-shlib --enable-memory-profiling \ - - --with-blas="-lopenblas" - ## Installation -```R +``` r devtools::install_github("git@github.com:JinmiaoChenLab/FastIntegrate.git") ``` ## Usage ### Preprocess -```R + +``` r library(Seurat) library(pbmcapply) rna.list = readRDS("rna_list.rds") # read list of Seurat object, each element in list is a sample @@ -50,15 +52,13 @@ for (i in 1:length(rna.list)) { rna.list[[i]] = FindVariableFeatures(rna.list[[i]]) rna.list[[i]] = RenameCells(rna.list[[i]], new.names = paste0(Cells(rna.list[[i]]), "--", i)) } - ``` - - ### Onestop function -For large sample size (> 200 samples), we recommend to use step by step integration. -```R +For large sample size (\> 200 samples), we recommend to use step by step integration. + +``` r library(FastIntegration) # rna.list is the list of seurat object data = OneStopIntegration( @@ -66,11 +66,11 @@ data = OneStopIntegration( tmp.dir = "./test/", max.cores = 30 ) - ``` ### Step by step integration -```R + +``` r library(Seurat) library(pbmcapply) library(FastIntegration) @@ -94,14 +94,13 @@ pbmclapply( rna.integrated = FastIntegration(tmp.dir = "./", npcs = 1:30, slot = "data", features.to.integrate = genes[idx[[i]]]) saveRDS(rna.integrated, paste0("FastIntegrationTmp/inte/inte_", i, ".rds"), compress = F) - }, mc.cores = 20 + }, mc.cores = 20 ) - ``` - ### After integration -```R + +``` r ##### create Seurat obj with the variable features of integration (For very big dataset) ##### features = readRDS("FastIntegrationTmp/others/features.rds") rna.data = pbmclapply( @@ -137,13 +136,27 @@ rna.data = RunPCA(rna.data, features = features) rna.data = FindNeighbors(rna.data, dims = 1:50) rna.data = FindClusters(rna.data, resolution = 0.5, algorithm = 2) rna.data = RunUMAP(rna.data, dims = 1:50) - ``` +### Download data from DISCO + +``` r +##### Filter samples and get metadata ##### +# You can filter samples by their different headers: tissue, disease, platform, project.id. sample.id, sample.type +# For each header, you can select multiple items as follows: +meta = FindSampleByMetadata(tissue = c("blood", "kidney")) + +##### Download sample ##### +# You can further filter cells by specifying expressed or unexpressed genes. +# dir is the location where the files are saved +DownloadDiscoData(meta, expressed.gene = c("CD3E"), unexpressed.gene = c("CD8A"), dir = "./disco") # mostly CD4 T cells (CD3E+CD8A-) + +``` ## Usage Scenario + We have apply FastIntegration to [DISCO](http://www.immunesinglecell.org/) database for integrating thousands of samples. ## License -All other code in this repository is licensed under a [GPL-3](https://www.r-project.org/Licenses/GPL-3) license. +All other code in this repository is licensed under a [GPL-3](https://www.r-project.org/Licenses/GPL-3) license. diff --git a/src/FastIntegration.so b/src/FastIntegration.so index 87595d9..075c3f4 100755 Binary files a/src/FastIntegration.so and b/src/FastIntegration.so differ diff --git a/src/RcppExports.o b/src/RcppExports.o index aee2f85..0482faa 100644 Binary files a/src/RcppExports.o and b/src/RcppExports.o differ diff --git a/src/inetgration.o b/src/inetgration.o index ce1441b..04d7e77 100644 Binary files a/src/inetgration.o and b/src/inetgration.o differ