Skip to content

Commit

Permalink
Merge pull request #8 from 4dn-dcic/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
SooLee authored Apr 2, 2017
2 parents f3637c3 + a8a8dc3 commit 81a0d71
Show file tree
Hide file tree
Showing 23 changed files with 334 additions and 166 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
^.*\.Rproj$
^\.Rproj\.user$
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: Rpairix
Title: Rpairix
Version: 0.0.3
Version: 0.0.4
Authors@R: person("Soo", "Lee", email = "[email protected]", role = c("aut", "cre"))
Description: R binder for pairix, tool for querying a pair of genomic ranges in a pairs file (pairix-indexed bgzipped text file)
Depends:
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Generated by roxygen2: do not edit by hand

export(px_exists)
export(px_keylist)
export(px_query)
export(px_seq1list)
Expand All @@ -9,3 +10,4 @@ useDynLib(Rpairix,get_keylist)
useDynLib(Rpairix,get_keylist_size)
useDynLib(Rpairix,get_lines)
useDynLib(Rpairix,get_size)
useDynLib(Rpairix,key_exists)
30 changes: 30 additions & 0 deletions R/px_exists.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#' Check function on pairix-indexed pairs file.
#'
#' This function allows you to check if a key (chr for 1D, chr pair for 2D) exists in a pairs file.
#'
#' @param filename a pairs file, or a bgzipped text file (sometextfile.gz) with an index file sometextfile.gz.px2 in the same folder.
#' @param key a pair of chromosomes in the query string format (e.g. "chr1|chr2"), or a chromosome for a 1D-indexed pairs file (e.g. "chr1").
#'
#' @keywords pairix check
#' @export px_exists
#' @examples
#' filename = system.file(".","test_4dn.pairs.gz", package="Rpairix")
#' key = "chrX|chrX"
#' res = px_exists(filename, key)
#' print(res)
#'
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz",package="Rpairix")
#' key = "10|20"
#' res = px_exists(filename, key)
#' print(res)
#'
#' filename = system.file(".","merged_nodups.space.chrblock_sorted.subsample1.txt.gz",
#' package="Rpairix")
#' key = "10|20"
#' res = px_exists(filename, key)
#' print(res)
#' @useDynLib Rpairix key_exists
px_exists<-function(filename, key){
out = .C("key_exists", filename, key, as.integer(0))
return(out[[3]][1])
}
9 changes: 8 additions & 1 deletion R/px_keylist.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,19 @@
#' @keywords pairix query 2D
#' @export px_keylist
#' @examples
#' filename = system.file(".","test_4dn.pairs.gz", package="Rpairix")
#' res = px_keylist(filename)
#' print(res)
#'
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz",package="Rpairix")
#' res = px_keylist(filename)
#' print(res)
#'
#' filename = system.file(".","merged_nodups.space.chrblock_sorted.subsample1.txt.gz",
#' package="Rpairix")
#'
#' res = px_keylist(filename)
#' print(res)
#'
#' @useDynLib Rpairix get_keylist_size get_keylist
px_keylist<-function(filename){
# first-round, get the max length and the number of items in the key list.
Expand Down
16 changes: 12 additions & 4 deletions R/px_query.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,28 @@
#' @keywords pairix query 2D
#' @export px_query
#' @examples
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz",package="Rpairix")
#' filename = system.file(".","test_4dn.pairs.gz", package="Rpairix")
#' querystr = "chrX|chrX"
#' res = px_query(filename, querystr)
#' print(res)
#'
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz", package="Rpairix")
#' querystr = "10:1-1000000|20"
#' res = px_query(filename,querystr)
#' res = px_query(filename, querystr)
#' print(res)
#'
#' filename = system.file(".","merged_nodups.space.chrblock_sorted.subsample1.txt.gz",
#' package="Rpairix")
#' querystr = "10:1-1000000|20"
#' res = px_query(filename,querystr)
#' res = px_query(filename, querystr)
#' print(res)
#'
#' @useDynLib Rpairix get_size get_lines
px_query<-function(filename, querystr, max_mem=100000000, stringsAsFactors=FALSE){

# first-round, get the max length and the number of lines of the result.
out =.C("get_size", filename, querystr, as.integer(0), as.integer(0), as.integer(0))
if(out[[5]][1] == -1 ) return(NULL) ## error
if(out[[5]][1] == -1 ) { message("Can't open input file"); return(NULL) } ## error
str_len = out[[4]][1]
n=out[[3]][1]
total_size = str_len * n
Expand Down
5 changes: 5 additions & 0 deletions R/px_seq1list.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@
#' @keywords pairix query 2D
#' @export px_seq1list
#' @examples
#' filename = system.file(".","test_4dn.pairs.gz", package="Rpairix")
#' res = px_seq1list(filename)
#' print(res)
#'
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz",package="Rpairix")
#' res = px_seq1list(filename)
#' print(res)
px_seq1list<-function(filename){
seqpairs = px_keylist(filename)
seq1_list = unique(sapply(seqpairs,function(xx)strsplit(xx,'|',fixed=T)[[1]][1]))
Expand Down
5 changes: 5 additions & 0 deletions R/px_seq2list.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@
#' @keywords pairix query 2D
#' @export px_seq2list
#' @examples
#' filename = system.file(".","test_4dn.pairs.gz", package="Rpairix")
#' res = px_seq2list(filename)
#' print(res)
#'
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz",package="Rpairix")
#' res = px_seq2list(filename)
#' print(res)
px_seq2list<-function(filename){
seqpairs = px_keylist(filename)
seq2_list = unique(sapply(seqpairs,function(xx)strsplit(xx,'|',fixed=T)[[1]][2]))
Expand Down
5 changes: 5 additions & 0 deletions R/px_seqlist.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,13 @@
#' @keywords pairix query 2D
#' @export px_seqlist
#' @examples
#' filename = system.file(".","test_4dn.pairs.gz", package="Rpairix")
#' res = px_seqlist(filename)
#' print(res)
#'
#' filename = system.file(".","merged_nodup.tab.chrblock_sorted.txt.gz",package="Rpairix")
#' res = px_seqlist(filename)
#' print(res)
px_seqlist<-function(filename){
seqpairs = px_keylist(filename)
seq1_list = unique(sapply(seqpairs,function(xx)strsplit(xx,'|',fixed=T)[[1]][1]))
Expand Down
75 changes: 45 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ R --no-site-file --no-environ --no-save --no-restore CMD INSTALL --install-tests
To install a specific version,
```
library(devtools)
install_url("https://github.com/4dn-dcic/Rpairix/archive/0.0.3.zip")
install_url("https://github.com/4dn-dcic/Rpairix/archive/0.0.4.zip")
```


Expand Down Expand Up @@ -72,43 +72,42 @@ px_seq2list(filename)
* The filename is sometextfile.gz and an index file sometextfile.gz.px2 must exist.
* The return value is a vector of second chromosomes.

### Check if a chromosome pair (or chromosome, for 1D) exists
```
px_exists(filename, key)
```
* The filename is sometextfile.gz and an index file sometextfile.gz.px2 must exist.
* Key is a chromosome pair (or a chromosome for 1D)
* The return value is 1 (exists), 0 (not exist), or -1 (error)

## Example run
```
> library(Rpairix)
> filename = "inst/merged_nodup.tab.chrblock_sorted.txt.gz"
> querystr = "10:1-1000000|20"
> filename = "inst/test_4dn.pairs.gz"
> querystr = "chr10:1-3000000|chr20"
> res = px_query(filename,querystr)
> print(res)
V1 V2 V3 V4 V5 V6 V7 V8
1 0 10 624779 1361 0 20 40941397 97868
2 16 10 948577 2120 16 20 59816485 148396
>
> keys = px_keylist("inst/merged_nodup.tab.chrblock_sorted.txt.gz")
V1 V2 V3 V4 V5 V6 V7
1 SRR1658581.51740952 chr10 157600 chr20 167993 - -
2 SRR1658581.33457260 chr10 2559777 chr20 7888262 - +
> keys = px_keylist(filename)
> length(keys)
[1] 1239
[1] 800
> keys[1:10]
[1] "1|1" "1|10" "1|11" "1|12" "1|13" "1|14" "1|15" "1|16" "1|17" "1|18"
>
>chrs = px_seqlist("inst/merged_nodup.tab.chrblock_sorted.txt.gz")
>chrs
[1] "1" "10" "11" "12" "13"
[6] "14" "15" "16" "17" "18"
[11] "19" "2" "20" "21" "22"
[16] "3" "4" "5" "6" "7"
[21] "8" "9" "GL000191.1" "GL000192.1" "GL000193.1"
[26] "GL000194.1" "GL000195.1" "GL000196.1" "GL000197.1" "GL000198.1"
[31] "GL000199.1" "GL000200.1" "GL000201.1" "GL000202.1" "GL000203.1"
[36] "GL000204.1" "GL000205.1" "GL000206.1" "GL000208.1" "GL000209.1"
[41] "GL000210.1" "GL000211.1" "GL000212.1" "GL000213.1" "GL000214.1"
[46] "GL000215.1" "GL000216.1" "GL000217.1" "GL000218.1" "GL000219.1"
[51] "GL000220.1" "GL000221.1" "GL000222.1" "GL000223.1" "GL000224.1"
[56] "GL000225.1" "GL000226.1" "GL000227.1" "GL000228.1" "GL000229.1"
[61] "GL000230.1" "GL000231.1" "GL000232.1" "GL000233.1" "GL000234.1"
[66] "GL000235.1" "GL000236.1" "GL000237.1" "GL000238.1" "GL000239.1"
[71] "GL000240.1" "GL000241.1" "GL000242.1" "GL000243.1" "GL000244.1"
[76] "GL000245.1" "GL000246.1" "GL000247.1" "GL000248.1" "GL000249.1"
[81] "MT" "NC_007605" "X" "Y"
[1] "chr1|chr1" "chr1|chr10" "chr1|chr11"
[4] "chr1|chr12" "chr1|chr13" "chr1|chr14"
[7] "chr1|chr15" "chr1|chr16" "chr1|chr17"
[10] "chr1|chr17_ctg5_hap1"
> chrs = px_seqlist(filename)
> length(chrs)
[1] 82
> chrs[1:10]
[1] "chr1" "chr1_gl000191_random" "chr1_gl000192_random"
[4] "chr10" "chr11" "chr11_gl000202_random"
[7] "chr12" "chr13" "chr14"
[10] "chr15"
> px_exists(filename, "chr10|chr20")
[1] 1
```


Expand All @@ -122,3 +121,19 @@ document()
Individual R functions are written and documented in `R/`. The `src/rpairixlib.c` is the main C source file. Raw data files are under `inst/`.


## Version history
### 0.0.4
* Function px_exists is now added.
* Source is synced with pairix/pypairix 0.1.1.
* 4dn pairs example is added

### 0.0.3
* corrected a typo in README

### 0.0.2
* cleaned up repo
* added more instructions in README

### 0.0.1
* initial release

2 changes: 1 addition & 1 deletion Rpairix.Rproj
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Version: 1.0
Version: 0.0.4

RestoreWorkspace: No
SaveWorkspace: No
Expand Down
Binary file added inst/test_4dn.pairs.gz
Binary file not shown.
Binary file added inst/test_4dn.pairs.gz.px2
Binary file not shown.
36 changes: 36 additions & 0 deletions man/px_exists.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion man/px_keylist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 11 additions & 3 deletions man/px_query.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/px_seq1list.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/px_seq2list.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/px_seqlist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file removed src/Rpairix.so
Binary file not shown.
Loading

0 comments on commit 81a0d71

Please sign in to comment.