Skip to content

Commit

Permalink
AAStringSet support
Browse files Browse the repository at this point in the history
  • Loading branch information
missuse committed Oct 14, 2021
1 parent 1022fc4 commit a41306b
Show file tree
Hide file tree
Showing 33 changed files with 382 additions and 64 deletions.
10 changes: 8 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,17 @@ Description: Identification of plant hydroxyproline rich glycoprotein sequences
Depends: R (>= 3.3.0)
License: file LICENSE
LazyData: true
Imports: xgboost (>= 1.1.1.1),
Imports:
xgboost (>= 1.1.1.1),
stringr (>= 1.2.0),
httr (>= 1.3.1),
xml2 (>= 1.1.1),
seqinr(>= 3.3),
utils (>= 3.3),
graphics (>= 3.3),
ggplot2 (>= 2.2.1)
ggplot2 (>= 2.2.1),
Biostrings (>= 2.6),
BiocGenerics (>= 0.3)
RoxygenNote: 7.1.1
Suggests:
protr,
Expand All @@ -37,3 +40,6 @@ URL: https://github.com/missuse/ragp
BugReports: https://github.com/missuse/ragp/issues
Encoding: UTF-8
VignetteBuilder: knitr
Remotes:
bioc::release/Biostrings
bioc::release/BiocGenerics
14 changes: 14 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,57 +1,71 @@
# Generated by roxygen2: do not edit by hand

S3method(get_big_pi,AAStringSet)
S3method(get_big_pi,character)
S3method(get_big_pi,data.frame)
S3method(get_big_pi,default)
S3method(get_big_pi,list)
S3method(get_espritz,AAStringSet)
S3method(get_espritz,character)
S3method(get_espritz,data.frame)
S3method(get_espritz,default)
S3method(get_espritz,list)
S3method(get_hmm,AAStringSet)
S3method(get_hmm,character)
S3method(get_hmm,data.frame)
S3method(get_hmm,default)
S3method(get_hmm,list)
S3method(get_netGPI,AAStringSet)
S3method(get_netGPI,character)
S3method(get_netGPI,data.frame)
S3method(get_netGPI,default)
S3method(get_netGPI,list)
S3method(get_phobius,AAStringSet)
S3method(get_phobius,character)
S3method(get_phobius,data.frame)
S3method(get_phobius,default)
S3method(get_phobius,list)
S3method(get_pred_gpi,AAStringSet)
S3method(get_pred_gpi,character)
S3method(get_pred_gpi,data.frame)
S3method(get_pred_gpi,default)
S3method(get_pred_gpi,list)
S3method(get_signalp,AAStringSet)
S3method(get_signalp,character)
S3method(get_signalp,data.frame)
S3method(get_signalp,default)
S3method(get_signalp,list)
S3method(get_signalp5,AAStringSet)
S3method(get_signalp5,character)
S3method(get_signalp5,data.frame)
S3method(get_signalp5,default)
S3method(get_signalp5,list)
S3method(get_targetp,AAStringSet)
S3method(get_targetp,character)
S3method(get_targetp,data.frame)
S3method(get_targetp,default)
S3method(get_targetp,list)
S3method(get_tmhmm,AAStringSet)
S3method(get_tmhmm,character)
S3method(get_tmhmm,data.frame)
S3method(get_tmhmm,default)
S3method(get_tmhmm,list)
S3method(maab,AAStringSet)
S3method(maab,character)
S3method(maab,data.frame)
S3method(maab,default)
S3method(maab,list)
S3method(predict_hyp,AAStringSet)
S3method(predict_hyp,character)
S3method(predict_hyp,data.frame)
S3method(predict_hyp,default)
S3method(predict_hyp,list)
S3method(scan_ag,AAStringSet)
S3method(scan_ag,character)
S3method(scan_ag,data.frame)
S3method(scan_ag,default)
S3method(scan_ag,list)
S3method(scan_nglc,AAStringSet)
S3method(scan_nglc,character)
S3method(scan_nglc,data.frame)
S3method(scan_nglc,default)
Expand Down
5 changes: 3 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ New Features

* added new function `get_signalp5()` which queries SignalP5 web server (http://www.cbs.dtu.dk/services/SignalP)
* added new function `get_tmhmm()` which queries TMHMM v. 2.0 web server (http://www.cbs.dtu.dk/services/TMHMM/)
* `plot_prot()` `nsp` argument can now be `"signalp"`, `"signalp5"` or `"none"`. Default is `"signalp5"`. This argument determines if `get_signalp()` or `get_signalp5()` are used for N-sp prediction. **BREAKING CHANGE**
* `plot_prot()` `tm` argument can now be `"phobius"`, `"tmhmm"` or `"none"`. Default is `"phobius"`. This argument determines if `get_phobius()` or `get_tmhmm()` are used for TM prediction. **BREAKING CHANGE**
* `plot_prot()` `nsp` argument can now be `"signalp"`, `"signalp5"` or `"none"`. Default is `"signalp5"`. This argument determines if `get_signalp()` or `get_signalp5()` are used for N-sp prediction.
* `plot_prot()` `tm` argument can now be `"phobius"`, `"tmhmm"` or `"none"`. Default is `"phobius"`. This argument determines if `get_phobius()` or `get_tmhmm()` are used for TM prediction.
* all `get_*` and `scan_*` functions, as well as `maab()` now work with `AAStringSet` class objects. #5

Bug Fixes and Improvements
--------------------------
Expand Down
26 changes: 24 additions & 2 deletions R/get_big_pi.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' big-PI Plant Predictor is a web server utilizing a scoring algorithm for prediction of GPI modification sites in plants.
#'
#' @aliases get_big_pi get_big_pi.default get_big_pi.character get_big_pi.data.frame get_big_pi.list
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Should be left blank if vectors are provided to sequence and id arguments.
#' @aliases get_big_pi get_big_pi.default get_big_pi.character get_big_pi.data.frame get_big_pi.list get_big_pi.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence An appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id An appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param simplify A boolean indicating the type of returned object, defaults to TRUE.
Expand Down Expand Up @@ -502,3 +502,25 @@ get_big_pi.default <- function(data = NULL,
}
return(res_out)
}

#' @rdname get_big_pi
#' @method get_big_pi AAStringSet
#' @export


get_big_pi.AAStringSet <- function(data,
...){
sequence <- as.character(data)
id <- names(sequence)
sequence <- unname(sequence)
sequence <- toupper(sequence)
sequence <- sub("\\*$",
"",
sequence)

res <- get_big_pi.default(sequence = sequence,
id = id,
...)
return(res)
}

24 changes: 22 additions & 2 deletions R/get_espritz.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' Espritz web server predicts disordered regions from primary sequence. It utilizes Bi-directional Recursive Neural Networks and can process proteins on a genomic scale with little effort and state-of-the-art accuracy.
#'
#' @aliases get_espritz get_espritz.default get_espritz.character get_espritz.data.frame get_espritz.list
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Should be left blank if vectors are provided to sequence and id arguments.
#' @aliases get_espritz get_espritz.default get_espritz.character get_espritz.data.frame get_espritz.list get_espritz.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param model One of c('X-Ray', 'Disprot', 'NMR'), default is 'X-Ray'. Determines the model to be used for prediction. See details.
Expand Down Expand Up @@ -406,3 +406,23 @@ get_espritz.default <- function(data = NULL,
res)
return(res)
}

#' @rdname get_espritz
#' @method get_espritz AAStringSet
#' @export

get_espritz.AAStringSet <- function(data,
...){
sequence <- as.character(data)
id <- names(sequence)
sequence <- unname(sequence)
sequence <- toupper(sequence)
sequence <- sub("\\*$",
"",
sequence)

res <- get_espritz.default(sequence = sequence,
id = id,
...)
return(res)
}
24 changes: 22 additions & 2 deletions R/get_hmm.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
#' hmmer web server offers biosequence analysis using profile hidden Markov Models. This function allows searching
#' of a protein sequence vs a profile-HMM database (Pfam-A).
#'
#' @aliases get_hmm get_hmm.default get_hmm.character get_hmm.data.frame get_hmm.list
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Should be left blank if vectors are provided to sequence and id arguments.
#' @aliases get_hmm get_hmm.default get_hmm.character get_hmm.data.frame get_hmm.list get_hmm.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param verbose Boolean, whether to print out the output for each sequence, defaults to FALSE.
Expand Down Expand Up @@ -556,3 +556,23 @@ get_hmm.default <- function(data = NULL,
rownames(pfam) <- NULL
return(pfam)
}

#' @rdname get_hmm
#' @method get_hmm AAStringSet
#' @export

get_hmm.AAStringSet <- function(data,
...){
sequence <- as.character(data)
id <- names(sequence)
sequence <- unname(sequence)
sequence <- toupper(sequence)
sequence <- sub("\\*$",
"",
sequence)

res <- get_hmm.default(sequence = sequence,
id = id,
...)
return(res)
}
24 changes: 22 additions & 2 deletions R/get_netGPI.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' NetGPI server offers GPI Anchor predictions
#'
#' @aliases get_netGPI get_netGPI.default get_netGPI.character get_netGPI.data.frame get_netGPI.list
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Should be left blank if vectors are provided to sequence and id arguments.
#' @aliases get_netGPI get_netGPI.default get_netGPI.character get_netGPI.data.frame get_netGPI.list get_netGPI.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param splitter An integer indicating the number of sequences to be in each .fasta file that is to be sent to the server. Defaults to 2500. Change only in case of a server side error. Accepted values are in range of 1 to 5000.
Expand Down Expand Up @@ -535,3 +535,23 @@ get_netGPI.default <- function(data = NULL,
res <- get_netGPI.character(data = file_name, ...)
return(res)
}

#' @rdname get_netGPI
#' @method get_netGPI AAStringSet
#' @export

get_netGPI.AAStringSet <- function(data,
...){
sequence <- as.character(data)
id <- names(sequence)
sequence <- unname(sequence)
sequence <- toupper(sequence)
sequence <- sub("\\*$",
"",
sequence)

res <- get_netGPI.default(sequence = sequence,
id = id,
...)
return(res)
}
24 changes: 22 additions & 2 deletions R/get_phobius.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' Phobius web server is a combined transmembrane topology and signal peptide (N-sp) predictor. Currently only "normal prediction" of signal peptides is supported by the function.
#'
#' @aliases get_phobius get_phobius.default get_phobius.character get_phobius.data.frame get_phobius.list
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Should be left blank if vectors are provided to sequence and id arguments.
#' @aliases get_phobius get_phobius.default get_phobius.character get_phobius.data.frame get_phobius.list get_phobius.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param progress Boolean, whether to show the progress bar, at default set to FALSE.
Expand Down Expand Up @@ -305,3 +305,23 @@ get_phobius.default <- function(data = NULL,
...)
return(res)
}

#' @rdname get_phobius
#' @method get_phobius AAStringSet
#' @export

get_phobius.AAStringSet <- function(data,
...){
sequence <- as.character(data)
id <- names(sequence)
sequence <- unname(sequence)
sequence <- toupper(sequence)
sequence <- sub("\\*$",
"",
sequence)

res <- get_phobius.default(sequence = sequence,
id = id,
...)
return(res)
}
22 changes: 20 additions & 2 deletions R/get_pred_gpi.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' PredGPI web server is a predictor of GPI modification sites.
#'
#' @aliases get_pred_gpi get_pred_gpi.default get_pred_gpi.character get_pred_gpi.data.frame get_pred_gpi.list
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Should be left blank if vectors are provided to sequence and id arguments.
#' @aliases get_pred_gpi get_pred_gpi.default get_pred_gpi.character get_pred_gpi.data.frame get_pred_gpi.list get_pred_gpi.AAStringSet
#' @param data A data frame with protein amino acid sequences as strings in one column and corresponding id's in another. Alternatively a path to a .fasta file with protein sequences. Alternatively a list with elements of class "SeqFastaAA" resulting from \code{\link[seqinr]{read.fasta}} call. Alternatively an `AAStringSet` object. Should be left blank if vectors are provided to sequence and id arguments.
#' @param sequence A vector of strings representing protein amino acid sequences, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param id A vector of strings representing protein identifiers, or the appropriate column name if a data.frame is supplied to data argument. If .fasta file path, or list with elements of class "SeqFastaAA" provided to data, this should be left blank.
#' @param spec Numeric in the 0-1 range, indicating the threshold specificity.
Expand Down Expand Up @@ -348,4 +348,22 @@ get_pred_gpi.default <- function(data = NULL,
return(res)
}

#' @rdname get_pred_gpi
#' @method get_pred_gpi AAStringSet
#' @export

get_pred_gpi.AAStringSet <- function(data,
...){
sequence <- as.character(data)
id <- names(sequence)
sequence <- unname(sequence)
sequence <- toupper(sequence)
sequence <- sub("\\*$",
"",
sequence)

res <- get_pred_gpi.default(sequence = sequence,
id = id,
...)
return(res)
}
Loading

0 comments on commit a41306b

Please sign in to comment.