Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SignedGenesets gives NaNs for valType %in% c("r", "f") #31

Open
idavydov opened this issue Aug 12, 2021 · 4 comments
Open

SignedGenesets gives NaNs for valType %in% c("r", "f") #31

idavydov opened this issue Aug 12, 2021 · 4 comments

Comments

@idavydov
Copy link
Contributor

Hi @Accio ,

I noticed that wmwTest() on SignedGenesets returns NaNs for valTypes "r" and "f". Not sure if that is intentional.

m <- matrix(c(
  0, 0, 0, 0, 0, 5,
  0, 0, 0, 0, 5, 5,
  0, 0, 0, 5, 5, 5,
  5, 5, 0, 0, 0, 0,
  5, 0, 0, 0, 0, 0
), byrow=TRUE, nrow=5)

rownames(m) <- paste0("g", 1:5)
colnames(m) <- paste0("s", 1:6)

sign <- BioQC::SignedGenesets(list(
  list(name="grows left to right", positive = c("g1", "g2", "g3"), negative = c()),
  list(name="grows left to right with neg", positive = c("g1", "g2", "g3"), negative = c("g4", "g5"))
))

BioQC::wmwTest(m, sign, valType="U1")
#>                              s1   s2 s3 s4 s5 s6
#> grows left to right           0  1.5  3  4  5  6
#> grows left to right with neg -6 -3.0  0  2  4  6
BioQC::wmwTest(m, sign, valType="r")
#>                                s1   s2  s3        s4        s5  s6
#> grows left to right            -1 -0.5   0 0.3333333 0.6666667   1
#> grows left to right with neg -Inf -Inf NaN       Inf       Inf Inf
BioQC::wmwTest(m, sign, valType="f")
#>                                s1   s2  s3        s4        s5  s6
#> grows left to right             0 0.25 0.5 0.6666667 0.8333333   1
#> grows left to right with neg -Inf -Inf NaN       Inf       Inf Inf

Created on 2021-08-12 by the reprex package (v2.0.1)

@idavydov
Copy link
Contributor Author

Interestingly, valType="p.greater" always returns either 1 or 0. Not sure if that's correct here.

@idavydov
Copy link
Contributor Author

Ok, I think I got it. nBg = nTotal-nInds, in grows left to right with neg all genes are included, so nBg is zero.

Then:

res = u1 / nInds / nBg;

Which leads to Inf. Ok; maybe fair enough for "r" and "f".

But for p-values this leads to overly optimistic/pessimistic estimates.Basically, p-values will be always 0 or 1.

This probably means that p-values are "too extreme" in all the cases.

Maybe in this case the correct way would be to compute two different p-values and use something like Fisher's method to aggregate them?

@idavydov
Copy link
Contributor Author

Here's an example. With totally random signature covering 95 genes out of 100, we always get a skewed p-value distibution:

set.seed(42)
ngenes <- 100
nsamples <- 1000
m <- matrix(rnorm(ngenes * nsamples), ncol=nsamples)
rownames(m) <- paste0("g", seq_len(nrow(m)))
colnames(m) <- paste0("s", seq_len(ncol(m)))

# using all the genes except for the last five
big_signature <- sample(c("pos", "neg"), ngenes - 5, replace = TRUE)

random_sign <- BioQC::SignedGenesets(list(
  list(
    name = "random signature",
    pos = paste0("g", which(big_signature == "pos")),
    neg = paste0("g", which(big_signature == "neg"))
  )
))

hist(BioQC::wmwTest(m, random_sign, valType="p.less"))

Created on 2021-08-12 by the reprex package (v2.0.1)

@idavydov
Copy link
Contributor Author

I think one way to resolve this problem is the following:

  1. We compute U for positive genes, and -U for negative genes.
  2. Then we convert both values to z-scores (z_pos and z_neg). z = (U - m_u) / sigma_U. Usually m_u = n1 * n2, but perhaps we would want to exclude negative class when computing z_pos and vice versa. Same goes for n1 and n2 in sigma computations with tie correction.
  3. We can compute z = (z_pos + z_neg) / sqrt(2), since sum of two normal distribution is a normal distribution.
  4. Finally, we can compute a p-value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant