Replies: 1 comment 5 replies
-
Good question. It's totally reasonable to compare the two. What that section means is: Fst = X / Y where X and Y are things computed from the genotypes. Now let X' and Y' be the corresponding branch stats; these are defined so that E[X] = X' and E[Y] = Y', where the expectations are conditioned on the trees. So, you would reasonably expect X' / Y' to be close to Fst-coimputed-from-the-genotypes; however, there is no equality of expectations: it is not true that E[Fst] = X' / Y'. In other words, even though X and Y are unbiased estimators of X' and Y', Fst is a biased estimator of X' / Y', although the bias goes away as the size of the window increases. (Does this count as "simpler language"?) |
Beta Was this translation helpful? Give feedback.
-
In the docs it says: "Most statistics have the property that mode="branch" and mode="site" are “dual” in the sense that they are equal, on average, under a high neutral mutation rate. Fst() and Tajimas_D() do not have this property (since both are ratios of statistics that do have this property)."
I don't quite understand what's being said here. I think it's reasonable to compare the branch-length version of Fst (across the entire genome) with the sitewise one (and in fact, you don't need to multiply by a mutation rate, because that cancels out on the top and bottom). The duality paper talks about how these ratio-based statistics aren't additive across windows, which I can see is the case. But could someone explain to me in simpler language what the paragraph above actually means?
Here, for example, is something I'd like to use in the docs to illustrate the reduced variance when using
mode="branch"
. I think this is reasonable, isn't it?Beta Was this translation helpful? Give feedback.
All reactions