-
Hi all, The short story is that I'm trying to learn something about branch lengths under selection, so I've simulated some data in slim, then kept the tree seq, and I'm importing it into tskit in python. I'm by no means a tskit expert (and honestly I'm a pretty bad python coder in general, since R is more my speed), as I've hardly used it at all, but what I'm doing is essentially this:
which may not be optimal, but runs fast enough for my purposes. My logic here is that the documentation says that the the node member of mutation is the first node "below" the mutation, which I take to mean the node closer in time to the present. Then, I figure out what edges have that node as a child (which again, I take the child of the edge to be the node closer to the present) and also cover the position of the mutation. Then I'm just trying to calculate the branch length by taking the difference in the parent and child times. The problem is that it doesn't work! I often end up with the edge object being empty, and it seems to be one of two problems:
Clearly this means I'm misunderstanding something about how this data is stored, and if you could give me a hand I'd really appreciate it! I can't seem to upload the .trees file since it's not a supported file type, but happy to share it and the slim code that generated it if necessary. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Here's one approach: import msprime
import numpy as np
import tskit
ts = msprime.sim_ancestry(samples=100, sequence_length=1e6, population_size=1e4, recombination_rate=1e-8)
ts = msprime.sim_mutations(ts, rate=1.25e-8)
branch_lengths = np.full(ts.num_mutations, np.nan)
for m in ts.mutations():
if m.edge != tskit.NULL: # skip unmapped mutations; these'll be nan in "branch_lengths"
p, c = ts.edges_parent[m.edge], ts.edges_child[m.edge]
branch_lengths[m.id] = ts.nodes_time[p] - ts.nodes_time[c] |
Beta Was this translation helpful? Give feedback.
Here's one approach: