-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grid restrictions #25
Comments
Hey, thanks for getting in touch.
Another option to consider might be a nearest neighbor Gaussian process (see here for details). The Fourier methods are generally quite a bit faster though (see Figure 2 on page 13 here). I saw you work for HMS. I'm just across the road at HSPH. Happy to meet up if you're around on campus. |
Hi there, Ah okay, the suggestion of fill-in points to convert it to a rectangular grid makes perfect sense. I take it then that rectangular (as opposed to perfectly square) grids are okay? The number of spots in a sample can go up to 25k, though 95% are less than 5k (the sample shown has 1.4k). We have a large number (500+) of correlated, ~negative binomially distributed features for each spot. My hope is to model all of these features plus some other complicating factors using joint correlated GPs, though I'm unsure if this is computationally feasible. You can see why FFT methods are appealing here given the spots lie on a nice grid. Nearest neighbor GPs are on the table too, but my guess is that they're suboptimal because there can be long-range correlations in the data -- a given tissue strata might span the whole width of the sample, which can create long bands of highly similar expression patterns across many spots. The covariance matrix might not be all that sparse. I'll need a week or so to play around with this, but yes a meet up would be great. My office is in Countway so I think we're right next door. I'll send an email to your HSPH address to schedule. |
Yes, in practice there are two ways to deal with the non-square grid.
parameters {
real<lower=0> length_scale;
}
transformed parameters {
vector<lower=0> [2] length_scales = (2 * length_scale, length_scale)';
} The factor of two accounts for the oversampling in the horizontal direction. The grid is more densely spaced in this direction so, if we "rescaled" things to get a square grid, the correlation length has to be longer by a factor of two. I think that's right, but I always get myself confused about where the factor of two should go. One way to double check would be to sample from the GP prior and make sure the posterior samples look as expected.
Yes, that scale may be tricky. How many correlated GPs are you using and what's the correlation structure? If Stan isn't able to handle the scale, variational inference using numpyro may be an alternative. The components for FFT GPs are there (see e.g., pyro-ppl/numpyro#1762), but there's a bit of legwork to be done. Stan also implements variational inference, but it's a bit brittle.
Sounds great! |
Hi there, I'd like to make use of this package on a spatial transcriptomics dataset and wanted to ask about whether it could work with my data. The two details I'm wondering about are:
gp_inv_rfft
and friends work on any regularly spaced pattern or does it specifically require points that lie on the corners of tiled squares?The plot of measured points below demonstrates both of these issues. I'd appreciate any help you could provide. I started to dig into the math because it feels like both of these problems should be solvable in theory, but I thought I'd ask since I imagine it will be immediately obvious to the authors if this could be made to work or not. Thanks!
The text was updated successfully, but these errors were encountered: