-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample of permutations? #1
Comments
Could you give me an input/output example of the data you are trying to produce? |
Here is an example, but there is a detail: def get_data(
xsize=500000, ysize=2000, score_type="manual"
) -> tuple[np.ndarray, np.ndarray]:
state = bernoulli.rvs(p=0.001, size=(xsize, ysize))
state = state.astype(bool) # type: ignore
# Pull scores from a distribution
score = norm.rvs(size=xsize, loc=0.1, scale=0.5)
score = score**2 I actually have 10k different scores to test. The statistic looks like this:
The permutation test simply compares the statistic from 9999 random 'states' vs. the real data. I discovered that, because my data is quite sparse per 'y' axis, looping through the 2k states and using the fisher yates shuffle is faster. I was trying to permute the 2d array, but actually, I need to keep each y separate (there are a different number of each, which is the detail I mentioned). |
So you want to shuffel (not permute) the output of get_data()? |
Seems to be kind of the same thing, right? |
This should give you a rough outline how you could calculate it. I didn't get 100% what you want to do, but I think it is close to my code. Very important: YOU HAVE TO USE MEMORY VIEWS and ALWAYS TYPED INDEXING!! The way you are using Cython is completely useless. That gives you a speed-up of max. 5%. With typed memory views you get to C-speed
And you can add prange (parallel) to the outer loop (don't forget to compile with openmp) |
Thanks for making this available!
I'm hitting my head against cython to try to write a permutation sampling function. The issues is that my arrays are quite large (e.g. 500,000 rows, and I want to calculate statistics for 10,000,000 permutations...).
I can't see that your code produces a sample of permutations, unless I'm wrong?
If I get it working, would you look at a PR?
Many thanks,
Dan.
The text was updated successfully, but these errors were encountered: