Sample of permutations? #1

dbolser · 2024-10-21T14:56:45Z

Thanks for making this available!

I'm hitting my head against cython to try to write a permutation sampling function. The issues is that my arrays are quite large (e.g. 500,000 rows, and I want to calculate statistics for 10,000,000 permutations...).

I can't see that your code produces a sample of permutations, unless I'm wrong?

If I get it working, would you look at a PR?

Many thanks,
Dan.

hansalemaos · 2024-10-21T15:17:26Z

Could you give me an input/output example of the data you are trying to produce?

dbolser · 2024-10-24T16:52:20Z

Here is an example, but there is a detail:

def get_data(
    xsize=500000, ysize=2000, score_type="manual"
) -> tuple[np.ndarray, np.ndarray]:

    state = bernoulli.rvs(p=0.001, size=(xsize, ysize))
    state = state.astype(bool)  # type: ignore

    # Pull scores from a distribution
    score = norm.rvs(size=xsize, loc=0.1, scale=0.5)
    score = score**2

I actually have 10k different scores to test.

The statistic looks like this:

def statistic(score: np.ndarray, state: np.ndarray, axis: int) -> float:
    return np.sum(score * state, axis=axis)

The permutation test simply compares the statistic from 9999 random 'states' vs. the real data.

I discovered that, because my data is quite sparse per 'y' axis, looping through the 2k states and using the fisher yates shuffle is faster. I was trying to permute the 2d array, but actually, I need to keep each y separate (there are a different number of each, which is the detail I mentioned).

hansalemaos · 2024-10-24T20:03:02Z

So you want to shuffel (not permute) the output of get_data()?

hansalemaos · 2024-10-24T20:07:51Z

Seems to be kind of the same thing, right?
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
I wouldn't change the original data. I would create an index array of a smaller datatype and loop through the index and get the data. But I don't know if it is the best idea. I couldn't even run the code you sent me (out of memory). After work, I will try it with a smaller array;

hansalemaos · 2024-10-24T21:36:47Z

This should give you a rough outline how you could calculate it. I didn't get 100% what you want to do, but I think it is close to my code. Very important: YOU HAVE TO USE MEMORY VIEWS and ALWAYS TYPED INDEXING!! The way you are using Cython is completely useless. That gives you a speed-up of max. 5%. With typed memory views you get to C-speed

cimport numpy as np
def statistic():
    cdef:
        Py_ssize_t xsize=5000
        Py_ssize_t ysize=2000
        np.ndarray score=bernoulli.rvs(p=0.001, size=(xsize, ysize)).astype(bool).astype(np.uint8) #there might be something better to save memory, i don't like 2x as type.
        np.ndarray state=norm.rvs(size=xsize, loc=0.1, scale=0.5)**2
        unsigned char[:,:] scoreview
        double[:] stateview
        np.ndarray axis0_full=np.arange(0,xsize,dtype=np.int64).reshape((-1,xsize)) # replace with algor. for permutation -> you can use mine
        np.ndarray axis1_full=np.arange(0,ysize,dtype=np.int64).reshape((-1,ysize)) # replace with  algor. for permutation
        Py_ssize_t[:,:] axis0_full_view=axis0_full
        Py_ssize_t[:,:] axis1_full_view=axis1_full
        Py_ssize_t i,j,k,l
        Py_ssize_t number_of_loops=1 ; # number of desired permutations - should be axis 0 of the permutation array
        double result

    #print(score) 
    #print(state)
    scoreview=score
    stateview=state
    for k in range(number_of_loops):
        result=0
        for i in range(axis0_full_view.shape[1]):
            for j in range(axis1_full_view.shape[1]):
                result=result+(scoreview[axis1_full_view[k,j],axis0_full_view[k,i]]*stateview[axis1_full_view[k,j]])
        print(result)

statistic()

And you can add prange (parallel) to the outer loop (don't forget to compile with openmp)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample of permutations? #1

Sample of permutations? #1

dbolser commented Oct 21, 2024

hansalemaos commented Oct 21, 2024

dbolser commented Oct 24, 2024

hansalemaos commented Oct 24, 2024

hansalemaos commented Oct 24, 2024

hansalemaos commented Oct 24, 2024 •

edited

Loading

Sample of permutations? #1

Sample of permutations? #1

Comments

dbolser commented Oct 21, 2024

hansalemaos commented Oct 21, 2024

dbolser commented Oct 24, 2024

hansalemaos commented Oct 24, 2024

hansalemaos commented Oct 24, 2024

hansalemaos commented Oct 24, 2024 • edited Loading

hansalemaos commented Oct 24, 2024 •

edited

Loading