-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] ENH: resampling with annotations #11408
base: main
Are you sure you want to change the base?
Conversation
I think it's actually going to be a bit worse than this (ignoring the preload vs not for now, which I think isn't the most annoying part). The result of resampling with and without annotations needs to end up the same number of samples, and the annotation boundaries will likely land in between samples, so we have to handle that properly... For example, if you have a 1 sec signal at 1000 Hz and the 100-200 ms annotated as bad, in principle you want to resample the samples 0-99 and 200-999, and copy 100-199. But in practice if you are resampling to some non-integer factor like 2/3 the sample rate (e.g., 1666 Hz or something), you have to be very careful about floor/round/ceil-ing the number of samples within each resampled or copied interval... yikes. Also, I'm realizing now that we might not need to add |
I guess we could have a unit test where the annotation period has a huge artifact (like in my data), and then check that the artifacts don't spread to the non-annotation periods ... that would probably address the rounding issues you are concerned about. I like the idea of using the same resampling method in annotation vs non-annotation periods ... although I'd be happy if it works even with just fft-based methods. |
Hi @larsoner and @jasmainak - I think having a "skip_by_annotation"-like behavior for resampling would be good for eye-tracking data (for example, there can be nan's in the signal during blinks or tracker dropout). If you still think that this is feasible given the challenges you raised regarding non-integer sampling frequencies, let me know, maybe I can help out on this at some point this summer. |
This PR dropped from my priority list ... feel free to take over from where I left off ! |
The more I think about this the more I think it'll be really difficult to resample segment-by-segment. A potentially simple way to figure out segment resampling mapping is to take the original signal of length
if we do
You get some ugly stuff here like the first three samples end up being resampled to 2 samples in the output, and the last 3 samples get resampled to 1 in the output. These have different downsampling factors (!). This problem goes away asymptotically as samples increase but do not go away completely. So frequencies get remapped differently depending on their offset from the downsampling factor. So instead, I'm starting to think at least one safe thing to do would be to:
Step (4) can be optional if people really want it to be, but we can add an option to disable the annotation expansion if desired. |
I think i can see your point about why the first simple approach is limited. I'm not too familiar with polyphase resampling but if I'm understanding correctly, is the idea that when |
Yes this is it -- bad values will spread but we spread the |
It seems like if you interpolate to replace the 0 values, ahead of downsampling, you at least partly mitigate this issue, because the valid values will bleed into the 0s. So linear interpolation in my scheme would look like 1 1 1 1.5 2 2 2.25 2.5 2.75 3 3 3 And then running a decimate with factor 2X, gets you On my end, I've been using cubic splines to interpolate across dropped samples and then pre-processing as normal. |
But the mapping between original-signal number of samples (three for the first good segment, three for the last good segment) and resampled-signal number of samples (two for the first good segment, one for the last good segment) will always have this wacky ratio problem. So I think the "resample segment by segment" idea is likely still doomed unless you're very careful (somehow) about padding each good segment the same way or something, which would have its own issues. So I think the polyphase + annotation expansion approach is probably safest, and a pretty straightforward route forward. Maybe someday we could add cubic or linear interpolation in the bad segments but I think that can be a separate function like |
closes #10447
It begins to get a little hairy when one gets into the details, particularly preloaded vs not ... and how to handle concatenated raws