Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preload subset of channels or channel types from raw.fif #1766

Closed
haribharadwaj opened this issue Jan 28, 2015 · 30 comments · Fixed by #7507
Closed

Preload subset of channels or channel types from raw.fif #1766

haribharadwaj opened this issue Jan 28, 2015 · 30 comments · Fixed by #7507
Assignees

Comments

@haribharadwaj
Copy link
Contributor

I have simultaneous MEG/EEG data sampled at 5 kHz (because I'm also recording brainstem activity) and it would nice to be able to preload say only EEG or only subsets of channels etc. and do analysis/eye-balling.. There are probably other good use cases for this too..

I can see this being possible quickly in a few different ways:
(1) Add kwargs like eeg=True, meg=False etc. to pick only certain channel types..
(2) Take channel names (as opposed to indices) list for a pickNames argument..
(3) Add a regular picks argument and then the user does two steps... (i) load with preload=False and determine picks from the info and then (ii) re-run the file read with preload=True and picks=picks

Thoughts?

@larsoner
Copy link
Member

There is now a simple preload_data function in master. Whatever you do, I'm
-1 on adding more arguments to the Raw constructor and +1 on putting them
there for such specialized use cases. Your option 3 would fit nicely there.

I suspect that adding these options will make things quite a bit more
complex, though. Have you looked at the raw data reading functionality? It
is already somewhat complex, so I worry that making it more complex will
lead to bugs.

If we do go this route, you'll probably have to make a ton of tests to make
sure we don't miss corner cases, and do performance profiling to make sure
we aren't reducing read performance for standard use cases.

@agramfort
Copy link
Member

something along these lines:

raw = mne.io.Raw(...)
picks = mne.pick_types(raw.info, ...)
ch_names = [raw.ch_names[k] for k in picks]
raw.pick_channels(ch_names)
raw.preload_data()

?

but currently you cannot pick_channels on non-preloaded data....

@larsoner
Copy link
Member

The API I'm proposing is more like this:

raw = mne.io.Raw(..., preload=False)
picks = mne.pick_types(raw.info, ...)
raw.preload_data(picks=picks)

And the raw data fetching functions, and all functions that use raw._preloaded (or whatever we call it) are made smart enough to deal with partially preloaded data.

You're right that doing pick_channels might also work for the use case, especially if there is a complementary function that allows the user to re-combine the two (or more) datasets after they've been split. But in that case, you're likely to end up back at having data partially pre-loaded again, so I'm not sure it buys too much. It does depend on the specific use cases, though.

@agramfort
Copy link
Member

I like what you suggest. +1 for picks argument in preload_data

@haribharadwaj
Copy link
Contributor Author

Ah... This is more complex than I imagined.. I just learnt about raw.preload_data()... +1 for picks arg in preload_data().. Is it Ok that the raw object would be left with a sort of out-dated info?

@larsoner
Copy link
Member

No -- the idea is that the info would still be correct, as all channels would still be present. But doing this would only load the selected data from disk into memory:

picks_eeg = pick_types(raw.info, meg=False, eeg=True)
raw.preload_data(picks=picks_eeg)

So that this operation would be immediate:

raw[picks_eeg]

And things like filtering only the EEG channels would work. But operations like this would still be permitted:

picks_meg = pick_types(raw.info, meg=True, eeg=False)
raw[picks_meg]

This would thus read from disk, instead of memory. And filtering the MEG channels would not be permitted, at least not until someone did this:

raw.preload_data(picks=picks_meg)

Which would cause the MEG data to also be preloaded (kind of like an append operation). Eventually we could add a preload_data(picks='clear') or something that would make it un-preload data if we need it, but we wouldn't need it for the first pass.

@haribharadwaj
Copy link
Contributor Author

Got it! So if the picks is simply propagated to the sel arg of _read_segment(.) it would do the job, wouldn't it?

@larsoner
Copy link
Member

I'm sure that would be part of it, but the overall job would be much more complex. You'd probably need to, at a minimum:

  1. Allow raw._preloaded to be a list, or a boolean.
  2. Change all checks of raw._preloaded to raw._preloaded is True or <all picks in _preloaded list>.
  3. Deal with multiple calls to .preload_data(picks=picks) with proper creation / expansion / checking of raw._data array.
  4. Deal with calls to the get data segment functions, so that they pull from the partially loaded raw._data, or the disk, as necessary.
  5. Deal with how raw concatenation works with ._preloaded -- if two files have the same subset of channels preloaded, they should stay preloaded when being concatenated.
  6. Implement tests for all functions that check the preload state that check different cases of partial preload_data calls.
  7. Check that performance has not decreased due to the changes.

There are probably other things, but this is what comes to mind when I think about how I'd implement it.

@agramfort
Copy link
Member

let's keep it simple: preload_data can only be called once and everything
stays almost the same.

@dengemann
Copy link
Member

Do we really need this? If so please keep it as simple as possible.

On 28 Jan 2015, at 18:10, Alexandre Gramfort [email protected] wrote:

let's keep it simple: preload_data can only be called once and everything
stays almost the same.

Reply to this email directly or view it on GitHub.

@haribharadwaj
Copy link
Contributor Author

@agramfort, @dengemann: So you would suggest the

raw.pick_channels(...)
raw.preload_data()

workflow?

@dengemann
Copy link
Member

@haribharadwaj we could make our pick/drop_channels function make return self
raw.pick_channels(...).preload_data() if you need a one liner.

I'm also fine with only allowing to call preload_data onece and add a picks etc argument.

@agramfort
Copy link
Member

I would do:

raw = mne.io.Raw(...)
picks = mne.pick_types(raw.info, ...)
raw.preload_data(picks=picks)

and no later possibility to load other channels

@larsoner
Copy link
Member

FWIW I don't think the possibility of later loading channels is the really challenging part of the coding. Most of the challenge lies in keeping track of which channels are preloaded and which are not, and triaging behavior appropriately. Adding more channels to preload later is pretty trivial by comparison (append/mix matrices, and extend list of loaded channels)...

@agramfort
Copy link
Member

doable but it's more work. You'd need to have raw._data maybe as a list of
arrays to avoid reallocs. With my suggestion you just call pick_info and
you're done. You then have in info only the preloaded channels.

@larsoner
Copy link
Member

As a user I wouldn't expect the function preload_data(picks=...) to restrict my raw file (and info) to then only contain a subset of channels. It seems like the API workflow would be better as raw.pick_channels(...).preload_data() as @dengemann suggested.

@dengemann
Copy link
Member

+1 let's add one basic version now. it's a rare usecase anyways, I suppose

On 28 Jan 2015, at 22:59, Alexandre Gramfort [email protected] wrote:

doable but it's more work. You'd need to have raw._data maybe as a list of
arrays to avoid reallocs. With my suggestion you just call pick_info and
you're done. You then have in info only the preloaded channels.

Reply to this email directly or view it on GitHub.

@agramfort
Copy link
Member

As a user I wouldn't expect the function preload_data(picks=...) to restrict my raw file (and info) to then only contain a subset of channels.

I would say as long as it's documented and error message are clear
that would be fine with me. It's also a really simple change. But you
may feel brave and nerd snipped :)

@larsoner
Copy link
Member

I see how it would be simpler from an implementation/developer perspective, but I don't think it makes sense from an API/user perspective. To me, channel picking/restricting should be done by pick_channels, and data preloading should be done by preload_data (regardless of whether it loads all or a subset of the data). Mixing both channel-picking and preloading functionality into preload_data seems like a strange, unpythonic move to me.

Maybe it's not so difficult to have raw.pick_channels() work with non-preloaded data... I'd have to think about it :) The _read_raw_segment function already can read a specific section of channels, so I think it would just require keeping track of which original file indices are the correct ones.

@dengemann
Copy link
Member

I agree with hari that chaining would be maximum clean and self-documenting

It's already supported as is I think, so nothing to add. just make sure that the pick/drop methods return self if copy=False --- if not a minimum change

On 28 Jan 2015, at 23:04, Alexandre Gramfort [email protected] wrote:

As a user I wouldn't expect the function preload_data(picks=...) to restrict my raw file (and info) to then only contain a subset of channels.

I would say as long as it's documented and error message are clear
that would be fine with me. It's also a really simple change. But you
may feel brave and nerd snipped :)

Reply to this email directly or view it on GitHub.

@agramfort
Copy link
Member

I'll review the PR :)

@larsoner
Copy link
Member

@dengemann IIUC the challenge with this approach is that pick_channels currently only works for preloaded data. Thus my comment about needing to keep track of which indices in the original data file need to be used to load the data after doing raw.pick_channels().

@larsoner
Copy link
Member

In any case, @haribharadwaj feel free to take a stab at it. If you're too busy I'll try to find time this week or next.

@haribharadwaj
Copy link
Contributor Author

Thanks all..
@Eric89GXL, I am always too busy but lets see where I am in the next couple of days :)

@dengemann
Copy link
Member

IIUC the challenge with this approach is that pick_channels currently only works for preloaded data.

but we could change that policy. I don't remember the exact reaosons why we introduced that in the first place. If data are not preloaded it would amount to picking the info and some attributes.

@agramfort
Copy link
Member

simplicity was the reason.

@dengemann
Copy link
Member

simplicity was the reason.

no there was another reason .... I remember havin opened an issue once / added this constraint.
But thinking about simplicity, picking without preload + chaining would be an optimal and clean API.
If not preloaded it should ammount returning/ making an object with a picked info.

@larsoner
Copy link
Member

@haribharadwaj do you still need this, or were there suitable workarounds?

@larsoner
Copy link
Member

larsoner commented Mar 1, 2017

I think most of these use cases can (hopefully) be solved by doing allowing pick_channels without data being loaded. @jaeilepp did something similar for EDF or some other reader recently, so hopefully it can be generalized.

@kingjr
Copy link
Member

kingjr commented Mar 1, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants