Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairing arrange of CSP filters removed #8074

Closed
KarnBaca opened this issue Jul 31, 2020 · 19 comments · Fixed by #8151
Closed

Pairing arrange of CSP filters removed #8074

KarnBaca opened this issue Jul 31, 2020 · 19 comments · Fixed by #8151
Milestone

Comments

@KarnBaca
Copy link

KarnBaca commented Jul 31, 2020

I have a question about the next lines:

f4b39c7#diff-12d11fc13bc79dc930808d5394b788b2L148-L150

Currently (from maint 0.13 until now) the pairing arrange of the columns (first, last, second, second last...) has been removed. Have you done this for a reason?

If I understand correctly, in the current implementation the arrange of the filters is based only in the eigenvalues (descending order), so one is not able to identify the CSP pairs.

@kingjr @cbrnr @mbillingr

Thanks in advance.

@agramfort
Copy link
Member

ping @cbrnr or @mbillingr maybe ?

@kingjr
Copy link
Member

kingjr commented Jul 31, 2020 via email

@mbillingr
Copy link
Contributor

I think this line has the same effect:
ix = np.argsort(np.abs(eigen_values - 0.5))[::-1]

It seems a bit brittle because it relies on eigenvalues being pairwise symmetric around 0.5.

@cbrnr
Copy link
Contributor

cbrnr commented Jul 31, 2020

It seems a bit brittle because it relies on eigenvalues being pairwise symmetric around 0.5.

Does this mean you assume that the eigenvalues are normalized? And why 0.5? Is this documented somewhere (https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigh.html)?

@mbillingr
Copy link
Contributor

Does this mean you assume that the eigenvalues are normalized?

They are, in a way. eigh normalized the eigenvectors and the covariance matrices are normalized to a trace of 1.

And why 0.5?

The eigenvectors are all exactly 0.5 if you plug in identity covariance matrices. I don't know if this transfer to the general case.

Regardless if the new code is correct, I think the old code is better because it is more explicit about what it does.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 1, 2020

I agree that the old code is better because it is more explicit. This question keeps coming up again and again, and we should either revert to the old code with explicit sorting, or at least include detailed comments explaining what is going on in the current code. Do you have time to implement this in a PR?

@mbillingr
Copy link
Contributor

If it's not urgent I can do it but it will take me a few weeks at least.

This may be an opportunity to review if the multi-class implementation could handle two classes without a special case.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 2, 2020

There's no hurry, it'd be great if you could go over the existing code, because I don't really understand what's going on in many places, e.g. https://github.com/mne-tools/mne-python/blob/master/mne/decoding/csp.py#L235 - some comments would have been great. If the two-class and multi-class branches can be unified this would also make the code a bit easier to understand (probably).

@mbillingr
Copy link
Contributor

Short update:

So far I've been refactoring the code a bit to make it easier to work with.

I have also added the old method of ordering the patterns and it gives different results than the new method.
To clarify:

  • old method: order = [largest eigenval, smallest eigenval, second-to-largest, second-to-smallest, ...]
  • new method: order = decreasing abs(eigenval - 0.5)

The new method results in a different order than the old method. However, the new method's order is consistent with ordering by mutual information.
This means that there was probably a good reason for this change in ordering and we can't simply switch back to the old order without potentially breaking applications.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 19, 2020

Thanks for the update @mbillingr. The old method seems to be the most commonly used though AFAIK (meaning that most papers on CSP order components using this method). Do you have a motivation and/or a reference where ordering by MI is used? And why is abs(eigenval - 0.5) equal to MI?

I agree that it is not possible to replace the ordering now without breaking current behavior. Do you think an argument to choose the order might be an option (the default should be the current way so we don't break existing code)?

@mbillingr
Copy link
Contributor

mbillingr commented Aug 19, 2020

motivation and/or a reference where ordering by MI is used?

MI is used in the multi-class implementation where we don't have eigenvalues. I don't know who wrote that code (copied from pyRiemann?) and what their references say.

Here is an example of eigenvalues and corresponding mutual information:

EV = [0.06, 0.1,   0.5,  0.8]
MI = [0.26, 0.18, -1.55, 0.08]

The second to smallest eigenvalue (0.1) is more informative than the largest eigenvalue (0.8); it is also further from 0.5.
My intuition says ordering by MI makes perfect sense, while interleaving large and small eigenvalues is a rather rough hammer that misses some finer nuances. Of course my intuition is not worth much if not backed by a reference, but I can't do any literature research now because I lack time and access to journals.

And why is abs(eigenval - 0.5) equal to MI?

No idea :) In various toy data I've been throwing at CSP I have observed that eigenvalues are always between 0 and 1 with uninformative components being closer to 0.5. This probably has to do with symmetry - if you swap classes you get the same components in reverse, which must be reflected in the eigenvalues somehow. I failed to derive a mathematical explanation, though.

Do you think an argument to choose the order might be an option

Absolutely! That's how I did it currently for being able to compare the various methods.

Should I open a [WIP] PR so we have some code to accompany the discussion?

@cbrnr
Copy link
Contributor

cbrnr commented Aug 19, 2020

Your intuition makes sense, but I don't think it's as simple as abs(eigenval - 0.5). The code from pyRiemann computes MI according to equation 15 described in Grosse-Wentrup 2008: https://github.com/alexandrebarachant/pyRiemann/blob/master/pyriemann/spatialfilters.py#L346-L364

So I'm really not sure where that simple formula comes from - maybe @agramfort or @kingjr remember (they wrote that part of the code)?

Let's wait until we've figured out if the current ordering makes sense before implementing the old sorting in a PR.

@agramfort
Copy link
Member

agramfort commented Aug 19, 2020 via email

@mbillingr
Copy link
Contributor

The source code forensics team just reported in:

the further is the eigenvalue from the symmetric point 1/2, the more the distance supported by the spatial filter between the two matrices is important.

I didn't read it in any detail, but I take this as proof that the current ordering makes sense.

Special casing the two-class case as eigenvalue decomposition + this ordering is probably just a performance optimization, which may explain why it's equivalent to using mutual information. I'm sure there is more in the paper.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 19, 2020

Ah, I missed the two-class special case - thanks for digging that up. Indeed this might be a simplification for the general multi-class case (thanks for the link to the paper).

So given that the current method makes sense, do you still think implementing the old sorting makes sense? It probably does, because that's how it is done in the classic CSP papers at least. On the other hand, we could also say that the new method is better so we don't provide the old one. Personally, I'd prefer to have an option to choose between the two methods, because I guess this won't be too much work (basically just replacing this one line in the two-class case).

@mbillingr
Copy link
Contributor

I guess this won't be too much work

Indeed. Actually, I already did this so I could run the comparisons.

@cbrnr
Copy link
Contributor

cbrnr commented Aug 21, 2020

Can you make a PR? I'm not sure that we need to run comparisons, because both methods are already described in the CSP literature. We just implement them and people can choose between them.

@mbillingr
Copy link
Contributor

Sure. Do you have a reference that describes the classic method so we can refer to it in the documentation?

I've also cleaned up the CSP code a bit. Do you want these changes as well or just the choice between classic and new?

@cbrnr
Copy link
Contributor

cbrnr commented Aug 21, 2020

The paper by Blankertz et al. (2008) seems to contain this recommendation:

Therefore, the common practice in a classification setting is to use several eigenvectors from both ends of the eigenvalue spectrum as spatial filters {wj} in Eq. (2). If the number of components J is too small, the classifier would fail to fully capture the discrimination between two classes (see also the discussion in Sec. V-B on the influence of artifacts); on the other hand, the classifier weights {βj} could severely overfit if J is too large. In practice we find J = 6, i.e., three eigenvectors from both ends, often satisfactory. Alternatively one can choose the eigenvectors according to different criterion (see Sec. A) or use cross-validation to determine the number of components.

If you have also done some housekeeping, of course I'd be happy if you included that in your PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants