-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added PCA subspace comparison methods and fixed n_components selection #2613
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2613 +/- ##
========================================
Coverage 91.12% 91.13%
========================================
Files 175 175
Lines 23642 23708 +66
Branches 3090 3100 +10
========================================
+ Hits 21544 21606 +62
- Misses 1480 1482 +2
- Partials 618 620 +2
Continue to review full report at Codecov.
|
if self._calculated: | ||
if n is None: | ||
n = len(self._variance) | ||
self.variance = self._variance[:n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to permanently change self.variance
?
In a recent discussion with @ianmkenney and @VOD555 we found it confusing that the "total" variance might only be calculated from a subset of components. The docs were not very clear on this to start with. This can be confusing if you want to plot how much each PC contributes to the total observed variance but only selected, say, n=5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, hence the addition of the new and static attributes _variance
and _p_components
. The cumulative variance is also calculated from the total _variance
so that now it does not add up to 1 if n_components
is not None
. TBH I don't really get the idea of throwing away information that was expensive to compute and is already in memory, so setting n_components
is more of a convenient way to truncate or extend the visible variance, cumulated_variance, and p_components. It does not alter their values. We could suggest to users that they manually set self._variance = self.variance
if they want to save space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix the issue that @ianmkenney and @VOD555 saw – guys, if you still see a problem here, please say something now.
n = len(self._variance) | ||
self.variance = self._variance[:n] | ||
self.cumulated_variance = (np.cumsum(self._variance) / | ||
np.sum(self._variance))[:n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, this line should fix the issue that @ianmkenney and @VOD555 saw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes suggested but I'll approve it already.
@lilyminium Please merge when the CI is happy. Either squash-merge or consolidate your history yourself if you think it makes more sense as separate commits. |
Thanks @orbeckst! |
Fixes #2623
This could be helpful for looking at subspaces.
Changes made in this Pull Request:
PR Checklist