-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update rates-based statistics to be modular #4608
base: master
Are you sure you want to change the base?
Update rates-based statistics to be modular #4608
Conversation
for feature in opts.statistic_features: | ||
if feature not in _allowed_statistic_features: | ||
err_msg = f"--statistic-feature {feature} not recognised" | ||
raise NotImplementedError(err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't actually happen due to the argparse choices, but safety is best
**extra_kwargs) | ||
trigger_times = sds['end_time'] | ||
stat_t = rank_method.rank_stat_single((ifo, sds)) | ||
trigger_times = trigs['end_time'][:][trigger_keep_ids] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some singles objects don't have the end time included
@@ -33,9 +33,18 @@ | |||
from .eventmgr_cython import logsignalrateinternals_computepsignalbins | |||
from .eventmgr_cython import logsignalrateinternals_compute2detrate | |||
|
|||
_allowed_statistic_features = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where is best to describe each feature here to be honest
# Assume best case scenario and use maximum signal rate | ||
s1 -= 2. * self.hist_max | ||
s1[s1 < 0] = 0 | ||
return s1 ** 0.5 | ||
|
||
|
||
class ExpFitStatistic(QuadratureSumStatistic): | ||
class ExpFitStatistic(PhaseTDStatistic): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subclassing PhaseTDStatistic here in order to get the phasetd stuff in init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this then not require the PhaseTD files, which I thought ExpFit statistic might not use if not using the "PhaseTD" feature?
3b1dd80
to
bfa528a
Compare
The numbers in the comparison table have been updated I am sure it is not a coincidence that the error in the coincs is double that of the singles, but I think a ~4e-6 difference is not too important given the dynamic range of the statistic. |
I am adding a description of the statistics to the docs - I am writing it in the |
Note that I found and fixed a bug in the way that |
5f7cd92
to
9c9bbee
Compare
2af01a3
to
2d7148a
Compare
2d7148a
to
e0214a1
Compare
I thought it best to check the memory usage, and for pycbc_coinc_findtrigs with 1/140 of the bank and the
OLD:
For the same statistic with
OLD:
The increase in user time seems to be because of a lower % of CPU (27 vs 75) So basically this looks like it doesn't change anything with regard to performance, as I would expect |
Noting here that it was suggested on today's call that we should wait to merge this until after creating the new PyCBC Live branch intended for the rest of O4 |
55fb556
to
33678e4
Compare
33678e4
to
13489d7
Compare
I think this is ready for review now. All tests are passing I haven't been able to test how this works with Live as well as I would like to; there seems to be a problem with my testing environment |
Poke @spxiwh post-winter break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below, but nothing significant. It's hard to review the bulk of changes here because of the amount of code being moved around, but the sanity checks look convincing. @ahnitz do you any concerns/comments?
- If using the ``chirp_mass`` feature, this chirp mass defines a maximum weighting which can be applied to the statistic. | ||
* - ``sngl_ranking_*`` | ||
- This is used to provide the keyword arguments to functions in `the events.ranking module <https://pycbc.org/pycbc/latest/html/_modules/pycbc/events/ranking.html>`_. For example, to use a different psdvar threshold in the newsnr_sgveto_psdvar_threshold function, we would use ``sngl_ranking_psd_var_val_threshold:10``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouild this documentation like in the middle of explanation of make_offline_search_workflow
? Would this be better as a separate documentation page on the stat module itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put it into a separate page within a "search" section of the docs page - the rest of this would obviously need building up more to be particularly useful!
# Assume best case scenario and use maximum signal rate | ||
s1 -= 2. * self.hist_max | ||
s1[s1 < 0] = 0 | ||
return s1 ** 0.5 | ||
|
||
|
||
class ExpFitStatistic(QuadratureSumStatistic): | ||
class ExpFitStatistic(PhaseTDStatistic): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this then not require the PhaseTD files, which I thought ExpFit statistic might not use if not using the "PhaseTD" feature?
…ha_below_thresh keyword
…eclimate to be quiet" This reverts commit 4f082ea.
980949f
to
fa13a45
Compare
This will cause a problem if using the (for some reason I can't reply directly to the comment) |
Overview of changes
The exponential fit statistics are all similar, with simple factors added or subtracted, as discussed in #4594
I have refactored the
ExpFitStatistic
to be able to use the different features using a--statistic-features
option. These then all useexp_fit
as the--ranking-statistic
optionThe available features are:
phasetd
kde
dq
chirp_mass
sensitive_volume
normalize_fit_rate
exp_fit
statistics, but is implemented here explicitly so that theexp_fit_csnr
statistic can reuse thelognoiserate
function from theExpFitStatistic
I have also removed the different treatment of triggers with sngl_ranking below threshold; this is now required explicitly as
--statistic-keywords alpha_below_threshold:6
. Again this was so that theexp_fit_csnr
statistic can reuse thelognoiserate
function from theExpFitStatistic
In addition, there are some minor changes as well to fix some of the statistics which didn't work at all on previous master. There is a (just for reference) PR at #4607 to show these.
Testing
I have tested all existing (and working) statistics to check that given the appropriate features, the output remains identical.
The testing is done against the codes in #4607, so that we can test statistics against what they should be, rather than what they are.
Initial tests with a very small fraction of the bank have shown that the SNR-like statistics output files have identical hashes. The exp_fit statistics outputs are all the same to within a numpy.isclose test, i.e. ~1e-6 difference for values O(1). But I will add the results of more stringent testing here.
phasetd sensitive_volume normalize_fit_rate
alpha_below_thresh:6
phasetd sensitive_volume normalize_fit_rate chirp_mass
alpha_below_thresh:6
phasetd sensitive_volume normalize_fit_rate kde
alpha_below_thresh:6
phasetd sensitive_volume normalize_fit_rate dq
alpha_below_thresh:6
phasetd sensitive_volume normalize_fit_rate dq kde
alpha_below_thresh:6