Could bss_eval return min and max as well? #192

carlthome · 2016-05-16T12:06:19Z

First of all, love this package! 💃

Curious if bss_eval could return the minimum and maximum SDR, SIR, SAR as well as the average? I realize its based on Emmanuel Vincent's method but variance seems important in many cases. Lets say average SIR over a song is 10 dB. Roughly speaking, it could be the case that the chorus with lots of harmonic and percussive elements is easy to separate while the verse with, lets say less instrumentation, is harder to separate. Obviously one could pre-segment the signal but I figured asking doesn't hurt.

The text was updated successfully, but these errors were encountered:

craffel · 2016-05-17T17:00:22Z

Hey, thanks for the kind words. Usually we focus on implementing evaluation metrics as widely used/specifically defined in research, so extensions like this are a little out of the scope of mir_eval unless they are starting to be utilized in publications or MIREX (or DCASE as the case may be). Do you know of people reporting min/max too? Otherwise, another goal of mir_eval is to make it very easy to modify the metrics so that they can be customized :)

carlthome · 2016-05-18T09:47:55Z

Well, most recent papers seem to introduce their own additional evaluation aside from SDR, SIR, SAR, and many rely on listening tests still, but there's also this more recent framework (2011), PEASS, with objective and subjective metrics that promises to be more inline with perceptual quality, by some of the same people that proposed bss_eval originally back in 2006, including Vincent.

PEASS has three parts, one with objective metrics, one with subjective metrics derived from listening tests, and one part for making new listening tests. The first part could be a welcome addition to mir_eval maybe?

Similarly to BSS Eval, the distortion signal is decomposed into three components: target distortion, interference, artifacts. These components are then used to compute four quality scores, namely OPS (Overall Perceptual Score), TPS (Target-related Perceptual Score), IPS (Interference-related Perceptual Score), APS (Artifact-related Perceptual Score). These scores better correlate with human assessments than the SDR/ISR/SIR/SAR measures of BSS Eval.

I'd be pretty content with a minimum SDR, SIR and SAR though, since that at least gives an idea of a lower bound on how poorly a separation method could be in a given time frame. I think many papers don't provide variance because they want to hide the fact that their model is not robust and has adversarial examples. For example when sources have similar energy distributions and the model discards phase information. Like, lets say a singer is particularly good at being in tune with the instrument and also just happens to display similar formants as the instruments' resonances. It seems to be surprisingly problematic, in my tiny experience.

Also part of the issue stems from that bss_eval is often used for speech enhancement as well as music source separation, in which it's probably pretty unlikely for the speech to have a similar energy distribution as the background noise in practice. Thus there's not a strong tradition of reporting variance across the test samples because it's typically insignificantly small in many cases (I'm guessing). Since mir_eval focuses on music though, I think the priorities should be different!

craffel · 2016-05-18T19:56:14Z

Well, most recent papers seem to introduce their own additional evaluation aside from SDR, SIR, SAR, and many rely on listening tests still, but there's also this more recent framework (2011), PEASS, with objective and subjective metrics that promises to be more inline with perceptual quality, by some of the same people that proposed bss_eval originally back in 2006, including Vincent.

Yes, in fact we have a TODO issue about adding PEASS (and other metrics) #68! A PR would be welcome.

I'd be pretty content with a minimum SDR, SIR and SAR though, since that at least gives an idea of a lower bound on how poorly a separation method could be in a given time frame.

For now, unless we have some examples of papers/contests which are using minimum (or maximum, or variance), I'd prefer to do the community standard. It should be pretty straightforward to create a custom version of the function for your own purposes, which is one of the intentions of mir_eval.

carlthome · 2016-05-18T21:35:37Z

Yes, in fact we have a TODO issue about adding PEASS (and other metrics) #68! A PR would be welcome.

Cool! 👍

For now, unless we have some examples of papers/contests which are using minimum (or maximum, or variance), I'd prefer to do the community standard. It should be pretty straightforward to create a custom version of the function for your own purposes, which is one of the intentions of mir_eval.

Sensible! Let me try to nag research teams, MIREX and SiSEC about it and I'll get back to this hopefully. It seems like bad practice to not estimate worst-case over a common test set, and it makes the different methods hard to compare for real-world usage.

carlthome · 2016-05-18T21:42:10Z

I'm curious though, the paper sparking BSS_EVAL mentions "local performance measures" for calculating SDR, SIR, SAR when the performance is expected to vary noticeably over time. Essentially it's a windowing function and sliding it across the signal, just like for STFT vs. FFT. This doesn't seem to be provided in the actual implementation though. Correct?

craffel · 2016-05-19T14:57:19Z

This doesn't seem to be provided in the actual implementation though. Correct?

Yes, I don't know of this being implemented/used, although it's useful. @dawenl do you know?

faroit · 2016-05-19T15:43:38Z

Maybe it's time to jump in here, I will be helping out @aliutkus organising the upcoming SISEC evaluation and we will be releasing a modified version of the matlab bss_eval code as well together with a python wrapper. Both will be on github in couple of days.

@carlthome As far as I understand your comment: yes, the upcoming bss_eval matlab version does also allow you to output the instantaneous SDR values for a given window size. You can have a look in a couple of days.

Also I am working on a pure python based version of bss_eval_images as well as a cython version, should also be finished soon and could be merged into mireval eventually later.

craffel · 2016-05-19T15:44:27Z

Also I am working on a pure python based version of bss_eval_images as well as a cython version, should also be finished soon and could be merged into mireval eventually later.

That would be great, thanks!

carlthome · 2016-05-19T15:59:43Z

@faroit Cool. 😄 Looking forward to the Python wrapper.

Do you have any thoughts on why SDR, SIR, SAR variance of the local measures is seldom listed in papers (such as Huang, Erdogan, Weninger, the Wangs, etc.), by the way? Seems most go for a point-estimate of the average "global" SDR, SIR, SAR over all estimated sources in a test set.

dawenl · 2016-05-19T16:34:52Z

It's been a while since the last time I touched upon anything related to source separation, but it seems that everything is solved (or hopefully will be resolved soon)?

aliutkus · 2016-05-19T16:40:10Z

thanks so much @faroit for handling all this python evaluation voodoo magic ! the upcoming dsd100 package he's been preparing for a while makes it very practical to test separation stuff in python indeed !

concerning everything being solved, we're not quite exactly there yet, but yeah, the community has been very active and it's a pleasure to see things working quite well now =D

faroit · 2016-06-02T15:39:22Z

Yes, I don't know of this being implemented/used, although it's useful. @dawenl do you know?

@craffel @carlthome see here. This is what will be used for the upcoming SISEC.

craffel · 2016-08-19T02:08:04Z

Has this been covered by the recent enhancements to separation eval (e.g. by using framewise eval and computing max/min by hand)? If so, can you close?

carlthome · 2016-08-19T07:14:48Z

Yes. Great.

faroit mentioned this issue Jun 2, 2016

Implement more separation metrics #68

Open

ecmjohnson mentioned this issue Jun 8, 2016

Adding framewise evaluation for source separation #199

Closed

ecmjohnson mentioned this issue Jul 26, 2016

Adding framewise evaluation for bss_eval_images #212

Merged

carlthome closed this as completed Aug 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could bss_eval return min and max as well? #192

Could bss_eval return min and max as well? #192

carlthome commented May 16, 2016

craffel commented May 17, 2016

carlthome commented May 18, 2016 •

edited

Loading

craffel commented May 18, 2016

carlthome commented May 18, 2016 •

edited

Loading

carlthome commented May 18, 2016 •

edited

Loading

craffel commented May 19, 2016

faroit commented May 19, 2016

craffel commented May 19, 2016

carlthome commented May 19, 2016 •

edited

Loading

dawenl commented May 19, 2016

aliutkus commented May 19, 2016

faroit commented Jun 2, 2016 •

edited

Loading

craffel commented Aug 19, 2016

carlthome commented Aug 19, 2016

Could bss_eval return min and max as well? #192

Could bss_eval return min and max as well? #192

Comments

carlthome commented May 16, 2016

craffel commented May 17, 2016

carlthome commented May 18, 2016 • edited Loading

craffel commented May 18, 2016

carlthome commented May 18, 2016 • edited Loading

carlthome commented May 18, 2016 • edited Loading

craffel commented May 19, 2016

faroit commented May 19, 2016

craffel commented May 19, 2016

carlthome commented May 19, 2016 • edited Loading

dawenl commented May 19, 2016

aliutkus commented May 19, 2016

faroit commented Jun 2, 2016 • edited Loading

craffel commented Aug 19, 2016

carlthome commented Aug 19, 2016

carlthome commented May 18, 2016 •

edited

Loading

carlthome commented May 18, 2016 •

edited

Loading

carlthome commented May 18, 2016 •

edited

Loading

carlthome commented May 19, 2016 •

edited

Loading

faroit commented Jun 2, 2016 •

edited

Loading