[FEAT] viz blocking rules using upset chart #2325

NickCrews · 2023-09-17T23:24:23Z

NickCrews
Sep 17, 2023

I can't remember if I already brought this up here, I did an issue search and couldn't find anything.

It would be awesome if we could use https://upset.app/ to visualize blocking rules.

Currently, the number of pairs each blocking rule generates depends on the ordering of the rules. This makes it look like the earlier rules are really broad, which isn't really true: if you moved a different rule first, it would also generate a lot of pairs.

an upset chart gives a more holisitic and accurate view of how your blocking rules are affecting performance, which sets of blocking rules are doing pretty much the same thing (and therefore maybe some of them could be removed), and which blocking rules are truly finding new candidate pairs (and thus I should go exploring looking for other similar rules to add).

We could do this in altair

We could even do something like the box-and-whisker or the scatterplot they have here, showing the distribution of match scores for each blocking rule! Then you could see "Oh well this blocking rule is only giving me like 10% true matches". then you could investigate what those other 90% of non-matches were, and you could look for some restriction on the blocking rule so you could cut out some of that 90%. This would really help with improving the efficiency of each blocking rule.

ThomasHepworth · 2023-09-20T21:01:13Z

ThomasHepworth
Sep 20, 2023
Maintainer

Thanks as ever, Nick. We really appreciate all of the comments, feedback and code you send our way.

And yes, you have brought this up previously, but in the form of a discussion.

RE: blocking rule updates and improvements - at present we're waiting for Robin to return from paternity and conclude his autoblocking work before doing any further work fleshing out our blocking rule. At that point we'll look at what we can do to further improve our previous charts and blocking methodology.

We're also planning on implementing one of your previous suggestions too at some point - #1493, when we find the capacity.

0 replies

NickCrews · 2023-09-21T17:34:20Z

NickCrews
Sep 21, 2023
Author

lol, I THOUGHT I already found that upset chart, but I found it again and was blown away a second time :)

Sounds good, no rush at all, I didn't double post out of impatience.

I would vote that we move discussion over from there over to here, as this is a proper issue that will probably be more visible.

0 replies

NickCrews · 2023-10-20T23:38:41Z

NickCrews
Oct 20, 2023
Author

FYI I have a basic implementation of this here,
you can see what this looks like in this walkthrough.
I would like to in the future refactor the upset plot code so that it has a nicer and more generalized API and then put that on pypi. If you would like to help design that API, let me know!

0 replies

ThomasHepworth · 2023-11-10T16:58:59Z

ThomasHepworth
Nov 10, 2023
Maintainer

Thanks so much. I've had a quick poke around your code, but I'm quite behind due to being sick for 2 of the last 3 weeks, so I haven't had as much time as I'd like.

I'll give you a proper response on Monday.

The chart looks great and some of your surrounding code is also something we are working towards implementing on our end (so it's nice to see a version in the wild).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] viz blocking rules using upset chart #2325

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[FEAT] viz blocking rules using upset chart #2325

NickCrews Sep 17, 2023

Replies: 4 comments

ThomasHepworth Sep 20, 2023 Maintainer

NickCrews Sep 21, 2023 Author

NickCrews Oct 20, 2023 Author

ThomasHepworth Nov 10, 2023 Maintainer

NickCrews
Sep 17, 2023

ThomasHepworth
Sep 20, 2023
Maintainer

NickCrews
Sep 21, 2023
Author

NickCrews
Oct 20, 2023
Author

ThomasHepworth
Nov 10, 2023
Maintainer