Replies: 4 comments
-
Thanks as ever, Nick. We really appreciate all of the comments, feedback and code you send our way. And yes, you have brought this up previously, but in the form of a discussion. RE: blocking rule updates and improvements - at present we're waiting for Robin to return from paternity and conclude his autoblocking work before doing any further work fleshing out our blocking rule. At that point we'll look at what we can do to further improve our previous charts and blocking methodology. We're also planning on implementing one of your previous suggestions too at some point - #1493, when we find the capacity. |
Beta Was this translation helpful? Give feedback.
-
lol, I THOUGHT I already found that upset chart, but I found it again and was blown away a second time :) Sounds good, no rush at all, I didn't double post out of impatience. I would vote that we move discussion over from there over to here, as this is a proper issue that will probably be more visible. |
Beta Was this translation helpful? Give feedback.
-
FYI I have a basic implementation of this here, |
Beta Was this translation helpful? Give feedback.
-
Thanks so much. I've had a quick poke around your code, but I'm quite behind due to being sick for 2 of the last 3 weeks, so I haven't had as much time as I'd like. I'll give you a proper response on Monday. The chart looks great and some of your surrounding code is also something we are working towards implementing on our end (so it's nice to see a version in the wild). |
Beta Was this translation helpful? Give feedback.
-
I can't remember if I already brought this up here, I did an issue search and couldn't find anything.
It would be awesome if we could use https://upset.app/ to visualize blocking rules.
Currently, the number of pairs each blocking rule generates depends on the ordering of the rules. This makes it look like the earlier rules are really broad, which isn't really true: if you moved a different rule first, it would also generate a lot of pairs.
an upset chart gives a more holisitic and accurate view of how your blocking rules are affecting performance, which sets of blocking rules are doing pretty much the same thing (and therefore maybe some of them could be removed), and which blocking rules are truly finding new candidate pairs (and thus I should go exploring looking for other similar rules to add).
We could do this in altair
We could even do something like the box-and-whisker or the scatterplot they have here, showing the distribution of match scores for each blocking rule! Then you could see "Oh well this blocking rule is only giving me like 10% true matches". then you could investigate what those other 90% of non-matches were, and you could look for some restriction on the blocking rule so you could cut out some of that 90%. This would really help with improving the efficiency of each blocking rule.
Beta Was this translation helpful? Give feedback.
All reactions