Implement relationship
and refine join match warning
#6753
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #6731
Closes #6717
Joint PR on the vctrs side r-lib/vctrs#1791, which contains even more detail
This PR hopefully fixes most of the complaints we've seen about the multiple match warning. At a high level:
As outlined in #6731, only warning on many-to-many makes much more sense, as one-to-many is generally pretty useful, and is symmetric with many-to-one which we don't warn on, and that didn't make much sense. As further proof that we should warn on many-to-many, most SQL dialects won't even let you create a many-to-many "relationship" between two tables; instead you have to create a 3rd junction table to break it into two one-to-many relationships.
This PR accomplishes this in two steps by:
multiple
default to"all"
(same as SQL), and deprecating theNULL
,"error"
, and"warning"
optionsrelationship
argument which replaces themultiple = "error"
case and holistically handles multiple matches betweenx
andy
The
relationship
argument takes on various options:NULL
(default, chooses vctrs options of"none"
or"warn-many-to-many"
automatically)"one-to-one"
"one-to-many"
"many-to-one"
"many-to-many"
These are inspired by database table relationships, which end up being orthogonal to the idea of the "kind" of join you are doing (i.e. left/right/full/inner), so we can add an argument for this without any ambiguity or conflicts.
The choice of
relationship
activates a constraint onx
andy
, for example,"many-to-one"
says that:x
can match at most 1 row iny
y
can match any number of rows inx
Which makes
relationship = "many-to-one"
a nice replacement formultiple = "error"
. But we also now have"one-to-one"
and"one-to-many"
too!Note that at most 1 means this doesn't handle the 0 match case, which is instead handled jointly by
unmatched
and your choice of join.For
NULL
:"many-to-many"
if that is expected. This is the only time we now automatically warn, so it should come up much less often.