-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Multi-Cardinality Support to DedupliFHIR Backend #122
Conversation
Signed-off-by: Isaac Milarsky <[email protected]>
Signed-off-by: Isaac Milarsky <[email protected]>
Signed-off-by: Isaac Milarsky <[email protected]>
Signed-off-by: Isaac Milarsky <[email protected]>
Signed-off-by: Isaac Milarsky <[email protected]>
Signed-off-by: Isaac Milarsky <[email protected]>
else: | ||
blocking_rules.append(block_on(rule)) | ||
def get_additional_comparison_rules(parsed_data_df): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[pylint] reported by reviewdog 🐶
C0303: Trailing whitespace (trailing-whitespace)
Signed-off-by: Isaac Milarsky <[email protected]>
Looks like the installation for tests on 3.11 is failing, is that a fluke or does that need to be updated? |
Signed-off-by: Isaac Milarsky <[email protected]>
Sorry I was just merging in Dependabot updates. It's just a fluke and is fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for now as an approach to support multiple! It would be good to add a ticket to the backlog that would compare street_address1
to street_address2
in addition to street_address1
, but sounds like that would be a heavier lift
Add Multi-Cardinality Support to DedupliFHIR Backend
Problem
Currently, the dedupliFHIR tool only supports a specific column structure when processing user-inputted data. Meaning, that the columns that the tool checks for and compares are predefined and static.
For example, if the user input a piece of data that included more than one street address or postal code it would be ignored by the tool.
This problem is detailed in this issue: #54
Solution
I have added functionality to support multi-cardinality if it is found in the input data. Both FHIR and CSV data now include support for multiple names, addresses, and postal codes.
Result
Tests have been adapted to use more than one address, the settings for the Splink backend are now computed after the data is parsed instead of before the data is parsed, etc.
Test Plan
Run
make test