Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assessment tools with "False" and "True" strings are parsed as boolean, breaking things #107

Closed
2 tasks done
surchs opened this issue Mar 31, 2023 · 0 comments · Fixed by #115
Closed
2 tasks done
Assignees
Labels
Epic A collection of issues that are related by topic and can be addressed together.

Comments

@surchs
Copy link
Contributor

surchs commented Mar 31, 2023

When I have an assessment tool column with unique values that look like the python special words for boolean values (i.e. "True" and "False"), then pandas reads the whole column as type boolean. If the "False" value is meant to indicate a MissingValue, then this is ignored now, because in the data dictionary it is described as a string, but in the loaded bagel.tsv, it is now a bool.

Here is an example of a data dictionary for a column with "True" and "False" values

  "moca_total_status": {
    "Description": "Montreal Cognitive Assessment",
    "Levels": {
      "True": "Completed",
      "False": "Not completed"
    },
    "Annotations": {
      "IsAbout": {
        "TermURL": "bg:Assessment",
        "Label": "Assessment Tool"
      },
      "IsPartOf": {
        "TermURL": "cogAtlas:trm_57964b8a66aed",
        "Label": "Montreal Cognitive Assessment"
      },
      "MissingValues": ["False"]
    }
  }

and here is an example of what the corresponding bagel.tsv file would look like

participant_id moca_total_status
sub-01 True
sub-02 False
sub-03 True

Then all three subjects (sub-01, sub-02, sub-03) would be shown to "have" the MOCA tool, because none of them would have a "False" value inside of the loaded bagel.tsv (because it got cast into a boolean and False != "False" in python).

Sneakily, all of this happens completely silently and without any error, because we are not checking whether the values described in the data dictionary actually exist in the loaded bagel.tsv file.

I think two things should be done:

@surchs surchs converted this from a draft issue Mar 31, 2023
@surchs surchs moved this from Backlog to Specify - Active in Neurobagel Mar 31, 2023
@surchs surchs added the Epic A collection of issues that are related by topic and can be addressed together. label Mar 31, 2023
@surchs surchs removed the status in Neurobagel Mar 31, 2023
@alyssadai alyssadai moved this to Implement - Done in Neurobagel Apr 12, 2023
@alyssadai alyssadai self-assigned this Apr 12, 2023
@surchs surchs moved this from Implement - Done to Review - Active in Neurobagel Apr 12, 2023
@github-project-automation github-project-automation bot moved this from Review - Active to Review - Done in Neurobagel Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic A collection of issues that are related by topic and can be addressed together.
Projects
Archived in project
2 participants