Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to handle Bioportal normalization #16

Merged
merged 7 commits into from
Oct 21, 2022
Merged

Conversation

caufieldjh
Copy link
Contributor

@caufieldjh caufieldjh commented Oct 20, 2022

@caufieldjh caufieldjh marked this pull request as ready for review October 21, 2022 19:29
@caufieldjh caufieldjh merged commit 81a97db into main Oct 21, 2022
@caufieldjh caufieldjh deleted the bioportal_parsing branch October 21, 2022 19:32
@@ -406,3 +399,48 @@ def load_sssom_maps(maps) -> tuple:
print(f"Loaded {len(cat_map)} category mappings.")

return (id_map, cat_map)


def obo_handle(old_id: str) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really be fixing the sources rather than writing such exception handling code.. is there a way we can get a report of all "exceptions" fixed this way so we can try to correct them in the ontologies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love a way to automate fixing this across the original ~1,000 Bioportal entries (because that's mostly what this is for), but for now, all IDs are written out to one of three different reports, as needed:

  • IDs of unexpected format, e.g.,
ID
OBO:ExO_0000030
OBO:ExO_0000151
OBO:ExO_0000152
  • IDs with remapped categories
Old ID	New Category
OBO:ExO_0000030	biolink:NamedThing
OBO:ExO_0000151	biolink:NamedThing
OBO:ExO_0000152	biolink:NamedThing
  • IDs with remapped IDs
Old ID	New ID
OBO:ExO_0000030	EXO:0000030
OBO:ExO_0000151	EXO:0000151
OBO:ExO_0000152	EXO:0000152

So that last report would be most useful for finding the easily-solved exceptions, but the first report may also contain some candidates for repair.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me, why are these not correctly understood to be: ExO:0000030?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it's to align with the Bioportal ID (EXO). I'm thinking about adding a profile option to use "OBO mode" so the Bioportal prefixes can still be used for mapping but will be normalized to the preferred forms like ExO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants