Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

berkeley: Update minter to use first typecode when pattern contains multiple typecodes #597

Conversation

eecavanna
Copy link
Collaborator

Description

In this branch, I updated the minter so that, in addition to extracting typecodes verbatim from patterns in the schema, it now also extracts the first typecode in a (...|...|...)-formatted sequence.

I also added a TODO comment about moving away from extracting typecodes from these pattern strings, which I think are authored with the mindset of "here's what I want the schema to allow for values in this field," as opposed to "here's what I want the minter to use when creating an ID."

Fixes #592

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • I ran $ python -m doctest -v nmdc_runtime/minter/config.py within the fastapi container in my development environment. Looks to me like this may be able to be incorporated into the existing pytest infrastructure (based on what I see in the pytest documentation, here), but I haven't done it yet.

Definition of Done (DoD) Checklist:

  • My code follows the style guidelines of this project (have you run black nmdc_runtime/?)

    No, I'll leave that to the GitHub Actions workflow once I open the PR

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (in docs/ and in <htt ps://github.com/microbiomedata/NMDC_documentation/>?)
  • I have added tests that prove my fix is effective or that my feature works, incl. considering downstream usage (e.g. https://github.com/microbiomedata/notebook_hackathons) if applicable.
  • New and existing unit and functional tests pass locally with my changes (make up-test && make test-run)

@eecavanna eecavanna requested a review from dwinston July 17, 2024 01:15
@eecavanna eecavanna self-assigned this Jul 17, 2024
@eecavanna eecavanna marked this pull request as draft July 17, 2024 01:16
@eecavanna eecavanna removed the request for review from dwinston July 17, 2024 01:16
@eecavanna eecavanna changed the base branch from berkeley to main July 17, 2024 01:17
@eecavanna eecavanna marked this pull request as ready for review July 17, 2024 01:18
@turbomam
Copy link
Member

turbomam commented Jul 17, 2024

Thanks @eecavanna, this looks good to me and was turned around very fast.

I am going to restate the obvious: this requires that we agree on what typecode should appear first in the (...|...) regular expression fragment. In our two current use cases, it think it's pretty clear that the contemporary typecode (like dgns) should go before the legacy typecode (like omprc). So that should be documented somewhere, presumably at least in the schema documentation, but possible other places too? Or at least linked multiple places?

@aclum @kheal what are your thoughts about that? I think you or others you nominate should be end-user reviewers of this PR.

@kheal
Copy link

kheal commented Jul 17, 2024

@turbomam . Good point and I agree re: documentation. I think this is a schema/documentation request and at the least can be naturally incorporated into the CONTRIBUTING guide in microbiomedata/berkeley-schema-fy24#225, I've added it to that PR.

@aclum
Copy link
Contributor

aclum commented Jul 17, 2024

This is fine as a workaround but longer term I'd like something more explicit in the schema, like a separate slot which specifies the minting typecode @turbomam

@eecavanna
Copy link
Collaborator Author

I will work with @kheal to create either an Issue or a Discussion in nmdc-schema about representing the typecodes more explicitly in the schema. I will include @aclum and @SamuelPurvine (he has expressed interest) in the Issue or Description we come up with.

@eecavanna
Copy link
Collaborator Author

There is now a discussion in the nmdc-schema repo about defining the typecode in the schema in a more explicit way. Discussion: microbiomedata/nmdc-schema#2125

@eecavanna eecavanna changed the base branch from main to berkeley July 18, 2024 00:30
@eecavanna
Copy link
Collaborator Author

I changed the base branch from main to berkeley. I'll merge this into berkeley so people can try minting things that have the more complex patterns.

@eecavanna eecavanna merged commit 19dd7dc into berkeley Jul 18, 2024
2 checks passed
@eecavanna eecavanna deleted the 592-berkeley-update-minter-to-use-first-typecode-when-pattern-contains-multiple-typecodes branch July 18, 2024 00:31
@eecavanna eecavanna changed the title Update minter to use first typecode when pattern contains multiple typecodes berkeley: Update minter to use first typecode when pattern contains multiple typecodes Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

berkeley: Update minter to use first typecode when pattern contains multiple typecodes
4 participants