Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curate apply-geolocation-rules: use default rules and custom rules #1745

Merged
merged 5 commits into from
Feb 11, 2025

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Feb 6, 2025

Description of proposed changes

This PR is based on #1744

Improvements to augur curate apply-geolocation-rules:

  • use the default geolocation rules by default
  • custom rules provided via --geolocation-rules have precedence over the default rules
  • allow users to ignore the default rules with --no-default-rules flag
  • raise an error if both default rules and custom rules are not provided

Related issue(s)

Resolves #1648

Checklist

  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

Copy link

codecov bot commented Feb 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.11%. Comparing base (58eade9) to head (430e70e).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1745      +/-   ##
==========================================
+ Coverage   73.06%   73.11%   +0.04%     
==========================================
  Files          79       79              
  Lines        8335     8350      +15     
  Branches     1700     1704       +4     
==========================================
+ Hits         6090     6105      +15     
  Misses       1958     1958              
  Partials      287      287              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joverlee521 added a commit that referenced this pull request Feb 7, 2025
The new geolocation_rules.tsv does not exist yet, so the linkcheck fails
with a 404 status code. Temporarily ignore this URL until we do a new
release of Augur that includes the file.

<#1745 (comment)>
[record.get(field, '') for field in location_fields],
args.case_sensitive)

# If no custom rules are applied, then try to apply the default rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[scanning through code] Is this the bit that resolves #1648?

Copy link
Contributor Author

@joverlee521 joverlee521 Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I've tried to make it straight-forward that a record will either get changed by the custom rules or the default rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I imagine you can still get circular-errors within the custom rules, but this seems acceptable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd consider circular-errors within the custom rules a user error. Hmm, maybe we should make it easier to check for cyclic rules in the rules file...

Comment on lines +12 to +13
class NoGeolocationRulesProvidedError(AugurError):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not a request for changes] I'm curious, what's the benefit of a new error vs AugurError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably overkill in this case, but usually the main benefit for me is being able to catch specific errors.

@joverlee521 joverlee521 force-pushed the default-geolocation-rules branch from 96298c0 to 58eade9 Compare February 10, 2025 19:56
`--geolocation-rules` is now optional and acts as custom rules that
have precedence over the default geolocation rules in Augur.

The custom rules are applied first and if none of the custom rules are
applied to a record, then use the default rules. This means that if the
custom rules have wildcards, then all locations that match the wildcards
will no longer use the default rules. This is best demonstrated in the
Cram test included in this commit.
Allow users to completely ignore Augur's default geolocation rules
without having to override all of them in the custom rules.
In the case where user is using the `--no-default-rules` flag and
have not provided custom rules via the `--geolocation-rules` option,
the records will pass through without changes.

Rather than have this be a no-op, I think this is a user error and
should fail loudly.
The new geolocation_rules.tsv does not exist yet, so the linkcheck fails
with a 404 status code. Temporarily ignore this URL until we do a new
release of Augur that includes the file.

<#1745 (comment)>
@joverlee521 joverlee521 force-pushed the use-geolocation-rules branch from 5e7e10d to 02ad675 Compare February 10, 2025 20:07
@joverlee521 joverlee521 force-pushed the use-geolocation-rules branch from 02ad675 to 430e70e Compare February 10, 2025 20:13
Base automatically changed from default-geolocation-rules to master February 11, 2025 21:38
@joverlee521 joverlee521 merged commit e6df775 into master Feb 11, 2025
36 checks passed
@joverlee521 joverlee521 deleted the use-geolocation-rules branch February 11, 2025 21:39
joverlee521 added a commit to nextstrain/pathogen-repo-guide that referenced this pull request Feb 13, 2025
Remove the use of the ncov-ingest geolocation rules since Augur
now uses the built-in geolocation rules by default.

Depends on the release of
<nextstrain/augur#1745>
joverlee521 added a commit to nextstrain/zika that referenced this pull request Feb 13, 2025
Remove the use of the ncov-ingest geolocation rules since Augur
now uses the built-in geolocation rules by default.

Depends on the release of
<nextstrain/augur#1745>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

curate apply-geolocation-rules: Allow rule overrides
4 participants