-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifications to support new StructuredFile ingestor #14
Conversation
…schema_path' to 'record_schema_path'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! My only request is that the contents of specs/schemas
and configuration/etl/etl_schemas.d/value_analytics
be deduplicated and references to those directories be cleaned up. This could be accomplished either by deleting specs/schemas
and removing the reference to it in build.json
or by deleting configuration/etl/etl_schemas.d/value_analytics
and updating build.json
to use this directory for builds instead of the old etl_specs.d
directory.
@tyearke Requested changes completed. I also updated the ETL config file to define filters in an array to bring it in line with changes to ubccr/xdmod#145 (and re-ran tests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specs/schemas
just needs to be removed from include_paths
in build.json
and it should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Also, thanks for fixing the use of MySQL type aliases for some of the columns - I didn't realize using those would cause table alterations on every run.
I have it on my list to add better support for things like |
These changes are in support of the new StructuredFile data endpoint. Note that this PR requires XDMoD PR ubccr/xdmod#145.*
parse()
methodsed
to collapse the various NIH agencies (e.g. NIH-NLM) into a single "NIH" agency usingsed -r 's/("agency": "NIH)(.*)",/\1",/'
array_element_schema_path
torecord_schema_path
since we no longer require the data to be an arrayTesting was performed by comparing a dump of the
modw_value_analytics
tables on va-demo to the newly ingested tables. Note that the only differences are that in the existing VA Demo data there is no abbreviation for Indiana University and the individual NIH funding agencies have been collapsed into a single NIH.