-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use read_csv_to_dataframe
in validation
#1419
Conversation
@@ -462,6 +462,13 @@ | |||
"type": "integer", | |||
"about": "Shore point ID" | |||
}, | |||
"R_hab": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this up so that the r_hab
column is matched before we get to the [HABITAT]
pattern, which matches everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out! Could you add an inline comment about this so we don't forget in the future?
"units": u.kilowatt | ||
}, | ||
"hsmax": { | ||
"columns": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched this to columns so that we don't have to validate the individual rows, as discussed on the last team call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense! Could you add an inline comment about this so we are sure to remember?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emlys ! I really like the new contract being enforced here in the validated dataframe ... really cleans things up conceptually.
I just had a few minor comments and suggestions, and then would you object to adding a note to HISTORY
about this? I think the main thing that might be worth mentioning there is something along the lines of improving consistency in validation of tables, which should improve the readability of validation errors. Just an idea ... I defer to you on it!
def read_csv_to_dataframe(path, spec, **kwargs): | ||
def read_csv_to_dataframe(path, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're removing the spec, I think this docstring becomes much, much simpler! Could you update the docstring to reflect the current state of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh true!
elif col_spec['type'] == 'boolean': | ||
df[col] = df[col].astype('boolean') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an else
base case, in case something falls through the other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! That should never happen with our code because the model specs are tested, but it would be good to have in case of plugins.
@@ -462,6 +462,13 @@ | |||
"type": "integer", | |||
"about": "Shore point ID" | |||
}, | |||
"R_hab": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out! Could you add an inline comment about this so we don't forget in the future?
"units": u.kilowatt | ||
}, | ||
"hsmax": { | ||
"columns": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, makes sense! Could you add an inline comment about this so we are sure to remember?
Co-authored-by: James Douglass <[email protected]>
Co-authored-by: James Douglass <[email protected]>
@phargogh I think I addressed all your comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you!
Description
Fixes #1379
This sets us up for #327.
utils.read_csv_to_dataframe
in a new function,validation.get_validated_dataframe
. This new function enforces a contract: given a CSV path and a spec,get_validated_dataframe
will either return a dataframe that matches the spec, or raise an error if that's not possible.utils.read_csv_to_dataframe
now handles the lower-level CSV parsing details.validation.get_validated_dataframe
is used both in validation and in execute. If it raises an error during validation, that error message will be returned as a validation message. If it raises an error in execute, that error will be raised. This is convenient because we get the same message in either context without duplicating.utils.read_csv_to_dataframe
(indirectly throughvalidation.get_validated_dataframe
in most cases, or directly in the few places (HRA criteria table, Wave Energy machine performance table)). This lets us apply the same basic rules to all CSVs, such as encoding, dropping empty rows/columns, and lower-casing column names.Checklist