Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/add data checks #179

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

Fix/add data checks #179

wants to merge 34 commits into from

Conversation

AngRodrigues
Copy link
Member

@AngRodrigues AngRodrigues commented Jan 14, 2025

Description

added data_checks, a series of functions that check the main datatypes in map2loop: Geology, Structure, Faults and Folds. Most of the code was refactored to avoid code repetition between the different functions, but checks that are valid for one single datatype are kept in the same functions

Also changed the config dictionary data checks into this file, and the functionality of not allowing invalid args into project (from issue 162), as it made sense to have these fixes together. Please check PR #171 for more info on this.

Fixes #162
Fixes #138

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Test improvement

How Has This Been Tested?

I am attaching 3 notebooks that show some of the common warnings and errors.
Geology - https://gist.github.com/AngRodrigues/a273380a082d024669699d13fa99a0f8
Structure - https://gist.github.com/AngRodrigues/7e7c5b9bb178cb89f1f815c3392db417
Fault - https://gist.github.com/AngRodrigues/6db7a0eca5421b90f4c79ae6021f69f6

Also checked this with the notebooks from source data and Loop server data and it seems to work well.

Branch: fix/add_data_checks

AngRodrigues and others added 30 commits December 17, 2024 13:29
@AngRodrigues AngRodrigues mentioned this pull request Jan 14, 2025
2 tasks
@AngRodrigues AngRodrigues marked this pull request as ready for review January 14, 2025 05:18
Copy link
Member

@lachlangrose lachlangrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks pretty good, my only main comment is to not duplicate the logging. I think having the warnings/errors in the check functions and then raise the exception in the mapdata is good.

I would also consider moving the notebooks into the documentation examples because it probably would be useful for people to be able to review this.

Geopandas make_valid could be used before checking for validity. Or maybe as a configuration option?
@AngRodrigues let me know what changes you want to make and i can do the rest.

Comment on lines +510 to +511
if not geodata.geometry.is_valid.all():
logger.error(f"Invalid geometries found in datatype {datatype_name}. Please fix them before proceeding.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worthwhile trying to make the geometry valid?
https://geopandas.org/en/v1.0.1/docs/reference/api/geopandas.GeoSeries.make_valid.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure exactly what make_valid does, so it could potentially make a change we don't want but maybe as a config option

if geodata[id_column].isnull().any():
error_msg = (
f"Datatype {geodata_name}: Column '{id_column}' "
f"(config key: '{id_config_key}') contains non-numeric or NaN values. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, we could make the error message even better by providing the strings (or some of) for the values that couldn't be coerced

Comment on lines +707 to +715
# else:
# warning_msg = (
# f"Datatype {datatype_name.upper()}: Optional column '{column_name}' "
# f"(config key: '{config_key}') is missing from the data. "
# )
####### this might be taking it a bit too far

# logger.info(warning_msg)
# warnings.append(warning_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keep that as an info log, but not as a warning. Good to have the extra information avaliable if we need it

elif datatype == Datatype.FAULT:
validity_check, message = check_fault_fields_validity(mapdata = self)
if validity_check:
logger.error(f"Datatype FAULT - data validation failed: {message}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is too much duplication in the error messages. The validity checker prints out nice error messages. I think it would be enough to just raise the exception Datatype FAULT - data validation failed

Comment on lines +257 to +258
"bounding_box": bounding_box,
"working_projection": working_projection, # this may be removed when fix is added for https://github.com/Loop3D/map2loop/issues/103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should we calculate the bounding box from the extent of the shapefiles?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] project allows invalid arguments [Bug] - map2loop crashes if required fields contain null
2 participants