-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data consistency issues #438
Comments
Hi Kurtis, I think most of the information you have questions about was automatically created from the Electronic CFR page when we made the templates for the AVAs waiting to be digitized, and I know at least some of the inconsistencies like the CFR Index (See Issue #333 ) originate from there . I suspect the escape characters also came from there. I'd be happy to have your ideas of how to fix some of these issues. The date format has been a bit of a moving target. Sometimes QGIS seems to reformat the dashes or slashes so I've been letting that slide for now, but ideally they should be consistent. I'm the person who creates the big file with all the AVAs when a new AVA is submitted. The R script I use is in the rcode folder. I'm realizing that is out of date after a couple of pull request the last few days so I'm going to go update that now. best, |
Now that my brain has had a chance to think about all these things, I'm planning to write a script to clean up the format issues you pointed out. |
I think a good first step would be to document what should be in the GeoJSON files as well as the overall structure. What properties are required, what's optional, data type, format, etc. And also what to do if the information isn't provided. For example with Be very specific. For example, Also, it might be better to remove the top level name property entirely since it's redundant. I've also encountered leading spaces in the Additional thoughts (maybe open a separate issue for each to discuss with others?):
{
"type": "FeatureCollection",
"features": [{
"type": "Feature",
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[-93.8900165363182, 35.5418952876998],
[-93.9211350905844, 35.5089957584565],
[-93.858897982052, 35.5089957584565],
[-93.8900165363182, 35.5418952876998]
]
]
]
},
"properties": {
"ava_id": "altus",
"name": "Altus",
"aka": null,
"created": null,
"removed": null,
"county": "Franklin",
"state": "AR",
"within": "Arkansas Mountain|Ozark Mountain",
"contains": null,
"petitioner": null,
"cfr_author": null,
"cfr_index": "9.77",
"cfr_revision_history": "[T.D. ATF-176, 49 FR 22471, May 30, 1984]",
"approved_maps": "(1) Ozark Quadrangle, 1966 ...",
"boundary_description": "General. The Altus viticultural area ...",
"used_maps": null,
"valid_start": null,
"valid_end": null,
"lcsh": null,
"sameas": null
}
}]
} Have a great weekend! Kurtis |
Most GIS software that reads geojson won't be able to handle or edit such attributes
I think the contains and within are actually in the AVA text descriptions. I'm not sure if any mention overlap. Easy enough to derive if needed.
Most GIS software that writes geojson will not preserve this, if a consistent method could be found that could be useful for the diffs. Fields are defined in the Readme. I'm not sure there's a way to consistently handle empty fields, once again the software used to edit sometimes changes if it's null. Many of these fixes can be applied at the R script after submission. |
I'm going to make individual issues for the pieces of this that need to be addressed and close this issue. I agree with Alex that some of these are things that aren't possible given that the focus here is producing a spatial data format readable by GIS software, however, I am getting started on an R script to fix the major inconsistency issues. See the consistent-formatting branch. |
I'll be submitting a pull request this afternoon to correct some errors with the
cfr_index
property on three files in the avas folder.I've noticed several other consistency issues and would be happy to correct these as well in a separate pull request. I want to be sure before I spend the time though.
They are:
name
property? Many do not.properties.ava_id
be all lower case? Some are not.YYYY-MM-DD
andYYYY/MM/DD
are used. The README indicates that it should be the former./n
, throughout many of the text properties. Some are trailing and some in the body. Those at the end should definitely be removed. And since this is a dataset we should probably avoid adding any kind of formatting as well such as what I'm seeing in theused_maps
properties.Maybe there should be some detailed documentation regarding how each GeoJSON file should be formatted. We could even write some tests to ensure the integrity of the data. Contributors would have to run the tests first to make sure they all pass before submitting a pull request.
Also, when the individual AVA files are updated. Who's responsible for updating the consolidated versions?
This is a fantastic project. I look forward to contributing to it further.
Thanks,
Kurtis
The text was updated successfully, but these errors were encountered: