Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate CSV import files using CSVLint #285

Closed
pmackay opened this issue Dec 3, 2014 · 7 comments
Closed

Validate CSV import files using CSVLint #285

pmackay opened this issue Dec 3, 2014 · 7 comments

Comments

@pmackay
Copy link
Contributor

pmackay commented Dec 3, 2014

There is a bunch of good work on CSV validation by the ODI, see http://theodi.org/blog/introducing-csvlint. What about to reafactoring validation by:

This also might be a route to supporting more localized validation. Currently several of the validators are US-specific. With schemas perhaps the country specific validations could be extracted and defined per locale.

@monfresh
Copy link
Member

monfresh commented Dec 3, 2014

Adding a link in the WIki to the CSV lint tool to check that the files are proper CSV files is a good idea. As for rewriting the whole import process, if you can get it to work just as well as what we have currently, I'd be glad to review a pull request, but note that the data still has to go through the Rails validations.

In order to use different validations for postal code and phone numbers, you'd have to change the source code, so I'm not sure how the ODI tool will help, but you might know better than me.

The two US-specific validations that I can think of are for postal code and phone numbers. Those validations are based on a regular expression, which you can easily change to suit your needs in these files:

https://github.com/codeforamerica/ohana-api/blob/master/app/validators/zip_validator.rb#L5

https://github.com/codeforamerica/ohana-api/blob/master/app/validators/phone_validator.rb#L5

@monfresh
Copy link
Member

monfresh commented Dec 3, 2014

I also wanted to add that I don't think it's the API's job to validate that a CSV file is a proper CSV file. The API assumes that you are importing valid files to begin with, but it is able to catch errors in the data thanks to the Rails validations.

Ensuring that the files are kosher before importing them should be a separate step. Linking to the CSV lint tool is a good first step. Beyond that, checking for the schema and other things should probably live in an external tool, like the GTFS validator.

@niveditc started such an Open Referral validation tool, so I would recommend that you collaborate with her if you want to build one that uses the ODI tools.

@pmackay
Copy link
Contributor Author

pmackay commented Dec 3, 2014

Thanks for the info. But should the API be doing custom validation on postal and phone numbers in that case?

@monfresh
Copy link
Member

monfresh commented Dec 3, 2014

Yes, the API should validate anything that needs to be validated. Those validations are used for the admin interface as well. Most people using this project are based in the US, which is why the postal code and phone validations are US-based by default, but can easily be changed if necessary.

The geographic bounds for location-based searches are also US-based by default, but those can easily be changed as well, as can many other things that are customizable.

@pmackay
Copy link
Contributor Author

pmackay commented Dec 3, 2014

Ah, I see the difference. I was mainly suggesting using CSVLint as a way to do content validation against a table schema, rather than actually validating the CSV files are valid CSV in terms of quotes, whitespace, etc. But both would be useful I suppose, in a separate tool.

@monfresh
Copy link
Member

monfresh commented Dec 3, 2014

In Ohana API, content validation is done in the app via Rails because it's needed on an ongoing basis to ensure that the data stays valid when it is updated via the admin interface or when writing to the API. It's not just a one-time validation during CSV import. Doing content validation for Ohana API with another tool like CSVLint would be redundant.

However, since there are potentially other apps that would use Open Referral data that might not necessarily have internal validations of their own, an external content validation tool could be useful in that case.

@monfresh
Copy link
Member

monfresh commented Dec 9, 2014

Closing this since we've established that content validation with CSVLint would not solve any problems that aren't already addressed, or that can only be addressed by Rails validations.

@monfresh monfresh closed this as completed Dec 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants