Implement script that can populate the DB from Open Referral-compliant CSV files. #243

volkanunsal · 2014-10-12T03:03:05Z

One of the main challenges I am seeing in importing data into new Ohana installations is the layers of abstraction I have to jump through in order to get the datasets from flat csv files into the nested JSON file that Ohana expects.

What if we made this easier? Why can't we make the expected data into a denormalized version of the database schema, and not bother the maintainers with transforming that into any intermediary data formats?

In addition, it's common for the service providers to organize their data around services, rather than organizations. So why does the import script assume the organizations will be at the top-level?

My ideal import file would be a csv file where each row would represent a single service, and it would take care of the joins in the script.

For example:

service.name	service.organization_name	service.organization.location.name
Home and health service	Red Cross	Shelter XYX

monfresh · 2014-10-13T16:13:46Z

Thanks for your comment, and sorry for the confusion. The current import script is based on the way we transformed San Mateo County's data from the MARC 21 format to JSON. At the time, converting to JSON made the most sense since we were working with just a flat file and had to create the schema from that file.

Now that we are developing a Human Services Data Specification, aka Open Referral, the goal is to replace the current script with one that takes Open Referral-compliant CSV files and populates the DB from them.

The spec is only at version 0.2 at the moment. 1.0 is scheduled to be released on January 5, 2015. Overall, I don't think the schema will change dramatically between now and then, but some of the fields within each table (or CSV file) might. So, to start experimenting with this new CSV-based import script, the first step would be to obtain a demo set of CSV files, which I think @spara was working on providing.

I will rename this Issue to represent the need for a new import script based on Open Referral CSV files.

monfresh · 2014-11-11T15:37:55Z

Working on this now. Hope to have it done by the end of the week.

volkanunsal · 2014-11-11T16:09:18Z

👍

monfresh · 2014-11-13T03:35:29Z

Here is a set of valid CSV files for reference: https://docs.google.com/a/codeforamerica.org/spreadsheets/d/1qkysLj5MAAvaysuac_9H0hOvTAb1aJEMhUVJAECmj2c/edit#gid=0

Next step is to write a script to parse the CSV files and import to the DB.

volkanunsal · 2014-11-13T04:20:28Z

A couple of thoughts on this.

Short feedback loop –– you shouldn't need to wait till the end of the script to see the errors. Maybe stopping the import process when an error occurs to give the user time to correct it before resuming again is the way to go.

API endpoint to bulk upload data –– this would make it possible something I've heard a lot of requests for: bulk updates/uploads from the admin interface. It could also be very useful in the "welcome" interface.

monfresh · 2014-11-13T04:44:56Z

Short feedback loop –– you shouldn't need to wait till the end of the script to see the errors. Maybe stopping the import process when an error occurs to give the user time to correct it before resuming again is the way to go.

How would this work when you're reading from CSV files?

API endpoint to bulk upload data –– this would make it possible something I've heard a lot of requests for: bulk updates/uploads from the admin interface.

What kind of bulk updates? What kind of bulk uploads that can't be performed with the import script?

It could also be very useful in the "welcome" interface.

Why can't the "welcome" interface use the CSV-to-DB import script?

volkanunsal · 2014-11-13T06:16:32Z

How would this work when you're reading from CSV files?

I started a prototype of this the other day. I can look it up tomorrow, but the basic idea was to use a JSON stream parser to consume chunks of the file without buffering the whole thing, so you can stop at any point to provide feedback and resume again. I think my plan then was to use the Google Docs API, which can give very detailed info about where the data is coming from, and if it generates an error, you can point them to the row number so they can edit it. You might be able to use this with an uploaded CSV file, but the error/feedback/edit loop wouldn't be as swift.

What kind of bulk uploads that can't be performed with the import script?

Updates to data that have some sort of access control on them, like an organization bulk uploading services and locations, or an app developer updating the said services for the organization.

Why can't the "welcome" interface use the CSV-to-DB import script?

I guess it could. That doesn't need to be part of the API because it doesn't need any sort of access control check and could only be done by the super_admin users.

volkanunsal · 2014-11-13T19:33:22Z

The stream parser is probably an overkill. I remember that's where I got stuck with, but, as an alternative, maybe you might be able to save which row where the error happened and then modify the request to Google Docs using the row number as a parameter (not sure how to do that yet), which would lessen the annoyance of long wait times as the script catches up to where the error had happened.

Here is a url I had prepared for a mock spreadsheet import file.

monfresh · 2014-11-13T20:27:12Z

What I'm working on right now is a simple solution that parses local CSV files, not Google Docs. If an error is encountered, it will output to the command line the line number and the column that has the error.

monfresh · 2014-11-15T06:45:20Z

Here is what I have so far: 925d44c

monfresh · 2014-11-18T01:19:28Z

This is done now and the Wiki article has been updated: https://github.com/codeforamerica/ohana-api/wiki/Populating-the-Postgres-database-from-OpenReferral-compliant-CSV-files.

Anyone with OR-compliant CSV files can now import them into Ohana API.

monfresh · 2014-11-18T01:20:36Z

Oh, and instructions for running the script are in the updated INSTALL.md: https://github.com/codeforamerica/ohana-api/blob/master/INSTALL.md#uploading-and-validating-your-own-data

monfresh changed the title ~~[RFC] Making data import easy~~ Implement script that can populate the DB from Open Referral-compliant CSV files. Oct 13, 2014

monfresh added the help wanted label Oct 17, 2014

monfresh added the in progress label Nov 11, 2014

monfresh removed the help wanted label Nov 15, 2014

monfresh closed this as completed Nov 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement script that can populate the DB from Open Referral-compliant CSV files. #243

Implement script that can populate the DB from Open Referral-compliant CSV files. #243

volkanunsal commented Oct 12, 2014

monfresh commented Oct 13, 2014

monfresh commented Nov 11, 2014

volkanunsal commented Nov 11, 2014

monfresh commented Nov 13, 2014

volkanunsal commented Nov 13, 2014

monfresh commented Nov 13, 2014

volkanunsal commented Nov 13, 2014

volkanunsal commented Nov 13, 2014

monfresh commented Nov 13, 2014

monfresh commented Nov 15, 2014

monfresh commented Nov 18, 2014

monfresh commented Nov 18, 2014

Implement script that can populate the DB from Open Referral-compliant CSV files. #243

Implement script that can populate the DB from Open Referral-compliant CSV files. #243

Comments

volkanunsal commented Oct 12, 2014

monfresh commented Oct 13, 2014

monfresh commented Nov 11, 2014

volkanunsal commented Nov 11, 2014

monfresh commented Nov 13, 2014

volkanunsal commented Nov 13, 2014

monfresh commented Nov 13, 2014

volkanunsal commented Nov 13, 2014

volkanunsal commented Nov 13, 2014

monfresh commented Nov 13, 2014

monfresh commented Nov 15, 2014

monfresh commented Nov 18, 2014

monfresh commented Nov 18, 2014