-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk catalog access option: access to all datasets in a single single file #7
Comments
I think it would actually be nice if there could be two levels of compliance, where the single file download looked like a valid REST endpoint, but with reduced functionality. One level of compliance would be built with Catalogs in mind, and one built mainly with Data Sources in mind. The bulk catalog operations you mention would mainly be targeted at data sources or very small catalogs. So, if it were structured so that the dataset endpoint would return the JSON of the catalog up to the first 1000 records, and would return full results (so a sync would not require getting the list and then doing a full round trip for each dataset). Then, any catalog with less than 1000 datasets would be able to provide the simpler access. Any catalog that was providing many datasets, would be required to provide a higher compliance level that provided paging and query by change date, etc. |
My initial reaction is that from a consumer standpoint the important thing is to have a consistent protocol across all catalogs that implement DCIP, regardless of the size of the catalog. One approach might be to standardize the variable names used in the responses from the List Dataset API and the Dataset API.
If you standardized fields in this manner, then the catalog could include as many "full dataset" fields as it wanted (beyond the required ones) in the bulk catalog (aka List Data API) listing. For example:
In other words, the bulk catalog and the List Data API are the same, from the consumer's standpoint. In practice, different catalogs would have the discretion of publishing either complete, nearly complete, or sparse datasets in the bulk catalog, depending on their respective implementations. Consumers would start by accessing the bulk catalog listing, and request any missing fields via the "url" field. |
@willpugh nice suggestion. I guess we still need a way to signal your level of compliance? |
Adding a comment from @tgherzog which seems to have gone missing: My initial reaction is that from a consumer standpoint the important thing is to have a consistent protocol across all catalogs that implement DCIP, regardless of the size of the catalog. One approach might be to standardize the variable names used in the responses from the List Dataset API and the Dataset API.
If you standardized fields in this manner, then the catalog could include as many "full dataset" fields as it wanted (beyond the required ones) in the bulk catalog (aka List Data API) listing. For example: [ { id: "123", revision: "1", url: "http://data.worldbank.org/catalog/123.json", modified: "2012-06-01", change_type: "update", title: "123 Data", publisher: "http://www.worldbank.org", // um, literal or resource here? // etc }, ] In other words, the bulk catalog and the List Data API are the same, from the consumer's standpoint. In practice, different catalogs would have the discretion of publishing either complete, nearly complete, or sparse datasets in the bulk catalog, depending on their respective implementations. Consumers would start by accessing the bulk catalog listing, and request any missing fields via the "url" field. |
I like tgherzog's suggestions. I think consistency between the listing APIs and the Dataset API is a good thing in general, and makes this case easier. There are 3 reasonable suggestions here:
or I think #3 seems more elegant. |
This proposes a substantive change to the DCIP spec. Key features
This option could be provided both in addition to and as substitute for the full API option.
Benefits:
Possible problems:
To Discuss
The text was updated successfully, but these errors were encountered: