Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api call to retrieve packages based on 'publication date' #25

Closed
nirmalapudota opened this issue Jan 19, 2019 · 8 comments
Closed

api call to retrieve packages based on 'publication date' #25

nirmalapudota opened this issue Jan 19, 2019 · 8 comments
Assignees
Labels
enhancement New feature or request Upcoming Items currently under development and will be available in our next release

Comments

@nirmalapudota
Copy link

Hi,

startDate and endDate parameters are used to search against the lastModified value for the individual packages, Could you let me know if there is a way to get packages based on their publication date?

I was trying to get FR issues published on December 10, 2018. Here is the API call I tried and the JSON response I received.
API Call: https://api.govinfo.gov/collections/FR/2018-12-10T00:00:00Z/2018-12-11T00:00:00Z?offset=0&pageSize=100&api_key=DEMO_KEY
JSON Output: {"count":0,"message":"No results found","nextPage":null,"previousPage":null,"packages":[]}

With the change in Start date like this, I got the result am looking for.
API Call: https://api.govinfo.gov/collections/FR/2018-12-08T00:00:00Z/2018-12-11T00:00:00Z?offset=0&pageSize=100&api_key=DEMO_KEY
JSON Output: {"count":1,"message":null,"nextPage":null,"previousPage":null,"packages":[{"packageId":"FR-2018-12-10","lastModified":"2018-12-08T05:24:32Z","packageLink":"https://api.govinfo.gov/packages/FR-2018-12-10/summary","docClass":"FR","title":"Federal Register Volume 83 Issue 236 (December 10, 2018)","congress":null}]}

I understand that since the last modified date of “FR-2018-12-10” package is 8th December, it didn’t come with my first API call.
I am looking for retrieving all packages published on a specific date.

Thank you.
Nirmala

@jonquandt
Copy link
Member

@nirmalapudota -- thanks for this feedback. This is probably something we'll try to handle with our Search Service (#1), but at a minimum I can think about a way to make it more explicit about the lastModified vs. publish date.

@jonquandt jonquandt self-assigned this Jan 25, 2019
@jonquandt jonquandt added the enhancement New feature or request label Jan 25, 2019
@nirmalapudota
Copy link
Author

thank you for the response. getting API results based on "published_date" would be very helpful when wanted to extract data daily and only final published packages.

Could you please confirm the below:
I was looking at the output of FR 'packages' API output:
e.g., API call "https://api.govinfo.gov/packages/FR-2012-04-27/summary?api_key=DEMO_KEY"

  1. The above API call output has 'dateIssued' key/attribute. is it same as 'published_date'?
  2. the above output says the 'dateIssued' of the "FR-2012-04-27" is on "'2012-04-27", however the last modified date is "2018-12-14T19:05:13Z". Does it mean this publication is last modified in December, 2018. If so, is there a way to understand what was modified either from JSON or XMLlinks outputs.

Thank you so much.
Nirmala

I was looking extracting Federal register packages. From collections/packages/granules JSON output

@jonquandt
Copy link
Member

jonquandt commented Jan 29, 2019

@nirmalapudota

  1. Yes, dateIssued is the equivalent to publish date -- there are some instances for other collections where there are other values that might take precedence, particularly for granules, but for the purposes of FR packages, they are the same.
  2. The lastModified date indicates the last change to the package -- in this case, it was reprocessed -- likely to include additional data within mods. The Premis preservation metadata will indicate what events have occurred to the package throughout it's lifecycle, though it doesn't necessarily tell you specifically what things may have changed. My recommendation would be to treat a package as a whole unit -- if the lastModified date changes for the package, you will likely need to re-extract the content and metadata again to ensure that it is fully up to date.

https://api.govinfo.gov/packages/FR-2012-04-27/premis
Here's the event that cause the overall lastModified date to change - see the eventDetail.

<event>
	<eventIdentifier>
		<eventIdentifierType>FDsys:event</eventIdentifierType>
		<eventIdentifierValue>8e09a323-f1e4-464e-9b81-47f901832e94</eventIdentifierValue>
	</eventIdentifier>
	<eventType>Reprocessed for Access</eventType>
	<eventDateTime>2018-12-14T14:04:44-05:00</eventDateTime>
	<eventDetail>
		11002ee180000964 has reprocessed ACP P0b002ee1825e9e09 for access, which includes deleting and regenerating the granule folder and derived renditions. The content has been reparsed and there may be updates to the descriptive metadata in AIP and ACP.
	</eventDetail>
	<eventOutcomeInformation>
		<eventOutcome>Success</eventOutcome>
	</eventOutcomeInformation>
	<linkingAgentIdentifier>
		<linkingAgentIdentifierType>FDsys:agent</linkingAgentIdentifierType>
		<linkingAgentIdentifierValue>11002ee180000964</linkingAgentIdentifierValue>
		<linkingAgentRole>implementer</linkingAgentRole>
	</linkingAgentIdentifier>
	<linkingObjectIdentifier>
		<linkingObjectIdentifierType>FDsys</linkingObjectIdentifierType>
		<linkingObjectIdentifierValue>P0b002ee1825e9e09</linkingObjectIdentifierValue>
		<linkingObjectRole>source</linkingObjectRole>
	</linkingObjectIdentifier>
</event>

As an aside, since the packageid is predictable, you could construct package service requests for any package via: https://api.govinfo.gov/packages/FR-`YYYY`-`MM`-`DD`/

and use the relevant endpoint for you request, such as:
/summary - json metadata summar
/pdf - pdf content
/xml - xml content
/mods - descriptive metadata
/premis - preservation metadata

Handling this use case will be one of the first tests for the search service as we work on development.

@nirmalapudota
Copy link
Author

thank you. This is very helpful. Will be waiting to see these new features in the API process.

Thank you.
Nirmlaa

@aelfric
Copy link

aelfric commented Sep 27, 2019

Is there any kind of efficient workaround for this with the current API? We're trying to look at the CFR collection which has over 5000 packages. It seems the last modified dates are all within the last few months even for versions of the packages that are several years old.

In our use-case, we would want to grab all the CFR volume entries for a given year. The only two options I can think of seem very wasteful of network resources: either (1) query the whole list, lookup the summary, look up the published date, and then filter the whole list accordingly. That would require a large number of round-trips or (2) enumerate all possible URLs and check whether we picked up all the volumes of each title..

@jonquandt
Copy link
Member

@aelfric -- we recently republished a large amount of the content on the system to update some data within our search indices.

Currently there's not a way within the API to flag the date values to go by publish date instead of lastModified. That's something we're looking at.

My suggestion for the moment would be to look at the CFR sitemaps. These are broken down by year. You could pull the package id out of the sitemap loc value by stripping the "https://www.govinfo.gov/app/details/" out.

Here's an example for 2019:
https://www.govinfo.gov/sitemap/CFR_2019_sitemap.xml

Once you had that list of package ids, you could grab the zips or whatever content version you wanted by inserting the package id into the api packages service

Understandably, this isn't perfect, but might be slightly faster than doing either 1 or 2 above.

Let me know if there's anything I can clarify.

@jonquandt jonquandt added the Upcoming Items currently under development and will be available in our next release label Feb 7, 2020
@jonquandt
Copy link
Member

jonquandt commented Feb 7, 2020

Hello, we are currently previewing a new published endpoint that will allow retrieval by publication date rather than lastModified time. This is still in development, but we'd like input on the functionality that's available so far.

https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=CFR&api_key=DEMO_KEY

Some additional features:

@nirmalapudota @aelfric

@cnizzardini -- this may help with #57

@jonquandt
Copy link
Member

jonquandt commented Feb 28, 2020

Format:

https:// api.govinfo.gov/published/dateIssuedStartDate/dateIssuedEndDate?offset=startingRecord&pageSize=number of records in call&collection=comma-separated list of values&api_key=your api.data.gov api key

Examples:

BILLS issued between January and July 2019:
https://api.govinfo.gov/published/2019-01-01/2019-07-31?offset=0&pageSize=100&collection=BILLS&api_key=DEMO_KEY

Federal Register and CFR packages in 2019:
https://api.govinfo.gov/published/2019-01-01/2019-12-31?offset=0&pageSize=100&collection=CFR,FR&modifiedSince=2020-01-01T00:00:00&api_key=DEMO_KEY

Required parameters

Optional parameters:

  • dateIssuedEndDate: the latest package you are requesting by dateIssued – YYYY-MM-DD
  • docClass: Filter the results by overarching collection-specific categories. The values vary from collection to collection. For example, docClass in BILLS corresponds with Bill Type --e.g. s, hr, hres, sconres. CREC (the Congressional Record) has docClass by CREC section: HOUSE, SENATE, DIGEST, and EXTENSIONS
  • congress: congress number (e.g. “116”)
  • modifiedSince: equivalent to the startDate parameter in the collections service which is based on lastModified– allows you to request only packages that have been modified since a given date/time – useful for tracking updates. Requires ISO 8601 format -- e.g. 2020-02-28T00:00:00Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Upcoming Items currently under development and will be available in our next release
Projects
None yet
Development

No branches or pull requests

3 participants