Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline copy of data #2530

Open
anthonyharrison opened this issue Feb 17, 2023 · 36 comments
Open

Offline copy of data #2530

anthonyharrison opened this issue Feb 17, 2023 · 36 comments
Assignees
Labels
api Changes that relates to API enhancement New feature or request

Comments

@anthonyharrison
Copy link

I really like the idea but to avoid repeated calls of the API for every product I would like data on, I would like to be maintain a local copy of the data and then only download updates each time I start my application (or after a particular time period e.g. only request updates once every 24 hours)

Ideally, I would be able to get the data in JSON format which I can then manage locally.

Alternative would be to call the API for every product to get the product data for each product. But this would also require that I know all of the products in the first place which given the dynamic nature of the data isn't very attractive.

@anthonyharrison anthonyharrison added the enhancement New feature or request label Feb 17, 2023
@welcome
Copy link

welcome bot commented Feb 17, 2023

Thank you for opening your first issue here 👍. Be sure to follow the issue template if you chose one.

@adriens
Copy link
Contributor

adriens commented Feb 17, 2023

I'm actually working on something like that 😸

@marcwrobel
Copy link
Member

Hi @anthonyharrison, thank you for the idea.

endoflife.date is using the static site generator Jekyll. Given the static nature of endoflife.date that may be difficult to implement: JSON and HTML file are only generated when there is an update on the master branch.

@captn3m0
Copy link
Member

Would a dataset published via a NPM package be good enough? Or a separate git repository that could fulfill the "update whenever needed" requirement easily?

I've been wanting to do this for a while, by means of uploading the generated JSON files (preferably in the v1 API format) to a release on GitHub.

But this would also require that I know all of the products in the first place

As an aside, we have an endpoint that solves this: https://endoflife.date/api/all.json.

@adriens Could you detail your plan to solve for this?

@captn3m0 captn3m0 self-assigned this Feb 18, 2023
@anthonyharrison
Copy link
Author

anthonyharrison commented Feb 18, 2023 via email

@adriens
Copy link
Contributor

adriens commented Feb 18, 2023

Maybe would you appreciate this repo : https://github.com/adriens/endoflife.date-nested

@anthonyharrison
Copy link
Author

anthonyharrison commented Feb 20, 2023

@adriens I can certainly use this as a starting point. However the https://endoflife.date/api/all.json already provides the data in JSON - if this was enhanced to include some more metadata e..g the date of the data dump, this would be the start of something very useful.

@anthonyharrison
Copy link
Author

Hi @anthonyharrison, thank you for the idea.

endoflife.date is using the static site generator Jekyll. Given the static nature of endoflife.date that may be difficult to implement: JSON and HTML file are only generated when there is an update on the master branch.

@captn3m0 Would it not be possible to maintain a history of changes to the information contained within the _data directory and then return details of the products which have changed via an API? The API endpoints will allow me to get all of the data but they will require that I get all of the data and not just the updated?

@marcwrobel
Copy link
Member

if this was enhanced to include some more metadata e..g the date of the data dump, this would be the start of something very useful.

A timestamp containing the date of the json file would be easy to add, but it requires the v1 API format (under development, see #2080 and https://deploy-preview-2080--endoflife-date.netlify.app/docs/api/v1/ for a preview). Unfortunately the current format (v0) cannot be updated without introducing a breaking change, and we did not planned to add new endpoints.

I do not mind adding a new /v1/products/all endpoint containing all the products with their corresponding release cycles. But that file will be big (don't know exactly how much, but at least a few MB). So I think we should consider Netlify bandwidth limits before doing that. @captn3m0, do you think it may be problematic ?

@captn3m0
Copy link
Member

Our current bandwidth usage is around ~50GB out of our 1TB limit, so I don't see any issue there. If this ever gets problematic due to this endpoint, we can set a redirect to another host/implement caching etc easily.

However, I don't think we should be abusing our API to essentially serve a dataset. I can suggest few alternative approaches:

  1. A separate repository called eol-dataset with a dump of all JSON files. This can be imported as a submodule for any usage easily.
  2. Setting up GitHub releases on this, or a separate repository, where we publish the data regularly. If this is automated correctly, a link like https://github.com/endoflife-date/endoflife.date/releases/latest/dataset.tar.gz will always point to the latest version of the dataset, and that can be used for any programmatic usage.

@anthonyharrison I'd be curious about the usecase here, to see if we can improve the API/documentation/roadmap further to account for this.

@usta
Copy link
Member

usta commented Feb 22, 2023

We can also add a new json endpoint called XYZ_meta.json that will just keep the metadata for XYZ
and users can decide to fetch whole real data in a hostedcached or our normal place XYZ_data.json
So XYZ_meta.json can only keep metadata something like
revision_id , revision_date , revision_dataurl so projects like adriens or someone else can make a check before fetching actual data
this will help them to determine before downloading same big file ( for example all_data.json ) if its revision_date is same with their own

NOTE : adding just revision_date to our current endpoints wont fix the main problem that users still need to redownload same big file if we wont implement this idea
@captn3m0 @marcwrobel @anthonyharrison @adriens

@marcwrobel
Copy link
Member

NOTE : adding just revision_date to our current endpoints wont fix the main problem that users still need to redownload same big file if we wont implement this idea

@usta, is XYZ the product name ? If yes the product files are not that big (2 to 20 KB each I would say), so I think sending two requests separately could take longer than retrieving all the data in one shot.

Note that v1 product API endpoint already includes a lastModified field, corresponding to the last time the product file was updated. Example : https://deploy-preview-2080--endoflife-date.netlify.app/api/v1/products/ansible/.

@anthonyharrison
Copy link
Author

@anthonyharrison I'd be curious about the usecase here, to see if we can improve the API/documentation/roadmap further to account for this.

@captn3m0 I am trying to develop an automated audit function which will identify whether a product is under support, under extended support or EOL and trigger some workflows For products which are nearing end of supptort, I want to be able to trigger a workflow to look at the upgrade path; for those which are EOL (or nearing EOL), I would want to trigger a different workflow.

@usta
Copy link
Member

usta commented Feb 23, 2023

@usta, is XYZ the product name ? If yes the product files are not that big (2 to 20 KB each I would say), so I think sending two requests separately could take longer than retrieving all the data in one shot.

@marcwrobel Nope i mean all , upcomingEOL , ... endpoints

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

@adriens Could you detail your plan to solve for this?

@captn3m0 , I'll release a first draft in a few minutes 🤞

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

@captn3m0 , here is a first proof of concept :

https://www.kaggle.com/datasets/adriensales/endoflifedate/

Please notice that :

  • My primary goal is to make the data as accessible as possible for any datascientist or data story teller,... within the hope the data will be usable and discovered by as many people as possible 🤞
  • I have added strong typed columns for boolean and dates, years,...
  • I'll produce a deciated blog post & video demo
  • Some work has yet to be done
  • The API v1 will be really a huge step forward (categories, etc...)
  • I still have to improve columns documentation
  • Your feedbacks are really welcome to know how this could be improved before I automate its release on an open source & dedicated repo so the date will be regulary updated

endoflife date

image

image
image

image
image
image
image

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

select category,
    count(*)
from product_categories
    group by category
    having count(*) > 10;

image

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

There are some cool surprises I'm working on too, on the same topic.

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

image
image
image
image

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

☝️ Other files will be added : does anyone want to give a try to a ;
image

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

🤩
image

@adriens
Copy link
Contributor

adriens commented Feb 26, 2023

@adriens
Copy link
Contributor

adriens commented Mar 6, 2023

@adriens
Copy link
Contributor

adriens commented Mar 6, 2023

@MartinPetkov
Copy link
Contributor

I opened a PR to implement the idea in #2530 (comment), since I liked that idea and would make use of it myself.

@adriens I see this as orthogonal to your efforts. Your work seems much more full-featured as compared to the simple GitHub Action I wrote, but I still think having a GitHub Release with a simple file is useful.

marcwrobel added a commit that referenced this issue Dec 17, 2023
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Dec 25, 2023
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Dec 30, 2023
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Jan 13, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Jan 20, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Feb 6, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Feb 17, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Mar 9, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Mar 17, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Mar 31, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Jun 23, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Jul 6, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Jul 20, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Jul 27, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Aug 4, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
@TimBrown1611
Copy link

any ETA of deploying it?
can be very helpful

@anthonyharrison
Copy link
Author

@TimBrown1611 I use the API endpoint https://endoflife.date/api/all.json to get the list of products and then request the data for each product. I store the set of data in a local file so I can then operate offline.

@adriens
Copy link
Contributor

adriens commented Aug 28, 2024

@TimBrown1611 I use the API endpoint https://endoflife.date/api/all.json to get the list of products and then request the data for each product. I store the set of data in a local file so I can then operate offline.

You can also get a ready-to-use snapshot here on Kaggle : ⏳ endoflife.date database (duckdb)

marcwrobel added a commit that referenced this issue Sep 1, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Sep 15, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Sep 23, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Oct 26, 2024
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
@tomersein
Copy link

@TimBrown1611 I use the API endpoint https://endoflife.date/api/all.json to get the list of products and then request the data for each product. I store the set of data in a local file so I can then operate offline.

You can also get a ready-to-use snapshot here on Kaggle : ⏳ endoflife.date database (duckdb)

hello @adriens,
is the snapshot being updated occasionally?
thanks!

@adriens
Copy link
Contributor

adriens commented Dec 7, 2024

Sure @tomersein : Daily 😆

@adriens
Copy link
Contributor

adriens commented Dec 7, 2024

marcwrobel added a commit that referenced this issue Jan 2, 2025
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Feb 8, 2025
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
marcwrobel added a commit that referenced this issue Mar 1, 2025
This is a major rework of the API with a lot of breaking changes. See CHANGELOG_API.md for more information.

Note that we thought of disabling API generation in development (using JEKYLL_ENV like the Jekyll Feed plugin - see https://github.com/jekyll/jekyll-feed/blob/master/lib/jekyll-feed/generator.rb#L145), but it was finally reverted. It does not work well with Netlify preview, and generate production URL (i.e. https://endoflife.date URLs) in development which makes it difficult to use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Changes that relates to API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants