Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP import with DIC - Design Proposal #243

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
new design proposal
Signed-off-by: Ido Aharon <[email protected]>
Ido Aharon committed May 2, 2024

Verified

This commit was signed with the committer’s verified signature.
frostming Frost Ming
commit 99cffdc4e4b0a8741e948d7211ed06828b8fb438
32 changes: 22 additions & 10 deletions design-proposals/data-import-cron-http-import.md
Original file line number Diff line number Diff line change
@@ -6,24 +6,23 @@ The design will also demonstrate to the user how to operate with such import typ
## Motivation
Currently, the DataImportCron allows the import of Registry Imports only. Recently, there was a demand to also allow HTTP import types.
The problem that arises from HTTP imports is that there is no convention between the different sources, so it's hard to know when the image is updated for each source in a generic way, which will make the polling process more difficult than standard registry sources.
One possible solution If the URL is static is to use a Get request with an If-Modified-Since header where we specify the date from which we want to check if there was a change. If there was a change since the specified date, the request will return with a status of 200OK, and then we know that the image has been updated since that date.

In the current situation we can enable HTTP imports by updating the DesiredDigest label.
The current approach is to support only sources that support the If-Modified-Since header.
If there was a change since the specified date, the request will return with a status of 200OK, and then we know that the image has been updated since that date.

## Goals

* Allow the user to perform http imports manually with dataimportcron
* Create a poller that will cover as many cases as possible for automatic updating
* Allow the user to perform http imports manually with dataimportcron.
* Create a poller that will cover automatic updating with the help of If-Modified-Since header.

## Non Goals

* The poller will probably not cover all import cases and sometimes the user will have to manually update the digest
* The poller will probably not cover all import cases and sometimes the user will have to do manual update.

## User Stories

* As a user, I would like to import images from an HTTP source using DataImportCron
* As a user, I would like the poller automatically update the image when it is needed
* As a user, I would like to manually trigger an HTTP import with DataImportCron
* As a user, I would like to import images from an HTTP source using DataImportCron.
* As a user, I would like the poller automatically update the image when the source is updated.
* As a user, I would like to manually trigger an HTTP import with DataImportCron.

## Repos

@@ -41,7 +40,6 @@ In the current situation we can enable HTTP imports by updating the DesiredDiges

# Design

If the URL is static:
* Make a GET request with the If-Modified-Since Header starting from the date stored in the AnnLastCronTime annotation
* If the returned status is 200OK, perform import again
* Update AnnLastCronTime to time.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered that some sources provide a separate file with the hash of the image? For instance the fedora cloud images have a file that contains the checksum of the file. So we could download that file and verify the hash against what we have and decide if a new one exists that way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The question is what percentage of sources support this and whether all sources provide the file in the same format?
For example https://cloud.debian.org/images/cloud/bookworm/daily/latest/SHA512SUMS looks different from the file you mentioned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell these are the areas where this file structure diverges:

  • Hash algorithm (MD5 in tinycore, SHA512 in debian, SHA256 in Fedora)
  • File structure (filename hash in most but fedora does SHA256 (filename) = hash)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can catalog these, if there is a limited number of permutations of these formats, maybe we can implement them, and only support those limited number of options. I understand there might be many, but if we capture the most common ones, that might be enough.

@@ -76,3 +74,17 @@ spec:
importsToKeep: 2
managedDataSource: fedora
```

This OS image mirror example supports the If-Modified-Since header: https://mirrors.dc.clear.net.ar/ubuntu/ls-lR.gz
```
iaharon@home ~ $ curl -I https://mirrors.dc.clear.net.ar/ubuntu/ls-lR.gz
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Thu, 02 May 2024 08:58:16 GMT
Content-Type: application/octet-stream
Content-Length: 26925864
Last-Modified: Tue, 30 Apr 2024 13:22:27 GMT <<
Connection: keep-alive
ETag: "6630f093-19adb28"
Accept-Ranges: bytes
```