Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload-artifact zips .tar.gz archive #109

Open
2 of 6 tasks
n0toose opened this issue Aug 5, 2020 · 7 comments
Open
2 of 6 tasks

upload-artifact zips .tar.gz archive #109

n0toose opened this issue Aug 5, 2020 · 7 comments

Comments

@n0toose
Copy link

n0toose commented Aug 5, 2020

Describe the bug
Uploading a .tar.gz artifact results in the archive being .zipped. Many (if not probably most) projects on GitHub tend to prefer the .tar.gz extension, rather than .zip, so it most likely isn't a good idea to archive them again under a different format and/or forcing maintainers to use a specific version.

Version

  • V1
  • V2

Environment

  • self-hosted
  • Linux
  • Windows
  • Mac

Screenshots

image

Run/Repo Url
As I may take down or otherwise modify my tree at any given moment, here's the pull request in question.

How to reproduce

  • Step 1: Create .tar.gz archive in a workflow.
  • Step 2: Upload it.
  • Step 3: 🎊

Additional context
Also see: #39

@n0toose n0toose added the bug Something isn't working label Aug 5, 2020
@konradpabjan
Copy link
Collaborator

If you currently upload an any archive file (.zip, .zipx, 7z, .tar, .tar.xz etc), and then attempt to download it from the UI, the current behavior is that you'll get a zip of whatever you uploaded.

This is a limitation of our APIs and our UI, some of my earlier comments go into more details #39 (comment) and #39 (comment)

If you also look at our public api to download an artifact, you'll notice that we currently require a zip :archive_format: https://developer.github.com/v3/actions/artifacts/#download-an-artifact and that is what effectively is being used when you click to download an artifact. Ideally we should have options that let get the raw contents of whatever was uploaded without any archiving format, but we currently don't have any solutions 😞

@sagikazarmark
Copy link

@konradpabjan is this behavior going to change at some point or we are stuck with it for life? It's unclear from your comments.

(I get that it's not a limitation of this action, but this is where users are affected, so I think it would make sense to post some updates here or envolve someone who works on the API.)

@konradpabjan
Copy link
Collaborator

konradpabjan commented Sep 25, 2020

@sagikazarmark

We absolutely have plans to address this, it's on our backlog but there are other priorities that are currently preventing us from picking up the work. Our public roadmap has some high level priorities for Actions for those curious: https://github.com/github/roadmap/projects/1

The experience right now leaves a lot to be desired and an overhaul for the Artifacts UI + better API support has come up in internal discussions multiple times. We're definitely planning on addressing this.

nistei added a commit to flybywiresim/aircraft that referenced this issue Mar 6, 2021
nistei added a commit to flybywiresim/aircraft that referenced this issue Mar 6, 2021
Benjozork pushed a commit to flybywiresim/aircraft that referenced this issue Mar 6, 2021
* chore: update fragmenter package

* build: use PR action for testing

* build: revert fragmenter and don't use dev-env for zip build

* build: upload modules as artifact

* fix: typo in lftp options

* build: remove ascii option

* build: use fragmenter output for artifacts

* build: remove fragmenter from PR action

* fix(ci): remove ascii option

* build: zip before artifact upload

refs: actions/upload-artifact#109
sarcasticadmin added a commit to socallinuxexpo/scale-network that referenced this issue Aug 30, 2021
The upload workflow within github actions leaves a lot to be desired.
There were many initial assumptions with GITHUB_WORKSPACE, paths from
test repos, etc. that were originally left here. These more or less have
been vetted now and this is the best result I have found to date for
scale-network.

Additionally, I am watching a few upstream stories in the upload actions
repo just to see if it improves things for us in the near term:
  - actions/upload-artifact#109
  - actions/upload-artifact#21
sarcasticadmin added a commit to socallinuxexpo/scale-network that referenced this issue Aug 30, 2021
The upload workflow within github actions leaves a lot to be desired.
There were many initial assumptions with GITHUB_WORKSPACE, paths from
test repos, etc. that were originally left here. These more or less have
been vetted now and this is the best result I have found to date for
scale-network.

Additionally, I am watching a few upstream stories in the upload actions
repo just to see if it improves things for us in the near term:
  - actions/upload-artifact#109
  - actions/upload-artifact#21
@awakecoding
Copy link

+1 on this, I compress all my artifacts to .tar.xz but I have no choice but to download them as .zip files that I have to unzip to recover the already-compressed .tar.xz artifact. Is this going to be fixed anytime soon? It's a pain

sarcasticadmin added a commit to socallinuxexpo/scale-network that referenced this issue Oct 2, 2021
The upload workflow within github actions leaves a lot to be desired.
There were many initial assumptions with GITHUB_WORKSPACE, paths from
test repos, etc. that were originally left here. These more or less have
been vetted now and this is the best result I have found to date for
scale-network.

Additionally, I am watching a few upstream stories in the upload actions
repo just to see if it improves things for us in the near term:
  - actions/upload-artifact#109
  - actions/upload-artifact#21
@fregante
Copy link

fregante commented Oct 27, 2021

This is basically a duplicate of:

The solution for both would be to avoid zipping "compressed archives"

agraef added a commit to agraef/pd-faustgen that referenced this issue Jan 22, 2023
Yes, it's a tarball. But due to limitations in the GH API, at present it
will be wrapped in a zip file.

cf. actions/upload-artifact#109
agraef added a commit to agraef/pd-faustgen that referenced this issue Jan 22, 2023
Yes, it's a tarball. But due to limitations in the GH API, at present it
will be wrapped in a zip file.

cf. actions/upload-artifact#109
mattstruble added a commit to mattstruble/zero-router that referenced this issue Nov 29, 2023
It was originally thought that by adding the extension .img.gz would
allow for the downloaded artifact to not be double zipped. However, this
seems to be a bug in github, which can be tracked via:

actions/upload-artifact#39
actions/upload-artifact#109

Signed-off-by: Matt Struble <[email protected]>
@rmunn
Copy link

rmunn commented Jun 21, 2024

#109 (comment) says:

Our public roadmap has some high level priorities for Actions for those curious: https://github.com/github/roadmap/projects/1

Currently that link still works but gives a warning that classic Projects will be sunset on August 23, 2024. For anyone reading this after August 23, 2024, the address for GitHub's public roadmap is now https://github.com/orgs/github/projects/4247/views/1.

@johnnynunez
Copy link

any news about this?
always upload as zip...

PastaPastaPasta added a commit to dashpay/dash that referenced this issue Feb 25, 2025
…ional test logs as artifacts

3461c14 ci: tentatively drop multiprocess and tsan functional tests (Kittywhiskers Van Gogh)
5db8fa0 ci: cache previous releases if running `linux64` variant (Kittywhiskers Van Gogh)
cca0d89 ci: add functional tests for linux builds (Kittywhiskers Van Gogh)
fc2efb6 ci: upload functional test logs as artifacts (Kittywhiskers Van Gogh)
b25e846 ci: handle space exhaustion by deleting files we don't need (Kittywhiskers Van Gogh)
0a1e635 ci: add reusable workflow for running functional tests (Kittywhiskers Van Gogh)
57cf278 ci: use helper script to bundle artifacts (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * [`actions/cache`](https://github.com/marketplace/actions/cache) allows specification of directories that want to be cached using glob expressions, which extend further into defining exclusions, needed due keep cache sizes maintainable ([source](https://github.com/dashpay/dash/blob/bb469687d3d936f82fd8e8fbe0934eec5e17df5e/.gitlab-ci.yml#L128-L138)).

    * Unfortunately, the implementation of globbing with respect to exclusions is more-or-less broken (see [actions/toolkit#713](actions/toolkit#713 (comment))) with the requirement that the inclusion depth should match the depth of the exclusion. Attempting to play by these rules more or less fails ([build](https://github.com/kwvg/dash/actions/runs/13344612118/job/37273624710#step:5:4634)).

    * Attempting to use third party actions like [`tj-actions/glob`](https://github.com/marketplace/actions/glob-match) provide for a much more saner experience but they enumerate individual files that match patterns, not folders. This means that when we pass them to `actions/cache`, we breach the arguments length limit ([build](https://github.com/kwvg/dash/actions/runs/13343953711/job/37272121153#step:9:4409)).

      * Modifying `ulimit` to get around this isn't very feasible due to odd behavior surrounding it (see [actions/runner#3421](actions/runner#3421)) and the general consensus is to save to a file and have the next action read from file ([source](https://stackoverflow.com/a/71349472)).

         [`tj-actions/glob`](https://github.com/marketplace/actions/glob-match) graciously does this with the `paths-output-file` output but it takes two to play and [`actions/cache`](https://github.com/marketplace/actions/cache) does not accept files (`path` must be a newline-delimited string).

    The path of least resistance, it seems, is to use a script to bundle our cache into a neat input and leave [`actions/cache`](https://github.com/marketplace/actions/cache) out of it entirely, this is the approach taken.

  * As we aren't using self-hosted runners, we are subject to GitHub's limits for everything, runner space, total artifact storage budget, total cache storage budget.

    * Caches that not **accessed** in 7 days are evicted and there's a 10 GB budget for all caches ([source](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy)) and GitHub will evict oldest (presumably by **creation**?) caches to make sure that limit is adhered to.

      * What makes this limit troubling is the immutable nature of caches as unlike GitLab, which is more conductive to shared caches ([source](https://github.com/dashpay/dash/blob/bb469687d3d936f82fd8e8fbe0934eec5e17df5e/.gitlab-ci.yml#L55-L69)), GitHub insists on its immutability (see [actions/toolkit#505](actions/toolkit#505)) and the only way to "update" a cache is to structure your cache key to allow the updated content to reflect in the key itself or delete the cache and create a new one, which brings race condition concerns ([comment](#6406 (review))).

        Sidenote, overwriting contents are allowed for artifacts ([source](https://github.com/actions/upload-artifact/blob/65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08/README.md#overwriting-an-artifact)), just not caches.

      * This means we need to be proactive in getting rid of caches with a short shelf life to avoid more long lasting caches (like `depends-sources`) from being evicted due to old age as we breach the 10 GB limit. We cannot just set a short retention period as GitHub doesn't offer you to do that with caches like they do with artifacts ([source](https://github.com/actions/upload-artifact/blob/65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08/README.md#retention-period)).

        * ~~While doing this properly would require us to implement a cache reaper workflow, this still needed to be addressed now as the contents of `build-ci` need to be passed onto the functional test workflow and this creates an ephemeral cache with a short lifespan that threatens longer-living (but older) caches.~~

       ~~This is currently approached by deleting `build-ci` (output) caches when the functional test runner is successful, we let the cache stick around if the build fails to allow for rerunning failed instances.~~

       ~~If for whatever reason a successful build has to be rerun, the build workflow would need to be rerun (though the `ccache` cache will speed this up significantly) to generate the output cache again for the test workflow to succeed. Failing to do this will result in a cache miss and run failure.~~ **Edit:** Switched to using artifacts to mitigate cache thrashing concerns. Auto-expiration is a huge plus, too.

    * Runners are limited to 14 GB of **addressable** storage space ([source](https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories)) and breaching this limit will cause the runners to fail.

      * Our TSan ([build](https://github.com/kwvg/dash/actions/runs/13355816205/job/37298587344#step:5:1178)) and multiprocess ([build](https://github.com/kwvg/dash/actions/runs/13355816205/job/37298658464#step:5:1190)) test variants breach this limit when collecting logs and deleting the `build-ci` (see 2153b0b) cache doesn't make enough of a dent to help ([build](https://github.com/kwvg/dash/actions/runs/13356474530)).

        * Omitting the logs from successful runs would be a regression in content for our `test_logs` artifacts and therefore wasn't considered.

      * While third-party actions like [`AdityaGarg8/remove-unwanted-software`](https://github.com/marketplace/actions/maximize-build-disk-space-only-remove-unwanted-software) can bring significant space savings ([build](https://github.com/kwvg/dash/actions/runs/13357806504/job/37302970610#step:2:150)), they cannot be run in jobs that utilize the [`container`](https://docs.github.com/en/actions/writing-workflows/choosing-where-your-workflow-runs/running-jobs-in-a-container) context (so, all jobs after the container creation workflow) as they'd be executed inside the container when we want to affect the runner underneath ([build](https://github.com/kwvg/dash/actions/runs/13357260369/job/37301757225#step:3:29), notice the step being run after "Initialize containers").
        * There are no plans to implement definable "pre-steps" (see [actions/runner#812](actions/runner#812)) and the only way to implement "before" and "after" steps is by self-hosting ([source](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/running-scripts-before-or-after-a-job)).

       This has been sidestepped tentatively by omitting TSan and multiprocess builds as any attempted fixes would require precision gutting to get borderline space savings which could easily be nullified by future code changes that occupy more space and such measures are better reserved for future PRs.

    * Artifacts share their storage quota with GitHub Packages ([source](https://docs.github.com/en/billing/managing-billing-for-your-products/managing-billing-for-github-packages/about-billing-for-github-packages#about-billing-for-github-packages)) and artifacts by default linger around for 90 days ([source](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/storing-and-sharing-data-from-a-workflow#about-workflow-artifacts)) and considering that each run can generate multi-gigabyte total artifacts, testing the upper limit runs the risk of being very expensive. **Edit:** It appears pricing is reflective of artifacts in private repos, public repos don't seem to run this risk.

      * ~~All artifacts generated have an expiry of one day (compared to GitLab's three days, [source](https://github.com/dashpay/dash/blob/bb469687d3d936f82fd8e8fbe0934eec5e17df5e/.gitlab-ci.yml#L165), but they are self-hosted).~~ **Edit:** Artifacts now have an expiry of three days, matching GitLab.

      * Artifacts are compressed as ZIP archives and there is no way around that as of now (see [actions/upload-artifact#109](actions/upload-artifact#109 (comment))) and the permissions loss it entails is acknowledged by GitHub and their solution is... to put them in a tarball ([source](https://github.com/actions/upload-artifact/blob/65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08/README.md#permission-loss)).

      To keep size under control, artifacts contain `zstd-5` compressed tarballs (compression level determined by benchmarks, see below) generated by bundling scripts alongside their checksum.

      <details>

      <summary>Benchmarks:</summary>

      ```
      $ zstd -b1 -e22 -T0 artifacts-linux64_multiprocess.tar
      1#_multiprocess.tar :1586411520 -> 537552492 (x2.951), 2396.2 MB/s, 1367.8 MB/s
      2#_multiprocess.tar :1586411520 -> 499098623 (x3.179), 2131.8 MB/s, 1306.6 MB/s
      3#_multiprocess.tar :1586411520 -> 474452284 (x3.344), 1371.6 MB/s, 1245.6 MB/s
      4#_multiprocess.tar :1586411520 -> 470931621 (x3.369),  620.3 MB/s, 1239.1 MB/s
      5#_multiprocess.tar :1586411520 -> 459075785 (x3.456),  457.2 MB/s, 1230.1 MB/s
      6#_multiprocess.tar :1586411520 -> 449594612 (x3.529),  415.3 MB/s, 1289.7 MB/s
      7#_multiprocess.tar :1586411520 -> 446208421 (x3.555),  282.6 MB/s, 1296.3 MB/s
      8#_multiprocess.tar :1586411520 -> 442797797 (x3.583),  254.3 MB/s, 1338.4 MB/s
      9#_multiprocess.tar :1586411520 -> 438690318 (x3.616),  210.8 MB/s, 1331.5 MB/s
      10#_multiprocess.tar :1586411520 -> 437195147 (x3.629),  164.1 MB/s, 1337.4 MB/s
      11#_multiprocess.tar :1586411520 -> 436501141 (x3.634),  108.2 MB/s, 1342.5 MB/s
      12#_multiprocess.tar :1586411520 -> 436405679 (x3.635),  102.7 MB/s, 1344.0 MB/s
      13#_multiprocess.tar :1586411520 -> 436340981 (x3.636),   65.9 MB/s, 1344.0 MB/s
      14#_multiprocess.tar :1586411520 -> 435626720 (x3.642),   61.5 MB/s, 1346.9 MB/s
      15#_multiprocess.tar :1586411520 -> 434882716 (x3.648),   49.4 MB/s, 1352.9 MB/s
      16#_multiprocess.tar :1586411520 -> 411221852 (x3.858),   33.6 MB/s, 1049.2 MB/s
      17#_multiprocess.tar :1586411520 -> 399523001 (x3.971),   26.0 MB/s, 1003.7 MB/s
      18#_multiprocess.tar :1586411520 -> 379278765 (x4.183),   21.0 MB/s,  897.5 MB/s
      19#_multiprocess.tar :1586411520 -> 378022246 (x4.197),   14.7 MB/s,  896.0 MB/s
      20#_multiprocess.tar :1586411520 -> 375741653 (x4.222),   14.0 MB/s,  877.6 MB/s
      21#_multiprocess.tar :1586411520 -> 373303486 (x4.250),   11.9 MB/s,  866.8 MB/s
      22#_multiprocess.tar :1586411520 -> 358172556 (x4.429),   6.09 MB/s,  884.9 MB/s
      ```

      </details>

        > **Note:** As mentioned above, we use similar bundling scripts for the outputs cache but unlike artifacts, we cannot disable their compression routines or even adjust compression levels (see [actions/tookit#544](actions/toolkit#544))

  ## Notes

  * ~~If we add or remove binaries in terms of compile output, `bundle-build.sh` needs to be updated. Without updating it, it will fail if it cannot find the files it was looking for and it will not include files that it wasn't told to include.~~ No longer applicable.

  ## Breaking Changes

  None expected.

  ## Checklist

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)**
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  PastaPastaPasta:
    utACK 3461c14
  UdjinM6:
    ACK 3461c14

Tree-SHA512: ef55bc10902c57673ffd9bee6562b362a87658e4c51e543b8553bf48c41544a302d6acad7c5a30395fbfcfd085354251a07327c3e78c93c750585496926be9f6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants