-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix artifact v4 upload above 8MB #31664
Fix artifact v4 upload above 8MB #31664
Conversation
Doesn't work yet. 40MB are fixed now but ca. 800MB still cause checksum errors, are there some asynchronious things going on... Parallel Chunk updates has been seen, 6 of 9 missing chunks found in log as duplicates |
After some tests the chunks are still out of order, investigating to implement blockList and store chunks using their name specified per query During merging read the block list to build the reader over every chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is fixed like that, but we need to look at the security of storing a blockid in blob storage or find alternatives
Uploaded bytes 746586112
Uploaded bytes 754974720
Uploaded bytes 763363328
Uploaded bytes 771751936
Uploaded bytes 780140544
Uploaded bytes 788529152
Uploaded bytes 796917760
Uploaded bytes 805306368
Uploaded bytes 807753807
Finished uploading artifact content to blob storage!
SHA256 hash of uploaded artifact zip is 6feeaf6b88049a4c6ced1240ed3911afaa819229cd082999be5c81eff09c33f1
Finalizing artifact upload
Artifact artifact.zip successfully finalized. Artifact ID 22
Artifact artifact has been successfully uploaded! Final size is 807753807 bytes. Artifact ID is 22
Artifact download URL: http://localhost:3000/test/artifact-upload-big/actions/runs/39/artifacts/22
routers/api/actions/artifactsv4.go
Outdated
_, err := r.fs.Save(fmt.Sprintf("tmp%d/block-%d-%d-%s", task.Job.RunID, task.Job.RunID, ctx.Req.ContentLength, blockid), ctx.Req.Body, -1) | ||
if err != nil { | ||
log.Error("Error runner api getting task: task is not running") | ||
ctx.Error(http.StatusInternalServerError, "Error runner api getting task: task is not running") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a blockid, delay ordering chunks to the end and use it's blockid to form the name
I notice that this might be a security issue, because an attacker could control the filesystem name, but what alternatives do we have?
Santizing is probably the way to go here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to base64url encoding of the blockid that doesn't allow to traverse the filesystem or do other bad things
routers/api/actions/artifactsv4.go
Outdated
case "blocklist": | ||
_, err := r.fs.Save(fmt.Sprintf("tmp%d/%d-blocklist", task.Job.RunID, task.Job.RunID), ctx.Req.Body, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we got the final block order by blockid in xml, save it for the merge step
routers/api/actions/artifactsv4.go
Outdated
log.Warn("Error merge chunks: %v", err) | ||
chunkMap, err := listChunksByRunID(r.fs, runID) | ||
if err != nil { | ||
log.Error("Error merge chunks: %v", err) | ||
ctx.Error(http.StatusInternalServerError, "Error merge chunks") | ||
return | ||
} | ||
chunks, ok = chunkMap[artifact.ID] | ||
if !ok { | ||
log.Error("Error merge chunks") | ||
ctx.Error(http.StatusInternalServerError, "Error merge chunks") | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tests don't upload a blockmap yet, fallback try old broken mode
@@ -123,6 +123,49 @@ func listChunksByRunID(st storage.ObjectStorage, runID int64) (map[int64][]*chun | |||
return chunksMap, nil | |||
} | |||
|
|||
func listChunksByRunIDV4(st storage.ObjectStorage, runID int64, artifactID int64, blist *BlockList) ([]*chunkFileItem, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntetise a chunk list with the correct start/end and artifact entries, based on the xml Blocklist
I also encountered the same problem. |
Isn't the storage considered "private" anyways? |
Idk if the gitea object storage is protected against this pattern tmpv4267/block-267-9-/../some/custom/path All runs of (all users?) use the same artifact storage container I double fixed that now so, it's no potential problem anymore |
Ah I see where you are getting at. Yes, cross-repo contamination should be avoided. |
Is a test easily possible here? Might be valuable. |
A test that uploads a block with a blockid like Yes I think this should be possible to do for me tomorrow |
My PR seem to have issues with minio and azurite in backend tests... Could it be that the blob storage that has been written by storage.Actions.Save(...) is not immediately available via storage.Actions.Open e.g. for minio I get an error like e.g. for azureite local backend has a 100% success rate I'm pretty shure the file has been created a few cycles before the open call Do you have any advice here? EDIT EDIT 2 EDIT 3 EDIT 4 |
…s via retries
goooooooooooooooooood !!! |
* giteaofficial/main: [skip ci] Updated licenses and gitignores Fix rename branch permission bug (go-gitea#32066) Fix artifact v4 upload above 8MB (go-gitea#31664) [skip ci] Updated translations via Crowdin Add bin to Composer Metadata (go-gitea#32099) Fix wrong last modify time (go-gitea#32102) Fix upload maven pacakge parallelly (go-gitea#31851) Repo Activity: count new issues that were closed (go-gitea#31776) Count typescript files as frontend for labeling (go-gitea#32088) Use camo.Always instead of camo.Allways (go-gitea#32097)
I think this needs to be backport to v1.22 |
Multiple chunks are uploaded with type "block" without using "appendBlock" and eventually out of order for bigger uploads. 8MB seems to be the chunk size This change parses the blockList uploaded after all blocks to get the final artifact size and order them correctly before calculating the sha256 checksum over all blocks Fixes go-gitea#31354
* SECURITY * Fix basic auth with webauthn (go-gitea#32531) (go-gitea#32536) * Refactor internal routers (partial backport, auth token const time comparing) (go-gitea#32473) (go-gitea#32479) * PERFORMANCE * Remove transaction for archive download (go-gitea#32186) (go-gitea#32520) * BUGFIXES * Fix `missing signature key` error when pulling Docker images with `SERVE_DIRECT` enabled (go-gitea#32365) (go-gitea#32397) * Fix get reviewers fails when selecting user without pull request permissions unit (go-gitea#32415) (go-gitea#32616) * Fix adding index files to tmp directory (go-gitea#32360) (go-gitea#32593) * Fix PR creation on forked repositories via API (go-gitea#31863) (go-gitea#32591) * Fix missing menu tabs in organization project view page (go-gitea#32313) (go-gitea#32592) * Support HTTP POST requests to `/userinfo`, aligning to OpenID Core specification (go-gitea#32578) (go-gitea#32594) * Fix debian package clean up cron job (go-gitea#32351) (go-gitea#32590) * Fix GetInactiveUsers (go-gitea#32540) (go-gitea#32588) * Allow the actions user to login via the jwt token (go-gitea#32527) (go-gitea#32580) * Fix submodule parsing (go-gitea#32571) (go-gitea#32577) * Refactor find forks and fix possible bugs that weaken permissions check (go-gitea#32528) (go-gitea#32547) * Fix some places that don't respect org full name setting (go-gitea#32243) (go-gitea#32550) * Refactor push mirror find and add check for updating push mirror (go-gitea#32539) (go-gitea#32549) * Fix basic auth with webauthn (go-gitea#32531) (go-gitea#32536) * Fix artifact v4 upload above 8MB (go-gitea#31664) (go-gitea#32523) * Fix oauth2 error handle not return immediately (go-gitea#32514) (go-gitea#32516) * Fix action not triggered when commit message is too long (go-gitea#32498) (go-gitea#32507) * Fix `GetRepoLink` nil pointer dereference on dashboard feed page when repo is deleted with actions enabled (go-gitea#32501) (go-gitea#32502) * Fix `missing signature key` error when pulling Docker images with `SERVE_DIRECT` enabled (go-gitea#32397) (go-gitea#32397) * Fix the permission check for user search API and limit the number of returned users for `/user/search` (go-gitea#32310) * Fix SearchIssues swagger docs (go-gitea#32208) (go-gitea#32298) * Fix dropdown content overflow (go-gitea#31610) (go-gitea#32250) * Disable Oauth check if oauth disabled (go-gitea#32368) (go-gitea#32480) * Respect renamed dependencies of Cargo registry (go-gitea#32430) (go-gitea#32478) * Fix mermaid diagram height when initially hidden (go-gitea#32457) (go-gitea#32464) * Fix broken releases when re-pushing tags (go-gitea#32435) (go-gitea#32449) * Only provide the commit summary for Discord webhook push events (go-gitea#32432) (go-gitea#32447) * Only query team tables if repository is under org when getting assignees (go-gitea#32414) (go-gitea#32426) * Fix created_unix for mirroring (go-gitea#32342) (go-gitea#32406) * Respect UI.ExploreDefaultSort setting again (go-gitea#32357) (go-gitea#32385) * Fix broken image when editing comment with non-image attachments (go-gitea#32319) (go-gitea#32345) * Fix disable 2fa bug (go-gitea#32320) (go-gitea#32330) * Always update expiration time when creating an artifact (go-gitea#32281) (go-gitea#32285) * Fix null errors on conversation holder (go-gitea#32258) (go-gitea#32266) (go-gitea#32282) * Only rename a user when they should receive a different name (go-gitea#32247) (go-gitea#32249) * Fix checkbox bug on private/archive filter (go-gitea#32236) (go-gitea#32240) * Add a doctor check to disable the "Actions" unit for mirrors (go-gitea#32424) (go-gitea#32497) * Quick fix milestone deadline 9999 (go-gitea#32423) * Make `show stats` work when only one file changed (go-gitea#32244) (go-gitea#32268) * Make `owner/repo/pulls` handlers use "PR reader" permission (go-gitea#32254) (go-gitea#32265) * Update scheduled tasks even if changes are pushed by "ActionsUser" (go-gitea#32246) (go-gitea#32252) * MISC * Remove unnecessary code: `GetPushMirrorsByRepoID` called on all repo pages (go-gitea#32560) (go-gitea#32567) * Improve some sanitizer rules (go-gitea#32534) * Update nix development environment vor v1.22.x (go-gitea#32495) * Add warn log when deleting inactive users (go-gitea#32318) (go-gitea#32321) * Update github.com/go-enry/go-enry to v2.9.1 (go-gitea#32295) (go-gitea#32296) * Warn users when they try to use a non-root-url to sign in/up (go-gitea#32272) (go-gitea#32273) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEumb2f9c/cFjXEtMIw7fJG2Mvc4oFAmdEyeoACgkQw7fJG2Mv # c4pythAAn57Z9Csfd8UrHbCd87SBlEGydhlng5Oc99pQIAvExR0hc9VFWjt5pFr4 # aXTajtzb/sDQkAPZEiL45CL471z+Ga81ixaKRfrBeMiSECB0wBaL4+XH94qQ3lw3 # /dNfQsc9bUnomGWQyEIbQ6mT85fJdvBD1nibUSH3b5P4WqOBHbY9YlehPmE96KY2 # 9k1IYvBvcfCjK6njVQ7m+sFOr7/Y2ZHe9FeN8hEf/1Bfnc75wtkeNyeXnlNe67Eo # ViFzcA35WyTXw4NRY+TG/8xZEXHl8DuOuUdPoBqkpFw9TzxR2svO0QLzRIHgJP+t # /Cdd16zZd6fQ+ET+DV8IaF2wlXdEgVDWs2aT04VDLGpSw9czxsUEUQ0ETWFFomXN # //goTLu1B3fVQYrE9MK2vfUQGe2Su3ChGwNtNEK9bMQpO6sLFGRE0nPgBJMPJ0yA # bfPhRlsVxnyEToqeKoC77wv0kPiOkzPfDm6sFLAt+tATcij5UlTU4nVXyXsELk14 # p5mtsTfaEqiH3U+JW0Drz8wV7nk8F599lZbYO92M3Z59bqC5TsOVYgqb1ODTpqQO # 7gLdgdKmQbKWTPHLA9Hz+0/3bT1MirMRdtXW7TmgW83TuN37wOuElCmXmJTN2feY # LG4k417kVrBwF+fdGPXo+T7H0MqxX1fTkVftG3C63sdaRQrUM1M= # =jyQM # -----END PGP SIGNATURE----- # gpg: Signature made Tue, Nov 26, 2024 3:03:06 AM # gpg: using RSA key BA66F67FD73F7058D712D308C3B7C91B632F738A # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: keydb_search failed: Connection timed out # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: Note: database_open 134217901 waiting for lock (held by 1152) ... # gpg: keydb_search failed: Connection timed out # gpg: Can't check signature: No public key
Multiple chunks are uploaded with type "block" without using "appendBlock" and eventually out of order for bigger uploads.
8MB seems to be the chunk size
This change parses the blockList uploaded after all blocks to get the final artifact size and order them correctly before calculating the sha256 checksum over all blocks
Fixes #31354