You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently set up a rebuilderd instance and noticed around 20 packages failing to reproduce, even though the results in build/ are exactly identical to what I can download from the mirror. After examining my logs closely, I noticed that the workers were reporting download sizes smaller then the size of the files on the mirror by a small amount (usually only a few percent). I think I triggered this issue by having a large (relative to the size of the machine) number of workers, which for one reason or another (thread contention?) forced some writes to return partial results, causing the rest of the write buffer to be discarded.
Looking at worker/src/download.rs, I see that we write from the request stream using this loop:
where f is a tokio::fs::File and write is from the trait tokio::io::AsyncWriteExt. This loop assumes that successful writes always write the whole buffer, but the documentation for this method explicitly rejects that:
This function will attempt to write the entire contents of buf, but the entire write may not succeed, or the write may also generate an error. A call to write represents at most one attempt to write to any wrapped object.
More generally, there should also be some integrity checking on the download, just to ensure that the downloaded package hasn't been corrupted. Otherwise we risk wasting a bunch of time trying to reproduce a package that can't be reproduced.
The text was updated successfully, but these errors were encountered:
I recently set up a rebuilderd instance and noticed around 20 packages failing to reproduce, even though the results in build/ are exactly identical to what I can download from the mirror. After examining my logs closely, I noticed that the workers were reporting download sizes smaller then the size of the files on the mirror by a small amount (usually only a few percent). I think I triggered this issue by having a large (relative to the size of the machine) number of workers, which for one reason or another (thread contention?) forced some writes to return partial results, causing the rest of the write buffer to be discarded.
Looking at
worker/src/download.rs
, I see that we write from the request stream using this loop:where
f
is atokio::fs::File
andwrite
is from the traittokio::io::AsyncWriteExt
. This loop assumes that successful writes always write the whole buffer, but the documentation for this method explicitly rejects that:Instead there should be a loop like this:
More generally, there should also be some integrity checking on the download, just to ensure that the downloaded package hasn't been corrupted. Otherwise we risk wasting a bunch of time trying to reproduce a package that can't be reproduced.
The text was updated successfully, but these errors were encountered: