Writes can be silently truncated during package download #40

fergus-dall · 2020-12-27T05:32:09Z

I recently set up a rebuilderd instance and noticed around 20 packages failing to reproduce, even though the results in build/ are exactly identical to what I can download from the mirror. After examining my logs closely, I noticed that the workers were reporting download sizes smaller then the size of the files on the mirror by a small amount (usually only a few percent). I think I triggered this issue by having a large (relative to the size of the machine) number of workers, which for one reason or another (thread contention?) forced some writes to return partial results, causing the rest of the write buffer to be discarded.

Looking at worker/src/download.rs, I see that we write from the request stream using this loop:

    let mut bytes: u64 = 0;
    while let Some(item) = stream.next().compat().await {
        let item = item?;
        bytes += f.write(&item).await? as u64;
    }
    info!("Downloaded {} bytes", bytes);

where f is a tokio::fs::File and write is from the trait tokio::io::AsyncWriteExt. This loop assumes that successful writes always write the whole buffer, but the documentation for this method explicitly rejects that:

This function will attempt to write the entire contents of buf, but the entire write may not succeed, or the write may also generate an error. A call to write represents at most one attempt to write to any wrapped object.

Instead there should be a loop like this:

    let mut bytes: u64 = 0;
    while let Some(item) = stream.next().compat().await {
        let mut item = item?;
        while !item.is_empty() {
            let written = f.write(&item).await? as u64;
            bytes += written;
            _, item = item.split_at(written);
        }
    }
    info!("Downloaded {} bytes", bytes);

More generally, there should also be some integrity checking on the download, just to ensure that the downloaded package hasn't been corrupted. Otherwise we risk wasting a bunch of time trying to reproduce a package that can't be reproduced.

The text was updated successfully, but these errors were encountered:

Closes #40

kpcyrd · 2020-12-27T14:03:53Z

Thank you for the very detailed bug report! Please have a look at #42 (using write_all), I'm planning to merge and release this as 0.9.1 today. :)

kpcyrd · 2020-12-27T21:00:04Z

This bug has been fixed in the 0.9.1 release, thank you very much!

kpcyrd added a commit that referenced this issue Dec 27, 2020

Fix lost data with partial writes discovered by @fergus-dall

35e0998

Closes #40

kpcyrd mentioned this issue Dec 27, 2020

Fix partially lost writes #42

Merged

kpcyrd added the bug Something isn't working label Dec 27, 2020

kpcyrd closed this as completed in #42 Dec 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writes can be silently truncated during package download #40

Writes can be silently truncated during package download #40

fergus-dall commented Dec 27, 2020

kpcyrd commented Dec 27, 2020

kpcyrd commented Dec 27, 2020

Writes can be silently truncated during package download #40

Writes can be silently truncated during package download #40

Comments

fergus-dall commented Dec 27, 2020

kpcyrd commented Dec 27, 2020

kpcyrd commented Dec 27, 2020