Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download of compressed files on arm-plattforms #110

Closed
simonhbw opened this issue Apr 27, 2021 · 8 comments
Closed

download of compressed files on arm-plattforms #110

simonhbw opened this issue Apr 27, 2021 · 8 comments

Comments

@simonhbw
Copy link

hi, ive a problem using tools like @azure/storage-blob or node-fetch on armv7 devices with any node-version, which uses llhttp instead of the legacy parser. when downloading a compressed file, the response is always decompressed what cause an data corruption failure. i can reproduce the issue using whatever @azure/storage-blob or node-fetch directly.

the interessting thing is, that the issue only ocurres on arm(v7)-plattforms. running the same code on an x64/86 plattform, doesnt cause any issues.
changin the node-version to 11.x or 12.x, with "http-parser=legacy" also doesnt cause any issues

code sample:

const { BlockBlobClient } = require('@azure/storage-blob');
const path = require('path')
const fs = require('fs')
const cliProgress = require('cli-progress');

const repositoryPath = '../repository'
const uri = 'placeUrlWithSasHere'

async function startDownload(sasUri) {
try {
    console.log('starting download from url: %s', sasUri);
    const blobClient = new BlockBlobClient(sasUri);
    const props = await blobClient.getProperties();
    // console.log({ props });
    
    const fileName = blobClient._name;
    const filePath = path.join(repositoryPath, fileName);
    const bar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);

    const downloadBlockBlobResponse = await blobClient.download(null, null,
        {
            onProgress: ((ev) => {
                bar.update(ev.loadedBytes);
            }),
        });
    
    const readStream = downloadBlockBlobResponse.readableStreamBody;
    const writeStream = fs.createWriteStream(filePath);
    
    bar.start(props.contentLength, 0);
    readStream
        // .pipe(throttler)
        .pipe(writeStream);
    
    readStream.on('error', (e) => { 
        bar.stop()
        console.error(e.message)
    });
    writeStream.on('error', (e) => { 
        bar.stop()
        console.error(e.message)
    });

    readStream.on('end', () => {
        bar.stop()
        console.log('readstream finished')
    })
} catch (error) {
   console.error(error)
}
}

startDownload(uri)

result on armv7:

Data corruption failure: received less data than required and reached maxRetires limitation. Received data offset: 1473222655, data needed offset: 10063157247, retries: 5, max retries: 5

result on x64

readstream finished

@Chaphasilor
Copy link

I'm also having this issue, check out the original issue (nodejs/node#37053) I created, which includes a basic setup for reproducing the issue :)

@indutny
Copy link
Member

indutny commented May 12, 2021

Thanks to undici issue: nodejs/undici#803 . I finally know what's going on there. There is a truncating conversion of uint64_t to size_t in consume in llparse. Will work on the fix soon.

indutny added a commit to nodejs/llparse that referenced this issue May 12, 2021
On 32bit platforms `size_t` is essentially `uint32_t` (or at times
even meager `uint16_t`). Loading `uint64_t` field value into `size_t` on
these platforms would truncate the high bits and leave only the low 32
(16) bits in place. This leads to various interesting errors in
downstream modules. See:

- nodejs/llhttp#110
- nodejs/undici#803
@indutny
Copy link
Member

indutny commented May 12, 2021

Fix: nodejs/llparse#44

indutny added a commit to nodejs/llparse that referenced this issue May 12, 2021
On 32bit platforms `size_t` is essentially `uint32_t` (or at times
even meager `uint16_t`). Loading `uint64_t` field value into `size_t` on
these platforms would truncate the high bits and leave only the low 32
(16) bits in place. This leads to various interesting errors in
downstream modules. See:

- nodejs/llhttp#110
- nodejs/undici#803

This patch makes all field loads go into their respective types.
Truncation doesn't happen in this case because C coercion rules will
cast both types to the largest necessary datatype to hold either of
them.
indutny added a commit to nodejs/llparse that referenced this issue May 12, 2021
On 32bit platforms `size_t` is essentially `uint32_t` (or at times
even meager `uint16_t`). Loading `uint64_t` field value into `size_t` on
these platforms would truncate the high bits and leave only the low 32
(16) bits in place. This leads to various interesting errors in
downstream modules. See:

- nodejs/llhttp#110
- nodejs/undici#803

This patch makes all field loads go into their respective types.
Truncation doesn't happen in this case because C coercion rules will
cast both types to the largest necessary datatype to hold either of
them.

PR-URL: #44
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Daniele Belardi <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
@indutny
Copy link
Member

indutny commented May 13, 2021

PR for node.js: nodejs/node#38665

@indutny indutny closed this as completed May 13, 2021
@simonhbw
Copy link
Author

simonhbw commented Jun 7, 2021

for my specific problem, the issue is not fixed.
im getting the following results when running the code from the first post, using a large file (~10GB):

using [email protected]

readstream finished

using [email protected]

Data corruption failure: received less data than required and reached maxRetires limitation. Received data offset: 1473222655, data needed offset: 10063157247, retries: 5, max retries: 5

using [email protected]

the download progress stops after at a random part between 5GB or 8GB without any log message
note that the "Data curruption failure" does not ocurr anymore.

@Chaphasilor
Copy link

using [email protected]

the download progress stops after at a random part between 5GB or 8GB without any log message
note that the "Data curruption failure" does not ocurr anymore.

Can you provide some demo code or the link it happens with? Would love to try it out, given that it worked for me with a 12+ GB file on Node 16.3.0...
Also, Node 16 didn't give an error message ever, it just closed the connection :)

@simonhbw
Copy link
Author

simonhbw commented Jun 7, 2021

using [email protected]

the download progress stops after at a random part between 5GB or 8GB without any log message
note that the "Data curruption failure" does not ocurr anymore.

Can you provide some demo code or the link it happens with? Would love to try it out, given that it worked for me with a 12+ GB file on Node 16.3.0...
Also, Node 16 didn't give an error message ever, it just closed the connection :)

you can use the code from my first post. because the file is non-public, i could send an email with the url. is this ok for you?

@Chaphasilor
Copy link

you can use the code from my first post. because the file is non-public, i could send an email with the url. is this ok for you?

Maybe you could try reducing your code further. Is there a way to use that azure library to get a (temorary) download URL for that file?

If yes, you could try download the file from that URL, using the built-in http(s) module.
This would make sure that the issue isn't in the azure package :)

That is, unless you already tried that ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants