Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Payload conversion converts data to utf8 #230

Closed
lteacher opened this issue Apr 5, 2017 · 15 comments · Fixed by #394
Closed

Payload conversion converts data to utf8 #230

lteacher opened this issue Apr 5, 2017 · 15 comments · Fixed by #394

Comments

@lteacher
Copy link
Contributor

lteacher commented Apr 5, 2017

Hi,

Just wanted to point out an issue with the change in #224. The toString() here is encoding to utf8. This makes binary data unusable.

As an example, I am passing lambda some binary content, and when using lambda-proxy the binary is converted into a base64 string. You can check the isBase64Encoding parameter in AWS to know to decode that base64 but the original payload content was never encoded to utf8 so it works fine. In serverless offline the content is destroyed.

I don't have a solution, but I have temporarily changed it in a fork to work around this.

cc @neverendingqs

@dherault
Copy link
Owner

dherault commented Apr 6, 2017

Thanks @lteacher,
Please feel free to PR as soon as you have a more general fix.

@Andriy-Kulak
Copy link

@lteacher so does your solution work for any multipart/form-data data? I am trying to upload image in a FormData format so I can alter it in lambda, but am getting some errors when reading the data with serverless-offline so I am trying to figure out the problem.

@Andriy-Kulak
Copy link

Andriy-Kulak commented Apr 8, 2017

Also @lteacher and @dherault it appears that API Gateway does not support multpart-form/data per this link.

Any thoughts on this? How can I send an image to Lambda so I can alter it and then store it in S3.

I wanted to avoid storing one image in S3, then upon success, triggering a Lambda function to alter that image, and send back a URL with the altered image. This requires two calls instead of keeping it as one request but maybe I am just overthinking this.

@lteacher
Copy link
Contributor Author

lteacher commented Apr 8, 2017

Hi @Andriy-Kulak

There are a few points to your question, but currently I am using multipart/form-data in lambda. Maybe by 'support' the comments are referring to some kind of native parsing or something... not sure since the binary support was added before the comments.

Firstly, the content is not parsed in any way so your logic will need to solve it. Second, you need to enable binary support. See this link. Of course setting this you use the header multipart/form-data

Then, you need to make sure your request is coming through ok, I am using lambda-proxy and it gets converted to base64 content, if using just lambda you need to make sure any random template doesn't mess it up. When AWS converts to base64 they give it a flag called isBase64Encoded on the request which you can use because serverless or others would not do this so you need to have slightly special logic. In my case that is checking this flag and using a Buffer.from binary vs base64

Finally, if you are trying to use serverless-offline it may never work for your content, depending on the file encoding, as that file will get converted to a utf8 string which is the only part my workaround is fixing.

@hackash
Copy link

hackash commented May 18, 2017

@lteacher thank you for workaround and explanation , I tried to install your fork, branch

fix-utf8-conversion

but anyway it does not work, I want to use multipart/form-data, any ideas ? Thanks in advance .

@lteacher
Copy link
Contributor Author

lteacher commented May 18, 2017

@hackash you nee to be a bit more specific in the details about how it doesn't work. I am still using the fork as I don't have a solution better than looking at the content type to whitelist probable binary types. So the key point is that it does work for me.

Just wondering where it fails for you because I have to do some parsing with busboy to make it work with my content as nothing is doing that for you which is unrelated to serverless offline.

Edit: Here is a gist to show the parsing I need to do for this.

I have a package which allows before hooks then I would add this parseForm, or parseRequest actually, to happen before my file upload handler.

Maybe that info can help you out. Perhaps should I propose PR here to get the discussion going how to deal with the utf conversion thing.

@hackash
Copy link

hackash commented May 19, 2017

@lteacher thank you for the answer and for the gist as well , to be more specific, here is the actual problem on my end.

I use aws-serverless-express and formidable just to get the parsed multipart form data , files and fields and here is the code I use, to save files on the server.

   fs.readFile(req.files['file'].path, 'binary', (err, data) => {
        if (err) {
            throw err;
        }
        fs.writeFile('some.[some ext]', data, 'binary', (err, data) => {
            if (err) {
                throw err;
            }
            res.end('OK');
        });
    });

After running this, I get the file in my local server , but it is corrupted, so I can't open it.
e.g images are not being shown, in normal image preview applications
e.g zip files can not be opened via archivers ERROR 21 is not a directory.

Even with your fork installed . Anyway I'll try the gist as well , if you have suggestions for the code above, please write a comments, many thanks.

@lteacher
Copy link
Contributor Author

lteacher commented May 19, 2017

@hackash thanks for the info. I took some time to take a look at your issue and I noticed that this package you mentioned has some interesting stuff going on with the event mapping.

Assuming you are using that server thing on a quick glance over the code I saw this section here which appears to convert to utf-8

The author(s) have provided some input in the third param to say which mime types are base64 encoded apparently.

So I think you need to pass that mime type in there and... hmm maybe base64encode. Not sure abt that part didt look so close

@hackash
Copy link

hackash commented May 19, 2017

@lteacher I tried the parser via busboy, it is working fine and what is good , that I can get rid off express staff from lambda, I think this is the good approach rather then have express staff in lambda, I'll modify it a bit for my needs , thank you man .

@gitowiec
Copy link

Hi
I think I have a similar issue. I would like to make my lambda function receive uploaded file. I choose to POST raw data, for development purposes I use CURL command line like this:
curl --header "Content-Type:application/octet-stream" --trace-ascii debugdump.txt --data-binary @test-files/small-test.zip http://localhost:8000/chrome
You can see chrome as resource because, yes it is serverless-chrome project (https://github.com/adieuadieu/serverless-chrome) I forked it and am changing to my needs. I am using serverless-offline to do developing of the solution.

I am absolute beginner regarding AWS, Lambda and Serverless. This is my first encounter with these technologies.

In the Node JS code (lambda) I take AWS Lambda event object and its event.body property into this function:

###handler.js

    export default (async function run(event, context, callback) {
    
        //here event.body keeps the content of small-test.zip
        //logged content is encoded with \u0000 signs- take a look at screenshots
        console.log(event);
        writeToDiskAndUnpackDocument(event.body);
    }

    function writeToDiskAndUnpackDocument(binaryFileContents) {
        //here I get binary content - take a look at screenshots
        console.log(binaryFileContents); 
        //this command writes binaryFileContents to disk, but result file is not the same as that in curl command
        fs.writeFile('/tmp/document.zip', binaryFileContents,'binary');
    }

My problem is when I send file via CURL I get the content of the file in a event.body property in the handler function. It looks kinda wrong, there is alot of \u0000 sequences which I suspect they come from converting binary data into UTF-8 string... Then I would like to write it as ZIP file.

The document.zip lands inside /tmp folder, when I issue cat document.zip it looks different from the input file test-files/small-test.zip. I don't know why it differs. I suspect encoding to UTF-8 in JSON. Here is the screenshot, at the top is original file, at the bottom is received file.

screenshot_20171012_020638

Will the solution of @lteacher help me with my use case?

screenshot_20171013_133347

@Tutch
Copy link

Tutch commented Apr 11, 2018

I'm running into the same issue as @gitowiec. I've tried to apply @lteacher parse function using Busboy but the end results are still the same. I get the contents of the file as \u characters on the lambda event, and the resulting buffer turns into a corrupted image when saved to disk using fs.writeFile.

In my particular case, I'm uploading the files using ng2-file-upload, but it doesn't appear to be the problem here.

I lost a full day trying different methods to solve this issue buy @lteacher fork does the job. I'd hope it would be merged to the main project. Thank you very much, you are a lifesaver.

@lteacher
Copy link
Contributor Author

lteacher commented Apr 11, 2018

@Tutch sorry I haven't done anything with serverless in a long time so even the one place where we use this its still on my fork and I had forgotten about this change. I guess its just not so common that people are trying to do this multipart form parsing...

I guess since the code base really hasn't changed in that location maybe we should just whitelist binary types using the content-type as in that fork so... I'll see if I can open a PR to fix it. but wouldn't have the time to test it on aws though Did test with actual data and mutipart form parser thingo actually.

@Tutch
Copy link

Tutch commented Apr 12, 2018

@lteacher, whitelisting does loos like the best solution in this case. Thank you again! Hope it gets merged, since it would be nice to have the fix on the master branch.

@ashish-dirkmedia-de
Copy link

Having the same issue. File gets currupt if I use lambda proxy in api gateway. I m using angular ng2-file-upload to up load the file. Please help what to do. Changes in angular side or api gateway nd what change to make

@lteacher
Copy link
Contributor Author

@ashish-dirkmedia-de sorry but you would need to provide info about the api implementation.

Also, pretty much everything you would need to know would have been covered in this issue and in #464 if the implementation is using serverless-http so maybe you should do some reading against your implementation firstly as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants