Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper support UTF8 payloads #875

Closed
johann-sonntagbauer opened this issue Oct 12, 2015 · 12 comments · Fixed by gluckgames/pixi-packer#18
Closed

Proper support UTF8 payloads #875

johann-sonntagbauer opened this issue Oct 12, 2015 · 12 comments · Fixed by gluckgames/pixi-packer#18

Comments

@johann-sonntagbauer
Copy link
Contributor

Sinon version : 1.17.1 Environment : Windows 8.1 all Browsers (tested with IE9-11, Firefox latest, Chrome latest)

Bug description

With the following commit f809165
ArrayBuffer and Blob support are introduced. So every response body now gets encoded according the request.responseType setting.

One problem with the current implementation of that feature is proper UTF8 support. The converstion for now can only handle ASCII characters f809165#diff-4b4d9095007a078c5beda46091f1111aR293.

It would be great to proper encode the body with the help of TextEncoder. An implementation example can be found in the specs https://encoding.spec.whatwg.org/#utf-8-encode. For cross browser support <IE10 there exists even a polyfill https://github.com/inexorabletash/text-encoding

If you accept that proposal I could write a PR for that.
Thx for your effort so far.

@fatso83
Copy link
Contributor

fatso83 commented Oct 19, 2015

After looking into this a bit, I think this sounds quite sound, and a PR would not be such a big effort when we can make use of the existing text-encoding library. I was a bit wary of using a polyfill, but as we are now transitioning to a CommonJS build, the library would not be used as a polyfill, but rather as a simple standalone library:

const { TextEncoder, TextDecoder } = require('text-encoding')

With that background I think we should accept your proposal and welcome your efforts.

@johann-sonntagbauer
Copy link
Contributor Author

Ok, so I will write a PR for this. Thx

@fatso83
Copy link
Contributor

fatso83 commented Oct 19, 2015

Great! It would be nice if you included a small test case in your PR that fails in the current implementation.

@johann-sonntagbauer
Copy link
Contributor Author

+1

fatso83 added a commit that referenced this issue Oct 22, 2015
@henryptung
Copy link
Contributor

Whoa! So my implementation was meant to support "binary" encoding, I.e. arbitrary non utf8 binary sequences. Is that still supported, or is it broken by this change?

@henryptung
Copy link
Contributor

To be clear, my use case is protobufs - true binary data.

@fatso83
Copy link
Contributor

fatso83 commented Nov 1, 2015

@henryptung : not sure which "implementation" you are talking about here? have you formerly contributed to an earlier version of this, or is this implementation some client code? I don't see what the new code should break; using an ASCII encoder should still work, no?

The easiest way of finding out is just testing whatever use case you have. Maybe you can enlighten us instead? :-)

@henryptung
Copy link
Contributor

Yes, I wrote the original commit for arraybuffer and blob support. The question is whether ASCII encoding supports extended bytes like 0x80, 0xFF, etc.

@fatso83
Copy link
Contributor

fatso83 commented Nov 1, 2015

Aha, hopefully @johann-sonntagbauer knows.

@henryptung : maybe you have some code snippet that you know works with the old version that also should work with the new? Then we could have a regression test. Maybe you included such a thing in your original work, in which case, it works :-)

@henryptung
Copy link
Contributor

Yeah; sorry, I just thought that "binary string" in the code was explicit enough that it was not meant to be UTF-8, but a different format; in this case, the string returned by https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/btoa. I hadn't accounted for redefinition of the input type in the tests, and mostly focused on testing proper preservation/delivery of data.

At root, the redefinition of behavior on strings with code points >= 0x80 is an API break from my code's point of view. The single-byte encodings supported by the standard are defined at https://encoding.spec.whatwg.org/#legacy-single-byte-encodings, and none of them seem to correspond to the "binary string" format used by WindowBase64. Given the situation, and to avoid breaking API on @johann-sonntagbauer, I'd say it's better to go with one of two options:

  1. Define a "base64" encoding for binary data, and special-case that type when calling into TextEncoder/Decoder. Since this type isn't defined among the encodings, it shouldn't be considered an API break.
  2. Add support for ArrayBuffer(View) directly in the API. This is IMO the superior API from my point of view, since it avoids the confusion described in https://codereview.chromium.org/143943017/ (they rejected solution (1) above precisely due to the confusion of mixing characters-as-binary with binary-as-characters), but involves much more implementation lift in the code. Slight downside: ArrayBuffer(View)s, unlike strings, are not immutable, so we'll need to either make defensive copies, or warn users against mutating the buffer after passing it in.

I currently don't have the cycles to spare on this, but possibly in a few weeks. @johann-sonntagbauer, any thoughts on the above?

@johann-sonntagbauer
Copy link
Contributor Author

@henryptung you are totaly right. I made the changes with the assumption, that sinon.respond(With) can only handle strings. Therefor they have to be encoded to proper work. Sorry for the confusion, it was definitly a lack of knowledge on my side.

According your suggestions I would also prefer to add ArrayBuffer in the sinon API in sake of clarity.

So to ensure I fully understand all the problems :)

  • sinon respond API should support two flavours:
    • Strings which should get encoded/decoded with TextEncoder mechanism
    • Blob / ArrayBuffer which should be stuffed directly into the response body
  • the same API enhancement should be reflected in the test assertion helper functions
  • update the documentation to make things clear to users

Please let me know if i got things right and then I will try to make a PR as soon as possible (within the next day)

@henryptung
Copy link
Contributor

@johann-sonntagbauer Sorry, didn't mean to come off so brusque - the usage of binary strings was also a concession on my part due to laziness, and I figured there would be some confusion between binary and text strings in the API. Your interpretation of the string API is more consistent with the rest of the API, and the right way to include binary support is allowing ArrayBufferViews as a mock response body type.

If I may offer a suggestion, supporting two different operation modes in FakeXMLHttpRequest (one for text, one for binary) seems non-ideal. By leveraging the encoding field you introduced, the codepaths (in particular chunking logic) can probably be unified into an ArrayBuffer-based pipeline, since TextDecoder/Encoder means all mock responses can be serialized to ArrayBuffer, and deserialized back to text (if desired), just like a real response. In particular, every mock response, text or binary, can be interpreted using every responseType (format permitting).

Again, though, there's no real rush here - while this might frustrate attempts to upgrade down the line, that's definitely not happening anytime soon. I'll be happy to chip in too, once I actually get some free time again >.<

tzrh added a commit to tzrh/nise that referenced this issue Sep 27, 2017
Fix sinonjs/sinon#1570

Binary strings will be encoded using utf-8, instead of being returned
unchanged in the response body. This commit allows binary response
data to be passed as an array buffer directly to nise, while keeping the
existing behavior of encoding normal strings in utf-8 as discussed in
sinonjs/sinon#875 (comment).
fatso83 pushed a commit to sinonjs/nise that referenced this issue Oct 23, 2017
Fixes sinonjs/sinon#1570
Supports raw binary responses

Binary strings will be encoded using utf-8, instead of being returned
unchanged in the response body. This commit allows binary response
data to be passed as an array buffer directly to nise, while keeping the
existing behavior of encoding normal strings in utf-8 as discussed in
sinonjs/sinon#875 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants