Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MIME charset based on Emacs coding system #18

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add MIME charset based on Emacs coding system #18

wants to merge 2 commits into from

Conversation

kqr
Copy link

@kqr kqr commented Feb 17, 2018

Companion to, but obviously independent of, PR #6 on impatient-mode.

@skeeto
Copy link
Owner

skeeto commented Feb 17, 2018 via email

@kqr
Copy link
Author

kqr commented Feb 17, 2018

  1. fst was just a brainfart that somehow made it into the code after I had tested it with car, which is the function I meant. I have pushed a fix, but one will probably want to squash them while merging.

  2. No, Emacs does not ignore original encoding when reading into a multibyte buffer. The original encoding (or at least a best guess) is saved to buffer-file-coding-system.

  3. You can try for yourself and see what

    (seq-every-p (lambda (x) (eq x 'utf-8))
        (mapcar #'coding-system-type
             '(utf-8 utf-8-auto utf-8-auto-dos utf-8-auto-mac utf-8-auto-unix utf-8-dos utf-8-emacs utf-8-emacs-dos utf-8-emacs-mac utf-8-emacs-unix utf-8-hfs utf-8-hfs-dos utf-8-hfs-mac utf-8-hfs-unix utf-8-mac utf-8-unix utf-8-with-signature utf-8-with-signature-dos utf-8-with-signature-mac utf-8-with-signature-unix)))
    

    returns. It's true! Many of the non-utf-8 cases are handled by the charset case.

@kqr
Copy link
Author

kqr commented Feb 17, 2018

Re. your concern about binary files I'm not sure... One could retrieve charset only for text/ mime types. Or guess based on presence of null bytes.

@skeeto
Copy link
Owner

skeeto commented Feb 17, 2018 via email

@kqr
Copy link
Author

kqr commented Feb 18, 2018

However, for this server the buffer is sent out with process-send-region, which uses the connection process' coding system, not the buffer's file coding system. Since the connection process is set to "binary," Emacs will send out the buffer's raw UTF-8 content regardless of the original file's encoding.

Aaah, that makes sense. Sorry, I misunderstood completely. Thank you for teaching me a little more about Emacs encoding handling.

If I understand the rest correctly (i.e. regardless of this patch or not, Emacs will push UTF-8 into the pipes), would it make sense to rewrite this patch to instead just set charset=utf-8 on everything? It's not optimal (since recoding is not a reversible process in general), but it should be strictly better than the current situation where UTF-8 is used with no indication of it, right?

@skeeto
Copy link
Owner

skeeto commented Feb 18, 2018 via email

skeeto added a commit that referenced this pull request Feb 18, 2018
If the client uses HTTP/1.0 or "Connection: close" then the server
*must* close the connection after the response has been sent.
@kqr
Copy link
Author

kqr commented Feb 19, 2018

Ah, yes, I forgot the buffer was not multibyte pre-patch. Sorry. Now that part makes sense as well. I agree fully that the server is out of line if it attempts to recode the content before passing it along to the user (beside the philosophical issue, it would wreck any checksums or HMACs associated with the content.)

You haven't said why you want this change. I see you paired this with a PR over in impatient-mode. Are you having trouble with a particular document?

Yes, although I'm starting to suspect something else is wrong, because my problem contradicts your other observations. In particular,

JavaScript included from an HTML document adopts that document's encoding

appeared not to be the case for me. Firefox assumed (as you said) that a JavaScript file included in my HTML was ASCII encoded rather than UTF-8, and then when it encountered GREEK SMALL LETTER MU in an identifier it errored out. The HTML file had UTF-8 specified as a meta tag and it appeared to understand UTF-8 sequences.

That said, I am 100% content with the solution you suggested of pushing a new mime type onto the alist. I wish I had thought of that to begin with, and not tried to be clever. If it's okay with you, I can close this pull request as well as its companion.

@skeeto
Copy link
Owner

skeeto commented Feb 19, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants