-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
There is no underscore in the character class in the regular expression capture for charset detection in URL previews #10307
Comments
please could you explain what the user-visible symptoms of this issue are? |
babolivier, thank you for fixing the wrong grammar. |
I'm assuming this is to match additional charsets, e.g. both Shift-JIS and Shift_JIS? This should probably be fine. Do you get mojibake without this change? |
Sorry for the late reply. Steps to reproduceI referred to the following article to find the "sample URL for verification". https://w3techs.com/technologies/overview/character_encoding When I changed the regex character class, restarted the server and did the same, it looked like the screenshot below. I checked the response header using firefox's web development toolThe "upper two pasted URLs" in the validation image did not contain the character set definition in the content-type line of the HTTP response header. For the third www.jalan.net, even if the HTTP response header contains "charset = Windows-31J", it is not output properly as a result. I haven't tracked how the "the retrieved character set in variable" are processed, but they are clearly defined in the hash variables in webencodings.labels. There is a recognition that in the past situation in Japan, in order to avoid the occurrence of problems, the WEB server side tended not to clearly set a specific character set for sending response headers. At least in the era when "individual blog operation including servers" became popular, it was seen in many "server setting articles" introduced by individuals. My personal server has the same settings as those. I tried my best with machine translation and wrote it desperately, I'm sorry if there is something rude |
@srividyut Thanks for including the screenshots! That makes it clear what's happening. I think the original PR you had put up (#10306) was correct. You'll just need to:
https://github.com/matrix-org/synapse/blob/master/CONTRIBUTING.md#9-submit-your-patch has a bit more info about this. |
I made a mistake not only in the grammar but also in the fixed code. |
Signed-off-by: sri-vidyut <[email protected]>
There is no underscore in the character class in the regular expression capture for charset detection>> There is no underscore in the character class in the regular expression capture for charset detection in URL previewsline61
synapse/synapse/rest/media/v1/preview_url_resource.py
Line 61 in 4b965c8
line63
synapse/synapse/rest/media/v1/preview_url_resource.py
Line 63 in 4b965c8
([a-z0-9-]+)
->([a-z0-9-_]+)
(21/07/16 02:00)
([a-z0-9-]+)
->([a-z0-9_-]+)
(21/07/16 02:00) Hyphens need to be escaped unless they are at the beginning or end. The "Source Editor Screenshot" is also incorrect, so I deleted it.
Ignore this as it seems to be excluded in the test
The text was updated successfully, but these errors were encountered: