-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(XML) Unicode letters should be recognized in element and attribute names #3256
Comments
I am not sure the whole range of Unicode letters or the same range as used in the XML spec is expressible in the four digit
I think. |
Could we just use |
If so perhaps lets reopen #3257 with that approach instead... |
I will hopefully be able to look into this next week. |
Ping. Still interested in pursuing this? |
I will give it a try during this week (i.e. until Sunday, May 1st). |
Describe the issue
XML allows Unicode letters as element and attribute names while your xml.js language mode uses a regular expression just checking for ASCII letters A-Z.
That way anyone trying to highlight XML with non-ASCII letters in element or attribute names doesn't get highlighting e.g. in
<categoría>producto</categoría>
the Spanish wordcategoría
which is a well-formed XML element name is not recognized as such by the regular expressionconst TAG_NAME_RE = regex.concat(/[A-Z_]/, regex.optional(/[A-Z0-9_.-]*:/), /[A-Z0-9_.-]*/);
in https://github.com/highlightjs/highlight.js/blob/main/src/languages/xml.js#L12Which language seems to have the issue?
XML from https://github.com/highlightjs/highlight.js/blob/main/src/languages/xml.js
Are you using
highlight
orhighlightAuto
?highlight
Sample Code to Reproduce
Expected behavior
The output for the XML markup
<categoría>test</categoría>
currently is<categoría>test</categoría>
while it should be<span class="hljs-tag"><<span class="hljs-name">categoría</span>></span>test<span class="hljs-tag"></<span class="hljs-name">categoría</span>></span>
.Additional context
https://www.w3.org/TR/xml/#NT-NameStartChar and https://www.w3.org/TR/xml/#NT-NameChar definitions from XML spec. I think it should be possible to fix the regular expressions used in xml.js, either by using ranges of the characters given in the XML spec or, if the Unicode support in JavaScript regular expressions is used, by using e.g.
\p{Letter}
instead ofA-Z
.The text was updated successfully, but these errors were encountered: