Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underscore is sometimes an unsafe character #347

Open
stefansundin opened this issue Feb 10, 2022 · 5 comments
Open

Underscore is sometimes an unsafe character #347

stefansundin opened this issue Feb 10, 2022 · 5 comments

Comments

@stefansundin
Copy link
Contributor

Hello there,

Just wanted to open this issue to discuss the underscore character. It is unsafe if used at the very end of an URL on GitHub. (I also tested it on Twitter but they handle it correctly.)

Here's an example: https://example.com/abcdefgh_

I noticed this problem because I use https://www.npmjs.com/package/react-markdown in a project and it rendered a link in this broken way which did ultimately cause problems for a user. If it is happening to me then I'm sure other people are experiencing it too (perhaps unwittingly since it is kinda rare, 1/64 chance).

If a non-_ character is added at the end then the link is no longer broken. https://example.com/abcdefgh_/

Anyway, I wanted to start a discussion here to raise the problem. Perhaps the react-markdown package can be improved to handle this situation? Are there other libraries and packages that may misbehave in a similar way?

I am going to use a custom alphabet in my project. Although I find that the API for using a custom alphabet is a bit cumbersome. I will open a PR soon with a suggested API change (but I need help to finish it).

Thanks!

@ai
Copy link
Owner

ai commented Feb 10, 2022

Interesting case.

What symbol can we use in return to keep the same size of default alphabet?

@ehoogeveen-medweb
Copy link

I don't think there are any perfect options - the only characters that are unreserved in all contexts for URIs (aside from letters and numbers) are -, ., _ and ~ - and of those, only - stays as part of the link on GitHub (and - is already part of the standard alphabet).

I think the closest would be $ - it's reserved as a sub-delimiter and correspondingly encodeURIComponent will encode it, but it doesn't have any specified special meaning and it's valid for use in path segments (see e.g. section 3.3 of RFC 3986).

Example: https://example.com/abcdefgh$

@linhub15
Copy link

linhub15 commented Apr 24, 2022

We can wrap the link with angle brackets < > to render the URL correctly in Github and probably most other Markdown libraries.

Since Markdown uses underscores to denotes italic text, the parser cannot accurately decide if https://a_ is a link. Markdown parsers are pretty good at guessing that https://a is a link most of the time, so we're conditioned to writing links without the angle brackets.

Underscores are definitely URL safe.
They are included in the list of special characters that can be used without encoding (RFC 1738, Section 2.2).

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

Markdown Specs

Markdown Guide
https://www.markdownguide.org/basic-syntax/#urls-and-email-addresses

CommonMark Spec defines these as "autolinks"
https://spec.commonmark.org/0.30/#autolinks

@CarlosNZ
Copy link

CarlosNZ commented Dec 2, 2022

I just came here to report his exact same thing, so glad to see others have the same problem.

The solution I used is just to check if the last character in the id is _ and replace it with a random alphanumeric value if so.

I realise that nanoid probably shouldn't be making modifications to fix other apps bugs/quirks. However, it would be super handy if nanoid had this option built in as an optional "url-parser safety" parameter.

Obviously it's only a tiny reduction in security (64^21 down to 64^20 * 63 possibilities), so can't see a problem there.

@ai
Copy link
Owner

ai commented Jan 1, 2025

I think the best way to solve it is to add trailing / to the URL in your webpages

https://example.com/abcdefgh_/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants