Skip to content

Commit

Permalink
remove all non-word characters from heading id's. see #181.
Browse files Browse the repository at this point in the history
  • Loading branch information
chjj committed Aug 4, 2013
1 parent 6dca44b commit 75dff71
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 6 deletions.
2 changes: 1 addition & 1 deletion lib/marked.js
Original file line number Diff line number Diff line change
Expand Up @@ -838,7 +838,7 @@ Parser.prototype.tok = function() {
return '<h'
+ this.token.depth
+ ' id="'
+ this.token.text.toLowerCase().replace(/\s/g, '-')
+ this.token.text.toLowerCase().replace(/[^\w]+/g, '-')

This comment has been minimized.

Copy link
@jasonkarns

jasonkarns Aug 5, 2013

Contributor

Why the change? Any non-whitespace character is a valid ID in the HTML5 spec.

If the goal is for HTML4 compliance (which isn't really necessary because the HTML5 behavior is already browser compatible), then we need to also be ensuring the first character is not numeric.

This comment has been minimized.

Copy link
@chjj

chjj Aug 5, 2013

Author Member

The goal is to not be messy.

Anyway, we can change it, but if we're going to do all non-whitespace characters, I'm going to add a + quantifier and escape the text for html. The original implementation was slightly flawed in that regard. A single double quote would break the markup.

This comment has been minimized.

Copy link
@jasonkarns

jasonkarns via email Aug 5, 2013

Contributor

This comment has been minimized.

Copy link
@chjj

chjj Aug 6, 2013

Author Member

I think using \w is pretty simple. It's just [a-z0-9_]. It's not even that hard to guess what a "word" character might be if you are unfamiliar with regexes. We can document it in the readme.

One thing I'm also thinking about is the possibility of collisions. Say someone had a simple one-word heading. It's likely it may collide with some other id which is not part of the content. I'm thinking of a few different possibilities to solve this and some other things:

  • Make header id's optional.
  • Add a headerPrefix option, similar to langPrefix.
  • Possibly increment a counter on the parser object (should only be a one-line addition) to prefix the id's with (e.g. id="1-hello-world"). I suppose this would add complexity for the author to think about, but would it be that hard to count how many headers there are?
  • Possibly switch back to using [^\s], but we'll have to escape it.

This comment has been minimized.

Copy link
@walterdavis

walterdavis via email Aug 6, 2013

This comment has been minimized.

Copy link
@jasonkarns

jasonkarns Aug 7, 2013

Contributor

I think using \w is pretty simple. It's just [a-z0-9_]. It's not even that hard to guess what a "word" character might be if you are unfamiliar with regexes. We can document it in the readme.

Agreed. But if we are replacing non-word characters, then the character that we replace them with should actually be a word character. So we should use replace(/[^\w]+/g, '_') instead of replace(/[^\w]+/g, '-'); that is, underscore instead of hyphen. Because there would be some strange cognitive dissonance if the character that replaces non-word characters is itself not a word character.

This comment has been minimized.

Copy link
@chjj

chjj Aug 8, 2013

Author Member

@jasonkarns, good point. Taking that into account, I think I'd rather do [^\w\-]+.

@walterdavis, as @jasonkarns pointed out, any non-whitespace character is conforming for @id in html5. html5 is what marked cares about, and as far as browsers go, they don't validate markup, so it doesn't really matter. I'm guessing every browser made in the html4 era can successfully render a page containing an id starting with a number.

This comment has been minimized.

Copy link
@justan

justan Aug 12, 2013

How about encodeURIComponent ?
Just like github did.

+ '">'
+ this.inline.output(this.token.text)
+ '</h'
Expand Down
2 changes: 1 addition & 1 deletion test/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,7 @@ function fix(options) {
.replace(/&lt;/g, '<')
.replace(/&amp;/g, '&');

id = id.toLowerCase().replace(/\s/g, '-');
id = id.toLowerCase().replace(/[^\w]+/g, '-');

return '<' + h + ' id="' + id + '">' + text + '</' + h + '>';
});
Expand Down
6 changes: 3 additions & 3 deletions test/tests/markdown_documentation_basics.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<h1 id="markdown:-basics">Markdown: Basics</h1>
<h1 id="markdown-basics">Markdown: Basics</h1>

<ul id="ProjectSubmenu">
<li><a href="/projects/markdown/" title="Markdown Project Page">Main</a></li>
Expand All @@ -8,7 +8,7 @@ <h1 id="markdown:-basics">Markdown: Basics</h1>
<li><a href="/projects/markdown/dingus" title="Online Markdown Web Form">Dingus</a></li>
</ul>

<h2 id="getting-the-gist-of-markdown's-formatting-syntax">Getting the Gist of Markdown&#39;s Formatting Syntax</h2>
<h2 id="getting-the-gist-of-markdown-s-formatting-syntax">Getting the Gist of Markdown&#39;s Formatting Syntax</h2>

<p>This page offers a brief overview of what it&#39;s like to use Markdown.
The <a href="/projects/markdown/syntax" title="Markdown Syntax">syntax page</a> provides complete, detailed documentation for
Expand All @@ -24,7 +24,7 @@ <h2 id="getting-the-gist-of-markdown's-formatting-syntax">Getting the Gist of Ma
<p><strong>Note:</strong> This document is itself written using Markdown; you
can <a href="/projects/markdown/basics.text">see the source for it by adding &#39;.text&#39; to the URL</a>.</p>

<h2 id="paragraphs,-headers,-blockquotes">Paragraphs, Headers, Blockquotes</h2>
<h2 id="paragraphs-headers-blockquotes">Paragraphs, Headers, Blockquotes</h2>

<p>A paragraph is simply one or more consecutive lines of text, separated
by one or more blank lines. (A blank line is any line that looks like a
Expand Down
2 changes: 1 addition & 1 deletion test/tests/markdown_documentation_syntax.html
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<h1 id="markdown:-syntax">Markdown: Syntax</h1>
<h1 id="markdown-syntax">Markdown: Syntax</h1>

<ul id="ProjectSubmenu">
<li><a href="/projects/markdown/" title="Markdown Project Page">Main</a></li>
Expand Down

0 comments on commit 75dff71

Please sign in to comment.