Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap out simple-xss for sanitize-html #190

Merged
merged 7 commits into from
Jun 21, 2014

Conversation

Martii
Copy link
Member

@Martii Martii commented Jun 20, 2014

PR is NOT COMPLETE... better than what we have and also may have future dependencies
Advantages:

  • Works unlike simple-xss
  • Lots of maintainers/contributors
  • Smaller than simple-xss
  • Correctly allows for sanitizing text first then applying markdown
  • Optional list is included for current defaults... this can be removed but I would feel safer matching it in case something comes down the line we don't like.
  • Actually sanitizes and doesn't just escape to HTML Entities
  • Does filter out javascript: items just like simple-xss

npm homepage
gh homepage

Related:

Tested in dev on this page


Additional bug report(s):


MISSING:

  • Rest of GH supported attributes on allowed tags... TRACKING UPSTREAM
  • Showing raw HTML support in md code blocks
  • Resolves html in markup code blocks

Refs:

Advantages:
* **Works** unlike simple-xss
* Lots of maintainers/contributors
* Smaller than simple-xss
* Correctly allows sanitizing text **first** then applying markdown
* Optional list is included for current defaults... this **can be removed** but I would feel safer matching it in case something comes down the line we don't like.
* Does filter out `javascript:` items just like xss

[npm homepage](https://www.npmjs.org/package/sanitize-html)
[gh homepage](https://github.com/punkave/sanitize-html)

Related:
* OpenUserJS#168
* OpenUserJS#125

Tested in dev on [this page](http://localhost:8080/scripts/marti/httplocalhost.localdomain/RFC_2606%C2%A73_-_license_and_licence_Unit_Test)
@Martii Martii added bug and removed expedite labels Jun 20, 2014
@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

Everyone please let me know if you want any default tags/attributes removed or added... I personally have never heard of the nl tag but it's in their default list.

The goal here is to encourage markdown but still allow some markup since md does have some disadvantages.


Does have one issue with highlighting but so does simple-xss... working on this.

// *********************
// This code is fenced via gfm and highlighted?
// *********************

var hello = "world";
<html>
<head></head>
<body>
</body>
</html>

@Martii Martii self-assigned this Jun 20, 2014
* Swap to markdown first then sanitize-html to allow html code blocks
* Allow img tag ... this deviates from their defaults so we need to define our own.
@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

hmmm don't think that last commit is going to generate the DOM correctly if and when highlighting gets fixed.

@sizzlemctwizzle
Copy link
Member

please let me know if you want any default tags/attributes removed or added

For compatibility, use the settings GitHub uses.

@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

@sizzlemctwizzle
Thanks... I think we need a markdown that has a callback sanitizing instead of just a boolean... and somethings wrong with highlighting even on pro now... also if I backout simple-xss altogether it still doesn't highlight.

@sizzlemctwizzle
Copy link
Member

I think we need a markdown

We don't have anything else (but I could always fork it again). What was the problem with running sanitize before the markdown renderer?

@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

We don't have anything else. What was the problem with running sanitize before the markdown renderer?

HTML as denoted in the commit message. This is why I never use innerHTML unless I have to in code. Somehow GH is doing it but they probably have it all integrated into one object.

…ilities even though it doesn't currently work
@sizzlemctwizzle
Copy link
Member

HTML as denoted in the commit message.

What happens to the HTML in the code blocks? My working tree is dirty so I can't test right now.

@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

What happens to the HTML in the code blocks?

Totally stripped in code fences and I haven't tried inside a JavaScript block yet... I suppose we could tell them to use pre and code tags and go with intended behavior.


Do you really want to support githubs schemes? e.g. github-windows:// and github-mac://

…) stuff.

* Still need to see if sanitize-html has a wildcard
@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

Still need to see if sanitize-html has a wildcard

See also:

blehh this is going to be lengthy.

@sizzlemctwizzle
Copy link
Member

Do you really want to support githubs schemes?

No.

Totally stripped in code fences

Which it should. This might help if added to markdown.js

function escape(html, encode) {
  return html
    .replace(!encode ? /&(?!#?\w+;)/g : /&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

renderer.code = function (code, lang) {
  code = this.options.highlight(escape(code, true), lang);

  if (!lang) {
    return '<pre><code>' + code + '\n</code></pre>';
  }

  return '<pre><code class="'
    + this.options.langPrefix
    + escape(lang, true)
    + '">'
    + code
    + '\n</code></pre>\n';
};

Edit: Nvm the code blocks need be escaped before or during HTML sanitization.

@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

EDIT:

Which it should. This might help if added to markdown.js

That makes JavaScript code fences look like this:

var hello = &quot;world&quot;;

instead of:

var hello = "world";

I'm running out of steam for the night but I do want to follow up with this when I get back from the weekend. This should give some time for sanitize-html maintainers to answer a question too. hopefully


Nvm the code blocks need be escaped before or during HTML sanitization.

This might be what is fubarring the highlighting btw. I noticed a &gt; with inspector and I doubt our highlighter handles that. IDKY

@Martii Martii added the sooner label Jun 20, 2014
@sizzlemctwizzle
Copy link
Member

Essentially we've ran into a catch-22. If we sanitize before we render, we strip html inside markdown code blocks (plus the sanitizer is meant for html, not markdown). But if we sanitize after, we lose the highlighted code because of the class attribute.

I noticed a &gt; with inspector and I doubt our highlighter handles that.

It does actually (check the page source to see). We're supposed to escape it before we send it to the highlighter.

As far as I can tell, the only solution: markdown (w/ escape code blocks) -> sanitize -> highlight. This might mean that we need to give the markdown library a sanitizer option that is a callback to use an external sanitization function rather than just the "all or nothing" html escape that is used when sanitize: true.

@Martii
Copy link
Member Author

Martii commented Jun 20, 2014

option that is a callback to use an external sanitization function

Possibly two pass as well.

@sizzlemctwizzle sizzlemctwizzle merged commit 41c359f into OpenUserJS:master Jun 21, 2014
@sizzlemctwizzle
Copy link
Member

I fixed it. I should have spent more time trying to understand our markdown renderer.

@sizzlemctwizzle
Copy link
Member

Okay, so ended up having to individually sanitize the output of the various block level markdown renders (obviously omitting the code one) and now everything seems to work correctly.

@Zren
Copy link
Contributor

Zren commented Jun 21, 2014

tables + images are broken.

https://openuserjs.org/discuss/The_new_layout#comment-146bdc24a14

@Martii
Copy link
Member Author

Martii commented Jun 21, 2014

@sizzlemctwizzle
Btw I'm still waiting on an answer from up here from those guys so this pr is partially incomplete at the moment even though it's merged and deployed. If you are feeling adventorous you can add all the attributes in per tag name into the .json file but I thought I'd wait and see first rather than a ton of typing with a ton of copy/pasting... and having another pr/commit. :) Zren also made #192 which I adjusted and explained how to fix for his comment up here currently right before this comment (which is the same as #192).

@Martii Martii removed the sooner label Jun 21, 2014
@sizzlemctwizzle
Copy link
Member

Btw I'm still waiting on an answer from up here

I've looked at their code and the answer is "no, there isn't". But I've identified the code that could be modified to add the wildcard, so I'll probably submit a PR sometime. Until then, I've added some code to whitelist a list of attributes for all whitelisted tags.

If you are feeling adventurous you can add all the attributes in per tag name into the .json file

No need to do this by hand.

@Martii Martii removed their assignment Jul 3, 2014
@Martii Martii deleted the fixXSSandMD branch July 7, 2014 09:24
@Martii Martii removed the tracking upstream Waiting, watching, wanting. label Feb 7, 2015
This was referenced Aug 25, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug You've guessed it... this means a bug is reported.
Development

Successfully merging this pull request may close these issues.

3 participants