Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use the implemented interface for element/attributes #147

Closed
mozfreddyb opened this issue Jun 10, 2022 · 10 comments · Fixed by #208
Closed

Do not use the implemented interface for element/attributes #147

mozfreddyb opened this issue Jun 10, 2022 · 10 comments · Fixed by #208
Milestone

Comments

@mozfreddyb
Copy link
Collaborator

Instead of talking about interfaces, we better talk about the element name directly.
Notably, HTML elements like blink and applet both implement the HTMLUnknownElement interface, yet we'd probably remove the former and keep the latter. (Even though applet is specified not to do anything, I'd prefer we don't allow it)

@otherdaniel
Copy link
Collaborator

This makes total sense. The problem I had is that I didn't have a proper classification I could adopt, or derive ours from. IMHO, we need to categorize elements and attributes into (at least) 3 classes each: known-good; known-bad; other. If we maintain explicit lists for the first two (or derive them from the HTML spec), we should be good.

It might be slightly better if we pick 4 classes: known-good ( == default-listed); known-maybe ( == not default-allowed, but permissible); known-bad ( == not in basedline == not permissable); and other.

Maybe a good transition strategy would be to define these name sets manually in an appendix, base the remaining spec on them, and then transition to deriving those same sets from the HTML spec.

I guess we'll have to come up with proper names, though. I'm pretty sure "known-good", "known-maybe" and "known-bad" won't be agreeable. :-)

@mozfreddyb
Copy link
Collaborator Author

Yeah, that seems fine in terms of next-steps. IIRC @annevk suggested we won't necessarily derive those sets from the HTML spec immediately but could do self-serving lists as part of the Sanitizer API at first.

@annevk
Copy link
Collaborator

annevk commented Jun 15, 2022

@domenic might have an alternative idea here, but I think new static lists is the way to go for now. And then later we can figure out if some of that information can be deduced from other sources or should be listed as part of the definition of an element.

@koto
Copy link

koto commented Jun 15, 2022

We used [StringContext] to enumerate the XSSy attributes in TT, with a JS API for querying.

@evilpie
Copy link

evilpie commented Aug 1, 2022

It's starting to look like we need some kind of resolution here for me to finish the new implementation of the Sanitizer API for Firefox that closely follows the spec.

It doesn't seem like Gecko has any central list of supported attributes, so implementing the current spec is rather harder.

Aside: What I do find interesting is that the list of allowed HTML attributes (plus a few specials ones like URLs) in the current sanitizer is maybe half as long as the current baseline list of the Sanitizer API.

@otherdaniel
Copy link
Collaborator

We (Freddy + myself) discussed this offline. The gist is:

  • Conceptually, the Sanitizer requires three lists: known element/attributes; default element/attributes; baseline elements/attributes.
  • Two of those - baseline + default - are explicit in the spec. The third - known element/attributes - is "hidden" in the element kind / attribute kind definitions. We should re-word the spec to be more clear about this.
  • Conceptually, the known element/attributes "list" is contained in the HTML spec, but unfortunately not in the shape of an actual list that one could somehow reference. It's not entirely clear how to deal with this. One option would be to just maintain an additional list in the Sanitizer for now; but that risks getting out of date.
  • The known element/attributes lists are important to handle the case of allowUnknownMarkup, since unknown markup cannot be tested against the baseline list - it can't be in there - but we still want baseline checks with allowUnknownMarkup against known elements, because otherwise we'd violate the one hard guarantee the Sanitizer wants to give, namely to not contain script-y content.

@otherdaniel
Copy link
Collaborator

I've sent a first PR to resolve the more trivial instances of this in "handling funky elements", #173.

I'm preparing a second, more elaborate one that puts a set of lists for default/baseline/known elements/attributes into the repo, plus a script to assemble appropriate lists for Appendix A. That should allow us to

  • more easily reason about what we're specifying (e.g., we can just diff "known" and "baseline" elements),
  • write this spec almost entirely in terms of list membership,
  • easily adapt this to whatever namespace representation we pick by changing the script.

And if some day the HTML spec can export lists of spec-defined elements/attributes, we should be able to hook those up quite easily, too.

@annevk
Copy link
Collaborator

annevk commented Sep 8, 2022

Bit confused by the last paragraph. The idea is still to upstream most of this to HTML, right?

@mozfreddyb
Copy link
Collaborator Author

Yeah, I believe we agreed that all "lists" would be part of the Sanitizer API at first, because HTML has no "these are all the elements we know of" list that one could easily refer to (and similar for a attributes).

@annevk
Copy link
Collaborator

annevk commented Sep 9, 2022

Right and then the list would be moved over to the HTML Standard upon integration. (And we'd add a note to the documentation for updating elements that this list might need to be changed as well.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants