Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeStats adds comments and doctype to used tags #8396

Closed
davidsneighbour opened this issue Apr 7, 2021 · 3 comments
Closed

writeStats adds comments and doctype to used tags #8396

davidsneighbour opened this issue Apr 7, 2021 · 3 comments

Comments

@davidsneighbour
Copy link
Contributor

❯ hugo version
hugo v0.82.0-9D960784+extended linux/amd64 BuildDate=2021-03-21T17:28:04Z VendorInfo=gohugoio

When setting build > writeStats to true Hugo writes a list of used tags into hugo_stats.json. In my list the first two items appear to be erroneous, a comment start (comment end is missing) and the !doctype which can't be accessed via CSS anyway.

I am only aware of the use of this file for purgecss so it might not be a bug if this is useful for any other use case, but I am failing to find usecases for comment-start and doctype.

If they make sense, then comment-end should be part of the list, so this is a bug in either case ;]

{
  "htmlElements": {
    "tags": [
      "!--",
      "!doctype",
      "a",
      ...
@bep
Copy link
Member

bep commented Apr 7, 2021

I'll reopen this; we should try make this as clean as possible, but there will be false positives.

@bep bep added the Enhancement label Apr 7, 2021
@bep bep added this to the v0.83 milestone Apr 7, 2021
@dirkolbrich
Copy link
Contributor

I narrowed it down to the func (c *cssClassCollectorWriter) insertStandinHTMLElement(el string) (string, string) method within publisher/htmlElementsCollector.go.
In here a <!DOCTYPE html> line is converted to <div html> and a <!-- comment --> line to <div comment -->, which disguises these lines for the next step as a standard html.ElementNode, instead as a html.DoctypeNode or html.CommentNode, which would be discarded.

I have added a simple filter (see branch master...dirkolbrich:writestats-exclude-comments-and-doctype), but I'm stuck with the lowercase <!doctype html> line. If I add this to the filter, a lot of errors for github.com/gohugoio/hugo/hugolib appear and a lot of tests blow up.

// The net/html parser does not handle single table elements as input, e.g. tbody.
// We only care about the element/class/ids, so just store away the original tag name
// and pretend it's a <div>.
func (c *cssClassCollectorWriter) insertStandinHTMLElement(el string) (string, string) {
	tag := el[1:]

	if strings.HasPrefix(tag, "!DOCTYPE") ||
		// if activating this line, tests for github.com/gohugoio/hugo/hugolib blow up
		// if leaving it out, "!doctype" (lowercase) gets written to publisher.HTMLElements.Tags
		// strings.HasPrefix(tag, "!doctype") ||
		strings.HasPrefix(tag, "!--") {
		return el, ""
	}

	spacei := strings.Index(tag, " ")
	if spacei != -1 {
		tag = tag[:spacei]
	}
	tag = strings.Trim(tag, "\n ")
	newv := strings.Replace(el, tag, "div", 1)
	return newv, strings.ToLower(tag)
}
> hugo version
hugo v0.82.0+extended darwin/amd64 BuildDate=unknown

dirkolbrich added a commit to dirkolbrich/hugo that referenced this issue Apr 17, 2021
- Reorder code blocks
- Rename cssClassCollectorWriter to htmlElementCollectorWriter, as it just collect html element information
- Expand benchmark to test for minified and unminified content

Fixes gohugoio#8396, Fixes gohugoio#8417
dirkolbrich added a commit to dirkolbrich/hugo that referenced this issue Apr 20, 2021
- Reorder code blocks
- Rename cssClassCollectorWriter to htmlElementCollectorWriter, as it just collect html element information
- Expand benchmark to test for minified and unminified content

Fixes gohugoio#8396, Fixes gohugoio#8417
@bep bep closed this as completed in bc80022 Apr 20, 2021
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.