Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: Tokenizer - incorrect handling of attribute states when U+002F SOLIDUS (/) is encountered #63402

Closed
maciekmm opened this issue Oct 5, 2023 · 3 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@maciekmm
Copy link

maciekmm commented Oct 5, 2023

What version of Go are you using (go version)?

$ go version
go version go1.21.1 linux/amd64
golang.org/x/net v0.15.0

Does this issue reproduce with the latest release?

Yes, golang.org/x/net v0.15.0 is the latest version

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/maciekmm/.cache/go-build'
GOENV='/home/maciekmm/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/maciekmm/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/maciekmm/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.1'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2511272611=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Calling the Tokenizer with HTML element containing SOLIDUS (/) in the attribute name results in incorrect tokenization.

This is due to violation of the following rules in the WHATWG spec:

Test cases:
https://go.dev/play/p/ne5aV9XWVBd

What did you expect to see?

I expected to have the HTML code with attributes containing the solidus / character tokenized correctly with following inputs:

	{
		"forward slash before attribute name",
		`<p/=">`,
		`<p ="="">`,
	},
	{
		"forward slash before attribute name with spaces around",
		`<p / =">`,
		`<p ="="">`,
	},
	{
		"forward slash in the attribute name followed by character",
		`<p a/ ="">`,
		`<p a="" =""="">`,
	}

What did you see instead?

<p/="> -> EOF
<p / ="> -> EOF
<p a/ =""> -> <p a="">

@gopherbot gopherbot added this to the Unreleased milestone Oct 5, 2023
@maciekmm maciekmm changed the title x/net/html: incorrect handling of attribute states when U+002F SOLIDUS (/) is encountered x/net/html: Tokenizer - incorrect handling of attribute states when U+002F SOLIDUS (/) is encountered Oct 5, 2023
@prattmic
Copy link
Member

prattmic commented Oct 6, 2023

cc @neild

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 6, 2023
maciekmm added a commit to maciekmm/net that referenced this issue Oct 7, 2023
Calling the Tokenizer with HTML element containing SOLIDUS (/) in the attribute name results in incorrect tokenization.

This is due to violation of the following rule transitions in the WHATWG spec:
- https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state (we are not reconsuming the character if '/' is encountered)
- https://html.spec.whatwg.org/multipage/parsing.html#after-attribute-name-state (we are not switching to self closing state)

Fixes golang/go#63402

Signed-off-by: Maciej Mionskowski <[email protected]>
maciekmm added a commit to maciekmm/net that referenced this issue Oct 7, 2023
Calling the Tokenizer with HTML element containing SOLIDUS (/) in the attribute name results in incorrect tokenization.

This is due to violation of the following rule transitions in the WHATWG spec:
- https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state (we are not reconsuming the character if '/' is encountered)
- https://html.spec.whatwg.org/multipage/parsing.html#after-attribute-name-state (we are not switching to self closing state)

Fixes golang/go#63402

Signed-off-by: Maciej Mionskowski <[email protected]>
maciekmm added a commit to maciekmm/net that referenced this issue Oct 7, 2023
Calling the Tokenizer with HTML element containing SOLIDUS (/) in the attribute name results in incorrect tokenization.

This is due to violation of the following rule transitions in the WHATWG spec:
- https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state (we are not reconsuming the character if '/' is encountered)
- https://html.spec.whatwg.org/multipage/parsing.html#after-attribute-name-state (we are not switching to self closing state)

Fixes golang/go#63402

Signed-off-by: Maciej Mionskowski <[email protected]>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/533518 mentions this issue: html: fix SOLIDUS '/' handling in attribute parsing

@maciekmm
Copy link
Author

maciekmm commented Feb 7, 2024

Hi, @neild, @prattmic
any idea when this is going to be merged? It's been in "ready to submit" state for months.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Feb 13, 2024
# 0.10.0

Adds support for various LevelDB formats (thanks @mikez) and Garmin Flexible and Interoperable Data Transfer format (FIT) (thanks @mlofjard).

And as usual some small fixes and dependency updates.

## Changes

- On macOS fq now reads init script from `~/.config/fq` in addition to `~/Library/Application Support/fq`. #871
- Switch readline module from own fork to https://github.com/ergochat/readline #854
- Updated gojq fork. Notable changes from upstream below. #844
  - Fix pre-defined variables to be available in initial modules
  - Fix object construction with duplicate keys

## Format changes

- `aac_frame` Decode instance tag and common window flag. #859
- `fit` Add support for Garmin Flexible and Interoperable Data Transfer decoder. Thanks @mlofjard #874
  - This format is used by various GPS tracking devices to record location, speed etc.
  - Example of converting position information to KML:
  ```jq
  # to_kml.jq
  # convert locations in a fit structure to KML
  def to_kml:
    ( [ .data_records[].data_message as {position_lat: $lat, position_long: $long}
      | select($lat and $long)
      | [$long, $lat, 0]
      | join(",")
      ]
    | join(". ")
    | { kml: {"@xmlns":"http://earth.google.com/kml/2.0"
      , Document: {Placemark: {LineString: {coordinates: .}}}}
      }
    | to_xml({indent:2})
    );
  ```
  Usage:
  ```sh
  # -L to add current directory to library path
  # -r for raw string output
  # 'include "to_ml";' to include to_kml.jq
  # to_kml calls to_kml function
  $ fq -L . -r 'include "to_kml"; to_kml' file.fit > file.kml
  ```


- `hevc_sps` Fix some incorrect profile_tier_level decoding. #829
- `html` Fix issue parsing elements including SOLIDUS "/". #870
  - Upstream issue golang/go#63402
- `mpeg_es` Support ES_ID_Inc and decode descriptors for IOD tags
- `leveldb_descriptor`, `leveldb_log`, `leveldb_table` Add support for LevelDB. Thanks @mikez #824
  - This format is used by many database backends and applications like Google chrome.
- `pcapng` Decode all section headers instead of just the first. #860
- `png` Fix incorrect decoding of type flags. #847
- `hevc_sps` Fix incorrect decoding of profile_tier_level. #829
- `tls` Fix field name typos. #839
- `mp4`
  - Don't try decode samples for a track that has an external reference. #834
  - Use box structure instead of track id to keep track for sample table data. #833
  - `ctts` box v0 sample offset seems to be signed in practice but not in spec. #832
- `webp` Decode width, height and flags for lossless WebP. #857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants