Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo validation errors in the IPv4 parser #739

Merged
merged 2 commits into from
Jan 20, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 28 additions & 22 deletions url.bs
Original file line number Diff line number Diff line change
Expand Up @@ -413,7 +413,8 @@ point <a for=/>URLs</a> from <var>A</var> can come from untrusted sources.

<div class=example id=example-host-parsing>
<p>A <a lt="host parser">parse</a>-<a lt="host serializer">serialize</a> roundtrip gives the
following results, depending on the <var>isNotSpecial</var> argument to the <a>host parser</a>:
following results, depending on the <var ignore>isNotSpecial</var> argument to the
<a>host parser</a>:

<table>
<tr>
Expand Down Expand Up @@ -731,9 +732,10 @@ to be distinguished.

<h3 id=host-parsing>Host parsing</h3>

<div algorithm>
<p>The <dfn export id=concept-host-parser lt="host parser|host parsing">host parser</dfn> takes a
<a>scalar value string</a> <var>input</var> with an optional boolean <var>isNotSpecial</var>
(default false), and then runs these steps:
(default false), and then runs these steps. They return failure or a <a for=/>host</a>.

<ol>
<li>
Expand Down Expand Up @@ -771,11 +773,13 @@ to be distinguished.

<li><p>Return <var>asciiDomain</var>.
</ol>
</div>

<hr>

<div algorithm>
<p>The <dfn>ends in a number checker</dfn> takes an <a>ASCII string</a> <var>input</var> and then
runs these steps:
runs these steps. They return a boolean.

<ol>
<li><p>Let <var>parts</var> be the result of <a>strictly splitting</a> <var>input</var> on
Expand Down Expand Up @@ -807,26 +811,24 @@ runs these steps:

<li><p>Return false.
</ol>
</div>

<div algorithm>
<p>The <dfn id=concept-ipv4-parser>IPv4 parser</dfn> takes an <a>ASCII string</a> <var>input</var>
and then runs these steps:
and then runs these steps. They return failure or an <a for=/>IPv4 address</a>.

<ol>
<li>
<p>Let <var>validationError</var> be false.

<p class=note>This uses <var>validationError</var> to track <a>validation errors</a> to avoid
reporting them before we are confident we want to parse <var>input</var> as an IPv4 address as the
<a>host parser</a> almost always invokes the <a>IPv4 parser</a>.
<p class=note>The <a for=/>IPv4 parser</a> is not to be invoked directly. Instead check that the
return value of the <a for=/>host parser</a> is an <a for=/>IPv4 address</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why IPv4 and IPv6 would be different...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host parser does percent-decoding and ToASCII before determining if something is an IPv4 address. The only thing it does before IPv6 is removing square brackets. In practice though I think people should always use the host parser and therefore neither the IPv4 nor the IPv6 parser are exported.


<ol>
<li><p>Let <var>parts</var> be the result of <a>strictly splitting</a> <var>input</var> on
U+002E (.).

<li>
<p>If the last <a for=list>item</a> in <var>parts</var> is the empty string, then:

<ol>
<li><p>Set <var>validationError</var> to true.
<li><p><a>Validation error</a>.

<li><p>If <var>parts</var>'s <a for=list>size</a> is greater than 1, then <a for=list>remove</a>
the last <a for=list>item</a> from <var>parts</var>.
Expand All @@ -848,18 +850,11 @@ and then runs these steps:

<li><p>If <var>result</var> is failure, <a>validation error</a>, return failure.

<li><p>If <var>result</var>[1] is true, then set <var>validationError</var> to true.
<li><p>If <var>result</var>[1] is true, <a>validation error</a>.

<li><p><a for=list>Append</a> <var>result</var>[0] to <var>numbers</var>.
</ol>

<li>
<p>If <var>validationError</var> is true, <a>validation error</a>.

<p class="note">At this point each part was parsed into a number and <var>input</var> will be
treated as an IPv4 address (or failure). And therefore error reporting resumes.
</li>

<li><p>If any item in <var>numbers</var> is greater than 255, <a>validation error</a>.

<li><p>If any but the last <a for=list>item</a> in <var>numbers</var> is greater than 255, then
Expand Down Expand Up @@ -887,7 +882,9 @@ and then runs these steps:

<li><p>Return <var>ipv4</var>.
</ol>
</div>

<div algorithm>
<p>The <dfn>IPv4 number parser</dfn> takes an <a>ASCII string</a> <var>input</var> and then runs
these steps:

Expand Down Expand Up @@ -938,11 +935,16 @@ these steps:

<li><p>Return (<var>output</var>, <var>validationError</var>).
</ol>
</div>

<hr>

<div algorithm>
<p>The <dfn id=concept-ipv6-parser>IPv6 parser</dfn> takes a <a>scalar value string</a>
<var>input</var> and then runs these steps:
<var>input</var> and then runs these steps. They return failure or an <a for=/>IPv6 address</a>.

<p class=note>The <a for=/>IPv6 parser</a> could in theory be invoked directly, but please discuss
actually doing that with the editors of this document first.

<ol>
<li><p>Let <var>address</var> be a new <a>IPv6 address</a> whose <a>IPv6 pieces</a> are all 0.
Expand Down Expand Up @@ -1088,11 +1090,14 @@ these steps:

<li><p>Return <var>address</var>.
</ol>
</div>

<hr>

<div algorithm>
<p>The <dfn export id=concept-opaque-host-parser>opaque-host parser</dfn> takes a
<a>scalar value string</a> <var>input</var>, and then runs these steps:
<a>scalar value string</a> <var>input</var>, and then runs these steps. They return failure or an
<a for=/>opaque host</a>.

<ol>
<li><p>If <var>input</var> contains a <a>forbidden host code point</a>,
Expand All @@ -1107,6 +1112,7 @@ these steps:
<li><p>Return the result of running <a for=string>UTF-8 percent-encode</a> on <var>input</var>
using the <a>C0 control percent-encode set</a>.
</ol>
</div>


<h3 id=host-serializing>Host serializing</h3>
Expand Down