Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: two-phase parsing #120

Merged
merged 4 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 61 additions & 19 deletions doc/modules/ROOT/pages/design_requirements/parser.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,31 @@ significant computational resources. For example:
chunk directly after the current without having to perform memory movement
due to the existence of a chunk header.

== Two-Phase Parsing

The parser must return immediately after parsing the header and must not process
the body until the next `parse()` call. For bodiless messages and head
responses, it must transition directly to the `complete_in_place` state after
parsing the header, making further `parse()` calls unnecessary (but still
valid).

This two-phase parsing offers several benefits with almost no complications on
the API usage side:

- It provides an optimization opportunity for users who want to attach a body
immediately after parsing the header (which is often the case), as there is no
need to allocate an internal buffer for the message body. This allows all
available space to be used for the input buffer.
- Since parsing the body might result in an error, returning after parsing the
header enables users to access the header and, on the next `parse()` call,
encounter the error.
- Setting the body limit during or after parsing the body doesn’t make much
sense, so returning immediately after parsing the header provides a window for
setting such limits.
- If users attach a body immediately after parsing the header, we avoid the
need for an extra buffer copy operation (in case the user wants to attach an
elastic buffer).

== Use Cases and Interfaces

To keep things simple, we will use the following synchronous free functions to
Expand All @@ -57,38 +82,55 @@ demonstrate the flow of the parse operation in each example:
[source,cpp]
----
void
read_some(stream& s, parser& pr)
read_some(stream& s, parser& pr, error_code& ec)
{
system::error_code ec;
if(pr.need_data())
pr.parse(ec);
if(ec != condition::need_more_input)
return;

auto n = s.read_some(pr.prepare(), ec);
pr.commit(n);
if(ec == asio::error::eof)
{
auto n = s.read_some(pr.prepare(), ec);
pr.commit(n);
if(ec == asio::error::eof)
{
pr.commit_eof();
ec = {};
}
if(ec.failed())
throw system::system_error{ec};
pr.commit_eof();
ec = {};
}
else if(ec.failed())
{
return;
}

pr.parse(ec);
if(ec.failed() && ec != condition::need_more_input)
throw system::system_error{ec};
}

void
read_header(stream& s, parser& pr)
{
while(!pr.got_header())
read_some(s, pr);
do
{
error_code ec;
read_some(s, pr, ec);
if(ec == condition::need_more_input)
continue;
if(ec.failed())
throw system::system_error(ec);
}
while(! pr.got_header());
}

void
read(stream& s, parser& pr)
{
while(!pr.is_complete())
read_some(s, pr);
{
do
{
error_code ec;
read_some(s, pr, ec);
if(ec == condition::need_more_input)
continue;
if(ec.failed())
throw system::system_error(ec);
}
while(! pr.is_complete());
}
----

Expand Down
4 changes: 3 additions & 1 deletion include/boost/http_proto/header_limits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ struct header_limits
@li <a href="https://datatracker.ietf.org/doc/html/rfc9112#section-2.1"
>2.1. Message Format (rfc9112)</a>
@li <a href="https://datatracker.ietf.org/doc/html/rfc9112#section-5"
>5. Field Syntax (rfc9112)</a>
>5. Field Syntax (rfc9112)</a>@see
@li <a href="https://stackoverflow.com/questions/686217/maximum-on-http-header-values"
>Maximum on HTTP header values (Stackoverflow)</a>
*/
std::size_t max_size = 8 * 1024;

Expand Down
Loading
Loading