Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize Content-Type for multipart/form-data requests #1139

Merged
merged 1 commit into from
Jan 17, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions etc/tempesta_fw.conf
Original file line number Diff line number Diff line change
Expand Up @@ -656,8 +656,8 @@
# from HTTP tables (see 'http_chain' directive).
#
# <directive> is one of 'location', 'proxy_pass', 'cache_bypass', 'cache_fulfill',
# 'nonidempotent' or 'hdr_add' directives (see the corresponding directives'
# description).
# 'nonidempotent', 'hdr_add' or 'http_post_validate' directives (see the
# corresponding directives' description).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra asterisk after directives

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that was a "possessive 's". Since "directives" is plural, "'s" reduces to a sole apostrophe.

#
# Example:
# vhost app {
Expand Down Expand Up @@ -693,7 +693,7 @@
# <OP> is a match operator, one of 'eq', 'prefix', 'suffix', or '*'.
# <string> is a verbatim string matched against URL in a request.
# <directive> is one of 'proxy_pass', 'cache_bypass', 'cache_fulfill',
# 'nonidempotent', 'hdr_add' or Frang limit directives.
# 'nonidempotent', 'hdr_add', 'http_post_validate' or Frang limit directives.
#
# Default:
# None.
Expand Down Expand Up @@ -894,6 +894,20 @@
# /etc/ddos_redirect.html
#

# TAG: http_post_validate
#
# Validate POST requests.
# Parses Content-Type header field, and rewrites it for multipart/form-data type
# of payload, to prevent evasion attacks. All parameters other than "boundary"
# are removed.
#
# Syntax:
# http_post_validate
#
# Default:
# Validation is disabled.
#

#
# Frang configuration.
#
Expand Down
61 changes: 61 additions & 0 deletions tempesta_fw/http.c
Original file line number Diff line number Diff line change
Expand Up @@ -2114,6 +2114,58 @@ tfw_http_set_loc_hdrs(TfwHttpMsg *hm, TfwHttpReq *req)
return 0;
}

/**
* Compose Content-Type header field from scratch.
*
* A POST-request with multipart/form-data payload need a boundary, which is
* supplied by a parameter in the Content-Type header field. There are strict
* instructions on how to parse that parameter (see RFC 7231 and RFC 7230), but
* application servers parse it in a non-standard way. For example, PHP checks
* whenever parameter name contains substring "boundary", and thus happily takes
* "xxboundaryxx". Such quirks are used to bypass web application firewalls.
*
* To make evasions harder, this function composes value of the Content-Type
* field from the parsed data. All parameters other than "boundary" are dropped.
*/
static int
tfw_http_recreate_content_type_multipart_hdr(TfwHttpReq *req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR has a good comment, why the header is recreated from parsed value. Let's keep it with the function, since the backend compatibility issue is not a trivial case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to the function. Text is mostly from the pull request descriptions, with references to the pull request removed.

{
TfwStr replacement = {
.ptr = (TfwStr []) {
TFW_STR_FROM("Content-Type"),
TFW_STR_FROM(": "),
TFW_STR_FROM("multipart/form-data; boundary="),
req->multipart_boundary_raw,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req->multipart_boundary_raw can't be compound string according to the usage. This should be asserted in unit tests, but i didn't found such check. But as I see usage of the variable in the http_parser, it actually is compound.

Copy link
Contributor Author

@i-rinat i-rinat Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, req->multipart_boundary_raw can be a compound string, and I totally forgot about that. Fixed by explicitly handling both cases of compound and plain types of req->multipart_boundary_raw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved back to the code version that expected req->multipart_boundary_raw to be a single-chunk string.

Since this data is then used to recreate Content-Type field of a request header, there is a huge possibility that buffer containing boundary string will overlap with the buffer it is copied into. Handling all the special cases looks infeasible at the moment, and the request parser was changed to make a copy of req->multipart_boundary_raw. So it's always linear at this point.

},
.flags = 4 << TFW_STR_CN_SHIFT,
};
TfwStr *c = replacement.ptr;

BUG_ON(!TFW_STR_PLAIN(&req->multipart_boundary_raw));
replacement.len = c[0].len + c[1].len + c[2].len + c[3].len;
return tfw_http_msg_hdr_xfrm_str((TfwHttpMsg *)req, &replacement,
TFW_HTTP_HDR_CONTENT_TYPE, false);
}

static bool
tfw_http_should_validate_post_req(TfwHttpReq *req)
{
if (req->location && req->location->validate_post_req)
return true;

if (!req->vhost)
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of this function (as well as the whole tfw_http_adjust_req() procedure) implies that req->vhost should be already known at the moment the function is used. If req->vhost == NULL it means that the function is used in wrong place. So, it should be better to use

if (WARN_ON_ONCE(!req->vhost))
	return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, if req->vhost is NULL, this function is not called at all. Added WARN_ON_ONCE in #1154.


if (req->vhost->loc_dflt && req->vhost->loc_dflt->validate_post_req)
return true;

if (req->vhost->vhost_dflt &&
req->vhost->vhost_dflt->loc_dflt->validate_post_req)
return true;

return false;
}

/**
* Adjust the request before proxying it to real server.
*/
Expand Down Expand Up @@ -2143,6 +2195,15 @@ tfw_http_adjust_req(TfwHttpReq *req)
if (r < 0)
return r;

if (req->method == TFW_HTTP_METH_POST &&
test_bit(TFW_HTTP_B_CT_MULTIPART, req->flags) &&
tfw_http_should_validate_post_req(req))
{
r = tfw_http_recreate_content_type_multipart_hdr(req);
if (r)
return r;
}

return tfw_http_set_hdr_connection(hm, BIT(TFW_HTTP_B_CONN_KA));
}

Expand Down
8 changes: 8 additions & 0 deletions tempesta_fw/http.h
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,10 @@ enum {
TFW_HTTP_B_CONN_EXTRA,
/* Chunked transfer encoding. */
TFW_HTTP_B_CHUNKED,
/* Media type is multipart/form-data. */
TFW_HTTP_B_CT_MULTIPART,
/* Multipart/form-data request have boundary parameter. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have -> has

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #1154. Thanks!

TFW_HTTP_B_CT_MULTIPART_HAS_BOUNDARY,
/* Singular header presents more than once. */
TFW_HTTP_B_FIELD_DUPENTRY,

Expand Down Expand Up @@ -369,6 +373,8 @@ typedef struct {
* @host - host in URI, may differ from Host header;
* @uri_path - path + query + fragment from URI (RFC3986.3);
* @mark - special hash mark for redirects handling in session module;
* @multipart_boundary_raw - multipart boundary as is, maybe with escaped chars;
* @multipart_boundary - decoded multipart boundary;
* @fwd_list - member in the queue of forwarded/backlogged requests;
* @nip_list - member in the queue of non-idempotent requests;
* @method - HTTP request method, one of GET/PORT/HEAD/etc;
Expand All @@ -394,6 +400,8 @@ struct tfw_http_req_t {
TfwStr host;
TfwStr uri_path;
TfwStr mark;
TfwStr multipart_boundary_raw;
TfwStr multipart_boundary;
struct list_head fwd_list;
struct list_head nip_list;
unsigned char method;
Expand Down
Loading