Backpressure for broker-to-historical communication #4933

gianm · 2017-10-10T17:38:02Z

If the result set size is very large and the client making the query is not pulling the results out fast enough, brokers can go OOM. This is because the broker lacks backpressure for large result sets, and they can "pile up" in memory. I believe that the specific part lacking backpressure is the collection and merging of results from individual historical (and realtime) nodes.

This is mostly an issue with scan queries and groupBy queries, which can generate large result sets.

When implementing backpressure it would be important to consider the effect this may have on the historical nodes. Backpressure would presumably extend all the way down to them, and in that case, would block threads their http server pools.

Related: #4229, which introduced maxScatterGatherBytes as a kind of workaround to prevent broker crashes (although it will fail queries).

Also related: #4865, a symptom.

The text was updated successfully, but these errors were encountered:

gianm · 2018-08-01T09:10:45Z

Also related: #6014, an attempt to solve this.

gianm · 2018-08-01T09:49:43Z

#6014 was closed but still has relevant discussion: #6014 (comment)

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context parameter "maxQueuedBytes". Both represent a maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Fixes apache#4933.

* Broker backpressure. Adds a new property "druid.broker.http.maxQueuedBytes" and a new context parameter "maxQueuedBytes". Both represent a maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Fixes #4933. * Fix query context doc.

gianm added the Area - Querying label Oct 10, 2017

gianm mentioned this issue Oct 12, 2017

Add streaming aggregation as the last step of ConcurrentGrouper if data are spilled #4704

Merged

gianm mentioned this issue Jul 18, 2018

Optionally refuse to consume new data until the prior chunk is being consumed #6014

Closed

1 task

This was referenced Aug 1, 2018

No protections for select query #5006

Closed

Add limit to query result buffering queue #4949

Closed

gianm mentioned this issue Sep 7, 2018

Broker backpressure. #6313

Merged

gianm closed this as completed in #6313 Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backpressure for broker-to-historical communication #4933

Backpressure for broker-to-historical communication #4933

gianm commented Oct 10, 2017

gianm commented Aug 1, 2018

gianm commented Aug 1, 2018 •

edited

Loading

Backpressure for broker-to-historical communication #4933

Backpressure for broker-to-historical communication #4933

Comments

gianm commented Oct 10, 2017

gianm commented Aug 1, 2018

gianm commented Aug 1, 2018 • edited Loading

gianm commented Aug 1, 2018 •

edited

Loading