Make max slice size in ORC slice reader configurable #24202

sdruzkin · 2024-12-05T04:23:12Z

Description

ORC slice reader has a hardcoded max slice size of 1GB, it throws when it one attempts to read a slice(s) larger than 1GB. Make it configurable to be able to increase the threshold in Spark for some failing jobs.

Plumbed the new value trough OrcReaderOptions -> OrcRecordReaderOptions -> OrcReader -> SelectiveReaderContext -> SliceDirectSelectiveStreamReader.

Motivation and Context

Hardcoded value of 1GB is too low for some files and needs to be increased to accommodate such cases.

Impact

No impact.

Test Plan

Existing and new unit tests.

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

facebook-github-bot · 2024-12-05T04:23:37Z

This pull request was exported from Phabricator. Differential Revision: D66800897

Summary: Make max slice size in ORC slice reader configurable to be able to increase the threshold in Spark for Data Mine failing jobs. Differential Revision: D66800897

facebook-github-bot · 2024-12-05T05:20:55Z

This pull request was exported from Phabricator. Differential Revision: D66800897

Summary: Make max slice size in ORC slice reader configurable to be able to increase the threshold in Spark for Data Mine failing jobs. Differential Revision: D66800897

facebook-github-bot · 2024-12-05T05:24:19Z

This pull request was exported from Phabricator. Differential Revision: D66800897

rschlussel · 2024-12-06T14:57:24Z

why do we have this limit in the first place? Is the problem that we aren't reserving the memory for the slice before we read?

steveburnett · 2024-12-06T17:48:54Z

If this is a new configuration (or session) property, please add documentation in the appropriate pages of the Presto doc:

Presto Session Properties
Presto Configuration Properties
Presto C++ Session Properties
Presto C++ Configuration Properties

sdruzkin · 2024-12-06T20:01:15Z

why do we have this limit in the first place? Is the problem that we aren't reserving the memory for the slice before we read?

I guess so, ORC memory context does not do any good with memory reservation. It was added somewhere before 2021-2022, when a typical cluster had under 30GB of total memory and a very little headroom.

If this is a new configuration (or session) property, please add documentation in the appropriate pages of the Presto doc:

This setting cannot be configured through the session or cluster properties.

sdruzkin requested a review from a team as a code owner December 5, 2024 04:23

sdruzkin requested a review from presto-oss December 5, 2024 04:23

facebook-github-bot added the fb-exported label Dec 5, 2024

sdruzkin force-pushed the export-D66800897 branch from 9e3976a to 7d41968 Compare December 5, 2024 05:20

Make max slice size in ORC slice reader configurable (prestodb#24202)

622b934

Summary: Make max slice size in ORC slice reader configurable to be able to increase the threshold in Spark for Data Mine failing jobs. Differential Revision: D66800897

sdruzkin force-pushed the export-D66800897 branch from 7d41968 to 622b934 Compare December 5, 2024 05:23

rschlussel approved these changes Dec 6, 2024

View reviewed changes

sdruzkin merged commit 7f1bae2 into prestodb:master Dec 6, 2024
58 checks passed

This was referenced Jan 28, 2025

Add release notes for 0.291 #24445

Merged

Add release notes for 0.291 #24448

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make max slice size in ORC slice reader configurable #24202

Make max slice size in ORC slice reader configurable #24202

sdruzkin commented Dec 5, 2024 •

edited

Loading

facebook-github-bot commented Dec 5, 2024

facebook-github-bot commented Dec 5, 2024

facebook-github-bot commented Dec 5, 2024

rschlussel commented Dec 6, 2024

steveburnett commented Dec 6, 2024

sdruzkin commented Dec 6, 2024

Make max slice size in ORC slice reader configurable #24202

Make max slice size in ORC slice reader configurable #24202

Conversation

sdruzkin commented Dec 5, 2024 • edited Loading

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

facebook-github-bot commented Dec 5, 2024

facebook-github-bot commented Dec 5, 2024

facebook-github-bot commented Dec 5, 2024

rschlussel commented Dec 6, 2024

steveburnett commented Dec 6, 2024

sdruzkin commented Dec 6, 2024

sdruzkin commented Dec 5, 2024 •

edited

Loading