-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make max slice size in ORC slice reader configurable #24202
Conversation
This pull request was exported from Phabricator. Differential Revision: D66800897 |
Summary: Make max slice size in ORC slice reader configurable to be able to increase the threshold in Spark for Data Mine failing jobs. Differential Revision: D66800897
9e3976a
to
7d41968
Compare
This pull request was exported from Phabricator. Differential Revision: D66800897 |
Summary: Make max slice size in ORC slice reader configurable to be able to increase the threshold in Spark for Data Mine failing jobs. Differential Revision: D66800897
7d41968
to
622b934
Compare
This pull request was exported from Phabricator. Differential Revision: D66800897 |
why do we have this limit in the first place? Is the problem that we aren't reserving the memory for the slice before we read? |
If this is a new configuration (or session) property, please add documentation in the appropriate pages of the Presto doc: Presto Session Properties |
I guess so, ORC memory context does not do any good with memory reservation. It was added somewhere before 2021-2022, when a typical cluster had under 30GB of total memory and a very little headroom.
This setting cannot be configured through the session or cluster properties. |
Description
ORC slice reader has a hardcoded max slice size of 1GB, it throws when it one attempts to read a slice(s) larger than 1GB. Make it configurable to be able to increase the threshold in Spark for some failing jobs.
Plumbed the new value trough OrcReaderOptions -> OrcRecordReaderOptions -> OrcReader -> SelectiveReaderContext -> SliceDirectSelectiveStreamReader.
Motivation and Context
Hardcoded value of 1GB is too low for some files and needs to be increased to accommodate such cases.
Impact
No impact.
Test Plan
Existing and new unit tests.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.