-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SQL] More aggressive defaults #3064
Conversation
marmbrus
commented
Nov 3, 2014
- Turns on compression for in-memory cached data by default
- Changes the default parquet compression format back to gzip (we have seen more OOMs with production workloads due to the way Snappy allocates memory)
- Ups the batch size to 10,000 rows
- Increases the broadcast threshold to 10mb.
- Uses our parquet implementation instead of the hive one by default.
- Cache parquet metadata by default.
/cc @liancheng @mateiz |
@@ -109,7 +109,7 @@ private[sql] trait SQLConf { | |||
* Hive setting: hive.auto.convert.join.noconditionaltask.size, whose default value is also 10000. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: comment seems slightly out of date now
Test build #22781 has started for PR 3064 at commit
|
Test FAILed. |
Test FAILed. |
Test build #506 has started for PR 3064 at commit
|
Test build #506 has finished for PR 3064 at commit
|
Test build #22781 has finished for PR 3064 at commit
|
Test PASSed. |
This LGTM. (Why does MiMA complain about interfaces introduced by the foreign data source API here?) |
Thanks for looking at this! I don't think that's MIMA, as that would actually fail the build (and is
|
- Turns on compression for in-memory cached data by default - Changes the default parquet compression format back to gzip (we have seen more OOMs with production workloads due to the way Snappy allocates memory) - Ups the batch size to 10,000 rows - Increases the broadcast threshold to 10mb. - Uses our parquet implementation instead of the hive one by default. - Cache parquet metadata by default. Author: Michael Armbrust <[email protected]> Closes #3064 from marmbrus/fasterDefaults and squashes the following commits: 97ee9f8 [Michael Armbrust] parquet codec docs e641694 [Michael Armbrust] Remote also a12866a [Michael Armbrust] Cache metadata. 2d73acc [Michael Armbrust] Update docs defaults. d63d2d5 [Michael Armbrust] document parquet option da373f9 [Michael Armbrust] More aggressive defaults (cherry picked from commit 25bef7e) Signed-off-by: Michael Armbrust <[email protected]>