-
Notifications
You must be signed in to change notification settings - Fork 28.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling
This patch will try to infer schema for RDD which has empty value (None, [], {}) in the first row. It will try first 100 rows and merge the types into schema, also merge fields of StructType together. If there is still NullType in schema, then it will show an warning, tell user to try with sampling. If sampling is presented, it will infer schema from all the rows after sampling. Also, add samplingRatio for jsonFile() and jsonRDD() Author: Davies Liu <[email protected]> Author: Davies Liu <[email protected]> Closes #2716 from davies/infer and squashes the following commits: e678f6d [Davies Liu] Merge branch 'master' of github.com:apache/spark into infer 34b5c63 [Davies Liu] Merge branch 'master' of github.com:apache/spark into infer 567dc60 [Davies Liu] update docs 9767b27 [Davies Liu] Merge branch 'master' into infer e48d7fb [Davies Liu] fix tests 29e94d5 [Davies Liu] let NullType inherit from PrimitiveType ee5d524 [Davies Liu] Merge branch 'master' of github.com:apache/spark into infer 540d1d5 [Davies Liu] merge fields for StructType f93fd84 [Davies Liu] add more tests 3603e00 [Davies Liu] take more rows to infer schema, or infer the schema by sampling the RDD (cherry picked from commit 24544fb) Signed-off-by: Michael Armbrust <[email protected]>
- Loading branch information
Showing
3 changed files
with
148 additions
and
69 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters