Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5: Adapt PlanningBenchmark for DVs #11531

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Nov 12, 2024

This PR adapts our PlanningBenchmark for DVs.

Benchmark                                                                          (type)  Mode  Cnt            Score              Error   Units

PlanningBenchmark.distributedPlanningWithMinMaxFilter                           partition    ss    5            1.415 ±            0.553    s/op
PlanningBenchmark.distributedPlanningWithMinMaxFilter                                file    ss    5            5.000 ±            1.021    s/op
PlanningBenchmark.distributedPlanningWithMinMaxFilter                                  dv    ss    5            2.312 ±            0.307    s/op

PlanningBenchmark.distributedPlanningWithoutFilter                              partition    ss    5            2.147 ±            1.272    s/op
PlanningBenchmark.distributedPlanningWithoutFilter                                   file    ss    5            6.452 ±            2.327    s/op
PlanningBenchmark.distributedPlanningWithoutFilter                                     dv    ss    5            2.863 ±            0.590    s/op

PlanningBenchmark.distributedPlanningWithoutFilterWithStats                     partition    ss    5           49.325 ±            7.451    s/op
PlanningBenchmark.distributedPlanningWithoutFilterWithStats                          file    ss    5           54.895 ±           12.604    s/op
PlanningBenchmark.distributedPlanningWithoutFilterWithStats                            dv    ss    5           50.448 ±            3.760    s/op

PlanningBenchmark.localPlanningWithMinMaxFilter                                 partition    ss    5            1.313 ±            0.645    s/op
PlanningBenchmark.localPlanningWithMinMaxFilter                                      file    ss    5            3.546 ±            2.172    s/op
PlanningBenchmark.localPlanningWithMinMaxFilter                                        dv    ss    5            2.421 ±            1.126    s/op

PlanningBenchmark.localPlanningWithoutFilter                                    partition    ss    5            3.163 ±            0.283    s/op
PlanningBenchmark.localPlanningWithoutFilter                                         file    ss    5            5.455 ±            0.859    s/op
PlanningBenchmark.localPlanningWithoutFilter                                           dv    ss    5            4.296 ±            0.415    s/op

PlanningBenchmark.localPlanningWithoutFilterWithStats                           partition    ss    5           10.992 ±            3.095    s/op
PlanningBenchmark.localPlanningWithoutFilterWithStats                                file    ss    5           19.860 ±           23.286    s/op
PlanningBenchmark.localPlanningWithoutFilterWithStats                                  dv    ss    5           11.543 ±            6.390    s/op

The benchmark is set up so that there is a lot of more delete metadata to read with file-scoped deletes and DVs. Therefore, they are slower. DVs are faster than file-scoped deletes because there is less garbage and fewer GC operations.

This work is part of #11122.

@github-actions github-actions bot added the spark label Nov 12, 2024
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks ! I will run benchmark on my machine (curiosity 😄 ).

@aokolnychyi aokolnychyi merged commit 7e4fd1b into apache:main Nov 15, 2024
31 checks passed
@aokolnychyi
Copy link
Contributor Author

Thanks, @jbonofre @nastra!

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants