Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49544][CONNECT] Replace coarse-locking in SparkConnectExecutionManager with ConcurrentMap #48034

Closed
wants to merge 3 commits into from

Conversation

changgyoopark-db
Copy link
Contributor

What changes were proposed in this pull request?

Replace the coarse-locking mechanism implemented in SparkConnectExecutionManager with ConcurrentMap in order to ameliorate lock contention.

Why are the changes needed?

When there are too many threads, e.g., ~10K threads on a 4-core node, OS scheduling may cause priority inversion that leads to a serious performance problems, e.g., a 1000s delay when reattaching to an execute holder.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@changgyoopark-db changgyoopark-db changed the title [SPARK-49544] Replace coarse-locking in SparkConnectExecutionManager with ConcurrentMap [WIP][SPARK-49544][CONNECT] Replace coarse-locking in SparkConnectExecutionManager with ConcurrentMap Sep 9, 2024
@changgyoopark-db changgyoopark-db changed the title [WIP][SPARK-49544][CONNECT] Replace coarse-locking in SparkConnectExecutionManager with ConcurrentMap [SPARK-49544][CONNECT] Replace coarse-locking in SparkConnectExecutionManager with ConcurrentMap Sep 9, 2024
@changgyoopark-db
Copy link
Contributor Author

@juliuszsompolski Can you please review this PR too? Thanks!

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussions. It looks good to me.

@changgyoopark-db
Copy link
Contributor Author

@hvanhovell @HyukjinKwon Hello Herman and Hyukjin, would you mind merging this PR? Thanks!

@changgyoopark-db
Copy link
Contributor Author

@grundprinzip Hey Martin, can you review and merge this change? Thanks!

HyukjinKwon pushed a commit that referenced this pull request Sep 11, 2024
…anager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.

### Why are the changes needed?

It is a spin-off from #48034 where #48034 addresses many-execution cases whereas this addresses many-session situations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48036 from changgyoopark-db/SPARK-49548.

Authored-by: Changgyoo Park <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@HyukjinKwon
Copy link
Member

Merged to master.

@changgyoopark-db changgyoopark-db deleted the SPARK-49544 branch September 19, 2024 06:13
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…anager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.

### Why are the changes needed?

It is a spin-off from apache#48034 where apache#48034 addresses many-execution cases whereas this addresses many-session situations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48036 from changgyoopark-db/SPARK-49548.

Authored-by: Changgyoo Park <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…nManager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking mechanism implemented in SparkConnectExecutionManager with ConcurrentMap in order to ameliorate lock contention.

### Why are the changes needed?

When there are too many threads, e.g., ~10K threads on a 4-core node, OS scheduling may cause priority inversion that leads to a serious performance problems, e.g., a 1000s delay when reattaching to an execute holder.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48034 from changgyoopark-db/SPARK-49544.

Authored-by: Changgyoo Park <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…anager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking in SparkConnectSessionManager with ConcurrentMap in order to minimise lock contention when there are many sessions.

### Why are the changes needed?

It is a spin-off from apache#48034 where apache#48034 addresses many-execution cases whereas this addresses many-session situations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48036 from changgyoopark-db/SPARK-49548.

Authored-by: Changgyoo Park <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
…nManager with ConcurrentMap

### What changes were proposed in this pull request?

Replace the coarse-locking mechanism implemented in SparkConnectExecutionManager with ConcurrentMap in order to ameliorate lock contention.

### Why are the changes needed?

When there are too many threads, e.g., ~10K threads on a 4-core node, OS scheduling may cause priority inversion that leads to a serious performance problems, e.g., a 1000s delay when reattaching to an execute holder.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test cases.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#48034 from changgyoopark-db/SPARK-49544.

Authored-by: Changgyoo Park <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants