[SPARK-23186][SQL] Initialize DriverManager first before loading JDBC Drivers #20359

dongjoon-hyun · 2018-01-23T05:03:34Z

What changes were proposed in this pull request?

Since some JDBC Drivers have class initialization code to call DriverManager, we need to initialize DriverManager first in order to avoid potential executor-side deadlock situations like the following (or STORM-2527).

Thread 9587: (state = BLOCKED)
 - sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor, java.lang.Object[]) @bci=0 (Compiled frame; information may be imprecise)
 - sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[]) @bci=85, line=62 (Compiled frame)
 - sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[]) @bci=5, line=45 (Compiled frame)
 - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) @bci=79, line=423 (Compiled frame)
 - java.lang.Class.newInstance() @bci=138, line=442 (Compiled frame)
 - java.util.ServiceLoader$LazyIterator.nextService() @bci=119, line=380 (Interpreted frame)
 - java.util.ServiceLoader$LazyIterator.next() @bci=11, line=404 (Interpreted frame)
 - java.util.ServiceLoader$1.next() @bci=37, line=480 (Interpreted frame)
 - java.sql.DriverManager$2.run() @bci=21, line=603 (Interpreted frame)
 - java.sql.DriverManager$2.run() @bci=1, line=583 (Interpreted frame)
 - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) @bci=0 (Compiled frame)
 - java.sql.DriverManager.loadInitialDrivers() @bci=27, line=583 (Interpreted frame)
 - java.sql.DriverManager.<clinit>() @bci=32, line=101 (Interpreted frame)
 - org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(java.lang.String, java.lang.Integer, java.lang.String, java.util.Properties) @bci=12, line=98 (Interpreted frame)
 - org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(org.apache.hadoop.conf.Configuration, java.util.Properties) @bci=22, line=57 (Interpreted frame)
 - org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(org.apache.hadoop.mapreduce.JobContext, org.apache.hadoop.conf.Configuration) @bci=61, line=116 (Interpreted frame)
 - org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext) @bci=10, line=71 (Interpreted frame)
 - org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(org.apache.spark.rdd.NewHadoopRDD, org.apache.spark.Partition, org.apache.spark.TaskContext) @bci=233, line=156 (Interpreted frame)

Thread 9170: (state = BLOCKED)
 - org.apache.phoenix.jdbc.PhoenixDriver.<clinit>() @bci=35, line=125 (Interpreted frame)
 - sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor, java.lang.Object[]) @bci=0 (Compiled frame)
 - sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[]) @bci=85, line=62 (Compiled frame)
 - sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[]) @bci=5, line=45 (Compiled frame)
 - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) @bci=79, line=423 (Compiled frame)
 - java.lang.Class.newInstance() @bci=138, line=442 (Compiled frame)
 - org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(java.lang.String) @bci=89, line=46 (Interpreted frame)
 - org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply() @bci=7, line=53 (Interpreted frame)
 - org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply() @bci=1, line=52 (Interpreted frame)
 - org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, org.apache.spark.Partition, org.apache.spark.TaskContext) @bci=81, line=347 (Interpreted frame)
 - org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(org.apache.spark.Partition, org.apache.spark.TaskContext) @bci=7, line=339 (Interpreted frame)

How was this patch tested?

N/A

SparkQA · 2018-01-23T08:05:01Z

Test build #86514 has finished for PR 20359 at commit 234a637.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-01-23T13:25:54Z

retest this please

SparkQA · 2018-01-23T16:41:41Z

Test build #86530 has finished for PR 20359 at commit 234a637.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-01-23T16:48:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala

@@ -32,6 +32,9 @@ import org.apache.spark.util.Utils
 */
 object DriverRegistry extends Logging {

+  // Initialize DriverManager first to prevent potential deadlocks between DriverManager and Driver
+  DriverManager.getDrivers


Hi, @rxin .
Could you give me some comments on this?

More context about this change? Based on your PR description, this is to resolve the deadlocks among executors? How does it work after applying this change?

It's the same situation like STORM in the PR description and this occurs in Spark, too.
In the Spark executor, the stacks shows the deadlock between DriverManager and Driver.

Pheonix library call DriverManager.loadInitialDrivers()

Spark DriverRegistry call PhoenixDriver constructor before DriverManager created.

Unfortunately, so far, I only have this log only. It's difficult to reproduce this deadlock.

How did you test it?

We need a test; otherwise, it is not the right thing to merge such a PR.

Do you mean a unit test case?

I'm wondering if you can merge this PR if I can test this patch in somewhere of our customer cluster. :)

if it's too hard to write a UT, I think a manual test is also fine.

Thank you for review, @cloud-fan .
So far, this deadlock situation is reported intermittently without any logs.

dongjoon-hyun · 2018-01-24T16:34:59Z

Ping, @gatorsmile and @cloud-fan .

cloud-fan · 2018-01-25T04:59:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala

@@ -32,6 +32,9 @@ import org.apache.spark.util.Utils
 */
 object DriverRegistry extends Logging {

+  // Initialize DriverManager first to prevent potential deadlocks between DriverManager and Driver


We need to say more about why this can avoid deadlock.

we can copy something from the storm PR: https://github.com/apache/storm/pull/2134/files

dongjoon-hyun · 2018-01-25T16:02:23Z

Thank you for review, @cloud-fan and @gatorsmile .
So far, there is no test case to generate this kind of Spark job hung (in executor-side).
I fully understand your viewpoints. I'll try this way in our customer production environment first after internal QE. Since this is intermittent deadlock situation, we don't know whether this is clearly removed or not. But, we can monitor that at least.

SparkQA · 2018-02-03T04:54:44Z

Test build #87020 has finished for PR 20359 at commit 52e6f19.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-02-03T05:39:36Z

It's a well-known irrelevant failure.

org.apache.spark.sql.kafka010.KafkaContinuousSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false

dongjoon-hyun · 2018-02-03T05:39:44Z

Retest this please.

SparkQA · 2018-02-03T08:05:01Z

Test build #87025 has finished for PR 20359 at commit 52e6f19.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-02-03T16:37:33Z

Retest this please

SparkQA · 2018-02-03T19:49:24Z

Test build #87036 has finished for PR 20359 at commit 52e6f19.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-02-08T20:57:13Z

FYI, this is tested on a production cluster for two weeks and the deadlock issue is not reported until now.

dongjoon-hyun · 2018-02-08T22:17:48Z

Thank you for review and approval, @srowen .

cloud-fan · 2018-02-09T04:55:30Z

thanks, merging to master/2.3!

… Drivers ## What changes were proposed in this pull request? Since some JDBC Drivers have class initialization code to call `DriverManager`, we need to initialize `DriverManager` first in order to avoid potential executor-side **deadlock** situations like the following (or [STORM-2527](https://issues.apache.org/jira/browse/STORM-2527)). ``` Thread 9587: (state = BLOCKED) - sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor, java.lang.Object[]) bci=0 (Compiled frame; information may be imprecise) - sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=85, line=62 (Compiled frame) - sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=5, line=45 (Compiled frame) - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) bci=79, line=423 (Compiled frame) - java.lang.Class.newInstance() bci=138, line=442 (Compiled frame) - java.util.ServiceLoader$LazyIterator.nextService() bci=119, line=380 (Interpreted frame) - java.util.ServiceLoader$LazyIterator.next() bci=11, line=404 (Interpreted frame) - java.util.ServiceLoader$1.next() bci=37, line=480 (Interpreted frame) - java.sql.DriverManager$2.run() bci=21, line=603 (Interpreted frame) - java.sql.DriverManager$2.run() bci=1, line=583 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) bci=0 (Compiled frame) - java.sql.DriverManager.loadInitialDrivers() bci=27, line=583 (Interpreted frame) - java.sql.DriverManager.<clinit>() bci=32, line=101 (Interpreted frame) - org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(java.lang.String, java.lang.Integer, java.lang.String, java.util.Properties) bci=12, line=98 (Interpreted frame) - org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(org.apache.hadoop.conf.Configuration, java.util.Properties) bci=22, line=57 (Interpreted frame) - org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(org.apache.hadoop.mapreduce.JobContext, org.apache.hadoop.conf.Configuration) bci=61, line=116 (Interpreted frame) - org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext) bci=10, line=71 (Interpreted frame) - org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(org.apache.spark.rdd.NewHadoopRDD, org.apache.spark.Partition, org.apache.spark.TaskContext) bci=233, line=156 (Interpreted frame) Thread 9170: (state = BLOCKED) - org.apache.phoenix.jdbc.PhoenixDriver.<clinit>() bci=35, line=125 (Interpreted frame) - sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor, java.lang.Object[]) bci=0 (Compiled frame) - sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=85, line=62 (Compiled frame) - sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=5, line=45 (Compiled frame) - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) bci=79, line=423 (Compiled frame) - java.lang.Class.newInstance() bci=138, line=442 (Compiled frame) - org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(java.lang.String) bci=89, line=46 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply() bci=7, line=53 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply() bci=1, line=52 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, org.apache.spark.Partition, org.apache.spark.TaskContext) bci=81, line=347 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(org.apache.spark.Partition, org.apache.spark.TaskContext) bci=7, line=339 (Interpreted frame) ``` ## How was this patch tested? N/A Author: Dongjoon Hyun <[email protected]> Closes #20359 from dongjoon-hyun/SPARK-23186. (cherry picked from commit 8cbcc33) Signed-off-by: Wenchen Fan <[email protected]>

dongjoon-hyun · 2018-02-09T06:05:11Z

Thank you for merging, @cloud-fan .
And thank you again, @HyukjinKwon , @gatorsmile , and @srowen !

dongjoon-hyun · 2018-02-09T19:13:12Z

@cloud-fan . Can we have this in branch-2.2, too? Although it's cherry-pickable, I'll make a PR if you want to trigger the Jenkins test.

cloud-fan · 2018-02-09T20:41:59Z

let's send a new PR

dongjoon-hyun · 2018-02-09T21:31:05Z

Thank you!

… Drivers ## What changes were proposed in this pull request? Since some JDBC Drivers have class initialization code to call `DriverManager`, we need to initialize `DriverManager` first in order to avoid potential executor-side **deadlock** situations like the following (or [STORM-2527](https://issues.apache.org/jira/browse/STORM-2527)). ``` Thread 9587: (state = BLOCKED) - sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor, java.lang.Object[]) bci=0 (Compiled frame; information may be imprecise) - sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=85, line=62 (Compiled frame) - sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=5, line=45 (Compiled frame) - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) bci=79, line=423 (Compiled frame) - java.lang.Class.newInstance() bci=138, line=442 (Compiled frame) - java.util.ServiceLoader$LazyIterator.nextService() bci=119, line=380 (Interpreted frame) - java.util.ServiceLoader$LazyIterator.next() bci=11, line=404 (Interpreted frame) - java.util.ServiceLoader$1.next() bci=37, line=480 (Interpreted frame) - java.sql.DriverManager$2.run() bci=21, line=603 (Interpreted frame) - java.sql.DriverManager$2.run() bci=1, line=583 (Interpreted frame) - java.security.AccessController.doPrivileged(java.security.PrivilegedAction) bci=0 (Compiled frame) - java.sql.DriverManager.loadInitialDrivers() bci=27, line=583 (Interpreted frame) - java.sql.DriverManager.<clinit>() bci=32, line=101 (Interpreted frame) - org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(java.lang.String, java.lang.Integer, java.lang.String, java.util.Properties) bci=12, line=98 (Interpreted frame) - org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(org.apache.hadoop.conf.Configuration, java.util.Properties) bci=22, line=57 (Interpreted frame) - org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(org.apache.hadoop.mapreduce.JobContext, org.apache.hadoop.conf.Configuration) bci=61, line=116 (Interpreted frame) - org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext) bci=10, line=71 (Interpreted frame) - org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(org.apache.spark.rdd.NewHadoopRDD, org.apache.spark.Partition, org.apache.spark.TaskContext) bci=233, line=156 (Interpreted frame) Thread 9170: (state = BLOCKED) - org.apache.phoenix.jdbc.PhoenixDriver.<clinit>() bci=35, line=125 (Interpreted frame) - sun.reflect.NativeConstructorAccessorImpl.newInstance0(java.lang.reflect.Constructor, java.lang.Object[]) bci=0 (Compiled frame) - sun.reflect.NativeConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=85, line=62 (Compiled frame) - sun.reflect.DelegatingConstructorAccessorImpl.newInstance(java.lang.Object[]) bci=5, line=45 (Compiled frame) - java.lang.reflect.Constructor.newInstance(java.lang.Object[]) bci=79, line=423 (Compiled frame) - java.lang.Class.newInstance() bci=138, line=442 (Compiled frame) - org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(java.lang.String) bci=89, line=46 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply() bci=7, line=53 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply() bci=1, line=52 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.<init>(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, org.apache.spark.Partition, org.apache.spark.TaskContext) bci=81, line=347 (Interpreted frame) - org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(org.apache.spark.Partition, org.apache.spark.TaskContext) bci=7, line=339 (Interpreted frame) ``` ## How was this patch tested? N/A Author: Dongjoon Hyun <[email protected]> Closes apache#20359 from dongjoon-hyun/SPARK-23186.

[SPARK-23186][SQL] Initialize DriverManager first before loading Drivers

234a637

dongjoon-hyun commented Jan 23, 2018

View reviewed changes

cloud-fan reviewed Jan 25, 2018

View reviewed changes

Update comments.

52e6f19

srowen approved these changes Feb 8, 2018

View reviewed changes

asfgit closed this in 8cbcc33 Feb 9, 2018

dongjoon-hyun deleted the SPARK-23186 branch February 9, 2018 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23186][SQL] Initialize DriverManager first before loading JDBC Drivers #20359

[SPARK-23186][SQL] Initialize DriverManager first before loading JDBC Drivers #20359

dongjoon-hyun commented Jan 23, 2018 •

edited

Loading

SparkQA commented Jan 23, 2018

HyukjinKwon commented Jan 23, 2018

SparkQA commented Jan 23, 2018

dongjoon-hyun Jan 23, 2018

gatorsmile Jan 24, 2018

dongjoon-hyun Jan 24, 2018 •

edited

Loading

dongjoon-hyun Jan 24, 2018

gatorsmile Jan 24, 2018

gatorsmile Jan 24, 2018

dongjoon-hyun Jan 24, 2018

dongjoon-hyun Jan 24, 2018

cloud-fan Jan 25, 2018 •

edited

Loading

dongjoon-hyun Jan 25, 2018

dongjoon-hyun commented Jan 24, 2018

cloud-fan Jan 25, 2018

cloud-fan Jan 25, 2018

dongjoon-hyun commented Jan 25, 2018

SparkQA commented Feb 3, 2018

dongjoon-hyun commented Feb 3, 2018

dongjoon-hyun commented Feb 3, 2018

SparkQA commented Feb 3, 2018

dongjoon-hyun commented Feb 3, 2018

SparkQA commented Feb 3, 2018

dongjoon-hyun commented Feb 8, 2018

dongjoon-hyun commented Feb 8, 2018

cloud-fan commented Feb 9, 2018

dongjoon-hyun commented Feb 9, 2018

dongjoon-hyun commented Feb 9, 2018 •

edited

Loading

cloud-fan commented Feb 9, 2018

dongjoon-hyun commented Feb 9, 2018

[SPARK-23186][SQL] Initialize DriverManager first before loading JDBC Drivers #20359

[SPARK-23186][SQL] Initialize DriverManager first before loading JDBC Drivers #20359

Conversation

dongjoon-hyun commented Jan 23, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 23, 2018

HyukjinKwon commented Jan 23, 2018

SparkQA commented Jan 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Jan 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jan 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun commented Jan 25, 2018

SparkQA commented Feb 3, 2018

dongjoon-hyun commented Feb 3, 2018

dongjoon-hyun commented Feb 3, 2018

SparkQA commented Feb 3, 2018

dongjoon-hyun commented Feb 3, 2018

SparkQA commented Feb 3, 2018

dongjoon-hyun commented Feb 8, 2018

dongjoon-hyun commented Feb 8, 2018

cloud-fan commented Feb 9, 2018

dongjoon-hyun commented Feb 9, 2018

dongjoon-hyun commented Feb 9, 2018 • edited Loading

cloud-fan commented Feb 9, 2018

dongjoon-hyun commented Feb 9, 2018

dongjoon-hyun commented Jan 23, 2018 •

edited

Loading

dongjoon-hyun Jan 24, 2018 •

edited

Loading

cloud-fan Jan 25, 2018 •

edited

Loading

dongjoon-hyun commented Feb 9, 2018 •

edited

Loading