[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432

FANNG1 · 2025-02-10T11:32:14Z

What changes were proposed in this pull request?

I'm not sure the root reason, seems MYSQL JDBC driver was not loaded automatically in some condition, in this PR, load Mysql driver explicitly.

Why are the changes needed?

Fix: #6392

Does this PR introduce any user-facing change?

no

How was this patch tested?

test in local machine.

FANNG1 · 2025-02-10T12:47:18Z

@jerryshao @shaofengshi PTAL.

yuqi1129 · 2025-02-10T12:58:57Z

...est-common/src/test/java/org/apache/gravitino/integration/test/container/MySQLContainer.java

+    // Fix https://github.com/apache/gravitino/issues/6392, MYSQL JDBC driver may not load
+    // automatically.
+    try {
+      Class.forName("com.mysql.jdbc.Driver");


Why do we need load the class explictly？Can you provide more detail about it?

I'm not sure the root reason, maybe the JDBC driver was loaded by another classloader.

It would be better to figure out the root cause before the fix.

@FANNG1 @jerryshao

The root cause is as follows:

When SparkIcebergCatalogRestBackendIT33 executes Spark SQL, it will enable the IsolatedClientLoader (used for isolating Hive dependencies; the parent classloader of this classloader is the root classloader) to execute part of the code. This causes the static code block(loadInitialDrivers) of java.sql.DriverManager to be triggered by the IsolatedClientLoader, thereby resulting in the driver's classloader in the DriverInfo registered with DriverManager being IsolatedClientLoader.

SparkIcebergCatalogRestBackendIT33 and SparkJdbcMysqlCatalogIT33 run in the same JVM. Since the static code block of a class loaded by the root classloader can only be initialized once, executing SparkJdbcMysqlCatalogIT33 afterward will cause an error due to the incorrect driver classloader.

@CallerSensitive public static Driver getDriver(String url) throws SQLException { println("DriverManager.getDriver(\"" + url + "\")"); Class<?> callerClass = Reflection.getCallerClass(); // Walk through the loaded registeredDrivers attempting to locate someone // who understands the given URL. for (DriverInfo aDriver : registeredDrivers) { // If the caller does not have permission to load the driver then // skip it. // aDriver loaded by IsolatedClientLoader will fail here if(isDriverAllowed(aDriver.driver, callerClass)) { try { if(aDriver.driver.acceptsURL(url)) { // Success! println("getDriver returning " + aDriver.driver.getClass().getName()); return (aDriver.driver); } } catch(SQLException sqe) { // Drop through and try the next driver. } } else { println(" skipping: " + aDriver.driver.getClass().getName()); } } println("getDriver: no suitable driver"); throw new SQLException("No suitable driver", "08001"); } private static boolean isDriverAllowed(Driver driver, ClassLoader classLoader) { boolean result = false; if(driver != null) { Class<?> aClass = null; try { aClass = Class.forName(driver.getClass().getName(), true, classLoader); } catch (Exception ex) { result = false; } **// The class loader of aClass is `AppClassLoader` and that of the driver.getClass() is IsolatedClientLoader, that is why the driver does exist, but can't fetch it successfully. ** result = ( aClass == driver.getClass() ) ? true : false; } return result; }

The order of these unit tests matters. If SparkJdbcMysqlCatalogIT33 is executed first each time, there will be no problem. Therefore, this issue is not consistently reproducible.

Solutions:

Re-register using Class.forName("") with the App Classloader.

Ensure that SparkJdbcMysqlCatalogIT33 is always executed first in the module

Run the first test in a separate JVM. This can be achieved by.

test { forkCount = 1 maxParallelForks = 1 }

Do we have similar issue when used in the real scenario, or it only happens in the UT? Why do we only meet this issue in Spark 3.3 test, not in other Spark module test, and not happened previously?

Spark loades JDBC driver explictly in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala?#L46-#L62, and there seems ok to use Spark SQL to query JDBC data, and I test in my local enviroment

It is the classloader that load the Driver matters. If the classloader that loads the Driver is the same as the one that uses it, there should be no problem.

Yes, Spark uses AppClassloader not IsolatedClientLoader to load the corresponding JDBC driver explictly.

// This steps will load JDBC drivers in `IsolatedClientLoader` and can't reuse these drivers in App class loader 1. getSparkSession().sql(query).collectAsList(); // Get MySQL connection in the App classloader 2. DriverManager.getConnection()

I guess this will report error, but this pattern is seldom used, and not the scope of Gravitino, the pattern is like why the failure happened, SparkRESTIT loads the driver in IsolatedClientLoader, SparkJDBCIT get MySQL connection failed for missing driver. Spark JDBC catalog load corresponding Jdbc driver explicitly to make sure the driver is loaded when executing SparkSQLs.

I see, that is acceptable.

yuqi1129

LGTM

#6432) ### What changes were proposed in this pull request? I'm not sure the root reason, seems MYSQL JDBC driver was not loaded automatically in some condition, in this PR, load Mysql driver explicitly. ### Why are the changes needed? Fix: #6392 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test in local machine. --------- Co-authored-by: Qi Yu <[email protected]>

fix comment

7ab4223

FANNG1 changed the title ~~fix comment~~ [#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed Feb 10, 2025

FANNG1 changed the title ~~[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed~~ [#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment Feb 10, 2025

fix comment

d999985

yuqi1129 reviewed Feb 10, 2025

View reviewed changes

Merge branch 'main' into spark_jdbc

0e4e1a9

yuqi1129 approved these changes Feb 27, 2025

View reviewed changes

yuqi1129 added the branch-0.8 Automatically cherry-pick commit to branch-0.8 label Feb 27, 2025

yuqi1129 assigned FANNG1 Feb 27, 2025

yuqi1129 merged commit b8a2349 into apache:main Feb 27, 2025
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432

[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432

FANNG1 commented Feb 10, 2025 •

edited

Loading

FANNG1 commented Feb 10, 2025

yuqi1129 Feb 10, 2025

FANNG1 Feb 11, 2025

jerryshao Feb 14, 2025 •

edited

Loading

yuqi1129 Feb 25, 2025 •

edited

Loading

jerryshao Feb 25, 2025

FANNG1 Feb 26, 2025

yuqi1129 Feb 26, 2025

FANNG1 Feb 26, 2025

FANNG1 Feb 27, 2025

yuqi1129 Feb 27, 2025

yuqi1129 left a comment

[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432

[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432

Conversation

FANNG1 commented Feb 10, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

FANNG1 commented Feb 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryshao Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

yuqi1129 Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi1129 left a comment

Choose a reason for hiding this comment

FANNG1 commented Feb 10, 2025 •

edited

Loading

jerryshao Feb 14, 2025 •

edited

Loading

yuqi1129 Feb 25, 2025 •

edited

Loading