-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432
Conversation
@jerryshao @shaofengshi PTAL. |
// Fix https://github.com/apache/gravitino/issues/6392, MYSQL JDBC driver may not load | ||
// automatically. | ||
try { | ||
Class.forName("com.mysql.jdbc.Driver"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need load the class explictly?Can you provide more detail about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the root reason, maybe the JDBC driver was loaded by another classloader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to figure out the root cause before the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root cause is as follows:
When SparkIcebergCatalogRestBackendIT33
executes Spark SQL, it will enable the IsolatedClientLoader (used for isolating Hive dependencies; the parent classloader of this classloader is the root classloader) to execute part of the code. This causes the static code block(loadInitialDrivers) of java.sql.DriverManager
to be triggered by the IsolatedClientLoader
, thereby resulting in the driver's classloader in the DriverInfo
registered with DriverManager
being IsolatedClientLoader
.
SparkIcebergCatalogRestBackendIT33
and SparkJdbcMysqlCatalogIT33
run in the same JVM. Since the static code block of a class loaded by the root classloader can only be initialized once, executing SparkJdbcMysqlCatalogIT33
afterward will cause an error due to the incorrect driver classloader.
@CallerSensitive
public static Driver getDriver(String url)
throws SQLException {
println("DriverManager.getDriver(\"" + url + "\")");
Class<?> callerClass = Reflection.getCallerClass();
// Walk through the loaded registeredDrivers attempting to locate someone
// who understands the given URL.
for (DriverInfo aDriver : registeredDrivers) {
// If the caller does not have permission to load the driver then
// skip it.
// aDriver loaded by IsolatedClientLoader will fail here
if(isDriverAllowed(aDriver.driver, callerClass)) {
try {
if(aDriver.driver.acceptsURL(url)) {
// Success!
println("getDriver returning " + aDriver.driver.getClass().getName());
return (aDriver.driver);
}
} catch(SQLException sqe) {
// Drop through and try the next driver.
}
} else {
println(" skipping: " + aDriver.driver.getClass().getName());
}
}
println("getDriver: no suitable driver");
throw new SQLException("No suitable driver", "08001");
}
private static boolean isDriverAllowed(Driver driver, ClassLoader classLoader) {
boolean result = false;
if(driver != null) {
Class<?> aClass = null;
try {
aClass = Class.forName(driver.getClass().getName(), true, classLoader);
} catch (Exception ex) {
result = false;
}
**// The class loader of aClass is `AppClassLoader` and that of the driver.getClass() is IsolatedClientLoader, that is why the driver does exist, but can't fetch it successfully. **
result = ( aClass == driver.getClass() ) ? true : false;
}
return result;
}
The order of these unit tests matters. If SparkJdbcMysqlCatalogIT33 is executed first each time, there will be no problem. Therefore, this issue is not consistently reproducible.
Solutions:
- Re-register using Class.forName("") with the App Classloader.
- Ensure that SparkJdbcMysqlCatalogIT33 is always executed first in the module
- Run the first test in a separate JVM. This can be achieved by.
test {
forkCount = 1
maxParallelForks = 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have similar issue when used in the real scenario, or it only happens in the UT? Why do we only meet this issue in Spark 3.3 test, not in other Spark module test, and not happened previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark loades JDBC driver explictly in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala?#L46-#L62, and there seems ok to use Spark SQL to query JDBC data, and I test in my local enviroment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the classloader that load the Driver matters. If the classloader that loads the Driver is the same as the one that uses it, there should be no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Spark uses AppClassloader
not IsolatedClientLoader
to load the corresponding JDBC driver explictly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This steps will load JDBC drivers in `IsolatedClientLoader` and can't reuse these drivers in App class loader
1. getSparkSession().sql(query).collectAsList();
// Get MySQL connection in the App classloader
2. DriverManager.getConnection()
I guess this will report error, but this pattern is seldom used, and not the scope of Gravitino, the pattern is like why the failure happened, SparkRESTIT loads the driver in IsolatedClientLoader
, SparkJDBCIT get MySQL connection failed for missing driver. Spark JDBC catalog load corresponding Jdbc driver explicitly to make sure the driver is loaded when executing SparkSQLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
#6432) ### What changes were proposed in this pull request? I'm not sure the root reason, seems MYSQL JDBC driver was not loaded automatically in some condition, in this PR, load Mysql driver explicitly. ### Why are the changes needed? Fix: #6392 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test in local machine. --------- Co-authored-by: Qi Yu <[email protected]>
What changes were proposed in this pull request?
I'm not sure the root reason, seems MYSQL JDBC driver was not loaded automatically in some condition, in this PR, load Mysql driver explicitly.
Why are the changes needed?
Fix: #6392
Does this PR introduce any user-facing change?
no
How was this patch tested?
test in local machine.