Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment #6432

Merged
merged 3 commits into from
Feb 27, 2025

Conversation

FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Feb 10, 2025

What changes were proposed in this pull request?

I'm not sure the root reason, seems MYSQL JDBC driver was not loaded automatically in some condition, in this PR, load Mysql driver explicitly.

Why are the changes needed?

Fix: #6392

Does this PR introduce any user-facing change?

no

How was this patch tested?

test in local machine.

@FANNG1 FANNG1 changed the title fix comment [#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed Feb 10, 2025
@FANNG1 FANNG1 changed the title [#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed [#6392] fix(test): SparkJdbcMysqlCatalogIT33 failed in some enviroment Feb 10, 2025
@FANNG1
Copy link
Contributor Author

FANNG1 commented Feb 10, 2025

@jerryshao @shaofengshi PTAL.

// Fix https://github.com/apache/gravitino/issues/6392, MYSQL JDBC driver may not load
// automatically.
try {
Class.forName("com.mysql.jdbc.Driver");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need load the class explictly?Can you provide more detail about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the root reason, maybe the JDBC driver was loaded by another classloader.

Copy link
Contributor

@jerryshao jerryshao Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to figure out the root cause before the fix.

Copy link
Contributor

@yuqi1129 yuqi1129 Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FANNG1 @jerryshao

The root cause is as follows:

When SparkIcebergCatalogRestBackendIT33 executes Spark SQL, it will enable the IsolatedClientLoader (used for isolating Hive dependencies; the parent classloader of this classloader is the root classloader) to execute part of the code. This causes the static code block(loadInitialDrivers) of java.sql.DriverManager to be triggered by the IsolatedClientLoader, thereby resulting in the driver's classloader in the DriverInfo registered with DriverManager being IsolatedClientLoader.

SparkIcebergCatalogRestBackendIT33 and SparkJdbcMysqlCatalogIT33 run in the same JVM. Since the static code block of a class loaded by the root classloader can only be initialized once, executing SparkJdbcMysqlCatalogIT33 afterward will cause an error due to the incorrect driver classloader.

  @CallerSensitive
    public static Driver getDriver(String url)
        throws SQLException {

        println("DriverManager.getDriver(\"" + url + "\")");

        Class<?> callerClass = Reflection.getCallerClass();

        // Walk through the loaded registeredDrivers attempting to locate someone
        // who understands the given URL.
        for (DriverInfo aDriver : registeredDrivers) {
            // If the caller does not have permission to load the driver then
            // skip it.

	   // aDriver loaded by IsolatedClientLoader will fail here
            if(isDriverAllowed(aDriver.driver, callerClass)) {
                try {
                    if(aDriver.driver.acceptsURL(url)) {
                        // Success!
                        println("getDriver returning " + aDriver.driver.getClass().getName());
                    return (aDriver.driver);
                    }

                } catch(SQLException sqe) {
                    // Drop through and try the next driver.
                }
            } else {
                println("    skipping: " + aDriver.driver.getClass().getName());
            }

        }

        println("getDriver: no suitable driver");
        throw new SQLException("No suitable driver", "08001");
    }


    private static boolean isDriverAllowed(Driver driver, ClassLoader classLoader) {
        boolean result = false;
        if(driver != null) {
            Class<?> aClass = null;
            try {
                aClass =  Class.forName(driver.getClass().getName(), true, classLoader);
            } catch (Exception ex) {
                result = false;
            }
	   
            **// The class loader of aClass is `AppClassLoader` and that of the driver.getClass() is IsolatedClientLoader, that is why the driver does exist, but can't fetch it successfully. **
             result = ( aClass == driver.getClass() ) ? true : false;
        }

        return result;
    }

The order of these unit tests matters. If SparkJdbcMysqlCatalogIT33 is executed first each time, there will be no problem. Therefore, this issue is not consistently reproducible.

Solutions:

  • Re-register using Class.forName("") with the App Classloader.
  • Ensure that SparkJdbcMysqlCatalogIT33 is always executed first in the module
  • Run the first test in a separate JVM. This can be achieved by.
test {
    forkCount = 1  
    maxParallelForks = 1  
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have similar issue when used in the real scenario, or it only happens in the UT? Why do we only meet this issue in Spark 3.3 test, not in other Spark module test, and not happened previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark loades JDBC driver explictly in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/DriverRegistry.scala?#L46-#L62, and there seems ok to use Spark SQL to query JDBC data, and I test in my local enviroment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the classloader that load the Driver matters. If the classloader that loads the Driver is the same as the one that uses it, there should be no problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Spark uses AppClassloader not IsolatedClientLoader to load the corresponding JDBC driver explictly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// This steps will load JDBC drivers in `IsolatedClientLoader`  and can't reuse these drivers in App class loader
1. getSparkSession().sql(query).collectAsList();
// Get MySQL connection in the App classloader
2. DriverManager.getConnection()

I guess this will report error, but this pattern is seldom used, and not the scope of Gravitino, the pattern is like why the failure happened, SparkRESTIT loads the driver in IsolatedClientLoader, SparkJDBCIT get MySQL connection failed for missing driver. Spark JDBC catalog load corresponding Jdbc driver explicitly to make sure the driver is loaded when executing SparkSQLs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that is acceptable.

Copy link
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuqi1129 yuqi1129 added the branch-0.8 Automatically cherry-pick commit to branch-0.8 label Feb 27, 2025
@yuqi1129 yuqi1129 merged commit b8a2349 into apache:main Feb 27, 2025
28 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 27, 2025
#6432)

### What changes were proposed in this pull request?

I'm not sure the root reason, seems MYSQL JDBC driver was not loaded
automatically in some condition, in this PR, load Mysql driver
explicitly.

### Why are the changes needed?

Fix: #6392 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
test in local machine.

---------

Co-authored-by: Qi Yu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-0.8 Automatically cherry-pick commit to branch-0.8
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug report] SparkJdbcMysqlCatalogIT33 failed
3 participants