-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding limited support for case-sensitive table names #8674
Conversation
Based on needs in issue prestodb#2863, where in RDBMS that supports case-sensitive table names and have tables with case-sensitive names, then those tables are unable to be used with PrestoDB. This change adds optional (configuration controlled) mapping for table names when executing queries that end up calling getTableHandle(). It maps a lowercased name to the name of the table from the database, and supports reloading the mapping in the event that a table is referenced but doesn't currently have a mapping so new tables can be used without restarting Presto. A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().
@@ -101,6 +104,10 @@ | |||
protected final String connectionUrl; | |||
protected final Properties connectionProperties; | |||
protected final String identifierQuote; | |||
private final boolean mapTableNames; | |||
|
|||
private Map<String, Map<String, String>> schemaTableMapping = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like Map of Map things, can you please extract some class instead? That class could have different implementation for mapTableNames
equal to true
and false
. You could have a lock there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is caching such information is correct? What if someone do some rename in underlying database in the meantime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can extract that to a separate Class. Is it preferred to be an inner class for this type of usage? Or a properly separate class file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in response about the lack of tests, if you read the 2nd part of my initial commit comment:
A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can extract that to a separate Class. Is it preferred to be an inner class for this type of usage? Or a properly separate class file?
Up to you. Choose the thing what fits bets.
Regarding the tests, sorry for not reading the commit message. What about using PSQL instead of H2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using what was there for the existing tests. Also was other wanting to end up with external applications needing to be run. You're referring to Postgres? In a quick search I actually see some Java libs for embedding Postgres and MySQL for testing purposes, I'll look into both if them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MySQL and PostgreSQL connectors already run tests against the embedded versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use a Guava LoadingCache
which will allow expiration and background refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, looking at the MySQL specific connector would have been smart to look at for the tests. Thanks for pointing that out, and I'll look at LoadingCache.
Also I have not noticed tests, can you please point me what ensures coverage for your change? |
{ | ||
// Only have 1 thread at a time load the mapping in. This may result in having some queries not return anything or fail because they table | ||
ReentrantLock lock = schemaTableMappingLock.get(jdbcSchemaName); | ||
if (lock == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use ConcurrentMap::computeIfAbsent here
} | ||
|
||
@Config("connection-map-tables") | ||
public BaseJdbcConfig setMapLowercaseTableNames(boolean mapLowercaseTableNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be configurable as the feature is needed in order for the connectors to work properly given Presto's limitations on lowercase table names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought for having a flag is to make this an optional feature for people, so as to preserve current behavior for those who don't want to use this. I'm very happy to make it default on, or even just always on without a flag.
So I'm running into a situation where I'm having to place the test code in the presto-mysql because I need to use the MySqlClient, unless I essentially copy the MySqlClient (even if as an anonymous class) in order to deal with the quirks of MySQL compared to the BaseJdbcClient. So the tests for this, which really IMO belong in presto-base-jdbc, would be challenging to put there. Thoughts? |
It's fine to put the tests there.
|
Based on the pull request review, making some major changes. List of changes: * Handling mapping schema names now as well as table names * Using CacheLoader to handle loading/adding/reloading the name mapping for both schema and tables * Needed to change so that the concrete client can return the raw schema/table names from protected methods, which the base class then lowercases as needed * Needed to have a way to prevent initial cache loading for the Plugin tests for each of the clients, otherwise it would fail as no server was created to load schemas/tables from * Updated the Plugin tests for each JDBC concrete client to set the flag to not auto-load the schema/table mappings * Test checking the mapping and case sensitivity is in the presto-mysql plugin, as the issues with H2 for a case-sensitive mean I need MySQL instance to test against, which means the MySqlClient, which I can't pull into presto-base-jdbc as it would cause a circular dependancy
Sorry for the delay, lots of work on this and of course other things also came up at work. Hopefully I've managed to take into account all of the concerns, see the latest commit message and code for full details. |
presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java
Show resolved
Hide resolved
private Map<String, Map<String, String>> schemaTableMapping = new HashMap<>(); | ||
private Map<String, ReentrantLock> schemaTableMappingLock = new ConcurrentHashMap<>(); | ||
private final LoadingCache<String, Optional<String>> schemaMappingCache; | ||
private final Map<String, LoadingCache<String, Optional<String>>> schemaTableMapping = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hide these fields and related methods as a separate class. Then it seems that with some abstraction over JDBC (which is used to load the cache) you could write some additional low level unit tests without using mysql. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you're asking to hide these fields as a separate class. Something like a SchemaTableMapping class? Pass in a BaseJdbcClient object to the constructor, and use that to perform all of the loading? Problem with this is how would I get access to the raw RDBMS schema/table names? The public API (currently) only provides for the lowercased names).
Hmm...I suppose I could instead create an implementation of the BaseJdbcClient in the test context which doesn't actually connect to JDBC and returns pre-defined lists of tables/schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Sounds a bit like an over engineering.
* @param schema the schema to list the tables for, or NULL for all | ||
* @return the schema + table names | ||
*/ | ||
protected List<String[]> getOriginalTablesWithSchema(Connection connection, String schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning List of arrays does not look good. Why not to use SchemaTableName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaTableName
in it's constructor specifically lowercases both the schema and the table strings, thus can't be used for this.
I chose an array as an easy, simple, basic structure that wouldn't take much effort. Also can't be a Map since a schema can have a multiple tables, and we might be pulling from multiple schemas at once rather than just 1 schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead you could create another new class like SchemaTableName
, which remain the names casing.
@@ -29,6 +29,6 @@ public void testCreateConnector() | |||
{ | |||
Plugin plugin = new PostgreSqlPlugin(); | |||
ConnectorFactory factory = getOnlyElement(plugin.getConnectorFactories()); | |||
factory.create("test", ImmutableMap.of("connection-url", "test"), new TestingConnectorContext()); | |||
factory.create("test", ImmutableMap.of("connection-url", "test", "connection-load-table-mappings", "false"), new TestingConnectorContext()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could tests for this and sqlserver
connector as well? What do you think? Maybe you could define a set of tests in presto-base-jdbc
and them call them from each connector which support mapping? And from jdbc connectors which do not support that we could have negative tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not sure what you mean for this, I do have this tests updated/fixed in sqlserver
module, as this particular one just seems to be to test that the plugin factory can be fetched and loaded and to create an instance of the plugin successfully.
I do see where defining the tests in presto-base-jdbc
for handling mixed-case schema/table names that each JDBC implementation can pull to test individually in their own module if the DB handles mixed-case is useful.
So I'm leaving for vacation for 2 weeks, so I won't be around for any replies or what not. |
Ok, back to work and getting back into the swing of things, so looking to get back into this and move this forward. @kokosing, any thoughts on my replies to your comments? |
I am sorry, but I won't be able to continue work on this. Please ask somebody else for a review. |
* CaseSensitiveMappedSchemaTableName used to return the raw table names instead of a String[] * Added tests by creating some mocked implementations of a Driver and metadata and supporting methods necessary for use with the BaseJdbcClient to perform some basic case-sensitive name tests See TestJdbcClientNameMapping and TestingNameMappingDriver
So, reviewing the 2 checks that failed, both seem to involve trying to connect to the servers but fail to. Is there an issue with the current server startup for these 2 checks? Or could it be that my pre-load code is happening too early as it's occurring in the constructor and so the rest of the surrounding code hasn't properly completed something needed to correctly connect? |
|
||
public CaseSensitiveMappedSchemaTableName(String schemaName, String tableName) | ||
{ | ||
if (schemaName == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Presto way to do this appears to be requireNonNull(schemaName, "schemaName is null");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, thank you.
} | ||
|
||
// if someone is listing all of the schema names, throw them all into the cache as a refresh since we already spent the time pulling them from the DB | ||
schemaMappingCache.putAll(mappedNames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method appears to be doing too much. (1) it's computing all the schema names and (2) caching the values.
It's unclear where (2) is used and why (1) doesn't use it.
Furthermore, it forces that reloadCache
must call getSchemaNames
before invalidating the values in schemaTableMapping
. This won't be clear to a future maintainer of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't use the cache because this method, in my view, must return the current list of schema names directly from the DB. I also see this change as something which needs to NOT change the output of any method, and will only change the input, and even then as little as possible in order to get a query to execute.
The caching being done on the schema names is so that other requests for data can successfully execute those queries, even if it's just for listing the table names if the schema name has mixed case name.
for (String key : schemaTableMapping.keySet()) { | ||
schemaTableMapping.get(key).invalidateAll(); | ||
schemaTableMapping.remove(key); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason you can't just iterate through the values?
schemaTableMapping.values().foreach(v -> v.invalidateAll());
schemaTableMapping.clear();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't typically used the Java8 Lambdas because of our code base, which sadly has an incredibly out of date Hibernate which barfs if we try and use Lambdas in the wrong place. sigh
But yes, that forEach is more efficient, especially the clear() after rather than doing a bunch of removes.
|
||
schema = finalizeSchemaName(metadata, schema); | ||
Map<String, Map<String, String>> schemaMappedNames = new HashMap<>(); | ||
ImmutableList.Builder<SchemaTableName> list = ImmutableList.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call this tableNames, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the list? It's a local only variable, but sure.
Some small tweaks from the feedback, doing things a bit more prestodb way.
any update? hive metastore tables in mysql like DBS, PARTITIONS can't be queried. presto> select * from mysql.metastore.DBS; |
Is this issue resolved ?? |
look forward for this! I am eager to know if there are any updates! |
@hamlet-lee, @babayega I am currently working on adding support for case-sensitive identifiers. You can track the work here(trinodb/trino#354) |
@hamlet-lee, @babayega, |
@Praveen2112 thanks for the information, I'll track that PR. |
@hamlet-lee sure. Just note that Starburst Presto is freely available. |
Closing. This will be supported by https://github.com/prestodb/presto/pulls/kewang1024 |
Based on needs in issue #2863, where in RDBMS that supports case-sensitive table names and have tables with case-sensitive names, then those tables are unable to be used with PrestoDB. This change adds optional (configuration controlled) mapping for table names when executing queries that end up calling getTableHandle(). It maps a lowercased name to the name of the table from the database, and supports reloading the mapping in the event that a table is referenced but doesn't currently have a mapping so new tables can be used without restarting Presto.
A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().
As a first time PR for this project from me, I did complete the CLA as per the repository contribution guidelines.