Adding limited support for case-sensitive table names #8674

Drizzt321 · 2017-08-04T20:53:09Z

Based on needs in issue #2863, where in RDBMS that supports case-sensitive table names and have tables with case-sensitive names, then those tables are unable to be used with PrestoDB. This change adds optional (configuration controlled) mapping for table names when executing queries that end up calling getTableHandle(). It maps a lowercased name to the name of the table from the database, and supports reloading the mapping in the event that a table is referenced but doesn't currently have a mapping so new tables can be used without restarting Presto.

A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().

As a first time PR for this project from me, I did complete the CLA as per the repository contribution guidelines.

Based on needs in issue prestodb#2863, where in RDBMS that supports case-sensitive table names and have tables with case-sensitive names, then those tables are unable to be used with PrestoDB. This change adds optional (configuration controlled) mapping for table names when executing queries that end up calling getTableHandle(). It maps a lowercased name to the name of the table from the database, and supports reloading the mapping in the event that a table is referenced but doesn't currently have a mapping so new tables can be used without restarting Presto. A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().

kokosing · 2017-08-07T05:40:55Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

@@ -101,6 +104,10 @@
    protected final String connectionUrl;
    protected final Properties connectionProperties;
    protected final String identifierQuote;
+    private final boolean mapTableNames;
+
+    private Map<String, Map<String, String>> schemaTableMapping = new HashMap<>();


I do not like Map of Map things, can you please extract some class instead? That class could have different implementation for mapTableNames equal to true and false. You could have a lock there as well.

Is caching such information is correct? What if someone do some rename in underlying database in the meantime?

I can extract that to a separate Class. Is it preferred to be an inner class for this type of usage? Or a properly separate class file?

Also, in response about the lack of tests, if you read the 2nd part of my initial commit comment:

A test was attempted to be written similar to the TestJdbcClient with a new TestingDatabase, however it was discovered that H2 will return TRUE for metadata.storesUpperCaseIdentifiers() even if DATABASE_TO_UPPER=FALSE is set. The only way to have that return FALSE is using MODE=MYSQL, however in that case H2 lowercases all table names. So it is not possible to create a test with H2 without modifying the section of code with the call to metadata.storesUpperCaseIdentifiers().

I can extract that to a separate Class. Is it preferred to be an inner class for this type of usage? Or a properly separate class file?

Up to you. Choose the thing what fits bets.

Regarding the tests, sorry for not reading the commit message. What about using PSQL instead of H2?

I was using what was there for the existing tests. Also was other wanting to end up with external applications needing to be run. You're referring to Postgres? In a quick search I actually see some Java libs for embedding Postgres and MySQL for testing purposes, I'll look into both if them.

The MySQL and PostgreSQL connectors already run tests against the embedded versions.

We can use a Guava LoadingCache which will allow expiration and background refresh.

You know, looking at the MySQL specific connector would have been smart to look at for the tests. Thanks for pointing that out, and I'll look at LoadingCache.

kokosing · 2017-08-07T05:42:49Z

Also I have not noticed tests, can you please point me what ensures coverage for your change?

lvijay · 2017-08-07T05:47:04Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

+    {
+        // Only have 1 thread at a time load the mapping in. This may result in having some queries not return anything or fail because they table
+        ReentrantLock lock = schemaTableMappingLock.get(jdbcSchemaName);
+        if (lock == null) {


You could use ConcurrentMap::computeIfAbsent here

electrum · 2017-08-07T16:45:48Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcConfig.java

+    }
+
+    @Config("connection-map-tables")
+    public BaseJdbcConfig setMapLowercaseTableNames(boolean mapLowercaseTableNames)


I don't think this should be configurable as the feature is needed in order for the connectors to work properly given Presto's limitations on lowercase table names.

My thought for having a flag is to make this an optional feature for people, so as to preserve current behavior for those who don't want to use this. I'm very happy to make it default on, or even just always on without a flag.

Drizzt321 · 2017-08-10T17:19:48Z

So I'm running into a situation where I'm having to place the test code in the presto-mysql because I need to use the MySqlClient, unless I essentially copy the MySqlClient (even if as an anonymous class) in order to deal with the quirks of MySQL compared to the BaseJdbcClient. So the tests for this, which really IMO belong in presto-base-jdbc, would be challenging to put there. Thoughts?

electrum · 2017-08-11T14:24:08Z

It's fine to put the tests there.

Based on the pull request review, making some major changes. List of changes: * Handling mapping schema names now as well as table names * Using CacheLoader to handle loading/adding/reloading the name mapping for both schema and tables * Needed to change so that the concrete client can return the raw schema/table names from protected methods, which the base class then lowercases as needed * Needed to have a way to prevent initial cache loading for the Plugin tests for each of the clients, otherwise it would fail as no server was created to load schemas/tables from * Updated the Plugin tests for each JDBC concrete client to set the flag to not auto-load the schema/table mappings * Test checking the mapping and case sensitivity is in the presto-mysql plugin, as the issues with H2 for a case-sensitive mean I need MySQL instance to test against, which means the MySqlClient, which I can't pull into presto-base-jdbc as it would cause a circular dependancy

Drizzt321 · 2017-08-16T18:42:33Z

Sorry for the delay, lots of work on this and of course other things also came up at work. Hopefully I've managed to take into account all of the concerns, see the latest commit message and code for full details.

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

kokosing · 2017-08-17T05:39:18Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

-    private Map<String, Map<String, String>> schemaTableMapping = new HashMap<>();
-    private Map<String, ReentrantLock> schemaTableMappingLock = new ConcurrentHashMap<>();
+    private final LoadingCache<String, Optional<String>> schemaMappingCache;
+    private final Map<String, LoadingCache<String, Optional<String>>> schemaTableMapping = new ConcurrentHashMap<>();


I would hide these fields and related methods as a separate class. Then it seems that with some abstraction over JDBC (which is used to load the cache) you could write some additional low level unit tests without using mysql. What do you think?

Not sure why you're asking to hide these fields as a separate class. Something like a SchemaTableMapping class? Pass in a BaseJdbcClient object to the constructor, and use that to perform all of the loading? Problem with this is how would I get access to the raw RDBMS schema/table names? The public API (currently) only provides for the lowercased names).

Hmm...I suppose I could instead create an implementation of the BaseJdbcClient in the test context which doesn't actually connect to JDBC and returns pre-defined lists of tables/schemas.

I see. Sounds a bit like an over engineering.

kokosing · 2017-08-17T05:41:47Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

+     * @param schema the schema to list the tables for, or NULL for all
+     * @return the schema + table names
+     */
+    protected List<String[]> getOriginalTablesWithSchema(Connection connection, String schema)


Returning List of arrays does not look good. Why not to use SchemaTableName?

SchemaTableName in it's constructor specifically lowercases both the schema and the table strings, thus can't be used for this.

I chose an array as an easy, simple, basic structure that wouldn't take much effort. Also can't be a Map since a schema can have a multiple tables, and we might be pulling from multiple schemas at once rather than just 1 schema.

Maybe instead you could create another new class like SchemaTableName, which remain the names casing.

kokosing · 2017-08-17T05:49:54Z

presto-postgresql/src/test/java/com/facebook/presto/plugin/postgresql/TestPostgreSqlPlugin.java

@@ -29,6 +29,6 @@ public void testCreateConnector()
    {
        Plugin plugin = new PostgreSqlPlugin();
        ConnectorFactory factory = getOnlyElement(plugin.getConnectorFactories());
-        factory.create("test", ImmutableMap.of("connection-url", "test"), new TestingConnectorContext());
+        factory.create("test", ImmutableMap.of("connection-url", "test", "connection-load-table-mappings", "false"), new TestingConnectorContext());


Maybe we could tests for this and sqlserver connector as well? What do you think? Maybe you could define a set of tests in presto-base-jdbc and them call them from each connector which support mapping? And from jdbc connectors which do not support that we could have negative tests.

So I'm not sure what you mean for this, I do have this tests updated/fixed in sqlserver module, as this particular one just seems to be to test that the plugin factory can be fetched and loaded and to create an instance of the plugin successfully.

I do see where defining the tests in presto-base-jdbc for handling mixed-case schema/table names that each JDBC implementation can pull to test individually in their own module if the DB handles mixed-case is useful.

Drizzt321 · 2017-08-21T16:43:43Z

So I'm leaving for vacation for 2 weeks, so I won't be around for any replies or what not.

Drizzt321 · 2017-09-12T21:22:43Z

Ok, back to work and getting back into the swing of things, so looking to get back into this and move this forward. @kokosing, any thoughts on my replies to your comments?

.

kokosing · 2017-09-13T19:08:00Z

I am sorry, but I won't be able to continue work on this. Please ask somebody else for a review.

* CaseSensitiveMappedSchemaTableName used to return the raw table names instead of a String[] * Added tests by creating some mocked implementations of a Driver and metadata and supporting methods necessary for use with the BaseJdbcClient to perform some basic case-sensitive name tests See TestJdbcClientNameMapping and TestingNameMappingDriver

Drizzt321 · 2017-09-22T20:33:21Z

So, reviewing the 2 checks that failed, both seem to involve trying to connect to the servers but fail to. Is there an issue with the current server startup for these 2 checks? Or could it be that my pre-load code is happening too early as it's occurring in the constructor and so the rest of the surrounding code hasn't properly completed something needed to correctly connect?

Drizzt321 · 2017-09-25T17:08:15Z

@electrum, @lvijay, please take a look.

lvijay · 2017-10-01T15:59:04Z

...e-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/CaseSensitiveMappedSchemaTableName.java

+
+    public CaseSensitiveMappedSchemaTableName(String schemaName, String tableName)
+    {
+        if (schemaName == null) {


The Presto way to do this appears to be requireNonNull(schemaName, "schemaName is null");

Ah, yes, thank you.

lvijay · 2017-10-01T16:16:01Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

+        }
+
+        // if someone is listing all of the schema names, throw them all into the cache as a refresh since we already spent the time pulling them from the DB
+        schemaMappingCache.putAll(mappedNames);


This method appears to be doing too much. (1) it's computing all the schema names and (2) caching the values.

It's unclear where (2) is used and why (1) doesn't use it.

Furthermore, it forces that reloadCache must call getSchemaNames before invalidating the values in schemaTableMapping. This won't be clear to a future maintainer of the code.

I don't use the cache because this method, in my view, must return the current list of schema names directly from the DB. I also see this change as something which needs to NOT change the output of any method, and will only change the input, and even then as little as possible in order to get a query to execute.

The caching being done on the schema names is so that other requests for data can successfully execute those queries, even if it's just for listing the table names if the schema name has mixed case name.

lvijay · 2017-10-01T16:37:04Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

+        for (String key : schemaTableMapping.keySet()) {
+            schemaTableMapping.get(key).invalidateAll();
+            schemaTableMapping.remove(key);
+        }


any reason you can't just iterate through the values?

schemaTableMapping.values().foreach(v -> v.invalidateAll()); schemaTableMapping.clear();

I haven't typically used the Java8 Lambdas because of our code base, which sadly has an incredibly out of date Hibernate which barfs if we try and use Lambdas in the wrong place. sigh

But yes, that forEach is more efficient, especially the clear() after rather than doing a bunch of removes.

lvijay · 2017-10-01T16:43:04Z

presto-base-jdbc/src/main/java/com/facebook/presto/plugin/jdbc/BaseJdbcClient.java

+
+            schema = finalizeSchemaName(metadata, schema);
+            Map<String, Map<String, String>> schemaMappedNames = new HashMap<>();
+            ImmutableList.Builder<SchemaTableName> list = ImmutableList.builder();


call this tableNames, perhaps?

You mean the list? It's a local only variable, but sure.

Some small tweaks from the feedback, doing things a bit more prestodb way.

tooptoop4 · 2018-09-13T01:57:50Z

any update? hive metastore tables in mysql like DBS, PARTITIONS can't be queried.

presto> select * from mysql.metastore.DBS;
Query 20180913_015448_00010_774as failed: line 1:15: Table mysql.metastore.dbs does not exist

babayega · 2018-12-10T12:55:24Z

Is this issue resolved ??
Because I am using presto 0.214 and still facing it.

hamlet-lee · 2019-03-13T03:06:13Z

look forward for this! I am eager to know if there are any updates!

Praveen2112 · 2019-03-13T06:50:04Z

@hamlet-lee, @babayega I am currently working on adding support for case-sensitive identifiers. You can track the work here(trinodb/trino#354)

findepi · 2019-03-13T08:12:09Z

@hamlet-lee, @babayega,
until @Praveen2112 adds proper case-sensitive identifiers, you can use Starburst Presto release
which provides an case-insensitive-name-matching = true | false option for all JDBC-based connectors.
See https://docs.starburstdata.com/latest/release/release-302-e.html

hamlet-lee · 2019-03-13T11:44:40Z

@Praveen2112 thanks for the information, I'll track that PR.
@findepi thanks for the information. Currently, we'd like to stick to community version :）

findepi · 2019-03-13T12:48:18Z

@hamlet-lee sure. Just note that Starburst Presto is freely available.
We planned to contribute the change i mentioned, just haven't had time.
However, given that @Praveen2112 's work is already in progress, I don't think
there would be an interest in our change, as it will become obsolete in the long-term.

rschlussel · 2019-08-12T17:14:34Z

Closing. This will be supported by https://github.com/prestodb/presto/pulls/kewang1024

facebook-github-bot added the CLA Signed label Aug 4, 2017

kokosing suggested changes Aug 7, 2017

View reviewed changes

lvijay reviewed Aug 7, 2017

View reviewed changes

electrum reviewed Aug 7, 2017

View reviewed changes

kokosing previously requested changes Aug 17, 2017

View reviewed changes

lvijay reviewed Oct 1, 2017

View reviewed changes

Further feedback tweaks

1ce1aad

Some small tweaks from the feedback, doing things a bit more prestodb way.

lvijay mentioned this pull request Nov 28, 2017

MYSQL Connector does not identifies Upper Case Database Name and Table Name #3470

Closed

Drizzt321 mentioned this pull request Apr 18, 2018

Add support for case sensitive identifiers #2863

Open

findepi mentioned this pull request Oct 3, 2018

WIP: Support non-lower-case table names in JDBC connectors #11633

Closed

rschlussel closed this Aug 12, 2019

Adding limited support for case-sensitive table names #8674

Adding limited support for case-sensitive table names #8674

Conversation

Drizzt321 commented Aug 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kokosing commented Aug 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Drizzt321 commented Aug 10, 2017

electrum commented Aug 11, 2017 via email

Drizzt321 commented Aug 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Drizzt321 commented Aug 21, 2017

Drizzt321 commented Sep 12, 2017

kokosing commented Sep 13, 2017

Drizzt321 commented Sep 22, 2017

Drizzt321 commented Sep 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tooptoop4 commented Sep 13, 2018

babayega commented Dec 10, 2018

hamlet-lee commented Mar 13, 2019

Praveen2112 commented Mar 13, 2019

findepi commented Mar 13, 2019

hamlet-lee commented Mar 13, 2019

findepi commented Mar 13, 2019

rschlussel commented Aug 12, 2019