Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Case Insensitive Collations #7519

Closed
morgo opened this issue Aug 28, 2018 · 6 comments
Closed

Support Case Insensitive Collations #7519

morgo opened this issue Aug 28, 2018 · 6 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@morgo
Copy link
Contributor

morgo commented Aug 28, 2018

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
CREATE TABLE t2 (a varchar(255)) character set utf8 collate utf8_general_ci;
INSERT INTO t2 VALUES ('a'), ('A');
SELECT * FROM t2 WHERE a='A';

SHOW CREATE TABLE t2;
  1. What did you expect to see?
    MySQL result:
mysql [localhost] {msandbox} (test) > CREATE TABLE t2 (a varchar(255)) character set utf8 collate utf8_general_ci;
Query OK, 0 rows affected, 1 warning (0.16 sec)

mysql [localhost] {msandbox} (test) > show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                     |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 3719 | 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous. |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > INSERT INTO t2 VALUES ('a'), ('A');
Query OK, 2 rows affected (0.09 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > SELECT * FROM t2 WHERE a='A';
+------+
| a    |
+------+
| a    |
| A    |
+------+
2 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > SHOW CREATE TABLE t2;
+-------+------------------------------------------------------------------------------------------+
| Table | Create Table                                                                             |
+-------+------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `a` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
  1. What did you see instead?
mysql> CREATE TABLE t2 (a varchar(255)) character set utf8 collate utf8_general_ci;
Query OK, 0 rows affected (0.13 sec)

mysql> INSERT INTO t2 VALUES ('a'), ('A');
Query OK, 2 rows affected (0.13 sec)

mysql> SELECT * FROM t2 WHERE a='A';
+------+
| a    |
+------+
| A    |
+------+
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE t2;
+-------+------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                     |
+-------+------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `a` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci |
+-------+------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
  1. What version of TiDB are you using (tidb-server -V or run select tidb_version(); on TiDB)?
mysql> SELECT tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: rc1-3104-g51046aa7
Git Commit Hash: 51046aa717e782d3843ae2c91f0a0952de05a4e5
Git Branch: tidb_tracing_prototype
UTC Build Time: 2018-08-23 08:21:06
GoVersion: go version go1.10.1 linux/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
1 row in set (0.00 sec)
@morgo
Copy link
Contributor Author

morgo commented Aug 28, 2018

It looks like it is only collating as binary:

MySQL 8.0:

mysql [localhost] {msandbox} (test) > show variables like 'character%';
+--------------------------+----------------------------------------------+
| Variable_name            | Value                                        |
+--------------------------+----------------------------------------------+
| character_set_client     | utf8                                         |
| character_set_connection | utf8                                         |
| character_set_database   | utf8mb4                                      |
| character_set_filesystem | binary                                       |
| character_set_results    | utf8                                         |
| character_set_server     | utf8mb4                                      |
| character_set_system     | utf8                                         |
| character_sets_dir       | /home/morgo/opt/mysql/8.0.12/share/charsets/ |
+--------------------------+----------------------------------------------+
8 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT * FROM t2 WHERE a='á';
+------+
| a    |
+------+
| a    |
| A    |
+------+
2 rows in set (0.00 sec)

TiDB:

mysql> show variables like 'character%';
+--------------------------+--------------------------------------------------------+
| Variable_name            | Value                                                  |
+--------------------------+--------------------------------------------------------+
| character_sets_dir       | /usr/local/mysql-5.6.25-osx10.8-x86_64/share/charsets/ |
| character_set_database   | utf8                                                   |
| character_set_client     | utf8                                                   |
| character_set_results    | utf8                                                   |
| character_set_connection | utf8                                                   |
| character_set_system     | utf8                                                   |
| character_set_filesystem | binary                                                 |
| character_set_server     | utf8                                                   |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> SELECT * FROM t2 WHERE a='á';
Empty set (0.00 sec)

@morgo
Copy link
Contributor Author

morgo commented Aug 28, 2018

I did some research, looks like this is a Roadmap feature. Should I close this bug or leave it open?

@morgo morgo added the type/enhancement The issue or PR belongs to an enhancement. label Oct 19, 2018
@morgo morgo changed the title utf8_general_ci does not collate insensitively Support Case Insensitive Collations Nov 24, 2018
@morgo
Copy link
Contributor Author

morgo commented Nov 25, 2018

Closing as duplicate of #222

@morgo morgo closed this as completed Nov 25, 2018
@zhexuany zhexuany reopened this Apr 21, 2019
@zhexuany
Copy link
Contributor

reopen since it has more details.

@bb7133
Copy link
Member

bb7133 commented Jan 24, 2020

hi @morgo @zhexuany , now we have the plan in #14573 and #14574, your opinions are appreciated, thanks!

@ghost
Copy link

ghost commented Aug 6, 2020

New collations are now available, and documented at https://docs.pingcap.com/tidb/stable/character-set-and-collation#new-framework-for-collations

I am going to reclose this issue. Thanks!

@ghost ghost closed this as completed Aug 6, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants