-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configuration for database connection collation #22564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and tidy
@mneudert I'm wondering if we also should introduce a way to ensure that all tables are created with the same collation (like we do for TiDb). At the moment a tables collation is also based on the default collation of the database, not sure if that could cause problems as well at some point, if the default collation changes 🤔 |
Good point @sgiehl. I just checked how MariaDB behaves, and found that the current proposed fix is not enough :( As we create table using So we indeed need to allow setting the collation in all create statements. Should the config setting be renamed to the more generic |
Guess that makes most sense. And during installation we should detect the current default collation in order to set it to the config. Not sure if we should also do that during the update, to ensure every installation has a collation set in the end. |
Further changes required by other reviewers
The changes related to tables creation look good. Are there still any changes pending related to the installation and automatic config setup? |
51b3e8a
to
a919e26
Compare
Would appreciate a patch release for this soon! My archive tasks are consistently failing for now. |
* Add configuration for database connection collation * Rename "connection_collation" config to "collation" * Pass configured database collation to table creation statements * Detect default collation to be used during database creation * Save default database collation to config during installation * Update database collection config if auto-detectable * Add database collation to diagnostics * Configure default collation for test environment * Update expected screenshots * Look at most recent archive table for update collation detection
* Add configuration for database connection collation * Rename "connection_collation" config to "collation" * Pass configured database collation to table creation statements * Detect default collation to be used during database creation * Save default database collation to config during installation * Update database collection config if auto-detectable * Add database collation to diagnostics * Configure default collation for test environment * Update expected screenshots * Look at most recent archive table for update collation detection
Description:
The MariaDB
11.5
release introduced a change in the default Unicode collation.Due to the way Matomo connects to the database (i.e. sending
SET NAMES
without aCOLLATE
parameter), this can lead to a change in the collation used in some queries.If the database was created with a previous MariaDB version, like
11.3
, and thecharset
setting was configured asutf8mb4
, the effective collation afterSET NAMES
should beutf8mb4_general_ci
. This collation will then also be used to create the archive tables likearchive_blob_2024_08
.With MariaDB
11.5
, the collation will, and this may depend on the individual server configuration, change toutf8mb4_uca1400_ai_ci
. And this collection will then be used to create a new archive table.The problems arise when queries are using variables, for example during archiving:
If the table
archive_blob_2024_08
was created usingutf8mb4_general_ci
, and the connection collation is set toutf8mb4_uca1400_ai_ci
, the variable assignments in theWHERE
clause will create a forbidden collation mix. And this breaks archiving.While one way to work around that issue is to reconfigure the
character_set_collations
server configuration fromutf8mb4=utf8mb4_uca1400_ai_ci
toutf8mb4=utf8mb4_general_ci
, this may not be possible in many environments, like shared hosting.This PR introduces a new, optional, database setting
connection_collation
. If it is set alongside a databasecharset
, this value will be passed to aSET NAMES ... COLLATE ...
statement, setting the connection collation back to the value required for an uninterrupted service.Config update
During installation and the next update, the config will be checked if a collation has been set.
If that is not the case, an automatic update is tried, based on the comparison of the collation of the
user
table and the one returned fromSELECT @@collation_connection
. If both are the same the configuration will be updated to this value.If not, the update will check the most recent archive table (by name). If the collation of that table matches the users table, it will be chosen for the config update.
Otherwise no update will take place, so we don't accidentally break any setups. For this case the diagnostics have been updated to show the used connection collation and suggest updating the configuration manually to a suitable value.
Note: It can happen that, after an upgrade, a mix of database tables has been created. For example
2024_08
and2024_09
could have been created with different collations. In this case the one of the tables has to be manually altered (or deleted and recreated by invalidation and rearchiving), so all tables have the same collation.Fixes #22536
Refs DEV-18459
Review