Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to 4.x sql duplicate key #37141

Closed
brianteeman opened this issue Feb 25, 2022 · 18 comments
Closed

Update to 4.x sql duplicate key #37141

brianteeman opened this issue Feb 25, 2022 · 18 comments

Comments

@brianteeman
Copy link
Contributor

One of the errors that is coming up quite often is a result of a user trying a failed update for a second time and running into this sql error

2022-02-25T07:30:46+00:00 INFO 80.244.198.253 update Ran query from file 4.0.0-2018-05-15. Query text: CREATE TABLE IF NOT EXISTS `#__workflows` ( `id` int NOT NULL AUTO_INCREMENT, .
2022-02-25T07:30:46+00:00 INFO 80.244.198.253 update Ran query from file 4.0.0-2018-05-15. Query text: INSERT INTO `#__workflows` (`id`, `asset_id`, `published`, `title`, `description.
2022-02-25T07:30:46+00:00 INFO 80.244.198.253 update JInstaller: :Install: Error SQL Duplicate entry '1' for key 'PRIMARY

There are several options available to us to resolve by amending the query "insert into ..."

In our specific case the most relevant would appear to be changing the query to "replace into ..."

When issuing a REPLACE statement, there are two possible outcomes for each issued command:

  • No existing data row is found with matching values and thus a standard INSERT statement is performed.
  • A matching data row is found, causing that existing row to be deleted with the standard DELETE statement, then a normal INSERT is performed afterward.

@richard67 thoughts? You know way more than me on sql

@brianteeman
Copy link
Contributor Author

It would only be the inserts that include the primary incremental record id that would need changing so that should only be the ones for workflows and maybe finder

@richard67
Copy link
Member

Using a "replace" statement in MySQL or doing the equivalent in PostgreSQL is called "upsert". Unfortunately the minimum version requirements of J4 don't allow to use that because we still have to support older MySQL and MariaDB versions which do not support that.

It could be solved by changing the insert statements to be done record by record, and for each record they use a select statement as a data source, similar as we do when we insert into the menu table and select the extension ID from the extensions table and put the result together with literal values for the other columns.

That select statement would contain a condition which returns no result when there is already a record with that ID (or other PK which would be violated).

We have done something similar for inserting the pkg_search record when updating.

That solution would have 2 disadvantages:

  1. It would cause a discussion with someone because we have to change old update SQL scripts which should be immutable in a ideal world.
  2. The SQL would be hard to understand and so if we want to do such inserts in future in a similar way hard for contributors to do it in the right way.

@brianteeman
Copy link
Contributor Author

brianteeman commented Feb 25, 2022

The minimum mysql version supported is 5.6 and that does support replace https://dev.mysql.com/doc/refman/5.6/en/replace.html

@brianteeman
Copy link
Contributor Author

there is also INSERT ON DUPLICATE KEY UPDATE and INSERT IGNORE

or am I missing something?

@richard67
Copy link
Member

Well, then it was either MariaDB or PostgreSQL where there was some problem why we could not do that.

Another problem is that even when we fix the duplicate key issue, there will be other issues which we can't fix with SQL only, see my comments in and it is also worth a question if we should leave the people with the illusion that their site is fine when we fix some of the glitches after failed updates but other not so visible glitches are remaining.

@brianteeman
Copy link
Contributor Author

MariaDB supports all the methods I described. It is postgres that does not and would require a more complicated solution.

However as 98% of all installs are on mysql/mariadb just changing those updates would be a massive improvement. As only a small % of updates hit this issue then there will not be very many running postgresql at all.

Regarding the other issues. I agree with you and its why I introduced the logging although it still says installation success at the end when the sql updates terminated :( but lets fix the issues we can fix now.

And no I wont complain about immutable sql files this time (especially if it is commented as you did with the child templates)

@richard67
Copy link
Member

I will investigate again on weekend and report back. But I'm still not sure if we should do something with it because it will not solve other problems like indexes already existing and MySQL not supporting an IX NOT EXISTS for adding columns and so on.

@brianteeman
Copy link
Contributor Author

found this for postgres upsert
https://www.postgresqltutorial.com/postgresql-upsert/

@brianteeman
Copy link
Contributor Author

and for adding columns can we use a stored procedure? https://thispointer.com/mysql-add-column-if-not-exists/

@richard67
Copy link
Member

We have to check since which version upsert is supported in PostgreSQL.

I think it could be sufficient just to use INSERT IGNORE at some places.

Regarding stored procedures: That could be a way, but the SQL using that would not be handled by the database checker.

@richard67
Copy link
Member

See also #35664 .

@richard67
Copy link
Member

As far as I can see we could use INSERT ... ON DUPLICATE KEY UPDATE for MySQL / MariaDB and INSERT ... ON CONFLICT in our SQL scripts, since these statements are supported with the minimum database server versions for J4.

The question is if we really need that updating of existing records or if it would not just be enough to do nothing on conflict, i.e. not insert or update anything when the record already exists, because in the cases handled here (new stuff e.g. for workflows added with an update SQL) we can assume that the existing records after a first, unsuccessful update attempt will not be modified and nobody will create new workflows stuff in backend manually before the 2nd update attempt.

Also a way could be just to delete the records in the tables for workflows before inserting the new records (but after the CREATE TABLE ... IF NOT EXISTS).

For other stuff which we insert on updates, like e.g. records in the menu table or in the extensions table or the assets table, we do not use hard-coded IDs. In such a case, when we run the update SQL again, we insert again new items for the same thing but with a different ID. This will result in redundant records, and an update on a violated primary key will not help. We would need other conditions (or keys) to identify what "the same thing" means in a particular table, like e.g. the combination of type, element, folder and client_id is in the extensions table. Not in all cases we have a key for that (but could create one), and should use that key to identify if there is already a record or not.

Another way to solve this and other issues would be PR #36506 , but I am not sure if it is sufficient to just allow the SQL to fail or if we would need some kind of condition which failure we allow. The correct update of the schema version in the database after each successful run of an SQL update script which that PR adds is a necessary fix independently from that condition thing.

@richard67
Copy link
Member

richard67 commented Feb 26, 2022

I think for the duplicate keys we can assume that nobody modified the records in the mean time, so we do not need to update them if we run the update SQL script a second time, we simply need to do nothing when the record with that PK already exists.

That means we can use INSERT IGNORE with MySQL/MariaDB and ON CONFLICT DO NOTHING with PostgreSQL.

For the latter see https://www.postgresqltutorial.com/postgresql-upsert/ .

I will prepare a PR in the next days.

@richard67
Copy link
Member

If you are curious: See #36506 (comment) ... I have created a branch with the change from PR #36506 and the necessary changes on the old update SQL scripts plus the change from #37154 .

The changes can be seen here: 4.1-dev...richard67:test-pr-36506 .

It only needs some additional comments in the old update SQL scripts.

PR will come tomorrow, I think.

@richard67
Copy link
Member

PR is #37156 , but as long as it's work in progress we leave this issue here open. The code changes are already done, it just needs to add comments to the changes and complete the PR description and testing instructions.

@brianteeman
Copy link
Contributor Author

I just came to close it but if you want it to stay open for now I am happy with that also

@richard67
Copy link
Member

Closing as having a pull request. Please test #37156 . If you have tested that with success, you can also mark a successful test result for #36506 . And in my PR #37156 please check section "RFC: What is worth a discussion - feedback is welcome" in the description and let me know your opinion.

Thanks in advance.

@richard67
Copy link
Member

Update: The test of my PR does not cover all changes in the other PR #36506 , so it needs to test both PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants