-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new topological sort implementation #10592
Merged
derrabus
merged 1 commit into
doctrine:entity-level-commit-order
from
mpdude:new-topo-sort
May 8, 2023
Merged
Add a new topological sort implementation #10592
derrabus
merged 1 commit into
doctrine:entity-level-commit-order
from
mpdude:new-topo-sort
May 8, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
21 tasks
greg0ire
reviewed
Mar 22, 2023
greg0ire
reviewed
Mar 23, 2023
mpdude
force-pushed
the
new-topo-sort
branch
2 times, most recently
from
March 23, 2023 08:20
8e3d0d8
to
8b5f8ee
Compare
Git says the concept of "weight" was introduced in #1570 |
This is the first chunk to break doctrine#10547 into smaller PRs suitable for reviewing. It adds a new topological sort implementation. #### Background Topological sort is an algorithm that sorts the vertices of a directed acyclic graph (DAG) in a linear order such that for every directed edge from vertex A to vertex B, vertex A comes before vertex B in the ordering. This ordering is called a topological order. Ultimately (beyond the scope of this PR), in the ORM we'll need this to find an order in which we can insert new entities into the database. When one entity needs to refer to another one by means of a foreign key, the referred-to entity must be inserted before the referring entity. Deleting entities is similar. A topological sorting can be obtained by running a depth first search (DFS) on the graph. The order in which the DFS finishes on the vertices is a topological order. The DFS is possible iif there are no cycles in the graph. When there are cycles, the DFS will find them. For more information about topological sorting, as well as a description of an DFS-based topological sorting algorithm, see https://en.wikipedia.org/wiki/Topological_sorting. #### Current situation There is a DFS-based topological sorting implemented in the `CommitOrderCalculator`. This implementation has two kinks: 1. It does not detect cycles When there is a cycle in the DAG that cannot be resolved, we need to know about it. Ultimately, this means we will not be able to insert entities into the database in any order that allows all foreign keys constraints to be satisfied. If you look at `CommitOrderCalculator`, you'll see that there is no code dealing with this situation. 2. It has an obscure concept of edge "weights" To me, it is not clear how those are supposed to work. The weights are related to whether a foreign key is nullable or not, but can (could) be arbitrary integers. An edge will be ignored if it has a higher (lower) weight than another, already processed edge... 🤷🏻? #### Suggested change In fact, when inserting entities into the database, we have two kinds of foreign key relationships: Those that are `nullable`, and those that are not. Non-nullable foreign keys are hard requirements: Referred-to entities must be inserted first, no matter what. These are "non-optional" edges in the dependency graph. Nullable foreign keys can be set to `NULL` when first inserting an entity, and then coming back and updating the foreign key value after the referred-to (related) entity has been inserted into the database. This is already implemented in `\Doctrine\ORM\UnitOfWork::scheduleExtraUpdate`, at the expense of performing one extra `UPDATE` query after all the `INSERT`s have been processed. These edges are "optional". When finding a cycle that consists of non-optional edges only, treat it as a failure. We won't be able to insert entities with a circular dependency when all foreign keys are non-NULLable. When a cycle contains at least one optional edge, we can use it to break the cycle: Use backtracking to go back to the point before traversing the last _optional_ edge. This omits the edge from the topological sort order, but will cost one extra UPDATE later on. To make the transition easier, the new implementation is provided as a separate class, which is marked as `@internal`. #### Outlook Backtracking to the last optional edge is the simplest solution for now. In general, it might be better to find _another_ (optional) edge that would also break the cycle, if that edge is also part of _other_ cycles. Remember, every optional edge skipped costs an extra UPDATE query later on. The underlying problem is known as the "minimum feedback arc set" problem, and is not easily/efficiently solvable. Thus, for the time being, picking the nearest optional edge seems to be reasonable.
@derrabus I've added the |
derrabus
approved these changes
May 8, 2023
mpdude
added a commit
to mpdude/doctrine2
that referenced
this pull request
May 8, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead. #### Current situation `UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code: https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325 #### Suggested change * Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities. * In the dependency graph, add edges for all to-one association target entities. * Make edges "optional" when the association is nullable. #### Test changes I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up. Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude
added a commit
to mpdude/doctrine2
that referenced
this pull request
May 8, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead. #### Current situation `UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code: https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325 #### Suggested change * Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities. * In the dependency graph, add edges for all to-one association target entities. * Make edges "optional" when the association is nullable. #### Test changes I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up. Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude
added a commit
to mpdude/doctrine2
that referenced
this pull request
May 9, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead. #### Current situation `UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code: https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325 #### Suggested change * Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities. * In the dependency graph, add edges for all to-one association target entities. * Make edges "optional" when the association is nullable. #### Test changes I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up. Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude
added a commit
to mpdude/doctrine2
that referenced
this pull request
May 9, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead. #### Current situation `UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code: https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325 #### Suggested change * Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities. * In the dependency graph, add edges for all to-one association target entities. * Make edges "optional" when the association is nullable. #### Test changes I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up. Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
mpdude
added a commit
to mpdude/doctrine2
that referenced
this pull request
May 31, 2023
This is the third step to break doctrine#10547 into smaller PRs suitable for reviewing. It uses the new topological sort implementation from doctrine#10592 and the refactoring from doctrine#10651 to compute the UoW's commit order for entity insertions not on the entity class level, but for single entities and their actual dependencies instead. #### Current situation `UnitOfWork::getCommitOrder()` would compute the entity sequence on the class level with the following code: https://github.com/doctrine/orm/blob/70477d81e96c0044ad6fd8c13c37b2270d082792/lib/Doctrine/ORM/UnitOfWork.php#L1310-L1325 #### Suggested change * Instead of considering the classes of all entities that need to be inserted, updated or deleted, consider the new (inserted) entities only. We only need to find a sequence in situations where there are foreign key relationships between two _new_ entities. * In the dependency graph, add edges for all to-one association target entities. * Make edges "optional" when the association is nullable. #### Test changes I have not tried to fully understand the few changes necessary to fix the tests. My guess is that those are edge cases where the insert order changed and we need to consider this during clean-up. Keep in mind that many of the functional tests we have assume that entities have IDs assigned in the order that they were added to the EntityManager. That does not change – so the order of entities is generally stable, equal to the previous implementation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the first chunk to break #10547 into smaller PRs suitable for reviewing. It adds a new topological sort implementation.
Background
Topological sort is an algorithm that sorts the vertices of a directed acyclic graph (DAG) in a linear order such that for every directed edge from vertex A to vertex B, vertex A comes before vertex B in the ordering. This ordering is called a topological order.
Ultimately (beyond the scope of this PR), in the ORM we'll need this to find an order in which we can insert new entities into the database. When one entity needs to refer to another one by means of a foreign key, the referred-to entity must be inserted before the referring entity. Deleting entities is similar.
A topological sorting can be obtained by running a depth first search (DFS) on the graph. The order in which the DFS finishes on the vertices is a topological order. The DFS is possible iif there are no cycles in the graph. When there are cycles, the DFS will find them.
For more information about topological sorting, as well as a description of an DFS-based topological sorting algorithm, see https://en.wikipedia.org/wiki/Topological_sorting.
Current situation
There is a DFS-based topological sorting implemented in the
CommitOrderCalculator
. This implementation has two kinks:When there is a cycle in the graph, we need to know about it. Ultimately, this means we will not be able to insert entities into the database in any order that allows all foreign keys constraints to be satisfied.
If you look at
CommitOrderCalculator
, you'll see that there is no code dealing with this situation.To me, it is not clear how those are supposed to work. The weights are related to whether a foreign key is nullable or not, but can (could) be arbitrary integers. An edge will be ignored if it has a higher (lower) weight than another, already processed edge... 🤷🏻?
Suggested change
In fact, when inserting entities into the database, we have two kinds of foreign key relationships: Those that are
nullable
, and those that are not.Non-nullable foreign keys are hard requirements: Referred-to entities must be inserted first, no matter what. These are "non-optional" edges in the dependency graph.
Nullable foreign keys can be set to
NULL
when first inserting an entity, and then coming back and updating the foreign key value after the referred-to (related) entity has been inserted into the database. This is already implemented in\Doctrine\ORM\UnitOfWork::scheduleExtraUpdate
, at the expense of performing one extraUPDATE
query after all theINSERT
s have been processed. These edges are "optional".When finding a cycle that consists of non-optional edges only, treat it as a failure. We won't be able to insert entities with a circular dependency when all foreign keys are non-NULLable.
When a cycle contains at least one optional edge, we can use it to break the cycle: Use backtracking to go back to the point before traversing the last optional edge. This omits the edge from the topological sort order, but will cost one extra UPDATE later on.
To make the transition easier, the new implementation is provided as a separate class, which is marked as
@internal
.Outlook
Backtracking to the last optional edge is the simplest solution for now. In general, it might be better to find another (optional) edge that would also break the cycle, if that edge is also part of other cycles.
Remember, every optional edge skipped costs an extra UPDATE query later on. The underlying problem is known as the "minimum feedback arc set" problem, and is not easily/efficiently solvable. Thus, for the time being, picking the nearest optional edge seems to be reasonable.
Extra bonus
This is what the DALL·E AI thinks it looks like when you have a directed graph with cycles, and some wizardry finds a solution to break those.