Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Vector for intermediate computations in Dijkstra #493

Merged
merged 37 commits into from
Jan 14, 2022

Conversation

IvanIsCoding
Copy link
Collaborator

@IvanIsCoding IvanIsCoding commented Nov 24, 2021

Related to #492

Replaces DictMap with Vector for intermediate computations in our Dijkstra's implementation.

  • Instead of always using DictMap to store the scores, we use a Vector and create the DictMap the end. That way, users can get a nice deterministic output and we can keep computations fast

@coveralls
Copy link

coveralls commented Nov 24, 2021

Pull Request Test Coverage Report for Build 1699742967

  • 98 of 99 (98.99%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.003%) to 98.479%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/shortest_path/mod.rs 62 63 98.41%
Totals Coverage Status
Change from base Build 1699064131: 0.003%
Covered Lines: 11593
Relevant Lines: 11772

💛 - Coveralls

@mtreinish
Copy link
Member

This makes the overhead worse, adding a sort step looks like it causes an even worse regression:

single_source_shortest_path_2
single_source_shortest_path
all_pairs

@IvanIsCoding
Copy link
Collaborator Author

This makes the overhead worse, adding a sort step looks like it causes an even worse regression:

single_source_shortest_path_2 single_source_shortest_path all_pairs

Good to know, I will try another idea

@IvanIsCoding IvanIsCoding changed the title Use HashMap for intermediate computations in Dijkstra Use Vector for intermediate computations in Dijkstra Nov 24, 2021
@IvanIsCoding
Copy link
Collaborator Author

Tried using Vector instead of HashMap, that might give us more margin.

Copy link
Collaborator

@georgios-ts georgios-ts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are here, we could also see if there is any performance benfit replacing BinaryHeap with a QuaternaryHeap. At least, BGL (that graph-tool depends on) uses a 4-ary heap.

retworkx-core/src/shortest_path/dijkstra.rs Outdated Show resolved Hide resolved
@IvanIsCoding
Copy link
Collaborator Author

IvanIsCoding commented Nov 25, 2021

While we are here, we could also see if there is any performance benfit replacing BinaryHeap with a QuaternaryHeap. At least, BGL (that graph-tool depends on) uses a 4-ary heap.

That crate seems a bit too young, I will hold on the Quaternary Heap for a bit. The MSRVs are incompatible as well.

@mtreinish
Copy link
Member

Running with the current version there is still a regression, but an improvement over main:

single_source_shortest_path_2
single_source_shortest_path
all_pairs

@IvanIsCoding
Copy link
Collaborator Author

IvanIsCoding commented Nov 29, 2021

Edit: I removed the QuaternaryHeap, the bottleneck is not there so I will save it for another PR when we benchmark with more details.

I added the QuaternaryHeap, they had a version without const generics with Rust MSRV 1.31

@IvanIsCoding
Copy link
Collaborator Author

@mtreinish I think the biggest culprit is https://github.com/Qiskit/retworkx/blob/main/src/shortest_path/mod.rs#L221-L231, every time we map through a HashMap/DictMap we allocate more memory. Because DictMap uses more memory and the graph had millions of nodes, the results got worse.

The code I have now is not the cleanest, but it creates a DictMap only once. That does lower the performance regression

@mtreinish
Copy link
Member

@IvanIsCoding yeah that does the trick now. Running the benchmarks again it's faster than 0.10.2 now:

single_source_shortest_path_2
single_source_shortest_path
all_pairs

@IvanIsCoding
Copy link
Collaborator Author

I will try to clean the code for this one, and flag other similar cases in #492.

@georgios-ts
Copy link
Collaborator

It's great that we managed to fixed the regression! Having two separate functions dijkstra, dijkstra_vector is not the end of the world but it's not the cleanest interface, so I have some guesses/suggestions to make:

  1. Given that the biggest regression was in the single source shortest path benchmark between two nodes and hopefully the code https://github.com/Qiskit/retworkx/blob/main/src/shortest_path/mod.rs#L221-L231 does not allocate memory for million of nodes if goal is set, I'm wondering if the early return here https://github.com/Qiskit/retworkx/blob/4673692bff01f924f71c6aeba8e8203db7fea6ee/src/shortest_path/mod.rs#L223-L234 that avoids iterating over all nodes in the dict could fix alone the regression.

  2. Keep the code from main in dijkstra and change the type of scores from DictMap<G::NodeId, K> to DictMap<usize, K>, so we avoid iterate/and allocate a new dict to store inside PathLengthMapping. Or, do the opposite and change the dict type inside PathLengthMapping to DictMap<NodeIndex, f64>.

  3. Keep only dijkstra_vector in retworkx-core.

  4. Define a trait:

    pub trait DistMap<N, K> { 
       fn get(&self,  a: N) -> Option<K>;
       fn put(&mut self, a: N, val: K)
    }

    similar to VisitMap, implement it for HashMap/DictMap/Vec and make dijkstra generic like

    pub fn dijkstra<G, F, K, E>(
        graph: G,
        start: G::NodeId,
        goal: Option<G::NodeId>,
        edge_cost: F,
        scores: impl DistMap<G::NodeId, K>
        path: Option<&mut DictMap<G::NodeId, Vec<G::NodeId>>>,
    )

    so Rust users can choose between HashMap/DictMap/Vec.

Copy link
Collaborator

@georgios-ts georgios-ts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some inline comments. Apart from that, it's weird that we use internally a vector for the computation in dijkstra_shortest_path_lengths but a DictMap in dijkstra_shortest_paths. For maximum flexibility and control over memory - speed trade-off, it seems a good choice to add a kwarg sparse in order to let users choose which data structure best fits their needs.

retworkx-core/src/shortest_path/dijkstra.rs Outdated Show resolved Hide resolved
retworkx-core/src/shortest_path/dijkstra.rs Outdated Show resolved Hide resolved
retworkx-core/src/shortest_path/dijkstra.rs Outdated Show resolved Hide resolved
retworkx-core/src/shortest_path/dijkstra.rs Show resolved Hide resolved
retworkx-core/src/shortest_path/dijkstra.rs Outdated Show resolved Hide resolved
retworkx-core/src/shortest_path/k_shortest_path.rs Outdated Show resolved Hide resolved
src/shortest_path/all_pairs_dijkstra.rs Outdated Show resolved Hide resolved
@IvanIsCoding
Copy link
Collaborator Author

I left some inline comments. Apart from that, it's weird that we use internally a vector for the computation in dijkstra_shortest_path_lengths but a DictMap in dijkstra_shortest_paths. For maximum flexibility and control over memory - speed trade-off, it seems a good choice to add a kwarg sparse in order to let users choose which data structure best fits their needs.

I addressed the comments, but I think adding a sparse kwarg is not the best choice. We'd need to have a conditional calling the same function with different types, which is a bit cumbersome. I think it's ok to make a choice for the user in this case.

@georgios-ts
Copy link
Collaborator

Yeah, it's ok to make a choice for the users but then why not use a vector internally for dijkstra_shortest_paths (or even in all_pairs_dijkstra_path_lengths) too?

@mtreinish mtreinish added this to the 0.11.0 milestone Jan 4, 2022
@IvanIsCoding
Copy link
Collaborator Author

all_pairs_dijkstra_path_lengths

I've updated the PR to include that as well

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thanks for keeping with this. The code looks great and the performance definitely is where we want it to be:

all_pairs
single_source_shortest_path
single_source_shortest_path_2

It'd be good if @georgios-ts gave this a look too since he had comments on earlier revisions.

I just had two quick comments inline about the documentation and comments before we merge this around the DistanceMap trait. It might be worth manually checking the cargo doc output to verify how it renders.

retworkx-core/src/distancemap.rs Show resolved Hide resolved
retworkx-core/src/distancemap.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@georgios-ts georgios-ts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and performance looks great!

retworkx-core/src/distancemap.rs Outdated Show resolved Hide resolved
@IvanIsCoding IvanIsCoding merged commit 0b542e9 into Qiskit:main Jan 14, 2022
InnovativeInventor pushed a commit to InnovativeInventor/retworkx that referenced this pull request Mar 19, 2022
* Sort Dijkstra output at the end

* Handle dense cases at the end

* Change condition for sorting output

* Use Vector for intermediate calculations

* Use vector in k_shortest_path

* Address clippy comments

* Fix bug

* Incorporate feedback from PR

* Add quaternary heap

* Fix steiner tree test

* Avoid creating duplicated dictmaps

* Revert "Fix steiner tree test"

This reverts commit 4a17a25.

* Add tests and docstring

* Use trait to reduce duplication

* Move DistanceMap to its own file

* Use DistanceMap in k_shortest_path

* Add test coverage

* Support HashMap in DistanceMap

* Remove type casting

* Remove unnecessary IndexType

* Use Vector in dijkstra_shortest_paths and all_pairs_dijkstra_path_lengths

* Update distancemap.rs

* Update retworkx-core/src/distancemap.rs

Co-authored-by: georgios-ts <[email protected]>

* Add docstrings

* Cargo fmt

Co-authored-by: georgios-ts <[email protected]>
@IvanIsCoding IvanIsCoding deleted the dijkstra-regression branch June 30, 2022 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants