Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Link Analysis functions #315

Closed
IvanIsCoding opened this issue May 6, 2021 · 2 comments · Fixed by #790
Closed

Add Link Analysis functions #315

IvanIsCoding opened this issue May 6, 2021 · 2 comments · Fixed by #790
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@IvanIsCoding
Copy link
Collaborator

What is the expected enhancement?

Add functions to calculate PageRank and HITS to expand the retworkx's link analysis capabilities.

These algorithms are popular among users of graph libraries, and also appear frequently in benchmarks. The hope by adding those algorithms is to make retworkx appeal to more users and become more popular.

For inspiration, our functions couls have an API that is similar to networkx's Link Analysis API:

@mtreinish mtreinish added enhancement New feature or request good first issue Good for newcomers labels May 6, 2021
@mtreinish
Copy link
Member

What about google_matrix, hub_matrix, and authority_matrix looking at those it looks like they're used internally by the functions you listed but also exposed as public functions in networkx. Do you think it makes sense to do the same for retworkx?

@IvanIsCoding
Copy link
Collaborator Author

What about google_matrix, hub_matrix, and authority_matrix looking at those it looks like they're used internally by the functions you listed but also exposed as public functions in networkx. Do you think it makes sense to do the same for retworkx?

It would be nice to expose those publicly if it is easy to do so. It also helps to keep the code organized, so we could follow networkx implemenation style and separate it as calculate matrix in one function, than interpret the eigenvectors in another function.

However, the main focus should be implementing pagerank and hits. google_matrix and hub_matrix would be a nice to have, but pagerank and hits are an order of magnitude more popular than the internals they use. For link anaylsis, what matters is more the end result.

This was referenced Jan 23, 2023
mtreinish added a commit that referenced this issue May 26, 2023
Related to #315

Adds an implementation of the PageRank algorithm using sparse matrices. It uses the sprs crate combined with ndarray to implement a Power Method approach of finding the PageRank.

Also, we test this implementation against NetworkX's implementation of the PageRank. We accept all the arguments that NetworkX accepts: tolerance, max_iter, personalization, dangling, etc.

* Add the sketch of PageRank

* More progress towards pagerank

* Use CentralityMapping and FailedToConverge

* Finalize PageRank

* First test does not run

* Remove unwanted triplet

* Fix clippy warning

* Handle personalization correctly

* Add more tests

* Cargo fmt

* Add scipy to test requirements

* Skip SciPy tests in case architecture does not have it

* Ignore flake8 errors that do not help

* Flake8

* Handle dangling weights

* Add more tests

* Add nstart argument

* Cargo Clippy

* Documentation

* Fix typo in URL

* Update releasenotes/notes/add-pagerank-bef0de7d46026071.yaml

Co-authored-by: Matthew Treinish <[email protected]>

* Tweak pyfunction signature

* Add scipy to aarch64 test requirements

* Address comments from code review

* Clippy is always right

---------

Co-authored-by: Matthew Treinish <[email protected]>
@mergify mergify bot closed this as completed in #790 May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants