Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly improve performance #40

Merged
merged 29 commits into from
Aug 21, 2020
Merged

Significantly improve performance #40

merged 29 commits into from
Aug 21, 2020

Conversation

ragibson
Copy link
Contributor

leiden_runtimes

I noticed that igraph's community_leiden(objective_function="modularity") was faster than this package and wanted to make the difference less significant.

I think most of the changes here should be fairly self-explanatory if you look at one commit at a time. Feel free to comment on or ask for clarification on the more complicated changes.

Up until the very last commit here, the changes are "RNG consistent" with your master branch. That is, the results should be exactly the same as the current code. For example,

import igraph as ig
import leidenalg as la
from random import seed

seed(0)
G = ig.Graph.Erdos_Renyi(n=100, m=500, directed=True)

optimiser = la.Optimiser()
optimiser.set_rng_seed(0)
part = la.RBConfigurationVertexPartition(G)
optimiser.optimise_partition(part)
print(part.membership)

prints

[2, 0, 3, 2, 0, 1, 1, 0, 2, 4, 0, 0, 1, 3, 1, 0, 2, 1, 4, 3, 2, 3, 0, 3, 1, 2, 2, 3, 1, 4, 0, 0, 0, 1, 1, 4, 4, 1, 1, 3, 0, 2, 3, 2, 3, 0, 0, 0, 0, 0, 3, 4, 3, 1, 1, 0, 2, 2, 0, 2, 2, 1, 4, 2, 1, 0, 2, 4, 0, 0, 1, 0, 0, 4, 1, 1, 0, 1, 4, 2, 2, 4, 0, 2, 0, 0, 0, 2, 0, 1, 2, 3, 1, 0, 2, 4, 3, 0, 1, 1]

in both versions.

The last commit replaces some set iteration (which iterates in value-sorted order for size_t) with vector iteration (which iterates in the order in which the node's neighbors appear in igraph). This doesn't change the underlying algorithm, but the new iteration order slightly changes the results for any given RNG seed.

There's definitely more work that can be done, especially since this only makes tweaks to code paths taken with the default Optimiser parameters, but I think this is good enough for consideration at this point.

@ragibson
Copy link
Contributor Author

It seems I have broken something with the fixed nodes -- I will look into this soon.

@ragibson
Copy link
Contributor Author

I hadn't considered the fact that fixed nodes could force community labels to be nonconsecutive after calling renumber_communities(). Hopefully the issue is now fixed.

@ragibson
Copy link
Contributor Author

ragibson commented Jul 13, 2020

I have to look into this more.

@ragibson ragibson closed this Jul 13, 2020
@ragibson
Copy link
Contributor Author

I've added support for rearrange_community_labels() calls that create empty communities (this only occurs with fixed nodes as far as I can tell).

The testing scripts are now valgrind-clean on my end, so this will probably do the trick.

@ragibson ragibson reopened this Jul 13, 2020
@ragibson
Copy link
Contributor Author

The fixed nodes test fails on i686 architectures? I'll have to check this in a VM tonight.

@vtraag
Copy link
Owner

vtraag commented Jul 13, 2020

Great, thanks for this @ragibson ! I will have to take a closer look, but it seems promising indeed. The performance gain from not using a set in 8c08b3c makes a lot of sense, and destroying the RNG compatibility with previous versions is no problem I think. People can still get identical results by using identical versions (which is not a strange requirement if you insist on exact replication).

(Btw, you can also mark a PR as a draft, and then mark it as finished if it is ready for review).

I will try to take a closer look somewhat later this week.

The fixed nodes test fails on i686 architectures? I'll have to check this in a VM tonight.

No idea why this happens, I will also have to check in more detail, at first sight it seems rather strange indeed.

@ragibson ragibson marked this pull request as draft July 13, 2020 23:08
@ragibson
Copy link
Contributor Author

ragibson commented Jul 14, 2020

It turns out my code for the rearranging of community labels didn't update the bookkeeping correctly if the number of communities increased during the call. Without fixed nodes, this was only used to reorder labels and remove empty communities, so I hadn't fully considered how to handle this case.

I still want to test this a bit more (and maybe add a few test cases) before calling this "ready to merge".

If you're willing to go down a fascinating rabbit hole, the reason the test was only failing on the i686 architecture is partially due to that architecture having faulty (i.e. not IEEE 754 compliant) floating-point units. In fact, if you look at the CI run in 2843e4b, you'll see

possible_improv=1.98 > max_improv=1.98 ?
possible_improv=0x1.fae147ae147aep+0 > max_improv=0x1.fae147ae147aep+0 ?
(gt=1, eq=0, lt=0)

where the CPU evaluated possible_improv as being greater than max_improv, despite having the exact same double-precision value.

The gist of what's happening is that double comparisons here actually use 80-bit extended precision when values are stored in a register and 64-bit precision otherwise.

Bizarrely, the architecture allows both possible_improv > max_improv and possible_improv == max_improv to be true in a single line of code if the variables' register allocation changes. I actually had a few tests on a local VM run

printf("(gt=%d, eq=%d, lt=%d)\n", possible_improv > max_improv, possible_improv == max_improv, possible_improv < max_improv);

and print

(gt=1, eq=1, lt=0)

where the CPU presumably did the first comparison with one of the variables in a register and then moved it to main memory before doing the next comparison. This truncates 16 bits of precision, making the values equal even though the variables themselves were never changed.

This ended up making the i686 architecture follow slightly different paths through the code than all the other ones, revealing the bug.

@ragibson
Copy link
Contributor Author

I don't see any remaining issues, even with DEBUG enabled, so I think this is ready for an initial, cursory review now.

I added one last fast path for undirected graphs, which makes the optimization essentially as fast as igraph's.

undirected_leiden_runtimes

Directed graphs (which igraph doesn't support right now) obviously see less of a benefit, but the earlier optimizations are still very significant.

directed_leiden_runtimes

@ragibson ragibson marked this pull request as ready for review July 16, 2020 03:39
@vtraag
Copy link
Owner

vtraag commented Jul 18, 2020

Again, great work @ragibson !

Good to see the i686 architecture revealed an actual bug. And indeed a fascinating rabbit hold to go down! Never would I have imagine to come across an implementation that would allow for strict inequality and equality at the same time.

I never expected this large performance gain still. The algorithm initially was designed to be flexible, not for speed, this was also the reason I included the Leiden algorithm in igraph itself. Seeing it now becoming almost as fast as the igraph implementation is a great improvement!

I will try to review the PR early next week. When it is merged I will probably do a quick release as soon as possible.

Copy link
Owner

@vtraag vtraag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ragibson, it all looks fine to me, thanks again! I went through all the changes, and they actually make a lot of sense. There is one part that we may improve still (for performance reasons when using a larger number of fixed nodes). All the rest are just some minor things.

I'm actually going on holidays for three weeks now. Feel free to make these changes yourself, or otherwise I can make these changes when I get back.

include/GraphHelper.h Outdated Show resolved Hide resolved
src/GraphHelper.cpp Show resolved Hide resolved
src/GraphHelper.cpp Outdated Show resolved Hide resolved
src/GraphHelper.cpp Show resolved Hide resolved
src/MutableVertexPartition.cpp Outdated Show resolved Hide resolved
src/MutableVertexPartition.cpp Outdated Show resolved Hide resolved
src/MutableVertexPartition.cpp Show resolved Hide resolved
src/MutableVertexPartition.cpp Show resolved Hide resolved
src/Optimiser.cpp Show resolved Hide resolved
tests/test_Optimiser.py Show resolved Hide resolved
@ragibson
Copy link
Contributor Author

ragibson commented Jul 24, 2020

Yes, there are also some tweaks that can be made with multilayer networks (e.g. relabeling instead of renumbering communities for each layer) and some non-default optimization modes (optimise_routine == Optimiser::MERGE_NODES and refine_routine == Optimiser::MOVE_NODES, for example), but those were outside the scope of my optimizations. In any case, those are not the "expected use case" for most users.

I'll look into the changes soon. Have a good holiday!

@ragibson
Copy link
Contributor Author

ragibson commented Aug 13, 2020

Alright, I think that's all the changes you requested -- let me know what you think when you get back from vacation.

Note that the CI failures do not seem real. They fail with

downloading mingw64.db...
downloading mingw64.db.sig...
error: mingw64: signature from "David Macek <[email protected]>" is unknown trust
error: failed to update mingw64 (invalid or corrupted database (PGP signature))
downloading msys.db...
downloading msys.db.sig...
error: msys: signature from "David Macek <[email protected]>" is unknown trust
error: failed to update msys (invalid or corrupted database (PGP signature))
error: failed to synchronize all databases

so they should be rerun once this is fixed.

I think you'll be happy to know that your suggested tweak to calculate degrees directly did make a measurable difference, so this is slightly faster than before.
final_leiden_runtimes

@vtraag vtraag self-requested a review August 21, 2020 08:15
@vtraag vtraag merged commit ce10c99 into vtraag:master Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants