-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly improve performance #40
Conversation
It seems I have broken something with the fixed nodes -- I will look into this soon. |
I hadn't considered the fact that fixed nodes could force community labels to be nonconsecutive after calling |
I have to look into this more. |
I've added support for The testing scripts are now valgrind-clean on my end, so this will probably do the trick. |
The fixed nodes test fails on i686 architectures? I'll have to check this in a VM tonight. |
Great, thanks for this @ragibson ! I will have to take a closer look, but it seems promising indeed. The performance gain from not using a set in 8c08b3c makes a lot of sense, and destroying the RNG compatibility with previous versions is no problem I think. People can still get identical results by using identical versions (which is not a strange requirement if you insist on exact replication). (Btw, you can also mark a PR as a draft, and then mark it as finished if it is ready for review). I will try to take a closer look somewhat later this week.
No idea why this happens, I will also have to check in more detail, at first sight it seems rather strange indeed. |
Also, remove the temporary printf statements for testing the CI
It turns out my code for the rearranging of community labels didn't update the bookkeeping correctly if the number of communities increased during the call. Without fixed nodes, this was only used to reorder labels and remove empty communities, so I hadn't fully considered how to handle this case. I still want to test this a bit more (and maybe add a few test cases) before calling this "ready to merge". If you're willing to go down a fascinating rabbit hole, the reason the test was only failing on the i686 architecture is partially due to that architecture having faulty (i.e. not IEEE 754 compliant) floating-point units. In fact, if you look at the CI run in 2843e4b, you'll see
where the CPU evaluated The gist of what's happening is that double comparisons here actually use 80-bit extended precision when values are stored in a register and 64-bit precision otherwise. Bizarrely, the architecture allows both printf("(gt=%d, eq=%d, lt=%d)\n", possible_improv > max_improv, possible_improv == max_improv, possible_improv < max_improv); and print
where the CPU presumably did the first comparison with one of the variables in a register and then moved it to main memory before doing the next comparison. This truncates 16 bits of precision, making the values equal even though the variables themselves were never changed. This ended up making the i686 architecture follow slightly different paths through the code than all the other ones, revealing the bug. |
I don't see any remaining issues, even with DEBUG enabled, so I think this is ready for an initial, cursory review now. I added one last fast path for undirected graphs, which makes the optimization essentially as fast as igraph's. Directed graphs (which igraph doesn't support right now) obviously see less of a benefit, but the earlier optimizations are still very significant. |
Again, great work @ragibson ! Good to see the i686 architecture revealed an actual bug. And indeed a fascinating rabbit hold to go down! Never would I have imagine to come across an implementation that would allow for strict inequality and equality at the same time. I never expected this large performance gain still. The algorithm initially was designed to be flexible, not for speed, this was also the reason I included the Leiden algorithm in I will try to review the PR early next week. When it is merged I will probably do a quick release as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ragibson, it all looks fine to me, thanks again! I went through all the changes, and they actually make a lot of sense. There is one part that we may improve still (for performance reasons when using a larger number of fixed nodes). All the rest are just some minor things.
I'm actually going on holidays for three weeks now. Feel free to make these changes yourself, or otherwise I can make these changes when I get back.
Yes, there are also some tweaks that can be made with multilayer networks (e.g. relabeling instead of renumbering communities for each layer) and some non-default optimization modes ( I'll look into the changes soon. Have a good holiday! |
Alright, I think that's all the changes you requested -- let me know what you think when you get back from vacation. Note that the CI failures do not seem real. They fail with
so they should be rerun once this is fixed. I think you'll be happy to know that your suggested tweak to calculate degrees directly did make a measurable difference, so this is slightly faster than before. |
I noticed that igraph's
community_leiden(objective_function="modularity")
was faster than this package and wanted to make the difference less significant.I think most of the changes here should be fairly self-explanatory if you look at one commit at a time. Feel free to comment on or ask for clarification on the more complicated changes.
Up until the very last commit here, the changes are "RNG consistent" with your master branch. That is, the results should be exactly the same as the current code. For example,
prints
in both versions.
The last commit replaces some set iteration (which iterates in value-sorted order for
size_t
) with vector iteration (which iterates in the order in which the node's neighbors appear in igraph). This doesn't change the underlying algorithm, but the new iteration order slightly changes the results for any given RNG seed.There's definitely more work that can be done, especially since this only makes tweaks to code paths taken with the default Optimiser parameters, but I think this is good enough for consideration at this point.