-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor indexes creation #335
Comments
@Yomguithereal just wondering if you have done any performance testing re the above? I just experimented with changing sigma's three NeighborIndexes to use only the id rather than the full edge and could not see either a significant improvement in terms of memory or timing with a ~20,000 edge graph. I noticed though that you were using only id's in graphology....so I'm curious what your experience has been, if any? |
One of the major issue with sigma's indexing is that it stores three huge indices built on nested objects. One undirected nodeA -> nodeB (+ nodeB -> nodeA) ->edge, this plus two directed ones (in & out) which means a memory usage which is more than reasonable and which is, what's more, very costly to write. However, if you really want to see performance drops when instantiating sigma, try a 50k nodes 2M edges and you should see the browser wait synchronously for a long time for sigma to load the graph. On a side note, do you know of another way to index a graph's structure (be able to retrieve relevant edges & neighbors in constant time) than this one? |
Thanks for those comments @Yomguithereal! I agree with you that there should certainly be more efficient data structures possible than those used by sigma.js currently. I have spent a bit of time reviewing indices in Graphology so far and it looks promising. Your last question is a very interesting one. I cannot claim to be an expert on the topic but I think thinking about it in terms of sparse matrices is possible and potentially helpful. We have an adjaceny matrix that, for large networks, is usually sparse. The question is how to store and access that efficiently. Sigma.js effectively takes a list of lists (LIL) approach to the sparse matrix problem (albeit with some duplication of data). The alternative would be something like CSR (compressed sparse row) format which would, in most instances, be more space efficient but come at the cost of speed to access relative to LIL. That said, there appears to be some interesting work on hashtables vs CSR, e.g. https://pdfs.semanticscholar.org/e284/4574e260ec83437e95c133c0dc56469f8357.pdf. Not sure that going down that path is necessarily a good idea though... |
The structure used by |
v2 now uses graphology so this issue is obsolete and will be closed. As reference, graphology's standard implementation does not internally rely on a LIL, CSR but rather a custom DoD for performance reasons since this is the best trade-off to get constant time access for all operations without hurting memory too much. CSR and LIL just cannot offer the same all-around read performance but would be interesting backend for specialized graphology implementations. |
On huge graphs (i.e. 3000 nodes, 50 000 edges), the initial graph processing is way to slow and is caused by neighbour indexes' creation.
The text was updated successfully, but these errors were encountered: