Refactor indexes creation #335

Yomguithereal · 2014-07-08T16:02:38Z

On huge graphs (i.e. 3000 nodes, 50 000 edges), the initial graph processing is way to slow and is caused by neighbour indexes' creation.

apitts · 2017-02-13T06:36:23Z

@Yomguithereal just wondering if you have done any performance testing re the above? I just experimented with changing sigma's three NeighborIndexes to use only the id rather than the full edge and could not see either a significant improvement in terms of memory or timing with a ~20,000 edge graph. I noticed though that you were using only id's in graphology....so I'm curious what your experience has been, if any?

Yomguithereal · 2017-02-13T09:58:17Z

One of the major issue with sigma's indexing is that it stores three huge indices built on nested objects. One undirected nodeA -> nodeB (+ nodeB -> nodeA) ->edge, this plus two directed ones (in & out) which means a memory usage which is more than reasonable and which is, what's more, very costly to write. graphology uses several strategies to optimize this by storing the indices directly in the nodes' register and so on. I don't believe the fact we are using ids instead of references does something to improve the performance, even if this might give a boost on some JavaScript engines.

However, if you really want to see performance drops when instantiating sigma, try a 50k nodes 2M edges and you should see the browser wait synchronously for a long time for sigma to load the graph.

On a side note, do you know of another way to index a graph's structure (be able to retrieve relevant edges & neighbors in constant time) than this one?

apitts · 2017-02-14T00:24:54Z

Thanks for those comments @Yomguithereal! I agree with you that there should certainly be more efficient data structures possible than those used by sigma.js currently. I have spent a bit of time reviewing indices in Graphology so far and it looks promising.

Your last question is a very interesting one. I cannot claim to be an expert on the topic but I think thinking about it in terms of sparse matrices is possible and potentially helpful. We have an adjaceny matrix that, for large networks, is usually sparse. The question is how to store and access that efficiently. Sigma.js effectively takes a list of lists (LIL) approach to the sparse matrix problem (albeit with some duplication of data). The alternative would be something like CSR (compressed sparse row) format which would, in most instances, be more space efficient but come at the cost of speed to access relative to LIL. That said, there appears to be some interesting work on hashtables vs CSR, e.g. https://pdfs.semanticscholar.org/e284/4574e260ec83437e95c133c0dc56469f8357.pdf. Not sure that going down that path is necessarily a good idea though...

Yomguithereal · 2017-02-14T08:42:57Z

The structure used by graphology is actually a sparse matrix (sigma also uses a sparse matrix but we could say it uses 3 sparse matrices instead of one). Hashtable and CSR are interesting ways but I would not use them on graphology reference implementation (on a variant implementation, why not?) because it must remain fast as possible keeping operation in constant time as often as possible.

Yomguithereal · 2021-10-25T09:34:04Z

v2 now uses graphology so this issue is obsolete and will be closed. As reference, graphology's standard implementation does not internally rely on a LIL, CSR but rather a custom DoD for performance reasons since this is the best trade-off to get constant time access for all operations without hurting memory too much. CSR and LIL just cannot offer the same all-around read performance but would be interesting backend for specialized graphology implementations.

Yomguithereal added the enhancement label Jul 8, 2014

mef mentioned this issue Jul 18, 2014

Graph model's memory usage #340

Closed

jacomyal self-assigned this Oct 10, 2014

Yomguithereal closed this as completed Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor indexes creation #335

Refactor indexes creation #335

Yomguithereal commented Jul 8, 2014

apitts commented Feb 13, 2017

Yomguithereal commented Feb 13, 2017

apitts commented Feb 14, 2017

Yomguithereal commented Feb 14, 2017

Yomguithereal commented Oct 25, 2021

Refactor indexes creation #335

Refactor indexes creation #335

Comments

Yomguithereal commented Jul 8, 2014

apitts commented Feb 13, 2017

Yomguithereal commented Feb 13, 2017

apitts commented Feb 14, 2017

Yomguithereal commented Feb 14, 2017

Yomguithereal commented Oct 25, 2021