[Bug] Neighbor expansion is slow and unreliable #324

kmcginnes · 2024-04-25T21:56:17Z

Community Note

Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

When performing neighbor expansion, the generated queries can be quite complex and resource intensive. Some create 50 or more network requests and can overwhelm the database server where the requests will begin to fail.

OS: macOS 14.4.1
Browser: Arc (Google Chromium)
Graph Explorer Version: 1.6
Graph Database & Version: Amazon Neptune

To Reproduce

Gremlin

I selected one node, and expanded 1 neighbor.

g.V("Keri Eastman").project("vertices", "edges")
  .by(both().hasLabel("post").dedup().range(0, 1).fold())
  .by(bothE().where(otherV().hasLabel("post")).dedup().fold())

This second query will be repeated for each neighbor being expanded. So if you have 50 neighbors, you would have 50 requests.

g.V("https://example.com/blogs/unlocking-the-power-of-relationships-how-graph-databases-are-revolutionizing-data-management/")
  .both().limit(500).dedup().group().by(label).by(count())

openCypher

Expanding an author with 6 posts:

MATCH (v)-[e]-(tgt:post) 
WHERE ID(v) = "Keri Eastman" 
WITH 
  collect(DISTINCT tgt)[..6] AS vObjects, 
  collect({edge: e, sourceType: labels(v), targetType: labels(tgt)})[..6] AS eObjects 
RETURN vObjects, eObjects

MATCH (v)-[e]-(tgt:post) 
WHERE ID(v) = "Keri Eastman" 
  AND ID(e) IN ["5E2E0417-FE9B-43E9-9D80-3D4F83AD90AC","92072F3D-7B73-4E58-8E4E-237B9866601F","2C7DABC9-074F-4BC9-B742-F178699FE5D9","B962DF33-D8C6-42DC-BE3D-4221F1189399","2CFFB4AF-F4E1-471A-AEDE-2EBE8EFA5C4C","B3D9257D-AE4C-4F38-B816-76ADBE6CBA8C"] 
WITH 
  collect(DISTINCT tgt)[..6] AS vObjects, 
  collect({edge: e, sourceType: labels(v), targetType: labels(tgt)})[..6] AS eObjects 
RETURN vObjects, eObjects

One request per neighbor, so 6 in this case:

MATCH (v) -[e]- (t) 
WHERE ID(v) = "https://example.com/blogs/unlocking-the-power-of-relationships-how-graph-databases-are-revolutionizing-data-management/" 
RETURN labels(t) AS vertexLabel, count(DISTINCT t) AS count 
LIMIT 500

SPARQL

Since the RDF data was different than PG data in the same Neptune instance, this is a single Airpot node expanding a single Country node. This resulted in 3 queries.

SELECT ?subject ?pred ?value ?subjectClass ?pToSubject ?pFromSubject {
  ?subject a     ?subjectClass;
           ?pred ?value {
    SELECT DISTINCT ?subject ?pToSubject ?pFromSubject {
      BIND(<http://kelvinlawrence.net/air-routes/resource/303> AS ?argument)
      VALUES ?subjectClass { <http://kelvinlawrence.net/air-routes/class/Country> }
      {
        ?argument ?pToSubject ?subject.
        ?subject a         ?subjectClass;
                 ?sPred    ?sValue .
        
      }
      UNION
      {
        ?subject ?pFromSubject ?argument;
                 a         ?subjectClass;
                 ?sPred    ?sValue .
       
      }
    }
    LIMIT 75 OFFSET 0
  }
  FILTER(isLiteral(?value))
}

SELECT ?subject ?subjectClass ?predToSubject ?predFromSubject {
  BIND(<http://kelvinlawrence.net/air-routes/resource/303> AS ?argument)
  VALUES ?subject { <http://kelvinlawrence.net/air-routes/resource/3505>}
  { 
    ?argument ?predToSubject ?subject.
    ?subject a ?subjectClass.
  }
  UNION
  { 
    ?subject ?predFromSubject ?argument;
             a                ?subjectClass.
  }
}

SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://kelvinlawrence.net/air-routes/resource/3505> }
      UNION
      { <http://kelvinlawrence.net/air-routes/resource/3505> ?p ?subject }
    }
    
  }
}
GROUP BY ?class

Expected behavior

Each language should be optimized. The query seems to be retrieving data for multiple parts of the app (graph and sidebar). We can split those tasks to further optimize, deferring the side bar work until needed.

Related issues

[Bug] Value in "Limit returned neighbors to" is exceeded (SPARQL ENDPOINT) #184

Tasks

kmcginnes · 2024-04-25T23:28:27Z

Gremlin also seems to have issues when limiting the neighbors to expand in the side bar.

This is the Gremlin generated to find 4 movies (limited to a max of 4) connected to Sean Connery:

g.V(\"nm0000125\").
    project(\"vertices\", \"edges\").
      by(both().hasLabel(\"movie\").dedup().range(0, 4).fold()).
      by(bothE().where(otherV().hasLabel(\"movie\")).dedup().fold())

Which returns the following and includes a lot of extra edges (75 in all) as there is no limit placed on the second by

{
    'vertices': [v[tt0094226], v[tt0102034], v[tt0058150], v[tt0081633]], 
    'edges': [
        e[tt0094226-actor-nm0000125][tt0094226-actor->nm0000125], 
        e[tt0102034-actor-nm0000125][tt0102034-actor->nm0000125], 
        e[tt0058150-actor-nm0000125][tt0058150-actor->nm0000125], 
        e[tt0081633-actor-nm0000125][tt0081633-actor->nm0000125], 
        e[tt0107969-actor-nm0000125][tt0107969-actor->nm0000125], 
        e[tt0070948-actor-nm0000125][tt0070948-actor->nm0000125], 
        e[tt0145734-actor-nm0000125][tt0145734-actor->nm0000125], 
        e[tt0075784-actor-nm0000125][tt0075784-actor->nm0000125], 
        e[tt0097328-actor-nm0000125][tt0097328-actor->nm0000125], 
        e[tt0117500-actor-nm0000125][tt0117500-actor->nm0000125], 
        e[tt0113071-actor-nm0000125][tt0113071-actor->nm0000125], 
        e[tt0058754-actor-nm0000125][tt0058754-actor->nm0000125], 
        e[tt0066995-actor-nm0000125][tt0066995-actor->nm0000125], 
        e[tt0057076-actor-nm0000125][tt0057076-actor->nm0000125], 
        e[tt0055262-actor-nm0000125][tt0055262-actor->nm0000125], 
        e[tt0073796-actor-nm0000125][tt0073796-actor->nm0000125], 
        e[tt0084750-actor-nm0000125][tt0084750-actor->nm0000125], 
        e[tt0071877-actor-nm0000125][tt0071877-actor->nm0000125], 
        e[tt0051364-actor-nm0000125][tt0051364-actor->nm0000125], 
        e[tt0091605-actor-nm0000125][tt0091605-actor->nm0000125], 
        e[tt0084920-actor-nm0000125][tt0084920-actor->nm0000125], 
        e[tt0074962-actor-nm0000125][tt0074962-actor->nm0000125], 
        e[tt0066767-actor-nm0000125][tt0066767-actor->nm0000125], 
        e[tt0052722-actor-nm0000125][tt0052722-actor->nm0000125], 
        e[tt0099810-actor-nm0000125][tt0099810-actor->nm0000125], 
        e[tt0067315-actor-nm0000125][tt0067315-actor->nm0000125], 
        e[tt0104839-actor-nm0000125][tt0104839-actor->nm0000125], 
        e[tt0075147-actor-nm0000125][tt0075147-actor->nm0000125], 
        e[tt0079550-actor-nm0000125][tt0079550-actor->nm0000125], 
        e[tt0109920-actor-nm0000125][tt0109920-actor->nm0000125], 
        e[tt0118661-actor-nm0000125][tt0118661-actor->nm0000125], 
        e[tt0091203-actor-nm0000125][tt0091203-actor->nm0000125], 
        e[tt0083947-actor-nm0000125][tt0083947-actor->nm0000125], 
        e[tt0097576-actor-nm0000125][tt0097576-actor->nm0000125], 
        e[tt0082869-actor-nm0000125][tt0082869-actor->nm0000125], 
        e[tt0055928-actor-nm0000125][tt0055928-actor->nm0000125], 
        e[tt0058329-actor-nm0000125][tt0058329-actor->nm0000125], 
        e[tt0116136-actor-nm0000125][tt0116136-actor->nm0000125], 
        e[tt0181536-actor-nm0000125][tt0181536-actor->nm0000125], 
        e[tt0113501-actor-nm0000125][tt0113501-actor->nm0000125], 
        e[tt0073341-actor-nm0000125][tt0073341-actor->nm0000125], 
        e[tt0137494-actor-nm0000125][tt0137494-actor->nm0000125], 
        e[tt0086006-actor-nm0000125][tt0086006-actor->nm0000125], 
        e[tt0070468-actor-nm0000125][tt0070468-actor->nm0000125], 
        e[tt0062512-actor-nm0000125][tt0062512-actor->nm0000125], 
        e[tt0084011-actor-nm0000125][tt0084011-actor->nm0000125], 
        e[tt0054898-actor-nm0000125][tt0054898-actor->nm0000125], 
        e[tt0095897-actor-nm0000125][tt0095897-actor->nm0000125], 
        e[tt0073906-actor-nm0000125][tt0073906-actor->nm0000125], 
        e[tt0311429-actor-nm0000125][tt0311429-actor->nm0000125], 
        e[tt0066090-actor-nm0000125][tt0066090-actor->nm0000125], 
        e[tt0060414-actor-nm0000125][tt0060414-actor->nm0000125], 
        e[tt0063592-actor-nm0000125][tt0063592-actor->nm0000125], 
        e[tt0100530-actor-nm0000125][tt0100530-actor->nm0000125], 
        e[tt0079013-actor-nm0000125][tt0079013-actor->nm0000125], 
        e[tt0059800-actor-nm0000125][tt0059800-actor->nm0000125], 
        e[tt0059274-actor-nm0000125][tt0059274-actor->nm0000125], 
        e[tt0079240-actor-nm0000125][tt0079240-actor->nm0000125], 
        e[tt1527819-actor-nm0000125][tt1527819-actor->nm0000125]
    ]
}

here is how it could also be written (simpler and no extra edges)

g.V("nm0000125").
    bothE().as('e').
    otherV().
    hasLabel('movie').as('n').
    limit(4).
    select('n','e')

which returns

{'n': v[tt0094226], 'e': e[tt0094226-actor-nm0000125][tt0094226-actor->nm0000125]}
{'n': v[tt0102034], 'e': e[tt0102034-actor-nm0000125][tt0102034-actor->nm0000125]}
{'n': v[tt0081633], 'e': e[tt0081633-actor-nm0000125][tt0081633-actor->nm0000125]}
{'n': v[tt0058150], 'e': e[tt0058150-actor-nm0000125][tt0058150-actor->nm0000125]}

kmcginnes added bug Something isn't working performance Issues relating to performance reliability Issues relating to improvements in reliability exploration Issues related to graph exploration, node & edge details, neighbor expansion, etc labels Apr 25, 2024

kmcginnes added this to Graph Explorer Planning Apr 25, 2024

github-project-automation bot moved this to 🆕 New in Graph Explorer Planning Apr 25, 2024

kmcginnes moved this from 🆕 New to 🔖 Ready in Graph Explorer Planning Apr 25, 2024

kmcginnes moved this from 🔖 Ready to 👍 Approved in Graph Explorer Planning May 1, 2024

kmcginnes mentioned this issue May 1, 2024

[Bug] RDF: Adding resource to the canvas is very slow or fails with out of memory #263

Open

kmcginnes added this to the Release 1.6.2 milestone May 1, 2024

kmcginnes mentioned this issue May 1, 2024

[Bug] Using the air-routes data set the DFW airport does not expand correctly #314

Closed

kmcginnes moved this from 👍 Approved to 🏗 In progress in Graph Explorer Planning May 8, 2024

kmcginnes mentioned this issue May 8, 2024

[Project] Optimize database queries #358

Open

kmcginnes removed this from the Release 1.8.0 milestone May 8, 2024

kmcginnes removed this from Graph Explorer Planning May 8, 2024

kmcginnes mentioned this issue May 13, 2024

[Task] Optimize the Gremlin query for neighbor types and count #342

Closed

kmcginnes modified the milestones: Release 1.8.0, Release 1.9.0 Jun 5, 2024

kmcginnes mentioned this issue Jun 11, 2024

[Project] Improve neighbor count query experience #432

Closed

12 tasks

kmcginnes modified the milestones: Release 1.9.0, Release 1.8.0 Jun 11, 2024

kmcginnes modified the milestones: Release 1.8.0, Release 1.9.0 Jul 2, 2024

kmcginnes modified the milestones: Release 1.9.0, Release 1.10.0 Aug 1, 2024

kmcginnes removed this from the Release 1.10.0 milestone Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Neighbor expansion is slow and unreliable #324

[Bug] Neighbor expansion is slow and unreliable #324

kmcginnes commented Apr 25, 2024 •

edited

Loading

kmcginnes commented Apr 25, 2024

[Bug] Neighbor expansion is slow and unreliable #324

[Bug] Neighbor expansion is slow and unreliable #324

Comments

kmcginnes commented Apr 25, 2024 • edited Loading

Community Note

Description

To Reproduce

Gremlin

openCypher

SPARQL

Expected behavior

Related issues

Tasks

kmcginnes commented Apr 25, 2024

kmcginnes commented Apr 25, 2024 •

edited

Loading