Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Neighbor expansion is slow and unreliable #324

Open
3 of 4 tasks
kmcginnes opened this issue Apr 25, 2024 · 1 comment
Open
3 of 4 tasks

[Bug] Neighbor expansion is slow and unreliable #324

kmcginnes opened this issue Apr 25, 2024 · 1 comment
Labels
bug Something isn't working exploration Issues related to graph exploration, node & edge details, neighbor expansion, etc performance Issues relating to performance reliability Issues relating to improvements in reliability

Comments

@kmcginnes
Copy link
Collaborator

kmcginnes commented Apr 25, 2024

Community Note

  • Please use a 👍 reaction to provide a +1/vote. This helps the community and maintainers prioritize this request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

When performing neighbor expansion, the generated queries can be quite complex and resource intensive. Some create 50 or more network requests and can overwhelm the database server where the requests will begin to fail.

  • OS: macOS 14.4.1
  • Browser: Arc (Google Chromium)
  • Graph Explorer Version: 1.6
  • Graph Database & Version: Amazon Neptune

To Reproduce

Gremlin

I selected one node, and expanded 1 neighbor.

g.V("Keri Eastman").project("vertices", "edges")
  .by(both().hasLabel("post").dedup().range(0, 1).fold())
  .by(bothE().where(otherV().hasLabel("post")).dedup().fold())

This second query will be repeated for each neighbor being expanded. So if you have 50 neighbors, you would have 50 requests.

g.V("https://example.com/blogs/unlocking-the-power-of-relationships-how-graph-databases-are-revolutionizing-data-management/")
  .both().limit(500).dedup().group().by(label).by(count())

openCypher

Expanding an author with 6 posts:

MATCH (v)-[e]-(tgt:post) 
WHERE ID(v) = "Keri Eastman" 
WITH 
  collect(DISTINCT tgt)[..6] AS vObjects, 
  collect({edge: e, sourceType: labels(v), targetType: labels(tgt)})[..6] AS eObjects 
RETURN vObjects, eObjects
MATCH (v)-[e]-(tgt:post) 
WHERE ID(v) = "Keri Eastman" 
  AND ID(e) IN ["5E2E0417-FE9B-43E9-9D80-3D4F83AD90AC","92072F3D-7B73-4E58-8E4E-237B9866601F","2C7DABC9-074F-4BC9-B742-F178699FE5D9","B962DF33-D8C6-42DC-BE3D-4221F1189399","2CFFB4AF-F4E1-471A-AEDE-2EBE8EFA5C4C","B3D9257D-AE4C-4F38-B816-76ADBE6CBA8C"] 
WITH 
  collect(DISTINCT tgt)[..6] AS vObjects, 
  collect({edge: e, sourceType: labels(v), targetType: labels(tgt)})[..6] AS eObjects 
RETURN vObjects, eObjects

One request per neighbor, so 6 in this case:

MATCH (v) -[e]- (t) 
WHERE ID(v) = "https://example.com/blogs/unlocking-the-power-of-relationships-how-graph-databases-are-revolutionizing-data-management/" 
RETURN labels(t) AS vertexLabel, count(DISTINCT t) AS count 
LIMIT 500

SPARQL

Since the RDF data was different than PG data in the same Neptune instance, this is a single Airpot node expanding a single Country node. This resulted in 3 queries.

SELECT ?subject ?pred ?value ?subjectClass ?pToSubject ?pFromSubject {
  ?subject a     ?subjectClass;
           ?pred ?value {
    SELECT DISTINCT ?subject ?pToSubject ?pFromSubject {
      BIND(<http://kelvinlawrence.net/air-routes/resource/303> AS ?argument)
      VALUES ?subjectClass { <http://kelvinlawrence.net/air-routes/class/Country> }
      {
        ?argument ?pToSubject ?subject.
        ?subject a         ?subjectClass;
                 ?sPred    ?sValue .
        
      }
      UNION
      {
        ?subject ?pFromSubject ?argument;
                 a         ?subjectClass;
                 ?sPred    ?sValue .
       
      }
    }
    LIMIT 75 OFFSET 0
  }
  FILTER(isLiteral(?value))
}
SELECT ?subject ?subjectClass ?predToSubject ?predFromSubject {
  BIND(<http://kelvinlawrence.net/air-routes/resource/303> AS ?argument)
  VALUES ?subject { <http://kelvinlawrence.net/air-routes/resource/3505>}
  { 
    ?argument ?predToSubject ?subject.
    ?subject a ?subjectClass.
  }
  UNION
  { 
    ?subject ?predFromSubject ?argument;
             a                ?subjectClass.
  }
}
SELECT ?class (COUNT(?class) AS ?count) {
  ?subject a ?class {
    SELECT DISTINCT ?subject ?class {
      ?subject a ?class .
      { ?subject ?p <http://kelvinlawrence.net/air-routes/resource/3505> }
      UNION
      { <http://kelvinlawrence.net/air-routes/resource/3505> ?p ?subject }
    }
    
  }
}
GROUP BY ?class

Expected behavior

Each language should be optimized. The query seems to be retrieving data for multiple parts of the app (graph and sidebar). We can split those tasks to further optimize, deferring the side bar work until needed.

Related issues

Tasks

@kmcginnes kmcginnes added bug Something isn't working performance Issues relating to performance reliability Issues relating to improvements in reliability exploration Issues related to graph exploration, node & edge details, neighbor expansion, etc labels Apr 25, 2024
@kmcginnes kmcginnes moved this from 🆕 New to 🔖 Ready in Graph Explorer Planning Apr 25, 2024
@kmcginnes
Copy link
Collaborator Author

Gremlin also seems to have issues when limiting the neighbors to expand in the side bar.

This is the Gremlin generated to find 4 movies (limited to a max of 4) connected to Sean Connery:

g.V(\"nm0000125\").
    project(\"vertices\", \"edges\").
      by(both().hasLabel(\"movie\").dedup().range(0, 4).fold()).
      by(bothE().where(otherV().hasLabel(\"movie\")).dedup().fold())

Which returns the following and includes a lot of extra edges (75 in all) as there is no limit placed on the second by

{
    'vertices': [v[tt0094226], v[tt0102034], v[tt0058150], v[tt0081633]], 
    'edges': [
        e[tt0094226-actor-nm0000125][tt0094226-actor->nm0000125], 
        e[tt0102034-actor-nm0000125][tt0102034-actor->nm0000125], 
        e[tt0058150-actor-nm0000125][tt0058150-actor->nm0000125], 
        e[tt0081633-actor-nm0000125][tt0081633-actor->nm0000125], 
        e[tt0107969-actor-nm0000125][tt0107969-actor->nm0000125], 
        e[tt0070948-actor-nm0000125][tt0070948-actor->nm0000125], 
        e[tt0145734-actor-nm0000125][tt0145734-actor->nm0000125], 
        e[tt0075784-actor-nm0000125][tt0075784-actor->nm0000125], 
        e[tt0097328-actor-nm0000125][tt0097328-actor->nm0000125], 
        e[tt0117500-actor-nm0000125][tt0117500-actor->nm0000125], 
        e[tt0113071-actor-nm0000125][tt0113071-actor->nm0000125], 
        e[tt0058754-actor-nm0000125][tt0058754-actor->nm0000125], 
        e[tt0066995-actor-nm0000125][tt0066995-actor->nm0000125], 
        e[tt0057076-actor-nm0000125][tt0057076-actor->nm0000125], 
        e[tt0055262-actor-nm0000125][tt0055262-actor->nm0000125], 
        e[tt0073796-actor-nm0000125][tt0073796-actor->nm0000125], 
        e[tt0084750-actor-nm0000125][tt0084750-actor->nm0000125], 
        e[tt0071877-actor-nm0000125][tt0071877-actor->nm0000125], 
        e[tt0051364-actor-nm0000125][tt0051364-actor->nm0000125], 
        e[tt0091605-actor-nm0000125][tt0091605-actor->nm0000125], 
        e[tt0084920-actor-nm0000125][tt0084920-actor->nm0000125], 
        e[tt0074962-actor-nm0000125][tt0074962-actor->nm0000125], 
        e[tt0066767-actor-nm0000125][tt0066767-actor->nm0000125], 
        e[tt0052722-actor-nm0000125][tt0052722-actor->nm0000125], 
        e[tt0099810-actor-nm0000125][tt0099810-actor->nm0000125], 
        e[tt0067315-actor-nm0000125][tt0067315-actor->nm0000125], 
        e[tt0104839-actor-nm0000125][tt0104839-actor->nm0000125], 
        e[tt0075147-actor-nm0000125][tt0075147-actor->nm0000125], 
        e[tt0079550-actor-nm0000125][tt0079550-actor->nm0000125], 
        e[tt0109920-actor-nm0000125][tt0109920-actor->nm0000125], 
        e[tt0118661-actor-nm0000125][tt0118661-actor->nm0000125], 
        e[tt0091203-actor-nm0000125][tt0091203-actor->nm0000125], 
        e[tt0083947-actor-nm0000125][tt0083947-actor->nm0000125], 
        e[tt0097576-actor-nm0000125][tt0097576-actor->nm0000125], 
        e[tt0082869-actor-nm0000125][tt0082869-actor->nm0000125], 
        e[tt0055928-actor-nm0000125][tt0055928-actor->nm0000125], 
        e[tt0058329-actor-nm0000125][tt0058329-actor->nm0000125], 
        e[tt0116136-actor-nm0000125][tt0116136-actor->nm0000125], 
        e[tt0181536-actor-nm0000125][tt0181536-actor->nm0000125], 
        e[tt0113501-actor-nm0000125][tt0113501-actor->nm0000125], 
        e[tt0073341-actor-nm0000125][tt0073341-actor->nm0000125], 
        e[tt0137494-actor-nm0000125][tt0137494-actor->nm0000125], 
        e[tt0086006-actor-nm0000125][tt0086006-actor->nm0000125], 
        e[tt0070468-actor-nm0000125][tt0070468-actor->nm0000125], 
        e[tt0062512-actor-nm0000125][tt0062512-actor->nm0000125], 
        e[tt0084011-actor-nm0000125][tt0084011-actor->nm0000125], 
        e[tt0054898-actor-nm0000125][tt0054898-actor->nm0000125], 
        e[tt0095897-actor-nm0000125][tt0095897-actor->nm0000125], 
        e[tt0073906-actor-nm0000125][tt0073906-actor->nm0000125], 
        e[tt0311429-actor-nm0000125][tt0311429-actor->nm0000125], 
        e[tt0066090-actor-nm0000125][tt0066090-actor->nm0000125], 
        e[tt0060414-actor-nm0000125][tt0060414-actor->nm0000125], 
        e[tt0063592-actor-nm0000125][tt0063592-actor->nm0000125], 
        e[tt0100530-actor-nm0000125][tt0100530-actor->nm0000125], 
        e[tt0079013-actor-nm0000125][tt0079013-actor->nm0000125], 
        e[tt0059800-actor-nm0000125][tt0059800-actor->nm0000125], 
        e[tt0059274-actor-nm0000125][tt0059274-actor->nm0000125], 
        e[tt0079240-actor-nm0000125][tt0079240-actor->nm0000125], 
        e[tt1527819-actor-nm0000125][tt1527819-actor->nm0000125]
    ]
}

here is how it could also be written (simpler and no extra edges)

g.V("nm0000125").
    bothE().as('e').
    otherV().
    hasLabel('movie').as('n').
    limit(4).
    select('n','e')

which returns

{'n': v[tt0094226], 'e': e[tt0094226-actor-nm0000125][tt0094226-actor->nm0000125]}
{'n': v[tt0102034], 'e': e[tt0102034-actor-nm0000125][tt0102034-actor->nm0000125]}
{'n': v[tt0081633], 'e': e[tt0081633-actor-nm0000125][tt0081633-actor->nm0000125]}
{'n': v[tt0058150], 'e': e[tt0058150-actor-nm0000125][tt0058150-actor->nm0000125]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exploration Issues related to graph exploration, node & edge details, neighbor expansion, etc performance Issues relating to performance reliability Issues relating to improvements in reliability
Projects
None yet
Development

No branches or pull requests

1 participant