Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape entity details queries #793

Merged
merged 5 commits into from
Feb 17, 2025

Conversation

kmcginnes
Copy link
Collaborator

@kmcginnes kmcginnes commented Feb 13, 2025

Description

This change helps prevent the chances of an injection attack through the saved graph file.

In the process of importing the graph file we now do some basic sanity checks on the vertex and edge IDs.

  • Trim any leading or trailing whitespace
  • Ensure they are not empty string
  • If property graph, escape strings
  • If RDF, ensure no > exist
  • If RDF, ensure edge format is correct

Other changes

  • Fixed toast notification where 1 node or edge was not found (said "were" instead of "was")
  • Removed idType that was no longer necessary

Validation

  • Verified export and import still work properly
  • Verified modified exported graph files fail gracefully

Related Issues

Check List

  • I confirm that my contribution is made under the terms of the Apache 2.0
    license.
  • I have run pnpm checks to ensure code compiles and meets standards.
  • I have run pnpm test to check if all tests are passing.
  • I have covered new added functionality with unit tests if necessary.
  • I have added an entry in the Changelog.md.

@kmcginnes kmcginnes force-pushed the escape-entity-details-queries branch 2 times, most recently from 983d154 to 8a59cf2 Compare February 13, 2025 22:21
@kmcginnes kmcginnes force-pushed the escape-entity-details-queries branch from 8a59cf2 to 4d8239d Compare February 13, 2025 22:39
@kmcginnes kmcginnes marked this pull request as ready for review February 13, 2025 22:50
.map(trimIfString)
.filter(isNotEmptyIfString)
.filter(isNotMaliciousIfSparql(connection.queryEngine))
.map(escapeIfPropertyGraphAndString(connection.queryEngine))
Copy link

@andreachild andreachild Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Does any escaping need to be done for sparql IRIs in case there are special characters? Or does it rely on the _sparqlFetch which calls encodeURIComponent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am choosing not to do any manual encoding of the string. So if there are special characters in the IRI then they will be passed to the database and rejected there. The main worry is that a string would escape out of the bounds of an IRI value in the query.

SELECT ?s ?p ?o
WHERE {
  <${subject}> ?p ?o
}

The subject between the < and > is the IRI. So as long as the IRI string does not contain a > then the database will receive the string as an IRI and reject any invalid IRIs.

I could not find a way to ensure the IRI string is properly encoded. If I were to encode the IRI myself, then the encoding function would re-encode any already properly encoded characters, rendering the value invalid. So I'm taking the path of least resistance and checking specifically for the one character that will certainly allow an injection attack.

@andreachild
Copy link

LGTM just a couple questions open for discussion but non-blocking.

edges: z
.array(z.union([z.string(), z.number()]))
.transform(ids => ids.map(id => createEdgeId(id))),
vertices: z.array(z.union([z.string(), z.number()])),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: have you explored the idea of moving the invalid id filtering logic into zod using refine and z.string().url? I'm guessing this would cause the file to be rejected if it contained anything strange which would be different from the current logic which skips over invalid ids (maybe that would be a good thing?).

Chatgpt suggested this for conditional validation of the ids to be valid URLs depending on the queryEngine:

vertices: z.array(z.union([z.string(), z.number()]))
      .refine((vertices, ctx) => {
        const { queryEngine } = ctx.parent.connection;
        if (queryEngine === "SPARQL") {
          // Ensure that vertices are URLs if queryEngine is SPARQL
          if (!vertices.every(v => typeof v === "string" && urlValidator.safeParse(v).success)) {
            ctx.addIssue({
              code: z.ZodIssueCode.custom,
              message: "All vertices must be valid URLs when queryEngine is SPARQL.",
              path: ["vertices"],
            });
            return false;
          }
        }
        return true;
      }),

Copy link
Collaborator Author

@kmcginnes kmcginnes Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, TIL I could do this in Zod (I was looking for this functionality):

const { queryEngine } = ctx.parent.connection;

I did consider using Zod for this validation, but I decided against it as there were too many validations that were connection dependent. So I went with an approach that treats the Zod schema as strictly the shape of the file, and then a separate validation and mapping step to convert the data to the shape expected by Graph Explorer.

I might change my mind about this when I explore adding graph data in to the URL via query params. The shape of the data will be fairly similar to the shape of the file, so I think it would be a good fit. And I'm curious how either approach will scale as we add future versions of the export file format and must maintain backwards compatibility.

@kmcginnes
Copy link
Collaborator Author

I'm going to go ahead and merge this since I don't think any of the open discussions are risky enough to hold it back.

However, I would like to continue the discussions after merging, which may result in future PR changes.

@kmcginnes kmcginnes merged commit 01b4fd6 into aws:main Feb 17, 2025
2 checks passed
@kmcginnes kmcginnes deleted the escape-entity-details-queries branch February 17, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent injection attack through graph JSON file
2 participants