Escape entity details queries #793

kmcginnes · 2025-02-13T22:13:35Z

Description

This change helps prevent the chances of an injection attack through the saved graph file.

In the process of importing the graph file we now do some basic sanity checks on the vertex and edge IDs.

Trim any leading or trailing whitespace
Ensure they are not empty string
If property graph, escape strings
If RDF, ensure no > exist
If RDF, ensure edge format is correct

Other changes

Fixed toast notification where 1 node or edge was not found (said "were" instead of "was")
Removed idType that was no longer necessary

Validation

Verified export and import still work properly
Verified modified exported graph files fail gracefully

Related Issues

Fixes Prevent injection attack through graph JSON file #789

Check List

I confirm that my contribution is made under the terms of the Apache 2.0
license.
I have run pnpm checks to ensure code compiles and meets standards.
I have run pnpm test to check if all tests are passing.
I have covered new added functionality with unit tests if necessary.
I have added an entry in the Changelog.md.

packages/graph-explorer/src/modules/GraphViewer/exportedGraph.ts

andreachild · 2025-02-14T19:57:47Z

packages/graph-explorer/src/modules/GraphViewer/exportedGraph.ts

+      .map(trimIfString)
+      .filter(isNotEmptyIfString)
+      .filter(isNotMaliciousIfSparql(connection.queryEngine))
+      .map(escapeIfPropertyGraphAndString(connection.queryEngine))


Nit: Does any escaping need to be done for sparql IRIs in case there are special characters? Or does it rely on the _sparqlFetch which calls encodeURIComponent?

I am choosing not to do any manual encoding of the string. So if there are special characters in the IRI then they will be passed to the database and rejected there. The main worry is that a string would escape out of the bounds of an IRI value in the query.

SELECT ?s ?p ?o WHERE { <${subject}> ?p ?o }

The subject between the < and > is the IRI. So as long as the IRI string does not contain a > then the database will receive the string as an IRI and reject any invalid IRIs.

I could not find a way to ensure the IRI string is properly encoded. If I were to encode the IRI myself, then the encoding function would re-encode any already properly encoded characters, rendering the value invalid. So I'm taking the path of least resistance and checking specifically for the one character that will certainly allow an injection attack.

packages/graph-explorer/src/modules/GraphViewer/exportedGraph.ts

andreachild · 2025-02-14T20:29:27Z

LGTM just a couple questions open for discussion but non-blocking.

andreachild · 2025-02-14T20:50:55Z

packages/graph-explorer/src/modules/GraphViewer/exportedGraph.ts

-    edges: z
-      .array(z.union([z.string(), z.number()]))
-      .transform(ids => ids.map(id => createEdgeId(id))),
+    vertices: z.array(z.union([z.string(), z.number()])),


Nit: have you explored the idea of moving the invalid id filtering logic into zod using refine and z.string().url? I'm guessing this would cause the file to be rejected if it contained anything strange which would be different from the current logic which skips over invalid ids (maybe that would be a good thing?).

Chatgpt suggested this for conditional validation of the ids to be valid URLs depending on the queryEngine:

vertices: z.array(z.union([z.string(), z.number()])) .refine((vertices, ctx) => { const { queryEngine } = ctx.parent.connection; if (queryEngine === "SPARQL") { // Ensure that vertices are URLs if queryEngine is SPARQL if (!vertices.every(v => typeof v === "string" && urlValidator.safeParse(v).success)) { ctx.addIssue({ code: z.ZodIssueCode.custom, message: "All vertices must be valid URLs when queryEngine is SPARQL.", path: ["vertices"], }); return false; } } return true; }),

First off, TIL I could do this in Zod (I was looking for this functionality):

const { queryEngine } = ctx.parent.connection;

I did consider using Zod for this validation, but I decided against it as there were too many validations that were connection dependent. So I went with an approach that treats the Zod schema as strictly the shape of the file, and then a separate validation and mapping step to convert the data to the shape expected by Graph Explorer.

I might change my mind about this when I explore adding graph data in to the URL via query params. The shape of the data will be fairly similar to the shape of the file, so I think it would be a good fit. And I'm curious how either approach will scale as we add future versions of the export file format and must maintain backwards compatibility.

kmcginnes · 2025-02-17T17:58:24Z

I'm going to go ahead and merge this since I don't think any of the open discussions are risky enough to hold it back.

However, I would like to continue the discussions after merging, which may result in future PR changes.

kmcginnes force-pushed the escape-entity-details-queries branch 2 times, most recently from 983d154 to 8a59cf2 Compare February 13, 2025 22:21

kmcginnes added 3 commits February 13, 2025 16:39

Prevent injection attacks from graph file

e1a75d6

Fix notification message when exactly 1 not found

6aa6a31

Update changelog

4d8239d

kmcginnes force-pushed the escape-entity-details-queries branch from 8a59cf2 to 4d8239d Compare February 13, 2025 22:39

kmcginnes marked this pull request as ready for review February 13, 2025 22:50