Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly format normalized path names #113

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

theory
Copy link
Contributor

@theory theory commented Jan 13, 2025

The RFC defines how to format normalized path names. Notably, it defines the escape character as ESC normal-escapable, but doesn't define ESC in that section. It is, however, defined for name selector syntax as %x5C; \ backslash. I'm therefore reasonably sure that the current behavior of serde_json_path when formatting normalized paths is incorrect, as it results in no escaping of ' at all.

For example, using the sandbox to select $.* from {"O'Reilly": true} returns

[
  {
    "loc": "$['O'Reilly']",
    "node": true
  }
]

I worked this out while building the location feature into Go jsonpath (many thanks for the inspiration in your crate!). This playground link shows what I believe to be the correct escaping (doubled for inclusion in JSON):

[
  {
    "node": true,
    "path": "$['O\\'Reilly']"
  }
]

So expand Display for PathElement to properly format characters strictly according to the spec. Iterate over each character in a name and escape exactly the characters it defines to be escaped.

The RFC defines how to [format normalized path names]. Notably, it
defines the escape character as `ESC normal-escapable`, but doesn't
define `ESC` in that section. It is, however, defined for [name selector
syntax] as `%x5C; \ backslash`. I'm therefore reasonably sure that the
current behavior of serde_json_path when formatting normalized paths is
incorrect, as it results in no escaping of `'` at all.

For example, using [the sandbox] to select `$.*` from
`{"O'Reilly": true}` returns

```json
[
  {
    "loc": "$['O'Reilly']",
    "node": true
  }
]
```

I worked this out while building the location feature into [Go jsonpath]
(many thanks for the inspiration in your crate!). This [playground link]
shows what I believe to be the correct escaping (doubled for inclusion
in JSON):

``` json
[
  {
    "node": true,
    "path": "$['O\\'Reilly']"
  }
]
```

So expand `Display for PathElement` to properly format characters
strictly according to the spec. Iterate over each character in a name
and escape exactly the characters it defines to be escaped.

  [format normalized path names]: https://www.rfc-editor.org/rfc/rfc9535#section-2.7
  [name selector syntax]: https://www.rfc-editor.org/rfc/rfc9535#name-syntax-3
  [the sandbox]: https://serdejsonpath.live
  [Go jsonpath]: https://github.com/theory/jsonpath
  [playground link]: https://theory.github.io/jsonpath/?p=%2524.*&j=%257B%2522O%27Reilly%2522%253A%2520true%257D&o=1
@theory theory force-pushed the path-element-format branch from d603cee to c56a4e5 Compare January 13, 2025 16:56
@hiltontj hiltontj added help wanted Extra attention is needed good first issue Good for newcomers bug Something isn't working labels Jan 13, 2025
@hiltontj
Copy link
Owner

Great find @theory and thanks for opening the issue! Another important thing that this made me realize is that I don't think the compliance test suite is providing a set of tests for validating how a library forms normalized paths.

I will see if I can find time to sort this out, but for now added some labels to 🎣 if anyone wants to try and fix it.

@theory
Copy link
Contributor Author

theory commented Jan 13, 2025

This PR should fix it.

@hiltontj
Copy link
Owner

Oh 🤦 I thought this was an issue, and not a PR. I will take a look...

Copy link
Owner

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @theory - really appreciate the PR 🎉

@hiltontj hiltontj merged commit 0022f33 into hiltontj:main Jan 13, 2025
5 checks passed
@theory theory deleted the path-element-format branch January 14, 2025 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants