Generate lexical information of `Pair` and `Pairs` as JSON #381

Goncalerta · 2019-03-05T19:30:25Z

Closes #377

Implements the to_json function for both Pair and Pairs which generates a pretty-printed JSON that contains their information.

I followed the sketch given in #377, but I wonder if instead of "pos": [start, end] it would be better to have "start" and "end" as separate fields, as in pretty-printed format I personally find it easier to read with less nesting.

This is the generated string right now:

{
  "pos": [
    0,
    5
  ],
  "pairs": [
    {
      "pos": [
        0,
        3
      ],
      "rule": "a",
      "inner": {
        "pos": [
          1,
          2
        ],
        "pairs": [
          {
            "pos": [
              1,
              2
            ],
            "rule": "b",
            "inner": "b"
          }
        ]
      }
    },
    {
      "pos": [
        4,
        5
      ],
      "rule": "c",
      "inner": "e"
    }
  ]
}

This would be with "start" and "end":

{
  "start": 0,
  "end": 5,
  "pairs": [
    {
      "start": 0,
      "end": 3,
      "rule": "a",
      "inner": {
        "start": 1,
        "end": 2,
        "pairs": [
          {
            "start": 1,
            "end": 2,
            "rule": "b",
            "inner": "b"
          }
        ]
      }
    },
    {
      "start": 4,
      "end": 5,
      "rule": "c",
      "inner": "e"
    }
  ]
}

CAD97 · 2019-03-05T20:01:15Z

(sigh, why must unit tests still not count for coverage...)

A few drive-by notes:

My original sketch had pos: (start, end) because I was thinking serde data model which does have tuples whereas JSON doesn't. If it were on the same line (full control over pretty printing) then the array/tuple solution might've been nice, but inline works nicely as well.
Personally I expected a R: Serialize bound rather than using enum debug formatting. (This is equivalent in "human"/text formats but different in the serde model and in compact formats.) Though I suppose that'd require telling serde_derive to emit a #[derive(Serialize)] for the token enum as well.
I'm not certain exactly how serde_derive handles "untagged enum" representations, but serializing inner as one of two types rather than as a serde enum scares me for some reason. I know we don't need to deserialize, but it still squicks me out a bit, even if we're only targeting self-describing, human-oriented formats.

Goncalerta · 2019-03-06T12:55:49Z

My original sketch had pos: (start, end) because I was thinking serde data model which does have tuples whereas JSON doesn't. If it were on the same line (full control over pretty printing) then the array/tuple solution might've been nice, but inline works nicely as well.

Oh, I see. Well, in order to have full control over pretty printing, there is also the possibility of using format!. That would even lift the dependency on serde and it shouldn't be hard to implement, as the template is pretty simple. However, this would make it harder to serialize to other data formats if that is ever desired.

Personally I expected a R: Serialize bound rather than using enum debug formatting. (This is equivalent in "human"/text formats but different in the serde model and in compact formats.) Though I suppose that'd require telling serde_derive to emit a #[derive(Serialize)] for the token enum as well.

Well, I personally think that would add too much complexity, because in order to be able to call #[derive(Serialize)] on the macro, we would need to expose serde on pest's API.

I'm not certain exactly how serde_derive handles "untagged enum" representations, but serializing inner as one of two types rather than as a serde enum scares me for some reason. I know we don't need to deserialize, but it still squicks me out a bit, even if we're only targeting self-describing, human-oriented formats.

Well, if you prefer I could make a local enum for it, I just thought it wasn't worth the effort in this case.

CAD97 · 2019-03-06T19:39:57Z

My original sketch had pos: (start, end) because I was thinking serde data model which does have tuples whereas JSON doesn't. If it were on the same line (full control over pretty printing) then the array/tuple solution might've been nice, but inline works nicely as well.

Oh, I see. Well, in order to have full control over pretty printing, there is also the possibility of using format!. That would even lift the dependency on serde and it shouldn't be hard to implement, as the template is pretty simple. However, this would make it harder to serialize to other data formats if that is ever desired.

Yeah; I'd potentially go so far as to just do the impl Serialize and allow the user to specify a data format themselves. (Personally, I like RON or YAML for "read-only" data formats over JSON.)

Well, I personally think that would add too much complexity, because in order to be able to call #[derive(Serialize)] on the macro, we would need to expose serde on pest's API.

Yep, the solution here is great for this.

Well, if you prefer I could make a local enum for it, I just thought it wasn't worth the effort in this case.

I'd feel better with a #[derive(Serialize)] #[serde(untagged)] enum here, as there's some guarantee that the serialization API isn't being misused if we go through serde_derive here. But I guess this is more up to @dragostis here.

ice1000 · 2019-04-10T19:44:04Z

What's the current status of this pr?

ice1000 · 2019-04-10T19:47:28Z

Can there be a bin that takes a text file and the grammar file and outputs the json (like to stdout or to a file)?

ice1000 · 2019-04-10T19:48:35Z

One usage can be in my IDE plugin: on-the-fly syntax highlighting your code.

dragostis

Sorry for the late review. This looks fine, especially given that it's behind a flag. Thanks a lot!

bors r+

381: Generate lexical information of `Pair` and `Pairs` as JSON r=dragostis a=Goncalerta Closes #377 Implements the `to_json` function for both `Pair` and `Pairs` which generates a pretty-printed JSON that contains their information. I followed the sketch given in #377, but I wonder if instead of `"pos": [start, end]` it would be better to have `"start"` and `"end"` as separate fields, as in pretty-printed format I personally find it easier to read with less nesting. This is the generated string right now: ``` { "pos": [ 0, 5 ], "pairs": [ { "pos": [ 0, 3 ], "rule": "a", "inner": { "pos": [ 1, 2 ], "pairs": [ { "pos": [ 1, 2 ], "rule": "b", "inner": "b" } ] } }, { "pos": [ 4, 5 ], "rule": "c", "inner": "e" } ] } ``` This would be with `"start"` and `"end"`: ``` { "start": 0, "end": 5, "pairs": [ { "start": 0, "end": 3, "rule": "a", "inner": { "start": 1, "end": 2, "pairs": [ { "start": 1, "end": 2, "rule": "b", "inner": "b" } ] } }, { "start": 4, "end": 5, "rule": "c", "inner": "e" } ] } ``` Co-authored-by: PedroGonçaloCorreia <[email protected]>

bors · 2019-04-14T08:39:10Z

Build succeeded

continuous-integration/travis-ci/push

Generate lexical information of Pair and Pairs as JSON

f0faa5a

CAD97 requested a review from dragostis March 5, 2019 21:36

CAD97 requested review from dragostis and removed request for dragostis April 10, 2019 20:20

ice1000 mentioned this pull request Apr 12, 2019

Atom/IntelliJ pest plugin #174

Closed

dragostis approved these changes Apr 14, 2019

View reviewed changes

bors bot merged commit f0faa5a into pest-parser:master Apr 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate lexical information of `Pair` and `Pairs` as JSON #381

Generate lexical information of `Pair` and `Pairs` as JSON #381

Goncalerta commented Mar 5, 2019

CAD97 commented Mar 5, 2019

Goncalerta commented Mar 6, 2019

CAD97 commented Mar 6, 2019

ice1000 commented Apr 10, 2019

ice1000 commented Apr 10, 2019

ice1000 commented Apr 10, 2019

dragostis left a comment

bors bot commented Apr 14, 2019

Generate lexical information of Pair and Pairs as JSON #381

Generate lexical information of Pair and Pairs as JSON #381

Conversation

Goncalerta commented Mar 5, 2019

CAD97 commented Mar 5, 2019

Goncalerta commented Mar 6, 2019

CAD97 commented Mar 6, 2019

ice1000 commented Apr 10, 2019

ice1000 commented Apr 10, 2019

ice1000 commented Apr 10, 2019

dragostis left a comment

Choose a reason for hiding this comment

bors bot commented Apr 14, 2019

Build succeeded

Generate lexical information of `Pair` and `Pairs` as JSON #381

Generate lexical information of `Pair` and `Pairs` as JSON #381