Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate lexical information of Pair and Pairs as JSON #381

Merged
merged 1 commit into from
Apr 14, 2019
Merged

Generate lexical information of Pair and Pairs as JSON #381

merged 1 commit into from
Apr 14, 2019

Conversation

Goncalerta
Copy link
Contributor

Closes #377

Implements the to_json function for both Pair and Pairs which generates a pretty-printed JSON that contains their information.

I followed the sketch given in #377, but I wonder if instead of "pos": [start, end] it would be better to have "start" and "end" as separate fields, as in pretty-printed format I personally find it easier to read with less nesting.

This is the generated string right now:

{
  "pos": [
    0,
    5
  ],
  "pairs": [
    {
      "pos": [
        0,
        3
      ],
      "rule": "a",
      "inner": {
        "pos": [
          1,
          2
        ],
        "pairs": [
          {
            "pos": [
              1,
              2
            ],
            "rule": "b",
            "inner": "b"
          }
        ]
      }
    },
    {
      "pos": [
        4,
        5
      ],
      "rule": "c",
      "inner": "e"
    }
  ]
}

This would be with "start" and "end":

{
  "start": 0,
  "end": 5,
  "pairs": [
    {
      "start": 0,
      "end": 3,
      "rule": "a",
      "inner": {
        "start": 1,
        "end": 2,
        "pairs": [
          {
            "start": 1,
            "end": 2,
            "rule": "b",
            "inner": "b"
          }
        ]
      }
    },
    {
      "start": 4,
      "end": 5,
      "rule": "c",
      "inner": "e"
    }
  ]
}

@CAD97
Copy link
Contributor

CAD97 commented Mar 5, 2019

(sigh, why must unit tests still not count for coverage...)

A few drive-by notes:

  • My original sketch had pos: (start, end) because I was thinking serde data model which does have tuples whereas JSON doesn't. If it were on the same line (full control over pretty printing) then the array/tuple solution might've been nice, but inline works nicely as well.
  • Personally I expected a R: Serialize bound rather than using enum debug formatting. (This is equivalent in "human"/text formats but different in the serde model and in compact formats.) Though I suppose that'd require telling serde_derive to emit a #[derive(Serialize)] for the token enum as well.
  • I'm not certain exactly how serde_derive handles "untagged enum" representations, but serializing inner as one of two types rather than as a serde enum scares me for some reason. I know we don't need to deserialize, but it still squicks me out a bit, even if we're only targeting self-describing, human-oriented formats.

@CAD97 CAD97 requested a review from dragostis March 5, 2019 21:36
@Goncalerta
Copy link
Contributor Author

  • My original sketch had pos: (start, end) because I was thinking serde data model which does have tuples whereas JSON doesn't. If it were on the same line (full control over pretty printing) then the array/tuple solution might've been nice, but inline works nicely as well.

Oh, I see. Well, in order to have full control over pretty printing, there is also the possibility of using format!. That would even lift the dependency on serde and it shouldn't be hard to implement, as the template is pretty simple. However, this would make it harder to serialize to other data formats if that is ever desired.

  • Personally I expected a R: Serialize bound rather than using enum debug formatting. (This is equivalent in "human"/text formats but different in the serde model and in compact formats.) Though I suppose that'd require telling serde_derive to emit a #[derive(Serialize)] for the token enum as well.

Well, I personally think that would add too much complexity, because in order to be able to call #[derive(Serialize)] on the macro, we would need to expose serde on pest's API.

  • I'm not certain exactly how serde_derive handles "untagged enum" representations, but serializing inner as one of two types rather than as a serde enum scares me for some reason. I know we don't need to deserialize, but it still squicks me out a bit, even if we're only targeting self-describing, human-oriented formats.

Well, if you prefer I could make a local enum for it, I just thought it wasn't worth the effort in this case.

@CAD97
Copy link
Contributor

CAD97 commented Mar 6, 2019

My original sketch had pos: (start, end) because I was thinking serde data model which does have tuples whereas JSON doesn't. If it were on the same line (full control over pretty printing) then the array/tuple solution might've been nice, but inline works nicely as well.

Oh, I see. Well, in order to have full control over pretty printing, there is also the possibility of using format!. That would even lift the dependency on serde and it shouldn't be hard to implement, as the template is pretty simple. However, this would make it harder to serialize to other data formats if that is ever desired.

Yeah; I'd potentially go so far as to just do the impl Serialize and allow the user to specify a data format themselves. (Personally, I like RON or YAML for "read-only" data formats over JSON.)

Well, I personally think that would add too much complexity, because in order to be able to call #[derive(Serialize)] on the macro, we would need to expose serde on pest's API.

Yep, the solution here is great for this.

Well, if you prefer I could make a local enum for it, I just thought it wasn't worth the effort in this case.

I'd feel better with a #[derive(Serialize)] #[serde(untagged)] enum here, as there's some guarantee that the serialization API isn't being misused if we go through serde_derive here. But I guess this is more up to @dragostis here.

@ice1000
Copy link
Member

ice1000 commented Apr 10, 2019

What's the current status of this pr?

@ice1000
Copy link
Member

ice1000 commented Apr 10, 2019

Can there be a bin that takes a text file and the grammar file and outputs the json (like to stdout or to a file)?

@ice1000
Copy link
Member

ice1000 commented Apr 10, 2019

One usage can be in my IDE plugin: on-the-fly syntax highlighting your code.

@CAD97 CAD97 requested review from dragostis and removed request for dragostis April 10, 2019 20:20
Copy link
Contributor

@dragostis dragostis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. This looks fine, especially given that it's behind a flag. Thanks a lot!

bors r+

bors bot added a commit that referenced this pull request Apr 14, 2019
381: Generate lexical information of `Pair` and `Pairs` as JSON r=dragostis a=Goncalerta

Closes #377

Implements the `to_json` function for both `Pair` and `Pairs` which generates a pretty-printed JSON that contains their information.

I followed the sketch given in #377, but I wonder if instead of `"pos": [start, end]` it would be better to have `"start"` and `"end"` as separate fields, as in pretty-printed format I personally find it easier to read with less nesting.

This is the generated string right now:
```
{
  "pos": [
    0,
    5
  ],
  "pairs": [
    {
      "pos": [
        0,
        3
      ],
      "rule": "a",
      "inner": {
        "pos": [
          1,
          2
        ],
        "pairs": [
          {
            "pos": [
              1,
              2
            ],
            "rule": "b",
            "inner": "b"
          }
        ]
      }
    },
    {
      "pos": [
        4,
        5
      ],
      "rule": "c",
      "inner": "e"
    }
  ]
}
```

This would be with `"start"` and `"end"`:
```
{
  "start": 0,
  "end": 5,
  "pairs": [
    {
      "start": 0,
      "end": 3,
      "rule": "a",
      "inner": {
        "start": 1,
        "end": 2,
        "pairs": [
          {
            "start": 1,
            "end": 2,
            "rule": "b",
            "inner": "b"
          }
        ]
      }
    },
    {
      "start": 4,
      "end": 5,
      "rule": "c",
      "inner": "e"
    }
  ]
}
```

Co-authored-by: PedroGonçaloCorreia <[email protected]>
@bors
Copy link
Contributor

bors bot commented Apr 14, 2019

Build succeeded

@bors bors bot merged commit f0faa5a into pest-parser:master Apr 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: export lexical information as json/xml/whatever
4 participants