Skip to content

Commit

Permalink
[lang] Add escape syntax for field names (#146)
Browse files Browse the repository at this point in the history
Related to #99

Currently field names containing a space, period, or escaped quote,
e.g. `date received` or `grpc.method`, cannot be parsed. This could
be worked around using `jq` or similar tools to rewrite the field
name, but that's a pain.

This commit adds an escaped field name syntax of `["<FIELD>"]`. This is
based on the Object Identifier-Index syntax[0] used by `jq`, so it
should be somewhat familiar to many people who parse JSON on the
command line.

The more obvious option of delimiting with just quotes, e.g.
"date received", creates an ambiguity between string literals and
escaped field names. For example, does `where foo == "date received"`
mean field `foo` matches field `date received`, or field `foo` matches
the string "date received"?

Example query:

```
* | json | where ["grpc.method"] == "Foo" | count by ["date received"]
```

[0]
https://stedolan.github.io/jq/manual/#ObjectIdentifier-Index:.foo,.foo.bar

Co-authored-by: Will Chandler <[email protected]>
  • Loading branch information
wfchandler and Will Chandler authored Jul 23, 2021
1 parent f1c7da9 commit a513dcb
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 1 deletion.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,14 @@ A simple query that operates on JSON logs and counts the number of logs per leve
agrind '* | json | count by log_level'
```

### Escaping Field Names

Field names containing spaces, periods, or quotes must be escaped using `["<FIELD>"]`:

```bash
agrind '* | json | count by ["date received"], ["grpc.method"]
```
### Filters
There are three basic filters:
Expand Down
20 changes: 19 additions & 1 deletion src/lang.rs
Original file line number Diff line number Diff line change
Expand Up @@ -431,12 +431,18 @@ named!(column_ref<Span, Expr>, do_parse!(
(Expr::Column { head: DataAccessAtom::Key(head), rest: rest })
));

named!(ident<Span, String>, do_parse!(
named!(ident<Span, String>, alt!(bare_ident | escaped_ident));

named!(bare_ident<Span, String>, do_parse!(
start: take_while1!(starts_ident) >>
rest: take_while!(is_ident) >>
(start.fragment.0.to_owned() + rest.fragment.0)
));

named!(escaped_ident<Span, String>,
delimited!(tag!("["), map!(quoted_string, |s| s.to_owned()), tag!("]"))
);

named!(arguments<Span, Vec<Expr>>, add_return_error!(SyntaxErrors::StartOfError.into(), delimited!(
tag!("("),
separated_list!(tag!(","), expr),
Expand Down Expand Up @@ -1166,6 +1172,18 @@ mod tests {
expect_fail!(ident, "5x");
}

#[test]
fn parse_quoted_ident() {
expect!(ident, "[\"hello world\"]", "hello world".to_string());
expect!(ident, "[\"hello.world\"]", "hello.world".to_string());
expect!(
ident,
r#"["hello \"world\""]"#,
r#"hello \"world\""#.to_string()
);
expect_fail!(ident, "\"\"");
}

#[test]
fn parse_var_list() {
expect!(
Expand Down
13 changes: 13 additions & 0 deletions tests/structured_tests/escaped_ident.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
query = """
* | json | count by ["grpc.method"], ["start time"], nested.["user.name"]
"""
input = """
{"start time": "today", "grpc.method": "Foo", "nested": {"user.name": "user1"}}
{"start time": "today", "grpc.method": "Bar", "nested": {"user.name": "user1"}}
"""
output = """
["grpc.method"] ["start time"] nested.["user.name"] _count
---------------------------------------------------------------------------------------
Bar today user1 1
Foo today user1 1
"""

0 comments on commit a513dcb

Please sign in to comment.