Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Unicode escape sequences #62832

Closed
rw-access opened this issue Sep 23, 2020 · 2 comments · Fixed by #70514
Closed

EQL: Unicode escape sequences #62832

rw-access opened this issue Sep 23, 2020 · 2 comments · Fixed by #70514
Assignees
Labels
:Analytics/EQL EQL querying Team:QL (Deprecated) Meta label for query languages team

Comments

@rw-access
Copy link
Contributor

rw-access commented Sep 23, 2020

Related to #61659

This is a proposal to add two new escape sequences for unicode code points to EQL " strings. Occasionally, we need to inject non-printable unicode characters into an EQL string. The first case we realized this was for the MITRE 2019 eval. We managed to workaround this by carefully copying and pasting a character. It was a little tricky because the character happened to be the RTL encoding character. So as soon as it was pasted, all the subsequent text was flipped, since it was immediately interpreted in the browser.

I propose we add these new escape sequences, which are widely used, and how JSON typically escapes non-ascii characters.

  • \uXXXX
  • \UXXXXXXXX

I believe our current escape sequences are these:

  • \n for newlines
  • \r for carriage returns
  • \b for backspace
  • \t for a tab
  • \f for form feed
  • \\ for backslash

Currently, if \ is not followed by one of these characters, it is invalid syntax. So this change is purely additive and doesn't break. I'm indifferent if this targets 7.10 vs 7.11.

Update:
Use an alternative, less visually ambiguous form for the escape sequence: \u{xxxxxxxx} (2-8 hex characters)

@rw-access rw-access added team-discuss :Analytics/EQL EQL querying Team:QL (Deprecated) Meta label for query languages team needs:triage Requires assignment of a team area label labels Sep 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@rw-access rw-access removed the needs:triage Requires assignment of a team area label label Sep 23, 2020
@matriv matriv self-assigned this Sep 28, 2020
@matriv matriv self-assigned this Mar 2, 2021
matriv added a commit to matriv/elasticsearch that referenced this issue Mar 17, 2021
Occationally, it's useful to be able to use non-printable,
RTL (right-to-left) or other non-standard unicode characters
in an EQL query.

Introducing the standard \uXXXX escape sequence as well as
the variable 2-8 char escape sequence \u{XXXXXXXX}, e.g.:

```
\u0023
\u{35}
\u{1f2da}
\u{002acd1}
```

Closes: elastic#62832
matriv added a commit that referenced this issue Mar 22, 2021
Occasionally, it's useful to be able to use non-printable,
RTL (right-to-left) or other non-standard unicode characters
in an EQL query.

Introducing an escape sequence `\u{XXXXXXXX}` where 2-8 hex
digits are allowed within the curly braces, where zero padding from the
left is implied e.g.:

```
\u{35}
\u{1f2da}
\u{002acd1}
```

Closes: #62832
@matriv
Copy link
Contributor

matriv commented Mar 23, 2021

master : cb6a6e0
7.x : de237f0

matriv added a commit that referenced this issue Mar 23, 2021
Occasionally, it's useful to be able to use non-printable,
RTL (right-to-left) or other non-standard unicode characters
in an EQL query.

Introducing an escape sequence `\u{XXXXXXXX}` where 2-8 hex
digits are allowed within the curly braces, where zero padding from the
left is implied e.g.:

```
\u{35}
\u{1f2da}
\u{002acd1}
```

Closes: #62832
(cherry picked from commit cb6a6e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants