Skip to content

Commit

Permalink
Extend the PPL identifier defintion (#888)
Browse files Browse the repository at this point in the history
* Extend the PPL identifier defintion

* update

* fix bug
  • Loading branch information
penghuo authored Dec 4, 2020
1 parent 80033fc commit 7dc8699
Show file tree
Hide file tree
Showing 4 changed files with 54 additions and 13 deletions.
16 changes: 9 additions & 7 deletions docs/experiment/ppl/general/identifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ Description

A regular identifier is a string of characters that must start with ASCII letter (lower or upper case). The subsequent character can be a combination of letter, digit, underscore (``_``). It cannot be a reversed key word. And whitespace and other special characters are not allowed.

For Elasticsearch, the following identifiers are supported extensionally:

1. Identifiers prefixed by dot ``.``: this is called hidden index in Elasticsearch, for example ``.kibana``.
2. Identifiers prefixed by at sign ``@``: this is common for meta fields generated in Logstash ingestion.
3. Identifiers with ``-`` in the middle: this is mostly the case for index name with date information.
4. Identifiers with star ``*`` present: this is mostly an index pattern for wildcard match.

Examples
--------

Expand All @@ -46,12 +53,7 @@ Delimited Identifiers
Description
-----------

A delimited identifier is an identifier enclosed in back ticks `````. In this case, the identifier enclosed is not necessarily a regular identifier. In other words, it can contain any special character not allowed by regular identifier. For Elasticsearch, the following identifiers are supported extensionally:

1. Identifiers prefixed by dot ``.``: this is called hidden index in Elasticsearch, for example ``.kibana``.
2. Identifiers prefixed by at sign ``@``: this is common for meta fields generated in Logstash ingestion.
3. Identifiers with ``-`` in the middle: this is mostly the case for index name with date information.
4. Identifiers with star ``*`` present: this is mostly an index pattern for wildcard match.
A delimited identifier is an identifier enclosed in back ticks `````. In this case, the identifier enclosed is not necessarily a regular identifier. In other words, it can contain any special character not allowed by regular identifier.

Use Cases
---------
Expand All @@ -67,7 +69,7 @@ Examples

Here are examples for quoting an index name by back ticks::

od> source=`acc*` | fields `account_number`;
od> source=`accounts` | fields `account_number`;
fetched rows / total rows = 4/4
+------------------+
| account_number |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,17 +79,18 @@ public void testStatsMax() throws IOException {
@Test
public void testStatsNested() throws IOException {
JSONObject response =
executeQuery(String.format("source=%s | stats avg(abs(age)*2) as AGE", TEST_INDEX_ACCOUNT));
executeQuery(String.format("source=%s | stats avg(abs(age) * 2) as AGE",
TEST_INDEX_ACCOUNT));
verifySchema(response, schema("AGE", null, "double"));
verifyDataRows(response, rows(60.342));
}

@Test
public void testStatsNestedDoubleValue() throws IOException {
JSONObject response =
executeQuery(String.format("source=%s | stats avg(abs(age)*2.0)",
executeQuery(String.format("source=%s | stats avg(abs(age) * 2.0)",
TEST_INDEX_ACCOUNT));
verifySchema(response, schema("avg(abs(age)*2.0)", null, "double"));
verifySchema(response, schema("avg(abs(age) * 2.0)", null, "double"));
verifyDataRows(response, rows(60.342));
}

Expand Down
4 changes: 1 addition & 3 deletions ppl/src/main/antlr/OpenDistroPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -230,13 +230,11 @@ CONCAT_WS: 'CONCAT_WS';
LENGTH: 'LENGTH';
STRCMP: 'STRCMP';

// LITERALS AND VALUES
//STRING_LITERAL: DQUOTA_STRING | SQUOTA_STRING | BQUOTA_STRING;
ID: ID_LITERAL;
INTEGER_LITERAL: DEC_DIGIT+;
DECIMAL_LITERAL: (DEC_DIGIT+)? '.' DEC_DIGIT+;

fragment ID_LITERAL: [A-Z_]+[A-Z_$0-9@\-]*;
fragment ID_LITERAL: [@*A-Z]+?[*A-Z_\-0-9]*;
DQUOTA_STRING: '"' ( '\\'. | '""' | ~('"'| '\\') )* '"';
SQUOTA_STRING: '\'' ('\\'. | '\'\'' | ~('\'' | '\\'))* '\'';
BQUOTA_STRING: '`' ( '\\'. | '``' | ~('`'|'\\'))* '`';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,18 @@

import com.amazon.opendistroforelasticsearch.sql.ast.Node;
import com.amazon.opendistroforelasticsearch.sql.ast.tree.RareTopN.CommandType;
import com.amazon.opendistroforelasticsearch.sql.common.antlr.SyntaxCheckException;
import com.amazon.opendistroforelasticsearch.sql.ppl.antlr.PPLSyntaxParser;
import org.junit.Ignore;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;

public class AstBuilderTest {

@Rule
public ExpectedException exceptionRule = ExpectedException.none();

private PPLSyntaxParser parser = new PPLSyntaxParser();

@Test
Expand Down Expand Up @@ -366,6 +372,40 @@ public void testIndexName() {
));
}

@Test
public void testIdentifierAsIndexNameStartWithDot() {
assertEqual("source=.kibana",
relation(".kibana"));
}

@Test
public void identifierAsIndexNameWithDotInTheMiddleThrowException() {
exceptionRule.expect(SyntaxCheckException.class);
plan("source=log.2020.10.10");
}

@Test
public void testIdentifierAsIndexNameWithSlashInTheMiddle() {
assertEqual("source=log-2020",
relation("log-2020"));
}

@Test
public void testIdentifierAsIndexNameContainStar() {
assertEqual("source=log-2020-10-*",
relation("log-2020-10-*"));
}

@Test
public void testIdentifierAsFieldNameStartWithAt() {
assertEqual("source=log-2020 | fields @timestamp",
projectWithArg(
relation("log-2020"),
defaultFieldsArgs(),
field("@timestamp")
));
}

@Test
public void testRareCommand() {
assertEqual("source=t | rare a",
Expand Down

0 comments on commit 7dc8699

Please sign in to comment.