Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support accessing JSON fields with periods (. dot notation) #1033

Closed
RickyUlrich opened this issue Mar 23, 2018 · 6 comments
Closed

Support accessing JSON fields with periods (. dot notation) #1033

RickyUlrich opened this issue Mar 23, 2018 · 6 comments

Comments

@RickyUlrich
Copy link

RickyUlrich commented Mar 23, 2018

In the KSQL cli, when I create streams on json data where fields contain periods, I am not able to perform KSQL queries on the fields with periods.

Here is an example

# from command line, load json data into kafka
host$ cat test.period.json | ./kafkacat -P -t test_json
host$ ./kafkacat -C -t test_json -u
{"timestamp":1520949575.820367,"id":2,"foo.bar":1}
{"timestamp":1520949580.817974,"id":1,"foo.bar":1}
{"timestamp":1520949875.920827,"id":3,"foo.bar":3}
% Reached end of topic test_json [0] at offset 3
host$

# from ksql-cli

ksql> CREATE STREAM test_json1 (id varchar, foo.bar varchar, timestamp BIGINT) with (VALUE_FORMAT = 'JSON', KAFKA_TOPIC = 'test_json', TIMESTAMP = 'timestamp');           [3/1982]
line 1:41: extraneous input '.' expecting {'ADD', . . . }
ksql> 
ksql> CREATE STREAM test_json1 (id varchar, `foo.bar` varchar, timestamp BIGINT) with (VALUE_FORMAT = 'JSON', KAFKA_TOPIC = 'test_json', TIMESTAMP = 'timestamp');

 Message
----------------
 Stream created
----------------
ksql> describe test_json1;

 Field     | Type
---------------------------------------
 ROWTIME   | BIGINT           (system)
 ROWKEY    | VARCHAR(STRING)  (system)
 ID        | VARCHAR(STRING)
 foo.bar | VARCHAR(STRING)
 TIMESTAMP | BIGINT
---------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED Stream,Table;
ksql> select * from test_json1;
Code generation failed for SelectValueMapper
Caused by: java.lang.String cannot be cast to io.confluent.ksql.function.udf.Kudf
ksql> CREATE STREAM test_json2 (id varchar, timestamp BIGINT) with (VALUE_FORMAT = 'JSON', KAFKA_TOPIC = 'test_json', TIMESTAMP = 'timestamp');

 Message
----------------
 Stream created
----------------
ksql> SET 'auto.offset.reset' = 'earliest';
Successfully changed local property 'auto.offset.reset' from 'earliest' to 'earliest'
ksql> select * from test_json2;
1520949575 | null | 2 | 1520949575
1520949580 | null | 1 | 1520949580
1520949875 | null | 3 | 1520949875
^CQuery terminated                
ksql> 

As you can see above, I can import streams where fields have periods but I cannot select data from those fields with periods without getting the SelectValueMapper code generation error.

Furthermore, when I create the Stream with foo.bar surrounded by single quotes I get a syntax error. When I use double quotes, the stream "compiles" but I get the same Code generation failed for SelectValueMapper error.

Thank you for your time!

@OneCricketeer
Copy link

OneCricketeer commented Mar 23, 2018

Keep in mind: How would you differentiate queries for "foo.bar": "value" and

"foo": {
  "bar": "value"
}

? Related - #638

@RickyUlrich
Copy link
Author

@Cricket007 That is a very good point! I'm not sure how we would resolve that. Maybe something like:
SELECT "foo.bar" ... for your example. And
SELECT 'foo.bar' ... for the case where the json field is literally called foo.bar

@big-andy-coates
Copy link
Contributor

Have you tried escaping the period? e.g.

CREATE STREAM test_json1 (id varchar, foo\.bar varchar, timestamp BIGINT) ...

@miguno miguno changed the title KSQL, JSON fields with periods Support accessing JSON fields with periods (. dot notation) Mar 27, 2018
@RickyUlrich
Copy link
Author

RickyUlrich commented Apr 5, 2018

Apologies for the delayed response @big-andy-coates. When I run that, I get this error:

test_json', TIMESTAMP = 'timestamp');
line 1:59: extraneous input '.' expecting {'ADD', 'APPROXIMATE', 'AT', 'CONFIDENCE', 'NO', 'SUBSTRING', 'POSITION', 'TINYINT', 'SMALLINT', 'INTEGER', 'DATE', 'TIME', 'TIMESTAMP', 'INTERVAL', 'YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'ZONE', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'VIEW', 'REPLACE', 'GRANT', 'REVOKE', 'PRIVILEGES', 'PUBLIC', 'OPTION', 'EXPLAIN', 'ANALYZE', 'FORMAT', 'TYPE', 'TEXT', 'GRAPHVIZ', 'LOGICAL',
'DISTRIBUTED', 'TRY', 'SHOW', 'TABLES', 'SCHEMAS', 'CATALOGS', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'TO', 'SYSTEM', 'BERNOULLI', 'POISSONIZED', 'TABLESAMPLE', 'RESCALED', 'ARRAY', 'MAP', 'SET', 'RESET', 'SESSION', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'WORK', 'ISOLATION', 'LEVEL', 'SERIALIZABLE', 'REPEATABLE', 'COMMITTED', 'UNCOMMITTED', 'READ', 'WRITE', 'ONLY', 'CALL', 'NFD', 'NFC', 'NFKD', 'NFKC', 'IF', 'NULLIF', 'COALESCE', IDENTIFIER, DIGIT_IDENTIFIER, QUOTED_IDENTIFIER, BACKQUOTED_IDENTIFIER}
Caused by: line 1:59: extraneous input '.' expecting {'ADD', 'APPROXIMATE', 'AT', 'CONFIDENCE', 'NO', 'SUBSTRING', 'POSITION', 'TINYINT', 'SMALLINT', 'INTEGER', 'DATE', 'TIME', 'TIMESTAMP', 'INTERVAL', 'YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'ZONE', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'VIEW', 'REPLACE', 'GRANT', 'REVOKE', 'PRIVILEGES', 'PUBLIC', 'OPTION', 'EXPLAIN', 'ANALYZE', 'FORMAT', 'TYPE', 'TEXT', 'GRAPHVIZ',
'LOGICAL', 'DISTRIBUTED', 'TRY', 'SHOW', 'TABLES', 'SCHEMAS', 'CATALOGS', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'TO', 'SYSTEM', 'BERNOULLI', 'POISSONIZED', 'TABLESAMPLE', 'RESCALED', 'ARRAY', 'MAP', 'SET', 'RESET', 'SESSION', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'WORK', 'ISOLATION', 'LEVEL', 'SERIALIZABLE', 'REPEATABLE', 'COMMITTED', 'UNCOMMITTED', 'READ', 'WRITE', 'ONLY', 'CALL', 'NFD', 'NFC', 'NFKD', 'NFKC', 'IF', 'NULLIF', 'COALESCE',
IDENTIFIER, DIGIT_IDENTIFIER, QUOTED_IDENTIFIER, BACKQUOTED_IDENTIFIER}
ksql>

I should add that I am using the ksql-cli docker image: confluentinc/ksql-cli:0.5.

I have also found a workaround for this problem by ensuring JSON fields don't have periods. But, still thanks for the help.

@smil2k
Copy link

smil2k commented Oct 7, 2019

@Cricket007:
You can differentiate the same way as in any old SQL interpreter does: "schema"."property" or "property.with.dots".

@MoRezaeirad
Copy link

user foo\.bar instead of foo.bar in your ...CREATE STREAM test_json1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants