Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KTable showing duplicate entries #4251

Open
eeepmb opened this issue Jan 9, 2020 · 0 comments
Open

KTable showing duplicate entries #4251

eeepmb opened this issue Jan 9, 2020 · 0 comments

Comments

@eeepmb
Copy link

eeepmb commented Jan 9, 2020

Hi guys,
I just started to get my hand on the latest release of ksqlDB and tested it.
Right now I'm facing the problem that in contrast to the little example given on your landing page a table shows duplicate entries when inserting the same message again and again. According to issue #530 this should be fixed.

I set up a test environment with this tutorial and ran following commands in the cli:

CREATE STREAM input_stream_json (id STRING, diff STRING, name STRING, date STRING)
    WITH (VALUE_FORMAT='JSON', KAFKA_TOPIC='input', KEY='id', PARTITIONS=1, REPLICAS=1);

CREATE TABLE dedup_table (id STRING, diff STRING)
    WITH (
       kafka_topic = 'input',
       key = 'id',
       value_format = 'json'
);

CREATE STREAM output_stream AS
  SELECT s.id, s.diff, s.name, s.date
  FROM input_stream_json s
  WHERE s.diff != 'dedup_table.diff';

The desired output should be that output_stream only keeps unique messages regarding the diff value. However it contains all messages (and so does the table that should filter the duplicated entries).

print 'input';
Format:JSON
{"ROWTIME":1578562247160,"ROWKEY":"1","ID":"1","DIFF":"10","NAME":"me","DATE":"2018"}
{"ROWTIME":1578562247775,"ROWKEY":"1","ID":"1","DIFF":"10","NAME":"me","DATE":"2018"}
{"ROWTIME":1578562248333,"ROWKEY":"1","ID":"1","DIFF":"10","NAME":"me","DATE":"2018"}

select * from dedup_table;
|1578562244536          |1                      |1                      |10                     |
|1578562246424          |1                      |1                      |10                     |
|1578562248333          |1                      |1                      |10                     |

select * from output_stream;
|1578562247160      |1                  |1                  |10                 |me              |2018               |
|1578562247775      |1                  |1                  |10                 |me              |2018               |
|1578562248333      |1                  |1                  |10                 |me              |2018               |

Any help or advice would be highly appreciated. Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant