Support point-in-time queries for Tables (cf. queryable state) #530

joewood · 2017-12-14T14:20:16Z

Related to #528.
Executing a SELECT on a table via the REST interface streams the results back. It would be useful to provide a single shot HTTP query of a table that simply did a distributed KTable query without any streaming. Right now this doesn't seem to be possible.

miguno · 2017-12-19T08:35:26Z

TL;DR: This feature request is primarily about adding functionality to KSQL so that it is possible to do a SELECT * FROM <table> with similar semantics as in the RDBMS world (MySQL, Postgres, Oracle, etc).

Today, however, KSQL only supports streaming queries, so a SELECT query against a TABLE will continue to run, and will also include the same key multiple times in the query output if there were subsequent changes to the key.

As already said via Slack: Thanks for reporting @joewood. :)

I assume what you mean by this is the following? Imagine you have a users TABLE with 10 records (= 10 unique users). In this case, you want a way to query the table's current data via e.g. SELECT * FROM users, which should return 10 output records and then terminate (rather than keeping the query running, waiting for changes to the table that will then be outputted, too).

joewood · 2017-12-19T14:15:11Z

@miguno yes, here's a couple of more suggestions (based on how we use the streams API now for distributed queries):

Allow simple key based queries e.g. SELECT * FROM users where key="<value">. This would map to store(name).get( key )
Possibly allow for a key range query e.g. SELECT * FROM users where key BETWEEN "<start>" AND "<end>". Mapping to store(name).range(start,end)
Possibly also include pagination support, like SELECT * FROM users OFFSET 10 ROWS FETCH NEXT 5 ROWS ONLY or with an implicit ROW_INDEX. Right now we do pagination by performing a range and limiting the iteration loop (which isn't ideal). (related to Suggestion - Add Pagination to Table Queries #528)

The above would need to be executed distributed across the KSQL processes. The simple key based queries could be clever and direct the query to the owning node based on partition assignment (we don't do that now, but it's a future optimization).

apurvam · 2018-02-14T00:40:49Z

Hi @joewood, can you share details about the use cases that essentially require KSQL tables to look like 'point in time snapshots' of your stream?

Is it to use KSQL with another system which expects the regular SQL syntax and semantics? Some details on the context of the problem you are trying to solve may suggest workarounds. At the very least it will give us some data which would help in prioritization.

This type of feature is a big investment, so more data points will definitely help moving it forward.

Thanks!

joewood · 2018-02-14T12:48:36Z

Hi @apurvam the use-case is to be able to access the state stores that KSQL is using. This allows external clients to view the state of the KSQL stores like a data repository. For example, I may have a KSQL statement that is used to create an aggregation of products based on orders placed. I may want a simple UI to show the current point in time aggregation by using a simple REST call (similar to the distributed queries in the Confluent Examples). The query could be simply by the store key (or range of store keys). I don't think a full SQL expression eval over every row is required (essentially query by key only).

I think the bigger question here is how KSQL is designed to be used. The current REST interface is an HTTP stream, which is fine - but it doesn't scale very well to multiple consumers. I don't see an easy to way to use KSQL programmatically. I may be missing something as I haven't checked back since December.

apurvam · 2018-02-15T01:42:29Z

I think the bigger question here is how KSQL is designed to be used. The current REST interface is an HTTP stream, which is fine - but it doesn't scale very well to multiple consumers. I don't see an easy to way to use KSQL programmatically.

Not sure I follow. KSQL is a simpler way to do stream processing, ie. read from kafka topic(s) and write to kafka topic(s). The outputs in the kafka topics can be consumed in a variety of ways by multiple consumers, like dumping to an external store using kafka connect, feeding real time dashboards, etc.

For your example above, you could just drop the aggregates generated to an external store and use that to build your UI dashboard. There are already tools to facilitate this (like a variety of sink connectors). What advantage do you see of querying state stores directly?

joewood · 2018-02-15T19:45:28Z

Hi @apurvam, sure I could write a separate streams app that took the backing topic from an aggregate store in KSQL, re-populated the store and expose it over HTTP. That just feels like a lot of additional resources given that the store already exists in the KSQL processes.

instantdelay · 2018-02-20T23:26:02Z

What is the difference between a table and a stream if the current state of a table is not accessible?

Copying from the terminology documentation:

A table is a view of a stream, or another table, and represents a collection of evolving facts. For example, we could have a table that contains the latest financial information such as "Bob’s current account balance is $150". It is the equivalent of a traditional database table but enriched by streaming semantics such as windowing. Facts in a table are mutable, which means new facts can be inserted to the table, and existing facts can be updated or deleted. Tables can be created from a Kafka topic or derived from existing streams and tables. In both cases, a table's underlying data is durably stored (persisted) within a Kafka topic on the Kafka brokers.

I'm confused as to how that example would actually be implemented. How can I turn a stream of account balance updates into a table where I can retrieve a user's current balance?

My initial impression was that select on a table would return the "point in time" results as this issue is requesting. I stumbled upon this issue while trying to figure out what that isn't the case.

apurvam · 2018-02-21T07:03:54Z

Thanks for your comment @instantdelay . Those are reasonable questions. I think there are two separate issues brought up in your post.

The notion of a table still makes sense even if we don't currently support point in time queries on these tables. For instance, we support joins between streams and tables today. For this operation, the definition of a table above makes sense: when a table is joined with a stream, the latest value for a given key in the table is what is joined with the incoming message in the stream.

This is not to say that we wont support point in time queries. But even without them the semantics of a table you laid out can still be used, just not through simple 'select' statements.

Hope this makes sense.

tobihofmann · 2018-02-21T07:25:04Z

We would require point in time queries for our use case, too. We built several microservices which store thier data in global KTables. Currrently we use a workaround to query the global KTable for attributes of the stored data (e.g. SELECT * FROM users WHERE logged_in = true).

PRanadheer · 2018-04-26T05:28:49Z

We too have a use case of point-in-time query support. Just checking if we have any ETA about when this will be supported..

tobihofmann · 2018-05-16T06:13:47Z

Are there any updates on the point-in-time queries? Do you have an ETA?

miguno · 2018-05-16T07:09:38Z

No ETA yet @tobihofmann

dschnek · 2018-06-15T17:41:29Z

We have a use cases for doing KSQL point-in-time queries for our application as well.

codestoned1 · 2018-06-19T16:18:35Z

We also would like KSQL point-in-time queries. It is a much simpler solution than funneling the data to an external datastore, and allows the retrieval to remain in KSQL. I am building an API to allow people to run KSQL statements directly, and it would be much more intuitive to be able to simply run a select statement and be guaranteed receive all the data currently in a table at that time or have an error occur, as in a standard SQL database. The most useful aspect is knowing when all the data in the table has actually been read. Currently, there is no good way to guarantee that all table rows have been read from a KTable unless one sets a long timeout, which is definitely not something I want to do.

codestoned1 · 2018-07-05T21:59:59Z

Does anyone know a good workaround to achieve point-in-time queries? Thanks!

sjs7007 · 2018-08-11T02:16:03Z

I am looking for the same as well. Any updates on this?

On a side note, is something like this available in Streams API itself for Kafka?

gjimher · 2018-09-21T18:30:51Z

I am looking for the same as well. Any updates on this?

On a side note, is something like this available in Streams API itself for Kafka?

"Interactive queries". But some pieces are missing.

https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html
https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

nishutayal · 2018-10-18T08:58:00Z

I also have a use case in my application with Streams API. Wondering if Streams API provide support for same, also pagination and limit.

sachnk · 2019-02-08T14:18:48Z

We're also looking for something like this. Any updates here?

MichaelDrogalis · 2019-06-26T16:04:54Z

This is being queued up to be worked on shortly. ❤️

ghost · 2019-06-28T09:49:52Z

This is being queued up to be worked on shortly. ❤️

made my day! any ETA yet?

MichaelDrogalis · 2019-06-28T14:17:15Z

@desixma :) Not yet, but we'll keep this issue posted with all the updates. All of the design work will happen on GitHub.

apurvam · 2019-07-26T04:55:59Z

The design discussion for this is at #3117

MichaelDrogalis · 2019-11-21T18:31:04Z

This is finally available in ksqlDB 0.6.0! :)

Release: https://ksqldb.io/quickstart.html
Docs: https://docs.ksqldb.io/en/latest/concepts/queries/pull/

miguno added the enhancement label Dec 19, 2017

apurvam changed the title ~~Single shot HTTP Query for Tables?~~ Support Single shot HTTP Query (aka point in time queries) for Tables? Feb 14, 2018

tobihofmann mentioned this issue Feb 21, 2018

Support using KSQL as library to write streaming applications (aka KSQL embedded mode) #734

Closed

miguno changed the title ~~Support Single shot HTTP Query (aka point in time queries) for Tables?~~ Support point-in-time queries for Tables (aka single-shot HTTP query) Feb 21, 2018

miguno changed the title ~~Support point-in-time queries for Tables (aka single-shot HTTP query)~~ Support point-in-time queries for Tables Mar 19, 2018

apurvam mentioned this issue Apr 12, 2018

Define start timestamp and end timestamp (time range) of input data for queries #1144

Open

miguno mentioned this issue Aug 16, 2018

Kafka Streams support? confluentinc/confluent-kafka-go#181

Open

miguno mentioned this issue Sep 14, 2018

KSQL tables are not grouping data by key #1876

Closed

miguno mentioned this issue Nov 26, 2018

Support querying a table's current value for a key #2192

Closed

miguno changed the title ~~Support point-in-time queries for Tables~~ Support point-in-time queries for Tables (cf. queryable state) Jan 15, 2019

miguno mentioned this issue Feb 25, 2019

GROUP BY is not working as expected #2477

Closed

MichaelDrogalis added this to the 5.4 milestone Jun 26, 2019

derekjn mentioned this issue Jul 23, 2019

docs(klip): klip-8: Interactive Queries #3117

Merged

apurvam mentioned this issue Aug 23, 2019

docs(klip): klip-8: Syntax for Point-in-Time Queries #3242

Merged

MichaelDrogalis closed this as completed Nov 21, 2019

eeepmb mentioned this issue Jan 9, 2020

KTable showing duplicate entries #4251

Open

ShahOdin mentioned this issue Feb 27, 2020

(pull)-query a joined table #4493

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support point-in-time queries for Tables (cf. queryable state) #530

Support point-in-time queries for Tables (cf. queryable state) #530

joewood commented Dec 14, 2017

miguno commented Dec 19, 2017 •

edited

Loading

joewood commented Dec 19, 2017 •

edited

Loading

apurvam commented Feb 14, 2018

joewood commented Feb 14, 2018

apurvam commented Feb 15, 2018

joewood commented Feb 15, 2018

instantdelay commented Feb 20, 2018

apurvam commented Feb 21, 2018

tobihofmann commented Feb 21, 2018

PRanadheer commented Apr 26, 2018

tobihofmann commented May 16, 2018

miguno commented May 16, 2018

dschnek commented Jun 15, 2018

codestoned1 commented Jun 19, 2018 •

edited

Loading

codestoned1 commented Jul 5, 2018 •

edited

Loading

sjs7007 commented Aug 11, 2018 •

edited

Loading

gjimher commented Sep 21, 2018

nishutayal commented Oct 18, 2018

sachnk commented Feb 8, 2019

MichaelDrogalis commented Jun 26, 2019

ghost commented Jun 28, 2019

MichaelDrogalis commented Jun 28, 2019

apurvam commented Jul 26, 2019

MichaelDrogalis commented Nov 21, 2019

Support point-in-time queries for Tables (cf. queryable state) #530

Support point-in-time queries for Tables (cf. queryable state) #530

Comments

joewood commented Dec 14, 2017

miguno commented Dec 19, 2017 • edited Loading

joewood commented Dec 19, 2017 • edited Loading

apurvam commented Feb 14, 2018

joewood commented Feb 14, 2018

apurvam commented Feb 15, 2018

joewood commented Feb 15, 2018

instantdelay commented Feb 20, 2018

apurvam commented Feb 21, 2018

tobihofmann commented Feb 21, 2018

PRanadheer commented Apr 26, 2018

tobihofmann commented May 16, 2018

miguno commented May 16, 2018

dschnek commented Jun 15, 2018

codestoned1 commented Jun 19, 2018 • edited Loading

codestoned1 commented Jul 5, 2018 • edited Loading

sjs7007 commented Aug 11, 2018 • edited Loading

gjimher commented Sep 21, 2018

nishutayal commented Oct 18, 2018

sachnk commented Feb 8, 2019

MichaelDrogalis commented Jun 26, 2019

ghost commented Jun 28, 2019

MichaelDrogalis commented Jun 28, 2019

apurvam commented Jul 26, 2019

MichaelDrogalis commented Nov 21, 2019

miguno commented Dec 19, 2017 •

edited

Loading

joewood commented Dec 19, 2017 •

edited

Loading

codestoned1 commented Jun 19, 2018 •

edited

Loading

codestoned1 commented Jul 5, 2018 •

edited

Loading

sjs7007 commented Aug 11, 2018 •

edited

Loading