-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using KSQL as library to write streaming applications (aka KSQL embedded mode) #734
Comments
We currently have a streaming ETL system implemented using Kafka Streams API in microservice Spring Boot applications. There is a lot of general replication of SQL-like logic in this Java code which seems like it could potentially be replaced by some straightforward KSQL statements. Being able to read from a raw stream of activity events, de-duplicated them, join them with other KTable data from lookup tables, and then write them to processed topics for re-use by other modules or for eventual loading into the data warehouse are the general use cases. |
We are building a set of microservices using the Streams-API. If KSQL could be used as a library (like the Streams-API) we would be able to simplify our code. If KSQL will offer point in time queries (see #530) it would be good to have that available as an API / library, too. |
Hi, |
Hi, |
FYI: You can do that already today by reading the (KSQL) table's backing Kafka topic into a KTable in KStreams. |
I'd be really great if it will be possible to access the state stores backing KSQL queries and do interactive queries on them in a way we can do it for kstreams apps already. AFAIK that's currently not possible ... or actually only with an additional kstreams app. |
We are building a log monitor system with kafka, and want to use KSQL to filter the log data continuously. However, we doubt the reliability of our monitor system to filter data by the way of REST API. If we can use KSQL as a library rather than HTTP request, it would be more reliable |
We want to use Kafka in a multi-tenant way with many customers having their own login credentials with ACLs defined for a limited amount of topics on the Kafka cluster. We'd like to only allow customers to access Kafka with their own credentials, but instantiating and running whole KSQL-server instances for every customer and also limiting access to those instances would get very cumbersome. Also, queries are made dynamically and starting a headless server with a predefined query file would lead to a lot of overhead. So this embedded mode sounds like a great feature for us. We'd be using it from Java (not Scala) for now, and for our use case there would not be a great need to mix the Streams API, as another app could just read the resulting topic for us. It would be ideal if we can read SELECT/non-persistent queries directly from the API though. |
We want to make kafka the source of truth for all our data. And since we are generating a great deal of data, we want to query it in real time. We can use kafka streams, but ksql has semantics which abstracts the underlying streams API and has a programming model which could make it easier to adopt. I vote for KSQL as a Java Library. |
We have SpaaS (Stream-Processing as a Service) built using Kafka streams API and running/supporting separate KSQL server just to run KSQL queries seems extra overhead. We would love to use KSQL in our already existing streaming apps though. In addition, if running KSQL in Java streaming apps is possible it'll give us the possibility to enhance our SPaaS platform where engineers, data scientists, and analysts don't have to write Java code at all when they require new aggregations and/or pipeline. They can submit a pull request with KSQL statements and rest of the magic can be done inside the platform 🚀 🚀 🚀 Looking forward to this!!! |
we're not big fans of having platforms to run things like KSQL and Kafka Connect - we'd rather embed them in a Jar which can be run through our normal development cycle and pipelines. having this stuff on a 'platform' requires someone to manage that platform, yet another deployment burden and the risk of impacting unrelated jobs. embedded, even if it's just wrapper jar. |
We have a lot of SQL developers who are uncomfortable with working on KStream and KTable directly. Allowing them to use KSQL in their spring boot micro services would really increase the adoption rate. |
I am developing a near real time architecture with kafka steams, ksql, registry. Our api read near real time off if kafka topics using spring boot flux and kafka reactive consumer. It be nice if I could convert that to ksql. I know I can post to the ksql interface which I am doing in some cases. A client lib would greatly simplify things overall. |
We were struggling with the adoption of application logic written in plain streams, so we went with refactoring in KSQL. However, due to the complexity of business logic and lack of some features in KSQL atm, some of the parts are still in streams, in a separate module, thus creating a bit of discrepancy in a sense that entire otherwise singular flow split among two modules. Having embedded KSQL server could've helped solve this problem and enable us to do one java app. Another thing is that we'd like to run it in integration tests, similar to |
is confluent building this request? did anyone tried this https://github.com/mmolimar/ksql-jdbc-driver |
I would like to use KSQL directly (particularly for it's time alignment capabilities when crossing streams, e.g. with sliding window or session window) as a step in the middle of my KStreams processing. I'd like to add my own custom logic before and after the KStreams query. |
Spark streaming provides a very easy to use interface SparkSQL for Java Python Scala, so why not Kafka is coming up with the libraries so that we also can leverage this feature. |
I think this is a great example of why this should be made available as a jar. With ksql+kstreams you can avoid complexity until you need it, and build only the complex parts in kstreams/code. As it is per today we need to deal with a large complex kstreams api, rather than being able to put the simple cases in sql and the complex ones in code. Also offering this as a library is a large advantage, as you are probably already monitoring your services closely. The cost here really isnt "you can just start this service", you also need your company's monitoring frameworks to be integrated, healthchecks working, metrics working, alerts working. These are already things you've integrated with the kafka standard + your service... |
I'd love to see a fs2-based client for Scala code. |
KSQL uses Kafka Streams and the major advantage of the latter is that it is not a cluster but a library. This is a selling point for Confluent. It is therefore strange that KSQL moves away from that. |
Wasn't it planned to offer KSQL as an embedded library as well? Both the KSQL paper and some slides from confluent specifically highlight the different deployment modes of KSQL, including 'embedded'. Confluent advertises that benefits of Kafka Streams as an embedded stream processing solution, wouldn't this also apply to KSQL for the same reasons? KSQL's DSL has advantages over the streams Java DSL, like ease of use. |
For those of you wanting to use ksqlDB as a library: while this is not the same functionality as in this feature request, ksqlDB 0.10+ now ships with a native Java client:
This might cover some of the needs of the people in this thread. Feedback is of course welcome! |
Disclaimer, I work for EsperTech. |
This issue will never be solved, since embedded library for really open source kafka-streams contradicts earnings model of confluent. |
Closing this issue. If there are future requests similar to this, please reopen or create a new issue. |
Some users have expressed interest in leveraging KSQL as a Java library to write stream processing applications (on the JVM, i.e. primarily Java/Scala), similar to how Kafka's Streams API is used for developing such applications. Sometimes this has been called "KSQL embedded mode".
If you are interested in this functionality, please:
The text was updated successfully, but these errors were encountered: