-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-key joins. #4424
Comments
@big-andy-coates , @vpapavas Is support for non key table join coming to ksqlDB in 0.8.0 release ? We have a project going into production in second quarter of this year and its heavily dependent on table joins which always dont share same primary key. Without this feature GlobalKTable and custom kafka streams may be our only option (which we were trying to avoid with ksql platform) |
Hi @entechlog, unfortunately foreign key joins are not coming in 0.8.0. How about creating additional tables that have as primary key the join key you are interested in? Simulate the behavior of secondary indeces basically by creating extra tables by hand that have the schema you need for each join you are interested in, |
@vpapavas , Here I need to join the orders table every time when an order comes in with user table and both has different PK's. I can't make USER_ID as PK in orders table, then it will retain only one order record, Unless the PK's are same ksqlDB is not letting to join the tables. TBL_ORDERS TBL_USER |
@entechlog you have to convert your orders table to a stream partitioned by user ID, then you can perform a stream - table join on user ID, repartition back to order ID and convert back to a table |
@PeterLindner, Thanks for the input. We are trying several things and will keep you posted on what we end up doing. The example was just to share the idea but we are dealing with much complex with multiple self joins and self joins based on result from aggregation of all historical data. |
@big-andy-coates I was actually just about to email you about this, as I see you've done a lot of work on the joins to date. I contributed the PR (with some help from many fine people) for KIP-213 mentioned above, and I've been looking at how to get this into KSQL. I've been looking under the hood of KSQL and I admit that I don't have a firm handle on it yet, and was wondering where this sort of work sits in terms of priority for the Confluent folks working on this product. I'm interested in helping if I can, but I admit that I know next to nothing about KSQL's engine. Thanks Andy! |
Bumping the question @big-andy-coates |
@PeterLindner collect_set has a limitation of just 1000 elements in the array. Did you implement non-key joins with any other work-around? |
@bhamur unfortunately not, for my use case I knew, that I'd have at most 3 records in the set |
@entechlog did you find a solution for the join with foreign-key ? I have the same problem and I was wondering if you found a solution yet ? :) |
Hi @bellemare, sorry for the delay in answering your question! To be honest, I'm not 100% sure where this comes in terms of priority. I know its not part of the next quarter's roadmap. Beyond that, I'm not sure. @MichaelDrogalis or @derekjn may be able to comment more. It would be great to have you contribute, if you feel able! First steps would be to write up the design proposal: https://github.com/confluentinc/ksql/blob/master/design-proposals/README.md. Andy |
A KIP/patch for this would be 😍 |
Unfortunately, I don't have the time to devote to this in the next 3 months due to personal obligations. We also aren't using KSql where I work at the moment, so I can't reasonably get cycles there to address it. |
Understandable. We appreciate your contribution to Streams, and we will leverage your work by exposing it in ksqlDB within the foreseeable future, just no firm date as yet. What this space! |
I'll keep my eyes and ears open! That being said, if something changes, this would probably be one of the first things I would work on :) |
is there some updates for this feature? |
Hey @rtrive, nothing yet. But patches welcome from anyone that wants to work on this. ❤️ |
@yuranos We're working on this right now. :) |
Howdy! Is there a place we could track code progress for that issue? |
There is no PR yet... Stay tuned. Happy to link PRs to this ticket when available. |
Glad the one-to-many is getting worked on. this specific issue is keeping us from actually treating ksql as anything other than a toy to play around with. Hoping this gets over the finish line soon...it could offset a whole lot of consumer code we currently are having to develop. |
First PR: #7452 |
Second PR: #7491 |
Adding more tests: #7526 |
Required refactoring: #7499 |
@mjsax I am very pleased to see your work on this! |
Logical to Physical plan translation (and trying to get the tests green): #7543 |
More tests: #7537 |
Some more bug fixes: #7576 |
Add missing test: #7588 |
Another bug fix: #7592 |
Enable FK-joins: #7591 |
Enable more tests: #7593 |
Updating docs: #7628 |
Hey @mjsax thanks for posting these updates! Do I read it right that it's actually a N:1 join, so a row from the left table is joined with just one from the right, but one and the same row from the right table can be joined to multiple different rows from the left? For example with the tables users and orders |
That is correct. If you think in DW terms, the left table would be your "fact table" and you can do FK-lookups into the right "dimension table". It would be possible of course, to enhance ksqlDB to also allow the second query (and I am sure we will do this at some point). It's similar to the missing "RIGHT JOIN" support -- we don't support |
With https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable complete, we should be able to enhance ksqlDB to support non-key joins!
The text was updated successfully, but these errors were encountered: