Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't check copartitioning for lookup topics #109

Closed
burdiyan opened this issue Mar 15, 2018 · 4 comments
Closed

Don't check copartitioning for lookup topics #109

burdiyan opened this issue Mar 15, 2018 · 4 comments
Labels

Comments

@burdiyan
Copy link
Contributor

Right now Goka checks for copartitioning all topics in the graph. I think it should not check that for lookup topics, since these are materialized completely in all instances of the application.

We have a use case when we want to have 10 partitions for input topic, but being able to do lookups from topics with only 1 partition.

Do you agree, or am I missing some situation where it would not be possible?

@db7
Copy link
Collaborator

db7 commented Mar 15, 2018

Well, I don't really agree on this one.

The loop topic should be copartitioned. The rationale behind loopback is to let one key "communicate" with another, not to repartition a topic.

Let me give you the typical use case for loopback: There is an event saying A sent a message to B. We want a processor to keep a counter for how many messages a user sent and received. With that we can calculate ratios between both values and perhaps send alert messages. The group table would have values like this:

type value struct {
   sent int
   received int
}

The processor does the following, it consumes the "messages" topic, which has the sender A as key. Once it receives the message, it increments sent for the user and loopbacks the message to B (the id of B is inside the message). Once key B receives the message in the loopback callback, it increments B's received counter.

We have also scenarios where the message is looped back yet another time. We do some online collaborative filtering and there the flow is something like this:

  • event arrives on A and A sends its model to B
  • B updates its model with A's model and send its model to A
  • A updates its model with B's.

When we need to do the repartition of a topic, we consume from an N-partitioned topic and emit into an M-partitioned topic and then consume that topic in another processor. Is that ok for your use case?

@db7 db7 added the question label Mar 15, 2018
@burdiyan
Copy link
Contributor Author

@db7 I totally get what you mean and fully agree on that! But I was talking about lookup, not loopback :)

@db7
Copy link
Collaborator

db7 commented Mar 15, 2018

@burdiyan Oh, I see... sorry for that :$

I fully agree on your point too, Lookup tables can have any number of partitions. And I see that the current implementation of processor does not allow that. This is definitely a bug.

@db7
Copy link
Collaborator

db7 commented Mar 16, 2018

This should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants