Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design persistent log to support input connector fault tolerance #13

Open
blp opened this issue Apr 26, 2024 · 0 comments
Open

Design persistent log to support input connector fault tolerance #13

blp opened this issue Apr 26, 2024 · 0 comments

Comments

@blp
Copy link
Member

blp commented Apr 26, 2024

For a fault tolerant (FT) Feldera pipeline, the division of input into steps must be durable (see https://github.com/feldera/dist-design?tab=readme-ov-file#inputoutput-synchronization). To achieve that, we require that each FT input connector have its own way for it to record the division into steps persistently. We only have a single FT input connector now, for Kafka.

Problems:

  • This design, where we write to one Kafka topic whenever we read data, seems to cut against the grain for Kafka (at least with Redpanda, which is what we're using for testing) in that, while it seems like we're using it according to the documentation, we get lots of mysterious failures in CI (see Kafka test failures for "Partition queues should be split off for all subscribed partitions" feldera#1331, Kafka test failures setting up the initial connection feldera#1469, Kafka test failure for proptest_kafka_input feldera#1470). The failures are hard to reproduce on demand and the logging that we've added so far doesn't produce much extra insight.
  • Speculatively, we think that it will be difficult for some users to be able to add extra Kafka topics for recording input division. There can be a separate team that admins the Kafka broker, which might need to approve new topics.
  • In the presence of multiple FT pipelines that use the same Kafka topics for input, we will need some way to invent unique names for recording the divisions. (Probably a minor problem on its own but it would exacerbate the problem if it's difficult to add Kafka topics at all.)
  • This design suits the Kafka FT input connector, since Kafka provides write access, but it doesn't extend to read-only input connectors like the URL connector. For FT support, those connectors would need to add some log on the side.

The proposal, then, is to add a new database or log to record the division of input. The goal of this issue is for the design of such a log. See https://github.com/feldera/dist-design?tab=readme-ov-file#inputoutput-synchronization for further background and contraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant