Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create CDC event stream CRD #1570

Merged
merged 55 commits into from
Feb 28, 2022
Merged

create CDC event stream CRD #1570

merged 55 commits into from
Feb 28, 2022

Conversation

FxKu
Copy link
Member

@FxKu FxKu commented Jul 27, 2021

The PR introduces a new operator feature to enable change data capture (CDC) streams into Zalando’s distributed event broker Nakadi. It utilizes an internal Zalando operator for creating Debezium workflows reading changes from Postgres’ logical replication slots and distribute to Nakadi (or AWS SQS)

From a new section in the manifest to list streams, the operator will create or update a custom resource that can be picked up by the CDC operator. The Postgres Operator user must specify the database tables for which logical decoding is enabled as well as event topics which are defined in the sink (Nakadi). The CDC operator is build around the outbox pattern. It’s idea is to create a de-normalized table in Postgres that resembles the structure of then event sink and perform logical decoding only on this “outbox” table. Either the app or the database is then responsible to copy change data from original tables to the outbox table. The Postgres Operator assumes that the outbox table is called like the specified table, appended with event type and _outbox suffix.

ToDo:

  • fix code generation, atm it has been created manually. Running ./hack/update-codegen.sh only updates acid API
  • do we need to sync DB resources? Might be the job of the CDC operator to report if specified databases and tables do not exist The operator should not deal with creating tables
  • should we patch the Postgres manifest, setting wal_level to logical and creating a logical replication slot under Patroni section? Or is it up to the user to configure it before enabling streams? yes and no: the operator changes the Postgres config without the need to prepare the manifest
  • operator should create a replication user to be used for all event streams

@FxKu FxKu added this to the 1.7 milestone Jul 27, 2021
@FxKu FxKu added the zalando label Jul 27, 2021
completed codegen manually
provide update and compare code for fes
add db check for specified databases and tables
add unit test
@FxKu FxKu modified the milestones: 1.7, 1.8 Aug 12, 2021
}

tableNames := make([]string, len(tables))
i := 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this counter needed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fill the tableNames slice a bit quicker. I read that append (which we usually use 😃 ) is slower.

@sdudoladov
Copy link
Member

👍

@sdudoladov
Copy link
Member

👍

1 similar comment
@FxKu
Copy link
Member Author

FxKu commented Feb 28, 2022

👍

@FxKu FxKu merged commit d8a159e into master Feb 28, 2022
@achrafsahnoun1
Copy link

Hey, I want to include the streams section in the cluster manifet of postgres, I didnt find any example on how to write the tables that we want to include, this is one of my trials:

Enables change data capture streams for defined database tables

streams:

  • applicationId: test-app
    database: postgres
    tables:
    data.state_pending_outbox:
    eventType: test-app.status-pending
    data.state_approved_outbox:
    eventType: test-app.status-approved
    data.orders_outbox:
    eventType: test-app.order-completed
    idColumn: o_id
    payloadColumn: o_payload
    data.links_outbox:
    eventType: test-app.order-completed
    idColumn: id
    payloadColumn: payload
    ///////////////////////////////
    In this case I want to include the links table which is in the test database, I checked if there are any publication created from the operator for that table since it should create it automatically but there is no one created. Thank you in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants