diff --git a/docs/content/preview/architecture/docdb-replication/change-data-capture.md b/docs/content/preview/architecture/docdb-replication/change-data-capture.md index bf7d65d29285..b783a90ee1e7 100644 --- a/docs/content/preview/architecture/docdb-replication/change-data-capture.md +++ b/docs/content/preview/architecture/docdb-replication/change-data-capture.md @@ -16,16 +16,14 @@ type: docs ## Architecture -![Stateless CDC Service](/images/architecture/stateless_cdc_service.png) - Every YB-TServer has a `CDC service` that is stateless. The main APIs provided by the CDC service are the following: - `createCDCSDKStream` API for creating the stream on the database. - `getChangesCDCSDK` API that can be used by the client to get the latest set of changes. -## CDC streams +![Stateless CDC Service](/images/architecture/stateless_cdc_service.png) -Creating a new CDC stream returns a stream UUID. This is facilitated via the [yb-admin](../../../admin/yb-admin/#change-data-capture-cdc-commands) tool. +## CDC streams YugabyteDB automatically splits user tables into multiple shards (also called tablets) using either a hash- or range-based strategy. The primary key for each row in the table uniquely identifies the location of the tablet in the row. @@ -39,11 +37,13 @@ The Debezium YugabyteDB connector captures row-level changes in the schemas of a ![How does CDC work](/images/explore/cdc-overview-work.png) -The connector produces a change event for every row-level insert, update, and delete operation that was captured, and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics that correspond to the database tables of interest, and can react to every row-level event they receive from those topics. For each table, the default behavior is that the connector streams all generated events to a separate Kafka topic for that table. Applications and services consume data change event records from that topic. +The core primitive of CDC is the _stream_. Streams can be enabled and disabled on databases. You can specify which tables to include or exclude. Every change to a watched database table is emitted as a record in a configurable format to a configurable sink. Streams scale to any YugabyteDB cluster independent of its size and are designed to impact production traffic as little as possible. + +Creating a new CDC stream returns a stream UUID. This is facilitated via the [yb-admin](../../../admin/yb-admin/#change-data-capture-cdc-commands) tool. A stream ID is created first, per database. You configure the maximum batch side in YugabyteDB, while the polling frequency is configured on the connector side. -The core primitive of CDC is the _stream_. Streams can be enabled and disabled on databases. Every change to a watched database table is emitted as a record in a configurable format to a configurable sink. Streams scale to any YugabyteDB cluster independent of its size and are designed to impact production traffic as little as possible. +Connector tasks can consume changes from multiple tablets. At least once delivery is guaranteed. In turn, connector tasks write to the Kafka cluster, and tasks don't need to match Kafka partitions. Tasks can be independently scaled up or down. -![How does CDC work](/images/explore/cdc-overview-work3.png) +The connector produces a change event for every row-level insert, update, and delete operation that was captured, and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics that correspond to the database tables of interest, and can react to every row-level event they receive from those topics. For each table, the default behavior is that the connector streams all generated events to a separate Kafka topic for that table. Applications and services consume data change event records from that topic. All changes for a row (or rows in the same tablet) are received in the order in which they happened. A checkpoint per stream ID and tablet is updated in a state table after a successful write to Kafka brokers. ## CDC guarantees diff --git a/docs/static/images/architecture/cdc-logical-replication-architecture.png b/docs/static/images/architecture/cdc-logical-replication-architecture.png index 558d7967d17c..9f2ce6d53033 100644 Binary files a/docs/static/images/architecture/cdc-logical-replication-architecture.png and b/docs/static/images/architecture/cdc-logical-replication-architecture.png differ diff --git a/docs/static/images/explore/cdc-overview-work.png b/docs/static/images/explore/cdc-overview-work.png index 48f0220a44b7..6dffb3e19011 100644 Binary files a/docs/static/images/explore/cdc-overview-work.png and b/docs/static/images/explore/cdc-overview-work.png differ diff --git a/docs/static/images/explore/cdc-overview-work2.png b/docs/static/images/explore/cdc-overview-work2.png index e089ca2bdbe1..690050b63aee 100644 Binary files a/docs/static/images/explore/cdc-overview-work2.png and b/docs/static/images/explore/cdc-overview-work2.png differ