Skip to content

Commit

Permalink
[Improve][Connector-V2][Clickhouse] Reconstruct the clickhouse connec…
Browse files Browse the repository at this point in the history
…tor document
  • Loading branch information
chenzy15 committed Jul 14, 2023
1 parent 309b58d commit eb1413a
Show file tree
Hide file tree
Showing 2 changed files with 81 additions and 119 deletions.
110 changes: 34 additions & 76 deletions docs/en/connector-v2/sink/Clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,91 +2,49 @@

> Clickhouse sink connector
## Description
## Support Those Engines

Used to write data to Clickhouse.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>
## Key features
## Key Features

- [ ] [exactly-once](../../concept/connector-v2-features.md)

The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.

- [x] [cdc](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|---------------------------------------|---------|----------|---------------|
| host | string | yes | - |
| database | string | yes | - |
| table | string | yes | - |
| username | string | yes | - |
| password | string | yes | - |
| clickhouse.config | map | no | |
| bulk_size | string | no | 20000 |
| split_mode | string | no | false |
| sharding_key | string | no | - |
| primary_key | string | no | - |
| support_upsert | boolean | no | false |
| allow_experimental_lightweight_delete | boolean | no | false |
| common-options | | no | - |

### host [string]

`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` .

### database [string]

The `ClickHouse` database

### table [string]

The table name

### username [string]

`ClickHouse` user username

### password [string]

`ClickHouse` user password
> The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.
### clickhouse.config [map]

In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc` .

### bulk_size [number]

The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`, if checkpoints are enabled, writing will also occur at the times when the checkpoints are satisfied .

### split_mode [boolean]

This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option
should be `true`. They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will be
counted.

### sharding_key [string]

When use split_mode, which node to send data to is a problem, the default is random selection, but the
'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only
worked when 'split_mode' is true.

### primary_key [string]

Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table

### support_upsert [boolean]

Support upsert row by query primary key

### allow_experimental_lightweight_delete [boolean]

Allow experimental lightweight delete based on `*MergeTree` table engine
## Description

### common options
Used to write data to Clickhouse.

Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details
## Supported DataSource Info

In order to use the Clickhouse connector, the following dependencies are required.
They can be downloaded via install-plugin.sh or from the Maven central repository.

| Datasource | Supported Versions | Dependency |
|------------|--------------------|------------------------------------------------------------------------------------------------------------------|
| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) |

## Sink Options

| Name | Type | Required | Default | Description |
|---------------------------------------|---------|----------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"`. |
| database | String | Yes | - | The `ClickHouse` database. |
| table | String | Yes | - | The table name. |
| username | String | Yes | - | `ClickHouse` user username. |
| password | String | Yes | - | `ClickHouse` user password. |
| clickhouse.config | Map | No | | In addition to the above mandatory parameters that must be specified by `clickhouse-jdbc` , users can also specify multiple optional parameters, which cover all the [parameters](https://github.com/ClickHouse/clickhouse-jdbc/tree/master/clickhouse-client#configuration) provided by `clickhouse-jdbc`. |
| bulk_size | String | No | 20000 | The number of rows written through [Clickhouse-jdbc](https://github.com/ClickHouse/clickhouse-jdbc) each time, the `default is 20000`. |
| split_mode | String | No | false | This mode only support clickhouse table which engine is 'Distributed'.And `internal_replication` option-should be `true`.They will split distributed table data in seatunnel and perform write directly on each shard. The shard weight define is clickhouse will counted. |
| sharding_key | String | No | - | When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true. |
| primary_key | String | No | - | Mark the primary key column from clickhouse table, and based on primary key execute INSERT/UPDATE/DELETE to clickhouse table. |
| support_upsert | Boolean | No | false | Support upsert row by query primary key. |
| allow_experimental_lightweight_delete | Boolean | No | false | Allow experimental lightweight delete based on `*MergeTree` table engine. |
| common-options | | No | - | Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details. |

## Examples

Expand Down
90 changes: 47 additions & 43 deletions docs/en/connector-v2/source/Clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,61 +2,66 @@

> Clickhouse source connector
## Description
## Support Those Engines

Used to read data from Clickhouse.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>
## Key features
## Key Features

- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [column projection](../../concept/connector-v2-features.md)

supports query SQL and can achieve projection effect.

- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|------------------|--------|----------|------------------------|
| host | string | yes | - |
| database | string | yes | - |
| sql | string | yes | - |
| username | string | yes | - |
| password | string | yes | - |
| server_time_zone | string | no | ZoneId.systemDefault() |
| common-options | | no | - |

### host [string]

`ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` .

### database [string]

The `ClickHouse` database

### sql [string]
> supports query SQL and can achieve projection effect.
The query sql used to search data though Clickhouse server

### username [string]

`ClickHouse` user username

### password [string]

`ClickHouse` user password

### server_time_zone [string]

The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone.
## Description

### common options
Used to read data from Clickhouse.

Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details
## Supported DataSource Info

In order to use the Clickhouse connector, the following dependencies are required.
They can be downloaded via install-plugin.sh or from the Maven central repository.

| Datasource | Supported Versions | Dependency |
|------------|--------------------|------------------------------------------------------------------------------------------------------------------|
| Clickhouse | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2/connector-clickhouse) |

## Data Type Mapping

| Clickhouse Data type | SeaTunnel Data type |
|--------------------------------------------------------|---------------------|
| String / IP / UUID /Enum | STRING |
| UInt8 | BOOLEAN |
| FixedString | BINARY |
| Int32 / UInt16 / Interval | INTEGER |
| Int8 | TINYINT |
| Int64 | BIGINT |
| Int16 / UInt8 | SMALLINT |
| Float64 | DOUBLE |
| Decimal / Int128 / Int256 / UInt64 / UInt128 / UInt256 | DECIMAL |
| Float32 | FLOAT |
| Date | Date |
| Timestamp | Timestamp |
| DateTime | Time |
| Array | ARRAY |

## Source Options

| Name | Type | Required | Default | Description |
|------------------|--------|----------|------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| host | String | Yes | - | `ClickHouse` cluster address, the format is `host:port` , allowing multiple `hosts` to be specified. Such as `"host1:8123,host2:8123"` . |
| database | String | Yes | - | The `ClickHouse` database |
| sql | String | Yes | - | The query sql used to search data though Clickhouse server |
| username | String | Yes | - | `ClickHouse` user username |
| password | String | Yes | - | `ClickHouse` user password |
| server_time_zone | String | No | ZoneId.systemDefault() | The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone. |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details |

## Examples

Expand All @@ -72,7 +77,6 @@ source {
server_time_zone = "UTC"
result_table_name = "test"
}
}
```

Expand Down

0 comments on commit eb1413a

Please sign in to comment.