Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Improve][LocalFileSink]Fix LocalFile Sink file_format_type. #5118

Merged
merged 3 commits into from
Aug 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions docs/en/connector-v2/sink/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ By default, we use 2PC commit to ensure `exactly-once`
| custom_filename | boolean | no | false | Whether you need custom the filename |
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format_type is text |
| row_delimiter | string | no | "\n" | Only used when file_format_type is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
Expand All @@ -52,8 +52,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| batch_size | int | no | 1000000 | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |

### host [string]

Expand Down Expand Up @@ -103,13 +103,13 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
| m | Minute in hour |
| s | Second in minute |

### file_format [string]
### file_format_type [string]

We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.
Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

### field_delimiter [string]

Expand Down Expand Up @@ -198,7 +198,7 @@ FtpFile {
username = "username"
password = "password"
path = "/data/ftp"
file_format = "text"
file_format_type = "text"
field_delimiter = "\t"
row_delimiter = "\n"
sink_columns = ["name","age"]
Expand All @@ -216,7 +216,7 @@ FtpFile {
username = "username"
password = "password"
path = "/data/ftp"
file_format = "text"
file_format_type = "text"
field_delimiter = "\t"
row_delimiter = "\n"
have_partition = true
Expand Down
12 changes: 6 additions & 6 deletions docs/en/connector-v2/sink/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| field_delimiter | string | no | '\001' | Only used when file_format_type is text |
| row_delimiter | string | no | "\n" | Only used when file_format_type is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
Expand All @@ -55,8 +55,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| kerberos_keytab_path | string | no | - | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |

### fs.defaultFS [string]

Expand Down Expand Up @@ -104,7 +104,7 @@ We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.
Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

### field_delimiter [string]

Expand Down Expand Up @@ -198,7 +198,7 @@ For orc file format simple config
HdfsFile {
fs.defaultFS = "hdfs://hadoopcluster"
path = "/tmp/hive/warehouse/test2"
file_format = "orc"
file_format_type = "orc"
}

```
Expand Down
24 changes: 12 additions & 12 deletions docs/en/connector-v2/sink/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you

By default, we use 2PC commit to ensure `exactly-once`

- [x] file format
- [x] file format type
- [x] text
- [x] csv
- [x] parquet
Expand All @@ -36,9 +36,9 @@ By default, we use 2PC commit to ensure `exactly-once`
| custom_filename | boolean | no | false | Whether you need custom the filename |
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format_type is text |
| row_delimiter | string | no | "\n" | Only used when file_format_type is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
Expand All @@ -48,8 +48,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| batch_size | int | no | 1000000 | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |

### path [string]

Expand Down Expand Up @@ -83,13 +83,13 @@ When the format in the `file_name_expression` parameter is `xxxx-${now}` , `file
| m | Minute in hour |
| s | Second in minute |

### file_format [string]
### file_format_type [string]

We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.
Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

### field_delimiter [string]

Expand Down Expand Up @@ -174,7 +174,7 @@ For orc file format simple config

LocalFile {
path = "/tmp/hive/warehouse/test2"
file_format = "orc"
file_format_type = "orc"
}

```
Expand All @@ -185,7 +185,7 @@ For parquet file format with `sink_columns`

LocalFile {
path = "/tmp/hive/warehouse/test2"
file_format = "parquet"
file_format_type = "parquet"
sink_columns = ["name","age"]
}

Expand All @@ -197,7 +197,7 @@ For text file format with `have_partition` and `custom_filename` and `sink_colum

LocalFile {
path = "/tmp/hive/warehouse/test2"
file_format = "text"
file_format_type = "text"
field_delimiter = "\t"
row_delimiter = "\n"
have_partition = true
Expand All @@ -224,7 +224,7 @@ LocalFile {
partition_dir_expression="${k0}=${v0}"
is_partition_field_write_in_file=true
file_name_expression="${transactionId}_${now}"
file_format="excel"
file_format_type="excel"
filename_time_format="yyyy.MM.dd"
is_enable_transaction=true
}
Expand Down
10 changes: 5 additions & 5 deletions docs/en/connector-v2/sink/OssFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| field_delimiter | string | no | '\001' | Only used when file_format_type is text |
| row_delimiter | string | no | "\n" | Only used when file_format_type is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
Expand All @@ -55,8 +55,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| batch_size | int | no | 1000000 | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |

### path [string]

Expand Down Expand Up @@ -112,7 +112,7 @@ We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.
Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

### field_delimiter [string]

Expand Down
10 changes: 5 additions & 5 deletions docs/en/connector-v2/sink/OssJindoFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| file_name_expression | string | no | "${transactionId}" | Only used when custom_filename is true |
| filename_time_format | string | no | "yyyy.MM.dd" | Only used when custom_filename is true |
| file_format_type | string | no | "csv" | |
| field_delimiter | string | no | '\001' | Only used when file_format is text |
| row_delimiter | string | no | "\n" | Only used when file_format is text |
| field_delimiter | string | no | '\001' | Only used when file_format_type is text |
| row_delimiter | string | no | "\n" | Only used when file_format_type is text |
| have_partition | boolean | no | false | Whether you need processing partitions. |
| partition_by | array | no | - | Only used then have_partition is true |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" | Only used then have_partition is true |
Expand All @@ -55,8 +55,8 @@ By default, we use 2PC commit to ensure `exactly-once`
| batch_size | int | no | 1000000 | |
| compress_codec | string | no | none | |
| common-options | object | no | - | |
| max_rows_in_memory | int | no | - | Only used when file_format is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format is excel. |
| max_rows_in_memory | int | no | - | Only used when file_format_type is excel. |
| sheet_name | string | no | Sheet${Random number} | Only used when file_format_type is excel. |

### path [string]

Expand Down Expand Up @@ -112,7 +112,7 @@ We supported as the following file types:

`text` `json` `csv` `orc` `parquet` `excel`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.
Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

### field_delimiter [string]

Expand Down
2 changes: 1 addition & 1 deletion docs/en/connector-v2/sink/S3-Redshift.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ We supported as the following file types:

`text` `csv` `parquet` `orc` `json`

Please note that, The final file name will end with the file_format's suffix, the suffix of the text file is `txt`.
Please note that, The final file name will end with the file_format_type's suffix, the suffix of the text file is `txt`.

### filename_time_format [string]

Expand Down
Loading