Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: DELIMITED should match more closely the functionality of mysql's LOAD INTO #40959

Closed
spaskob opened this issue Sep 21, 2019 · 1 comment · Fixed by #40960
Closed
Assignees
Labels
A-disaster-recovery A-sql-semantics C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@spaskob
Copy link
Contributor

spaskob commented Sep 21, 2019

Currently DELIMITED mode for importing text files has limited support for handling unmatched quotes in enclosed fields. Mysql supports a broad range of use cases and since DELIMITED is meant as our analog of Mysql's LOAD INTO command we should get them to par. Also this is currently causing problems with on-boarding enterprise clients. See #40374 which deals with mishandling of a few rows in large csv-like files.

This what one can achieve with MySql today when importing file quotes.csv with separator |:

"Iam"|quoted field.
I"a"m|non-quoted 2 quotes in middle.
"I"a"m"|quoted 2 quotes in middle.
I"am|non-quoted quote in middle.
"I"am"|quoted quote in middle.

Our current implementation crashed on this input, while in MySql this is treated quite sanely:

mysql> LOAD DATA INFILE 'quotes.csv'  into table baz FIELDS TERMINATED BY '|' ENCLOSED BY '"';
Query OK, 5 rows affected (0.03 sec)
Records: 5  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from baz;
+-------+--------------------------------+
| s1    | s2                             |
+-------+--------------------------------+
| Iam   | quoted field.                  |
| I"a"m | non-quoted 2 quotes in middle. |
| I"a"m | quoted 2 quotes in middle.     |
| I"am  | non-quoted quote in middle.    |
| I"am  | quoted quote in middle.        |
+-------+--------------------------------+
5 rows in set (0.01 sec)

Describe the solution you'd like
Add support for the above use case in Cockroah DELIMITED import mode.

Additional context
See #40374

@spaskob spaskob added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-semantics A-disaster-recovery labels Sep 21, 2019
@spaskob spaskob self-assigned this Sep 21, 2019
@spaskob
Copy link
Contributor Author

spaskob commented Sep 21, 2019

@robert-s-lee @rolandcrosby
Our DELIMITED mode is lacking compared to MySql and this new feature should virtually eliminate csv import problems.

spaskob pushed a commit to spaskob/cockroach that referenced this issue Sep 23, 2019
… fields

Previously DELIMITED import mode will complain about importing field
`"This a nested single quote " inside quoted field"`
when importing using option `WITH fields_enclosed_by '"'`.
MySql implementation handles this case correctly.
For more info and examples see cockroachdb#40959.

Touches cockroachdb#40374.
Fixes cockroachdb#40959.

Release note (enterprise change): Fixes incorrect behavior for enclosed fields.

Release justification: This is low risk since only business logic of
import has changed. Moreover this functionality has been requested
from a potential client.
craig bot pushed a commit that referenced this issue Sep 24, 2019
40960: importccl: DELIMITED should handle delimiters only at end of fields r=spaskob a=spaskob

Previously DELIMITED import mode will complain about importing field
`"This a nested single quote " inside quoted field"`
when importing using option `WITH fields_enclosed_by '"'`.
Mysql implementation handles this case correctly.
For more info and examples see #40959.

Touches #40374.
Fixes #40959.

Release note (enterprise change): Fixes incorrect behavior for enclosed fields.

Release justification: This is low risk since only business logic of
import has changed. Moreover this functionality has been requested
from a potential client.

Co-authored-by: spaskob <[email protected]>
@craig craig bot closed this as completed in 9f0f622 Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-sql-semantics C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant