Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: DELIMITED should handle delimiters only at end of fields #40960

Merged
merged 1 commit into from
Sep 24, 2019

Conversation

spaskob
Copy link
Contributor

@spaskob spaskob commented Sep 21, 2019

Previously DELIMITED import mode will complain about importing field
"This a nested single quote " inside quoted field"
when importing using option WITH fields_enclosed_by '"'.
Mysql implementation handles this case correctly.
For more info and examples see #40959.

Touches #40374.
Fixes #40959.

Release note (enterprise change): Fixes incorrect behavior for enclosed fields.

Release justification: This is low risk since only business logic of
import has changed. Moreover this functionality has been requested
from a potential client.

@spaskob spaskob requested a review from dt September 21, 2019 16:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@spaskob spaskob force-pushed the improve-delimited-quote-handling branch 3 times, most recently from 85cccf6 to 2940d75 Compare September 23, 2019 00:34
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doc'ing DELIMITED ?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, @rolandcrosby, and @spaskob)


pkg/ccl/importccl/read_import_mysqlout.go, line 67 at r1 (raw file):

}

func (d *mysqloutfileReader) readFile(

nit (and doesn't need to go in this PR): we should rename this struct/file/etc to delimited and expunge "mysqlout" from the codebase.


pkg/ccl/importccl/read_import_mysqlout.go, line 75 at r1 (raw file):

	var row []tree.Datum
	// the current field being read.

this comment could help explain why it is a []string


pkg/ccl/importccl/read_import_mysqlout.go, line 88 at r1 (raw file):

	var readingField bool
	// If we have just encountered a potential encloser symbol.
	// That means if a end of filed or line is next we should honor it.

nit: s/filed/field/


pkg/ccl/importccl/read_import_mysqlout.go, line 95 at r1 (raw file):

	reader := bufio.NewReaderSize(input, 1024*64)
	addField := func() error {
		thefield := strings.Join(field, "")

nit: don't love the name / doesn't really tell me what it and "field" are. suggestion: field -> fieldParts, thefield -> field.


pkg/ccl/importccl/read_import_mysqlout.go, line 238 at r1 (raw file):

				} else {
					// Unexpected since we did not
					// see one at start of field.

any reason not to wrap at the usual 50?


pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):

		}

		field = append(field, string(c))

will an o(n) slice of 1-byte strings get expensive? how hard would it be to append to field[-1]?

@rolandcrosby
Copy link

We are doc'ing DELIMITED now, yes. It solves some customer problems, is not tied to the little-known CSV RFC, and is more general than its old name implied. @spaskob - LGTM once you get through David's suggestions.

@spaskob spaskob force-pushed the improve-delimited-quote-handling branch from 2940d75 to 1b944df Compare September 23, 2019 20:27
Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, and @rolandcrosby)


pkg/ccl/importccl/read_import_mysqlout.go, line 67 at r1 (raw file):

Previously, dt (David Taylor) wrote…

nit (and doesn't need to go in this PR): we should rename this struct/file/etc to delimited and expunge "mysqlout" from the codebase.

good point, should have done it in the original DELIMITED PR. I will do it in a follow-up PR.


pkg/ccl/importccl/read_import_mysqlout.go, line 75 at r1 (raw file):

Previously, dt (David Taylor) wrote…

this comment could help explain why it is a []string

done


pkg/ccl/importccl/read_import_mysqlout.go, line 88 at r1 (raw file):

Previously, dt (David Taylor) wrote…

nit: s/filed/field/

done


pkg/ccl/importccl/read_import_mysqlout.go, line 95 at r1 (raw file):

Previously, dt (David Taylor) wrote…

nit: don't love the name / doesn't really tell me what it and "field" are. suggestion: field -> fieldParts, thefield -> field.

thanks, that was WIP name and forgot to change it


pkg/ccl/importccl/read_import_mysqlout.go, line 238 at r1 (raw file):

Previously, dt (David Taylor) wrote…

any reason not to wrap at the usual 50?

Do you mean the comment? Lmk if this what you expected.


pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):

Previously, dt (David Taylor) wrote…

will an o(n) slice of 1-byte strings get expensive? how hard would it be to append to field[-1]?

In could be in come corner case but usually this is limited by the line length which should something reasonable. It's harder to reason about which is the last symbol in the presence of unicode symbols. Alternatively, we can have field be []rune aka []int32 and convert to string only in the end when we join the parts. Lmk if you like this approach.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, @rolandcrosby, and @spaskob)


pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

In could be in come corner case but usually this is limited by the line length which should something reasonable. It's harder to reason about which is the last symbol in the presence of unicode symbols. Alternatively, we can have field be []rune aka []int32 and convert to string only in the end when we join the parts. Lmk if you like this approach.

if []rune would work just as well, that'd certainly be my preference.

@spaskob spaskob force-pushed the improve-delimited-quote-handling branch from 1b944df to a62c037 Compare September 24, 2019 15:58
Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, and @rolandcrosby)


pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):

Previously, dt (David Taylor) wrote…

if []rune would work just as well, that'd certainly be my preference.

yes that worked really nicely, turns out one can join a slice of runes by calling string(runes)

… fields

Previously DELIMITED import mode will complain about importing field
`"This a nested single quote " inside quoted field"`
when importing using option `WITH fields_enclosed_by '"'`.
MySql implementation handles this case correctly.
For more info and examples see cockroachdb#40959.

Touches cockroachdb#40374.
Fixes cockroachdb#40959.

Release note (enterprise change): Fixes incorrect behavior for enclosed fields.

Release justification: This is low risk since only business logic of
import has changed. Moreover this functionality has been requested
from a potential client.
@spaskob spaskob force-pushed the improve-delimited-quote-handling branch from a62c037 to 9f0f622 Compare September 24, 2019 16:33
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @robert-s-lee and @rolandcrosby)

@spaskob
Copy link
Contributor Author

spaskob commented Sep 24, 2019

bors r+

craig bot pushed a commit that referenced this pull request Sep 24, 2019
40960: importccl: DELIMITED should handle delimiters only at end of fields r=spaskob a=spaskob

Previously DELIMITED import mode will complain about importing field
`"This a nested single quote " inside quoted field"`
when importing using option `WITH fields_enclosed_by '"'`.
Mysql implementation handles this case correctly.
For more info and examples see #40959.

Touches #40374.
Fixes #40959.

Release note (enterprise change): Fixes incorrect behavior for enclosed fields.

Release justification: This is low risk since only business logic of
import has changed. Moreover this functionality has been requested
from a potential client.

Co-authored-by: spaskob <[email protected]>
@craig
Copy link
Contributor

craig bot commented Sep 24, 2019

Build succeeded

@craig craig bot merged commit 9f0f622 into cockroachdb:master Sep 24, 2019
@spaskob spaskob deleted the improve-delimited-quote-handling branch September 27, 2019 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

importccl: DELIMITED should match more closely the functionality of mysql's LOAD INTO
4 participants