-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: DELIMITED should handle delimiters only at end of fields #40960
importccl: DELIMITED should handle delimiters only at end of fields #40960
Conversation
85cccf6
to
2940d75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we doc'ing DELIMITED
?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, @rolandcrosby, and @spaskob)
pkg/ccl/importccl/read_import_mysqlout.go, line 67 at r1 (raw file):
} func (d *mysqloutfileReader) readFile(
nit (and doesn't need to go in this PR): we should rename this struct/file/etc to delimited and expunge "mysqlout" from the codebase.
pkg/ccl/importccl/read_import_mysqlout.go, line 75 at r1 (raw file):
var row []tree.Datum // the current field being read.
this comment could help explain why it is a []string
pkg/ccl/importccl/read_import_mysqlout.go, line 88 at r1 (raw file):
var readingField bool // If we have just encountered a potential encloser symbol. // That means if a end of filed or line is next we should honor it.
nit: s/filed/field/
pkg/ccl/importccl/read_import_mysqlout.go, line 95 at r1 (raw file):
reader := bufio.NewReaderSize(input, 1024*64) addField := func() error { thefield := strings.Join(field, "")
nit: don't love the name / doesn't really tell me what it and "field" are. suggestion: field -> fieldParts, thefield -> field.
pkg/ccl/importccl/read_import_mysqlout.go, line 238 at r1 (raw file):
} else { // Unexpected since we did not // see one at start of field.
any reason not to wrap at the usual 50?
pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):
} field = append(field, string(c))
will an o(n) slice of 1-byte strings get expensive? how hard would it be to append to field[-1]?
We are doc'ing DELIMITED now, yes. It solves some customer problems, is not tied to the little-known CSV RFC, and is more general than its old name implied. @spaskob - LGTM once you get through David's suggestions. |
2940d75
to
1b944df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, and @rolandcrosby)
pkg/ccl/importccl/read_import_mysqlout.go, line 67 at r1 (raw file):
Previously, dt (David Taylor) wrote…
nit (and doesn't need to go in this PR): we should rename this struct/file/etc to delimited and expunge "mysqlout" from the codebase.
good point, should have done it in the original DELIMITED PR. I will do it in a follow-up PR.
pkg/ccl/importccl/read_import_mysqlout.go, line 75 at r1 (raw file):
Previously, dt (David Taylor) wrote…
this comment could help explain why it is a []string
done
pkg/ccl/importccl/read_import_mysqlout.go, line 88 at r1 (raw file):
Previously, dt (David Taylor) wrote…
nit: s/filed/field/
done
pkg/ccl/importccl/read_import_mysqlout.go, line 95 at r1 (raw file):
Previously, dt (David Taylor) wrote…
nit: don't love the name / doesn't really tell me what it and "field" are. suggestion: field -> fieldParts, thefield -> field.
thanks, that was WIP name and forgot to change it
pkg/ccl/importccl/read_import_mysqlout.go, line 238 at r1 (raw file):
Previously, dt (David Taylor) wrote…
any reason not to wrap at the usual 50?
Do you mean the comment? Lmk if this what you expected.
pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):
Previously, dt (David Taylor) wrote…
will an o(n) slice of 1-byte strings get expensive? how hard would it be to append to field[-1]?
In could be in come corner case but usually this is limited by the line length which should something reasonable. It's harder to reason about which is the last symbol in the presence of unicode symbols. Alternatively, we can have field be []rune aka []int32 and convert to string only in the end when we join the parts. Lmk if you like this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, @rolandcrosby, and @spaskob)
pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
In could be in come corner case but usually this is limited by the line length which should something reasonable. It's harder to reason about which is the last symbol in the presence of unicode symbols. Alternatively, we can have field be []rune aka []int32 and convert to string only in the end when we join the parts. Lmk if you like this approach.
if []rune
would work just as well, that'd certainly be my preference.
1b944df
to
a62c037
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @robert-s-lee, and @rolandcrosby)
pkg/ccl/importccl/read_import_mysqlout.go, line 272 at r1 (raw file):
Previously, dt (David Taylor) wrote…
if
[]rune
would work just as well, that'd certainly be my preference.
yes that worked really nicely, turns out one can join a slice of runes by calling string(runes)
… fields Previously DELIMITED import mode will complain about importing field `"This a nested single quote " inside quoted field"` when importing using option `WITH fields_enclosed_by '"'`. MySql implementation handles this case correctly. For more info and examples see cockroachdb#40959. Touches cockroachdb#40374. Fixes cockroachdb#40959. Release note (enterprise change): Fixes incorrect behavior for enclosed fields. Release justification: This is low risk since only business logic of import has changed. Moreover this functionality has been requested from a potential client.
a62c037
to
9f0f622
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @robert-s-lee and @rolandcrosby)
bors r+ |
40960: importccl: DELIMITED should handle delimiters only at end of fields r=spaskob a=spaskob Previously DELIMITED import mode will complain about importing field `"This a nested single quote " inside quoted field"` when importing using option `WITH fields_enclosed_by '"'`. Mysql implementation handles this case correctly. For more info and examples see #40959. Touches #40374. Fixes #40959. Release note (enterprise change): Fixes incorrect behavior for enclosed fields. Release justification: This is low risk since only business logic of import has changed. Moreover this functionality has been requested from a potential client. Co-authored-by: spaskob <[email protected]>
Build succeeded |
Previously DELIMITED import mode will complain about importing field
"This a nested single quote " inside quoted field"
when importing using option
WITH fields_enclosed_by '"'
.Mysql implementation handles this case correctly.
For more info and examples see #40959.
Touches #40374.
Fixes #40959.
Release note (enterprise change): Fixes incorrect behavior for enclosed fields.
Release justification: This is low risk since only business logic of
import has changed. Moreover this functionality has been requested
from a potential client.