Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: replace unprintable and invalid characters in errors #23387

Merged
merged 3 commits into from
Jun 1, 2022

Conversation

davidby-influx
Copy link
Contributor

Replace unprintable and invalid characters with '?'
in logged errors. Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386
@davidby-influx davidby-influx self-assigned this Jun 1, 2022
@lesam lesam self-requested a review June 1, 2022 16:52
@davidby-influx davidby-influx marked this pull request as ready for review June 1, 2022 16:55
tsdb/shard.go Outdated
b.Grow(len(s))
for _, r := range strings.ToValidUTF8(s, string(unPrintReplRune)) {
if !unicode.IsPrint(r) || r == unicode.ReplacementChar {
b.WriteRune(unPrintReplRune)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you kept the 3-byte counter here instead, you wouldn't compress valid unicode WHAT???? to WHAT???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to compress runs generated by strings.ToValidUTF8 which in the customer scenario have run lengths in the hundreds.

Copy link
Contributor Author

@davidby-influx davidby-influx Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to elide runs of '?' generated by strings.ToValidUTF8, which in the customer scenario are in the hundreds of characters range (they put thousand-character runs of invalid Unicode characters in their line protocol). The only complete solution for perfect operation I saw was to re-implement strings.ToValidUTF8 with run-length detection, and that seemed overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed with unicode.ReplacementChar and a single replacement elision loop.

@davidby-influx davidby-influx requested a review from lesam June 1, 2022 17:11
if c < unPrintMaxReplRune {
b.WriteRune(unPrintReplRune)
}
c++
} else {
b.WriteRune(r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c=0 to reset here, so each group is limited to 3 instead of 3 total?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a test for that case too I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am clearly not awake today

Copy link
Contributor

@lesam lesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

f := makePrintable(s)
require.True(t, models.ValidKeyToken(f))
c := 0
for _, r := range f {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just write the expected output for each input, instead of this loop - might be easier to do the new test requested above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a sequence counter, but I can change it if you prefer. Take a look at your convenience, no rush.

@davidby-influx davidby-influx requested a review from lesam June 1, 2022 18:52
Copy link
Contributor

@lesam lesam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer explicit tests (like assert.Equal(t, expectedValue, actualValue)) over re-writing part of the test logic in the test. But this works and I don't want to block it a second time.

@davidby-influx davidby-influx merged commit 0ae0bd6 into master-1.x Jun 1, 2022
@davidby-influx davidby-influx deleted the DSB_bad_bytes branch June 1, 2022 20:45
davidby-influx added a commit that referenced this pull request Jun 1, 2022
Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

(cherry picked from commit 0ae0bd6)
davidby-influx added a commit that referenced this pull request Jun 1, 2022
Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

(cherry picked from commit 0ae0bd6)

closes #23389
davidby-influx added a commit that referenced this pull request Jun 1, 2022
…23394)

Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

(cherry picked from commit 0ae0bd6)

closes #23391
davidby-influx added a commit that referenced this pull request Jun 1, 2022
…23395)

Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

(cherry picked from commit 0ae0bd6)

closes #23389
davidby-influx added a commit that referenced this pull request Jun 8, 2022
Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

(cherry picked from commit 0ae0bd6)

closes #23390
davidby-influx added a commit that referenced this pull request Jun 9, 2022
…23418)

Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes #23386

(cherry picked from commit 0ae0bd6)

closes #23390
chengshiwen pushed a commit to chengshiwen/influxdb that referenced this pull request Aug 27, 2024
…#23387)

Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes influxdata#23386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants