-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: replace unprintable and invalid characters in errors #23387
Conversation
Replace unprintable and invalid characters with '?' in logged errors. Truncate consecutive runs of them to only 3 repeats of '?' closes #23386
tsdb/shard.go
Outdated
b.Grow(len(s)) | ||
for _, r := range strings.ToValidUTF8(s, string(unPrintReplRune)) { | ||
if !unicode.IsPrint(r) || r == unicode.ReplacementChar { | ||
b.WriteRune(unPrintReplRune) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you kept the 3-byte counter here instead, you wouldn't compress valid unicode WHAT????
to WHAT???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to compress runs generated by strings.ToValidUTF8
which in the customer scenario have run lengths in the hundreds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to elide runs of '?' generated by strings.ToValidUTF8
, which in the customer scenario are in the hundreds of characters range (they put thousand-character runs of invalid Unicode characters in their line protocol). The only complete solution for perfect operation I saw was to re-implement strings.ToValidUTF8
with run-length detection, and that seemed overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now fixed with unicode.ReplacementChar
and a single replacement elision loop.
if c < unPrintMaxReplRune { | ||
b.WriteRune(unPrintReplRune) | ||
} | ||
c++ | ||
} else { | ||
b.WriteRune(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c=0
to reset here, so each group is limited to 3 instead of 3 total?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have a test for that case too I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am clearly not awake today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment
f := makePrintable(s) | ||
require.True(t, models.ValidKeyToken(f)) | ||
c := 0 | ||
for _, r := range f { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just write the expected output for each input, instead of this loop - might be easier to do the new test requested above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a sequence counter, but I can change it if you prefer. Take a look at your convenience, no rush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer explicit tests (like assert.Equal(t, expectedValue, actualValue)
) over re-writing part of the test logic in the test. But this works and I don't want to block it a second time.
…#23387) Replace unprintable and invalid characters with '?' in logged errors. Truncate consecutive runs of them to only 3 repeats of '?' closes influxdata#23386
Replace unprintable and invalid characters with '?'
in logged errors. Truncate consecutive runs of them to
only 3 repeats of '?'
closes #23386