-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support handle invalid logs #19
Support handle invalid logs #19
Conversation
ba6e2e8
to
dab3e40
Compare
f82c3b0
to
e1084ad
Compare
@breeswish @crazycs520 @lonng PTAL, thanks! |
search_log.go
Outdated
@@ -136,20 +175,22 @@ func readLine(reader *bufio.Reader) (string, error) { | |||
return string(line), nil | |||
} | |||
|
|||
func readLastLine(file *os.File) string { | |||
// Read a line from the end of a file. | |||
// TODO: read byte by byte is low efficiency, we can improve it, for example, read 1024 bytes one time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refine by this way? (not sure whether it has some pitfalls.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can add a todo here.
search_log.go
Outdated
stat, _ := file.Stat() | ||
filesize := stat.Size() | ||
for { | ||
cursor -= 1 | ||
cursor-- | ||
file.Seek(cursor, io.SeekEnd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the log has appended some new logs after the file is opened when calling file.seek(0, io.SeekEnd)
, does it work same as file.seek(filesize, io.SeekStart)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, I think we should add some tests for these scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will do some tests to check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lonng , add a test method TestReadAndAppendLogFile
to test this case, it seems we will read the unexcepted result from the end when new contents is appending to the file. But we can try to resolve it in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will continue to fix it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it by file.Seek(cursor, io.SeekStart)
.
Maybe we can set a threshold for this behaviour, i.e. maximum look ahead xxx bytes, in case of the whole big file is invalid logs and causing performance degrade
Maybe we can keep the same log time, log level as the previous one |
There is a
ok, and updated it. |
search_log.go
Outdated
stat, _ := file.Stat() | ||
filesize := stat.Size() | ||
fmt.Println("current filesize:", filesize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove it later
search_log_test.go
Outdated
fmt.Println("final content:\n", string(content)) | ||
} | ||
|
||
// result: (may be different every time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove later
search_log_test.go
Outdated
|
||
stat, _ := file.Stat() | ||
filesize := stat.Size() | ||
fmt.Println("initial filesize: ", filesize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this?
search_log_test.go
Outdated
line := fmt.Sprintf("[2020/07/07 06:%d:17.011 -04:00] [INFO] appended logs\n", i) | ||
_, err := file_append.WriteString(line) | ||
c.Assert(err, IsNil, Commentf("write %s failed", line)) | ||
fmt.Print("write line: ", line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto?
@@ -515,3 +534,102 @@ func (s *searchLogSuite) TestParseLogItem(c *C) { | |||
c.Assert(item.Message, Equals, cas.message) | |||
} | |||
} | |||
|
|||
func (s *searchLogSuite) TestReadAndAppendLogFile(c *C) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test output is not expected. I think we need to re-implement the readLineReverse
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I decided to continue to re-implement it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
- read from file io.SeekEnd will read incorrect content when other threads append logs
|
||
cursor-- | ||
file.Seek(cursor, io.SeekStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace file.Seek(cursor, io.SeekEnd)
by file.Seek(cursor, io.SeekStart)
, the former way may read incorrect content from the end of a file when some others append new logs to this file.
- read 512 bytes once instead of just 1 byte
Addressed all comments, polished by the following work:
@breeswish @lonng @crazycs520 PTAL, thanks! |
if cursor == -filesize { // stop if we are at the begining | ||
break | ||
|
||
var size int64 = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read 512 bytes once instead of 1 byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change very much, would you like to write a benchmark for this optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison:
Old way:
searchLogSuite.BenchmarkReadLastLine 10000 124423 ns/op
searchLogSuite.BenchmarkReadLastLine 10000 126135 ns/op
New way:
searchLogSuite.BenchmarkReadLastLines 1000000 1892 ns/op
searchLogSuite.BenchmarkReadLastLines 1000000 1920 ns/op
Conclusion:
the new implementation is 65x (126135/1920) as fast as the old way. (the result should depend on the length of the last line. In this case, the last line is 76 bytes long. If we double the last line length, the old way cost time will be doubled too, see details in PR #20 )
search_log.go
Outdated
break | ||
} | ||
endCursor -= int64(len(lines)) | ||
linesSlice := strings.FieldsFunc(lines, func(c rune) bool { return c == '\n' || c == '\r' }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there will be problems when line ending is \r\n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I thought that, let me try to polish it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polish it by strings.Split(strings.ReplaceAll(finalStr, "\r\n", "\n"), "\n")
and put it inside the readLastLines
method.
cursor -= size | ||
|
||
file.Seek(cursor, io.SeekStart) | ||
chars := make([]byte, size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move this line out of the loop and reuse the buffer to improve performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! update it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but found it doesn't work well, including keep size value not change. Because file.Read(chars)
reads len(chars)
bytes content, so we need to make sure the chars is allocated by the size we need to read, instead of fixed 512 bytes.
lines = append(chars, lines...) | ||
|
||
// find first '\n' or '\r' | ||
for i := 0; i < len(chars); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to check \r
and \n
in the scanned bytes from this round, instead of previous rounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works as you said, len(chars)
is the length of content just read in this round.
Very cool! |
search_log.go
Outdated
filesize := stat.Size() | ||
// Read lines from the end of a file | ||
// endCursor initial value should be the filesize | ||
func readLastLines(file *os.File, endCursor int64) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readLastLines
look like maybe returns multiple lines, but the return parameter is string
, should be []string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to get read bytes by len(string)
in its caller method, return []string
will miss the length info, else I need to add an extra readSize
return value. Which way is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to return lines []string, size int
, it's more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Co-authored-by: Wenxuan <[email protected]>
This reverts commit 650740f.
508ad40
to
56008cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest LGTM. @crazycs520 PTAL
Co-authored-by: Wenxuan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
close #17 , close #16
What did: