-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support remote log level #4413
Support remote log level #4413
Conversation
the build is gonna break of course because the dependencies are missing in pillar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch in general looks good to me, I have some comments
- in collect-info.sh for 'newlog' directory, we should skip the 'devUpload' directory, since those not yet uploaded files will be the duplicates for the keep directory
- similar in the edgeview in walkLogDirs() function, we need to skip the 'devUpload' directory to avoid duplicates
- in loguploader.go, there is 'failSendDir' and logic to move the failed to send to controller log files and keep them there. With this change, i don't think it is needed anymore, since everything will be in the keep directory. this can simplify some logic
- when doing kibana search, i do search for 'Alpine' string for locate when the device has rebooted, now we may not have this. maybe to always log at least the first 5 device log entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the description in https://github.com/lf-edge/eve/blob/master/docs/LOGGING.md with the intended new state of things?
580384e
to
c894bec
Compare
@eriknordmark I made the changes to the documentation, please have a look. I'm not sure if it's an overkill with the |
c894bec
to
5b2cfcf
Compare
5b2cfcf
to
8e0fafa
Compare
@naiming-zededa
but I would like to address the following in the next PR to not bloat the scope of this one.
|
@eriknordmark I think the architecture should be right from our discussions. Please give this PR another look now that I adapted the implementation accordingly. |
Btw before merging this I'll need to change the EVE API and pillar first, so that I can update the dependencies here. Please let me know if you are satisfied with the changes, then I'll prepare the respective PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks fine to me so please proceeed generating the other PRs/commits.
@naiming-zededa can you take a look at the newlogd code changes?
pkg/edgeview/src/edgeview_test.go
Outdated
"testing" | ||
"time" | ||
|
||
"github.com/stretchr/testify/assert" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want testify? Can't we use gomega for tests like we do in other golang tests in EVE repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3435 (comment) ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's because I know how to use testify
and I don't know how to use gomega
😄
but if you want I'll learn how to use gomega
and switch to it. is it better than testify
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brining go-chi was useful, because they have route parameter parser built-in, so we didn't have to build our own. The only reason why we left parts of Gorilla then was because we used ws communication (happy to restart the conversation of refactoring it). Rest was completely replaced by go-chi. Basically I'm asking the same question here: what's the benefit of testify over gomega and why do we want to have two testing frameworks in the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/gorilla/mux/blob/main/README.md?plain=1#L303 - isn't that the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/gorilla/mux/blob/main/README.md?plain=1#L303 - isn't that the same?
We didn't go for gorilla-mux, because at the time it didn't have maintainers, now they do have them. We could have just used standard http mux from golang, but then we would have to write our own route parameter parser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uncleDecart rewrote the tests with gomega
- it's not much of a difference, so it makes sense to me to stick with one testing framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw
grep -r "github.com/stretchr/testify/assert" pkg/ | grep "_test.go" | wc -l
26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for change! Missed the fact that you bring testify to edge_view, not pillar, didn't know we were using testify, I think it's a good raise on LF-Edge call
8e0fafa
to
846b26b
Compare
d622fc0
to
aca592a
Compare
@eriknordmark could you re-run Eden tests? I setup the workflow to use the newest Eden release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-run eden tests
Eden tests won’t be executed as the changes are in WF. Irrespective of that, GHA needs the workflow/changes to be already part of the repo. Perhaps, it would be better to first point the eden to use the latest version and then rebase this branch to test the new WF. @europaul If it helps, I can create a new PR to update the eden test. |
Thank you @yash-zededa! I forgot about that. I put this commit in #4464 now. |
aca592a
to
48dcf52
Compare
@yash-zededa could you please also kick-off Eden tests again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run eden tests.
Once the PR build is completed, I will trigger the eden tests |
It would be very nice to backport this to 13.4-stable so that it will be in that LTS release, since it gives us a much better tool to limit both network and controller load due to lots of log noise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit to resolve conflicts, but that might need to be squashed away before merging.
4fe28ee
to
32b615f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kick off eden tests
As opposed to the default log level, which was setting which logs are produced by EVE's microservices, the remote log level will set which device logs are uploaded to the cloud. So it's assumed that the remote log level is always higher than the default log level. There are no changes in how newlogd collects the logs, only in how it handles the log files. For the logs that are uploaded: - we create a separate file with prefix dev.log.upload in collect - that file is gzipped when it reaches a certain size or by timer - the same as before - once gzipped the file is moved to devUpload - the same as before - once the file is uploaded successfully, it's deleted instead of being moved to keepSentQueue For the logs that stay on the device: - we create a separate file with prefix dev.log.keep in collect - that file is gzipped when it reaches a certain size - no timer - once gzipped the file is moved to keepSentQueue - the name of the folder is preserved The commit also adds log levels "all" and "none" as well as remote log levels for kernel and syslog logs: as the names suggest, "all" will log everything and "none" will suppress all logs. The levels can be set in the global configuration for - debug.default.loglevel - debug.default.remote.loglevel - agent.*agentname*.debug.loglevel - agent.*agentname*.debug.remote.loglevel - debug.syslog.loglevel - debug.syslog.remote.loglevel - debug.kernel.loglevel - debug.kernel.remote.loglevel Signed-off-by: Paul Gaiduk <[email protected]>
Add new metrics for newlogd: - OldestSavedLog - the timestamp of the oldest log available on the device - TotalSizeLogs - the total size of all logs on the device Signed-off-by: Paul Gaiduk <[email protected]>
In case the total size of gzipped logs exceeds the limit they should be removed in the following order by directory: - `keepSentQueue` - `failedUpload` - `devUpload` - `appUpload` Signed-off-by: Paul Gaiduk <[email protected]>
Since keepSentQueue contains the same or more logs than devUpload, it's enough to only use those logs when searching with edge-view. Signed-off-by: Paul Gaiduk <[email protected]>
Add a chapter to LOGGING.md explaining how different log levels are handled in the system. Updated CONFIG-PROPERTIES.md to clarify remote log level settings and provide full list of log level parameters Signed-off-by: Paul Gaiduk <[email protected]>
Currently memlogd has set a max-line-len of 8192. If the log line is longer than this, it will be truncated. Since we use structured logs for EVE microservices, truncating logs will result in malformed JSON, which is not parseable by newlogd. This commit removes logs that have a potential to be longer than that limit, because they log whole configs that tend to get very long. In future developers should be aware of this limit and avoid logging big chunks of data. Signed-off-by: Paul Gaiduk <[email protected]>
32b615f
to
9ed9171
Compare
Rebased to include Eden 1.0.1 with fixed tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run again
This PR is a work in progress, but I appreciate if somebody could take a look at the code and the general approach and give me feedback.
Please see the commit messages (especially the second commit) for the overview of the changes.
Before merging this I need to update the pillar first to include the dependency in newlogd. I also need to update eve-api to include the new metrics.
I'm also planning to add more metrics as well as proper tests for newlogd to at least ensure that there is no regression.