Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote log level #4413

Merged
merged 6 commits into from
Dec 9, 2024

Conversation

europaul
Copy link
Contributor

This PR is a work in progress, but I appreciate if somebody could take a look at the code and the general approach and give me feedback.

Please see the commit messages (especially the second commit) for the overview of the changes.

Before merging this I need to update the pillar first to include the dependency in newlogd. I also need to update eve-api to include the new metrics.

I'm also planning to add more metrics as well as proper tests for newlogd to at least ensure that there is no regression.

@europaul
Copy link
Contributor Author

the build is gonna break of course because the dependencies are missing in pillar

Copy link
Contributor

@naiming-zededa naiming-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch in general looks good to me, I have some comments

  • in collect-info.sh for 'newlog' directory, we should skip the 'devUpload' directory, since those not yet uploaded files will be the duplicates for the keep directory
  • similar in the edgeview in walkLogDirs() function, we need to skip the 'devUpload' directory to avoid duplicates
  • in loguploader.go, there is 'failSendDir' and logic to move the failed to send to controller log files and keep them there. With this change, i don't think it is needed anymore, since everything will be in the keep directory. this can simplify some logic
  • when doing kibana search, i do search for 'Alpine' string for locate when the device has rebooted, now we may not have this. maybe to always log at least the first 5 device log entries?

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the description in https://github.com/lf-edge/eve/blob/master/docs/LOGGING.md with the intended new state of things?

@europaul
Copy link
Contributor Author

europaul commented Nov 6, 2024

Can you update the description in https://github.com/lf-edge/eve/blob/master/docs/LOGGING.md with the intended new state of things?

@eriknordmark I made the changes to the documentation, please have a look. I'm not sure if it's an overkill with the debug.local.loglevel and is a feature that nobody wants that only makes the system harder to understand?

@europaul europaul force-pushed the support-remote-log-level branch from c894bec to 5b2cfcf Compare November 18, 2024 10:31
@europaul europaul force-pushed the support-remote-log-level branch from 5b2cfcf to 8e0fafa Compare November 25, 2024 21:36
@europaul europaul marked this pull request as ready for review November 25, 2024 21:37
@europaul
Copy link
Contributor Author

@naiming-zededa
Thank you very much for the detailed comments ❤️
I addressed this comment

* similar in the edgeview in walkLogDirs() function, we need to skip the 'devUpload' directory to avoid duplicates

but I would like to address the following in the next PR to not bloat the scope of this one.

* in collect-info.sh for 'newlog' directory, we should skip the 'devUpload' directory, since those not yet uploaded files will be the duplicates for the keep directory

* in loguploader.go, there is 'failSendDir' and logic to move the failed to send to controller log files and keep them there. With this change, i don't think it is needed anymore, since everything will be in the keep directory. this can simplify some logic

* when doing kibana search, i do search for 'Alpine' string for locate when the device has rebooted, now we may not have this. maybe to always log at least the first 5 device log entries?

@europaul
Copy link
Contributor Author

@eriknordmark I think the architecture should be right from our discussions. Please give this PR another look now that I adapted the implementation accordingly.
I would like to include proper testing in another PR, as this will require big structural changes to newlog, which would make this PR harder to follow.

@europaul
Copy link
Contributor Author

Btw before merging this I'll need to change the EVE API and pillar first, so that I can update the dependencies here. Please let me know if you are satisfied with the changes, then I'll prepare the respective PRs.

docs/LOGGING.md Outdated Show resolved Hide resolved
docs/LOGGING.md Outdated Show resolved Hide resolved
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks fine to me so please proceeed generating the other PRs/commits.

@naiming-zededa can you take a look at the newlogd code changes?

"testing"
"time"

"github.com/stretchr/testify/assert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want testify? Can't we use gomega for tests like we do in other golang tests in EVE repository?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because I know how to use testify and I don't know how to use gomega 😄
but if you want I'll learn how to use gomega and switch to it. is it better than testify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brining go-chi was useful, because they have route parameter parser built-in, so we didn't have to build our own. The only reason why we left parts of Gorilla then was because we used ws communication (happy to restart the conversation of refactoring it). Rest was completely replaced by go-chi. Basically I'm asking the same question here: what's the benefit of testify over gomega and why do we want to have two testing frameworks in the repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/gorilla/mux/blob/main/README.md?plain=1#L303 - isn't that the same?

We didn't go for gorilla-mux, because at the time it didn't have maintainers, now they do have them. We could have just used standard http mux from golang, but then we would have to write our own route parameter parser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@europaul europaul Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uncleDecart rewrote the tests with gomega - it's not much of a difference, so it makes sense to me to stick with one testing framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw

grep -r "github.com/stretchr/testify/assert" pkg/ | grep "_test.go" | wc -l
26

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for change! Missed the fact that you bring testify to edge_view, not pillar, didn't know we were using testify, I think it's a good raise on LF-Edge call

@europaul europaul force-pushed the support-remote-log-level branch from 8e0fafa to 846b26b Compare November 26, 2024 15:06
@europaul
Copy link
Contributor Author

europaul commented Dec 6, 2024

@eriknordmark could you re-run Eden tests? I setup the workflow to use the newest Eden release

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-run eden tests

@yash-zededa
Copy link
Collaborator

yash-zededa commented Dec 6, 2024

Re-run eden tests

Eden tests won’t be executed as the changes are in WF. Irrespective of that, GHA needs the workflow/changes to be already part of the repo.

Perhaps, it would be better to first point the eden to use the latest version and then rebase this branch to test the new WF.

@europaul If it helps, I can create a new PR to update the eden test.

@europaul
Copy link
Contributor Author

europaul commented Dec 6, 2024

Thank you @yash-zededa! I forgot about that. I put this commit in #4464 now.

@europaul europaul force-pushed the support-remote-log-level branch from aca592a to 48dcf52 Compare December 6, 2024 13:24
@europaul
Copy link
Contributor Author

europaul commented Dec 6, 2024

@yash-zededa could you please also kick-off Eden tests again?

Copy link
Collaborator

@yash-zededa yash-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run eden tests.

@github-actions github-actions bot requested a review from eriknordmark December 6, 2024 13:44
@yash-zededa
Copy link
Collaborator

@yash-zededa could you please also kick-off Eden tests again?

Once the PR build is completed, I will trigger the eden tests

@eriknordmark eriknordmark added the stable Should be backported to stable release(s) label Dec 6, 2024
@eriknordmark
Copy link
Contributor

It would be very nice to backport this to 13.4-stable so that it will be in that LTS release, since it gives us a much better tool to limit both network and controller load due to lots of log noise.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a commit to resolve conflicts, but that might need to be squashed away before merging.

@github-actions github-actions bot requested a review from eriknordmark December 7, 2024 20:53
@europaul europaul force-pushed the support-remote-log-level branch from 4fe28ee to 32b615f Compare December 9, 2024 10:19
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kick off eden tests

As opposed to the default log level, which was setting which logs are
produced by EVE's microservices, the remote log level will set which
device logs are uploaded to the cloud. So it's assumed that the remote
log level is always higher than the default log level.

There are no changes in how newlogd collects the logs, only in how it
handles the log files.
For the logs that are uploaded:
- we create a separate file with prefix dev.log.upload in collect
- that file is gzipped when it reaches a certain size or by timer - the
  same as before
- once gzipped the file is moved to devUpload - the same as before
- once the file is uploaded successfully, it's deleted instead of being
  moved to keepSentQueue

For the logs that stay on the device:
- we create a separate file with prefix dev.log.keep in collect
- that file is gzipped when it reaches a certain size - no timer
- once gzipped the file is moved to keepSentQueue - the name of the
  folder is preserved

The commit also adds log levels "all" and "none" as well as remote log
levels for kernel and syslog logs:
as the names suggest, "all" will log everything and "none" will suppress
all logs. The levels can be set in the global configuration for
- debug.default.loglevel
- debug.default.remote.loglevel
- agent.*agentname*.debug.loglevel
- agent.*agentname*.debug.remote.loglevel
- debug.syslog.loglevel
- debug.syslog.remote.loglevel
- debug.kernel.loglevel
- debug.kernel.remote.loglevel

Signed-off-by: Paul Gaiduk <[email protected]>
Add new metrics for newlogd:
- OldestSavedLog - the timestamp of the oldest log available on the
    device
- TotalSizeLogs - the total size of all logs on the device

Signed-off-by: Paul Gaiduk <[email protected]>
In case the total size of gzipped logs exceeds the limit
they should be removed in the following order by directory:
- `keepSentQueue`
- `failedUpload`
- `devUpload`
- `appUpload`

Signed-off-by: Paul Gaiduk <[email protected]>
Since keepSentQueue contains the same or more logs than
devUpload, it's enough to only use those logs when
searching with edge-view.

Signed-off-by: Paul Gaiduk <[email protected]>
Add a chapter to LOGGING.md explaining how different log levels are
handled in the system.

Updated CONFIG-PROPERTIES.md to clarify remote log level settings and
provide full list of log level parameters

Signed-off-by: Paul Gaiduk <[email protected]>
Currently memlogd has set a max-line-len of 8192. If the log line is
longer than this, it will be truncated. Since we use structured logs
for EVE microservices, truncating logs will result in malformed JSON,
which is not parseable by newlogd.

This commit removes logs that have a potential to be longer than that
limit, because they log whole configs that tend to get very long.

In future developers should be aware of this limit and avoid logging
big chunks of data.

Signed-off-by: Paul Gaiduk <[email protected]>
@europaul europaul force-pushed the support-remote-log-level branch from 32b615f to 9ed9171 Compare December 9, 2024 18:10
@europaul
Copy link
Contributor Author

europaul commented Dec 9, 2024

Rebased to include Eden 1.0.1 with fixed tests.

@github-actions github-actions bot requested a review from eriknordmark December 9, 2024 18:30
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run again

@eriknordmark eriknordmark merged commit e872c81 into lf-edge:master Dec 9, 2024
40 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stable Should be backported to stable release(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants