Propose log tracking KEP #1961

hase1128 · 2020-09-01T14:12:06Z

I created a new KEP instead of Request-ID KEP.
The design of the new KEP was discussed in the slack thread.

k8s-ci-robot · 2020-09-01T14:12:14Z

Hi @hase1128. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hase1128 · 2020-09-01T14:17:33Z

@dashpole @serathius

This KEP isn't finished yet (most of it is a copy of the content of discussion in Slack), but I'll tell you because I created the KEP for now.
I will be vacation from 9/2 to 9/7, so I will add and modify missing parts of this KEP after 9/8.

hase1128 · 2020-09-01T14:26:46Z

/assign @dashpole

ehashman · 2020-09-03T22:45:09Z

@hase1128 I believe this KEP is based on an older template that is missing a number of questions for production-readiness that need to be answered. Can you please rebase this based on the new template?

ehashman · 2020-09-03T22:45:30Z

/ok-to-test

serathius · 2020-09-07T09:09:32Z

/cc @44past4

hase1128 · 2020-09-08T09:45:53Z

@hase1128 I believe this KEP is based on an older template that is missing a number of questions for production-readiness that need to be answered. Can you please rebase this based on the new template?

@ehashman
I see. I will rebase later.

ehashman · 2020-09-10T20:55:52Z

/retitle [WIP] Propose log tracking KEP

Please remove WIP when KEP is ready for review, thanks!

kikisdeliveryservice

Some format issues here:

the dir should be keps/sig-instrumentation/1668-log-tracking/
this file (20200901-log-tracking.md ) should be renamed README.md
a kep.yaml file which will contain milestones, status and other info about the KEP should also be included in this dir

The templates that you'd use for your KEP can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/NNNN-kep-template

Also, as a reminder to be included in a release:

The KEP must be merged in an implementable state
The KEP must have test plans
The KEP must have graduation criteria.

As an example please see: https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging

ehashman · 2021-01-04T21:42:27Z

@dashpole @serathius and related person

We update our KEP. Could you please review our KEP?
(Now, this KEP has a PRR test error, I will check the reason. If you know what to do, please let me know...)

PRR approval is now required for the 1.21 milestone. Without it, this KEP cannot merge. This is new and I don't think anyone has the correct OWNERS/directory set up, I will follow up with the release team.

We will also need at least one non-Google approver for this PR; currently all listed reviewers/approvers appear to be employed at Google. Can you please work to identify one?

hase1128 · 2021-01-05T01:43:13Z

@ehashman

thanks the comments and kindly follow up.

We will also need at least one non-Google approver for this PR; currently all listed reviewers/approvers appear to be employed at Google. Can you please work to identify one?

I would be very happy if you could be one of approver.

ehashman · 2021-01-06T18:05:41Z

I suppose it has to be either myself or @brancz :)

I sent an email about the new mandatory PRR review to k-dev, it seems to have flown under people's radars. We should be discussing at the leads meetings next week.

hase1128 · 2021-01-07T01:50:38Z

I suppose it has to be either myself or @brancz :)

Thank you. At the final approval of the KEP design, let me ask one of you for approval.

I sent an email about the new mandatory PRR review to k-dev, it seems to have flown under people's radars. We should be discussing at the leads meetings next week.

I will continue to pay attention the PRR situation.

Signed-off-by: Li Zhijian <[email protected]>

update log example

k8s-ci-robot · 2021-01-08T11:43:03Z

@hase1128: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-enhancements-verify	`34d8f47`	link	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

fenggw-fnst · 2021-01-08T11:50:44Z

We have updated the part of "3. How to log the metadata from a context.Context" and added an example. PTAL, thanks.
@pohly @dashpole

serathius · 2021-01-08T13:28:49Z

keps/sig-instrumentation/1668-log-tracking/README.md

+| ------ | ------ |
+| traceid | spans an user request. unique for user's request |
+| spanid | spans a controller action. unique for controller actions |
+| objsetid | spans the entire object lifecycle. unique for a related object set |


What do you mean as an object set here? First sentence refers to object lifecycle, but we already have a unique identifier for objects uid. What's the difference here?

one related objects may include a deploment, a repliset and some pods, these objects have the same objsetid. below is a group related objects example:

deployment: uid: D123456 annotations: - objsetid: D123456 - traceid: t1234567890 - spanid: s123456 --- repliset: uid: R123456 annotations: - objsetid: D123456 <<< same with deployment's objsectid - traceid: t0123456789 - spanid: s012345 --- pod: uid: P123456 annotations: - objsetid: D123456 <<< same with deployment's objsectid - traceid: t9012345678 - spanid: s501234

should it be a rootid ? I have never heard term object set so correct me if I'm wrong. I think we should look into other relations between objects that we also want to trace, for example PV,PVC. In that case term object set is hard to apply.

serathius · 2021-01-08T13:32:37Z

keps/sig-instrumentation/1668-log-tracking/README.md

+| spanid | spans a controller action. unique for controller actions |
+| objsetid | spans the entire object lifecycle. unique for a related object set |
+
+### Opt-in for Components


I don't think we should develop features that don't have a dedicated minimal target adoption. We should define a set of components that we plan to introduce changes so that they are useful for users.

Do you mean that we should show the components which will be changed, in this KEP doc?
Our purpose is to identify the specific logs related to an user request and object, so I think the comannds/components are kubectl apply/create/delete, deployment/repricaset/pod and so on.

serathius · 2021-01-08T13:35:49Z

keps/sig-instrumentation/1668-log-tracking/README.md

+### Opt-in for Components
+
+Updating/changing the logs is outside the scope of this KEP.  To actually add metadata to K8s component logs, the following procedures are needed in addition.
+- Open issues for each component, and discuss them with the SIGs that own that component.


We cannot push work on other sigs and expect that they will do it on base that it's important for sig-instrumentation. We should scope work to set of components that will allow it to be beneficial and finishable by our sig-instrumentation.

For our logging implementation, I think we can just open issues for each component, and discuss them with the SIGs that own that component. Do you think this plan is "push work on other sigs"?
Should we open the new logging KEP and discuss for each component?

It's generally easier to target SIGs individually, since KEPs can easily balloon (causing unnecessary bikeshedding). Insofar as practically applying @serathius' suggestion, it may make sense to separate 'core functionality', i.e. library code for tracing, from the component integration story.

serathius · 2021-01-08T13:38:41Z

keps/sig-instrumentation/1668-log-tracking/README.md

+
+Tracking logs among each Kubernetes component related to specific an user operation and objects is very tough work.
+It is necessary to match logs by basically using timestamps and object's name as hints.
+If multiple users throw many API requests at the same time, it is very difficult to track logs across each Kubernetes component log.


I understand that this is currently unsolved problem, but could you provide some info why this problem is worth solving? Why tracking user operation is important? Examples: Debugging, auditing, reproducing problems?

serathius · 2021-01-08T13:42:41Z

keps/sig-instrumentation/1668-log-tracking/README.md

+### Goals
+
+ - Implement method which propagates new logging metadata among each K8s components
+ - Store and update logging metadata into object annotation through mutating admission controller(aka Webhook)


You put some implementation details (like object annotation, mutating admission controller) in goals I understand that we would like use them to achieve goal of adding metadata, but should their usage be goal per se?

I think goal should stay on level that describes what result we are expecting from user/developer point of view.

Thanks, we will update.

serathius · 2021-01-08T13:46:12Z

keps/sig-instrumentation/1668-log-tracking/README.md

+
+### Non-Goals
+
+ - Add new logging metadata into actual K8s component logs


If goal of this KEP is introducing trace information into logs then we skip this process. If the main benefit of KEP is introducing this information then KEP should include this work. Possibly in reduced scope, but enough to provide a benefit.

For example structured logging KEP goal was not to rewrite all logs, but to provide interface and use it in enough places to provide benefit (25 log line changes was enough to improve 99.9% of log output). Maybe same approach would work here?

As you said, we are now considering the separation to tracing and logging. If we separate, I think "Add new logging metadata into actual K8s component logs" is non-goal. In the future, we want to realize this work.

Thank you for your suggestion about the method how you did in structured logging. Maybe it helps our work.

serathius · 2021-01-08T13:47:29Z

keps/sig-instrumentation/1668-log-tracking/README.md

+
+ - Add new logging metadata into actual K8s component logs
+   - This task will be done by opening  issues after completing this KEP
+ - To centrally manage the logs of each Kubernetes component with objsetid(This can be realized with existing OSS such as Kibana, so no need to implement into Kubernetes components).


Don't understand this point. What do you mean by manage the logs of each component ?

This means that we will not implement the feature(log analysis/query/visualization) which is realized with existing OSS such as ELK. We had better to write more details.

serathius · 2021-01-08T13:57:10Z

keps/sig-instrumentation/1668-log-tracking/README.md

+ - This proposal does not add additional telemetry to any components, just context-based metadata to existing logs. This KEP doesn't require running any additional OpenTelemetry components (such as the OpenTelemetry collector, which the  [API Server Tracing](https://github.com/kubernetes/enhancements/issues/647) KEP uses)
+
+## Background
+It's Known that all requests to K8s will reach API Server first, in order to propagate these metadata to mutating admission controller, we require API Server to support propogating these metadata as well. Fortunately there is already a KEP [API Server Tracing](https://github.com/kubernetes/enhancements/issues/647)  to do this.


Is this up to date with latest changes to API Server Tracing KEP?

This KEP covers passing request trace information, but from your description you still assume object tracing? Are we should that this is compatible? I expect opentelemetry to utilize HTTP request headers to pass request trace information this approach is no K8s object aware.

@dashpole to confirm

API Server tracing will propagate trace context from incoming requests to outgoing requests, including to etcd, re-entrant apiserver calls, and to webhooks. This section isn't saying that the API server will propagate context to objects, just that the API server will propagate context to a webhook.

Sure, I was just confused as Summary section states This KEP proposes a method for adding new three unique logging metadata into annotation of objects and propagating them across components which is not compatible with with what's described here.

serathius · 2021-01-08T14:03:21Z

keps/sig-instrumentation/1668-log-tracking/README.md

+In this case, the volume generation on the storage side is OK, and there is a possibility that the mount process to the container in the pod is NG.
+In order to identify the cause, it is necessary to look for the problem area while checking the component (kubelet) log as well as the system side syslog and mount related settings.
+
+This log tracking feature is useful to identify the logs related to specific user operation  and cluster processing, and can reduce investigation cost in such cases.


This story described need to identify all logs related to a PV in multiple kubernetes components. Is object tracing needed in such case? I think having a annotation/label referencing PV name be enough to easily find Kubelet logs referring to PV. Wouldn't it? Adding trace information would allow us to separate requests from each other, but I don't think it's needed in this case.

serathius · 2021-01-08T14:06:28Z

keps/sig-instrumentation/1668-log-tracking/README.md

+This log tracking feature is useful to identify the logs related to specific user operation  and cluster processing, and can reduce investigation cost in such cases.
+### Summary of Cases
+
+ - Given a component log(such as error log), find the API request that caused this (error) log.


I don't think you can get back API request this way, you will only get request id. Possibly it would be useful to be able to correlate component logs to audit logs which store original request content.

I'm sorry, The current sentence seems to have been misleading.
This sentence meant that we can traceback the log messages closest to the API request easily, and then find out the API request via audit logs as you said or some ways else.

serathius

I see two main threads in this KEP,

object tracing, picked up after original Tracing KEP reduced scope
adding context information to logs

Both are deep topics that can lead to lot of discussion, making it hard to move forward with the design.

I think focusing this KEP on only one of those problems would help moving forward. Object tracing problem is mainly related to api machinery, storing, passing and retrieving trace information (complicated design, small implementation). On the other hand context logging is mostly design of golang interface for logging library and changing logging library handling (from global klog methods to one based on context), (simple design, but complicated implementation/code migration).

As this KEP has put logging implementation out of scope (Add new logging metadata into actual K8s component logs is in no-goals) I would propose to focus this KEP on object tracing.

ehashman · 2021-01-20T23:47:06Z

Once @serathius's feedback is addressed I will give this a review.

fenggw-fnst · 2021-01-22T03:32:57Z

We have rewritten the KEP and opened a new PR#2312 to focus on only trace propagating as Marek suggested.

logicalhan · 2020-10-20T21:55:15Z

keps/sig-instrumentation/1668-log-tracking/README.md

+
+This KEP proposes a method for adding new three unique logging meta-data into K8s component logs.
+It makes us more easy to identify specific logs related to an user request (such as `kubectl apply`) and object (such as Pod, Deployment).
+It is expected to reduce investigation cost greatly when trouble shoothing.


s/trouble shoothing/troubleshooting/

logicalhan · 2021-01-25T22:18:59Z

keps/sig-instrumentation/1668-log-tracking/README.md

+### Opt-in for Components
+
+Updating/changing the logs is outside the scope of this KEP.  To actually add metadata to K8s component logs, the following procedures are needed in addition.
+- Open issues for each component, and discuss them with the SIGs that own that component.


It's generally easier to target SIGs individually, since KEPs can easily balloon (causing unnecessary bikeshedding). Insofar as practically applying @serathius' suggestion, it may make sense to separate 'core functionality', i.e. library code for tracing, from the component integration story.

fenggw-fnst · 2021-01-27T04:27:02Z

It's generally easier to target SIGs individually, since KEPs can easily balloon (causing unnecessary bikeshedding). Insofar as practically applying @serathius' suggestion, it may make sense to separate 'core functionality', i.e. library code for tracing, from the component integration story.

Thanks for review, we have sent the new PR as @serathius suggested. PTAL, any comments would be appreciated.

ehashman · 2021-01-27T17:48:43Z

We have rewritten the KEP and opened a new PR#2312 to focus on only trace propagating as Marek suggested.

Does that mean this should be closed in favour of #2312 ?

fenggw-fnst · 2021-01-28T02:55:56Z

We have rewritten the KEP and opened a new PR#2312 to focus on only trace propagating as Marek suggested.

Does that mean this should be closed in favour of #2312 ?

Yes, I think this one can be closed, new discussion should be at #2312

ehashman · 2021-01-28T17:54:48Z

Okay, I will close this PR.

Propose log tracking KEP

1632b59

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 1, 2020

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 1, 2020

k8s-ci-robot requested review from ehashman and logicalhan September 1, 2020 14:12

k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Sep 1, 2020

k8s-ci-robot assigned dashpole Sep 1, 2020

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 3, 2020

k8s-ci-robot requested a review from 44past4 September 7, 2020 09:09

hase1128 mentioned this pull request Sep 8, 2020

Add KEP for Request ID #1348

Closed

Update 20200901-log-tracking.md

a407e86

k8s-ci-robot changed the title ~~Propose log tracking KEP~~ [WIP] Propose log tracking KEP Sep 10, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 10, 2020

hase1128 mentioned this pull request Sep 14, 2020

New unique logging meta-data into key Kubernetes logs #1668

Closed

kikisdeliveryservice reviewed Sep 16, 2020

View reviewed changes

Create README.md

d2ec492

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 18, 2020

Create kep.yaml

0c24512

zhijianli88 and others added 2 commits January 8, 2021 18:58

update log example

d1ea659

Signed-off-by: Li Zhijian <[email protected]>

Merge pull request #15 from zhijianli88/hase1128/log-tracking

34d8f47

update log example

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 8, 2021

serathius reviewed Jan 8, 2021

View reviewed changes

zhijianli88 mentioned this pull request Jan 22, 2021

Trace context propagation #2312

Closed

logicalhan reviewed Jan 25, 2021

View reviewed changes

ehashman closed this Jan 28, 2021


		### Non-Goals

		- Add new logging metadata into actual K8s component logs

Propose log tracking KEP #1961

Propose log tracking KEP #1961

Conversation

hase1128 commented Sep 1, 2020

k8s-ci-robot commented Sep 1, 2020

hase1128 commented Sep 1, 2020

hase1128 commented Sep 1, 2020

ehashman commented Sep 3, 2020

ehashman commented Sep 3, 2020

serathius commented Sep 7, 2020

hase1128 commented Sep 8, 2020

ehashman commented Sep 10, 2020

kikisdeliveryservice left a comment • edited Loading

Choose a reason for hiding this comment

ehashman commented Jan 4, 2021

hase1128 commented Jan 5, 2021

ehashman commented Jan 6, 2021

hase1128 commented Jan 7, 2021 • edited Loading

k8s-ci-robot commented Jan 8, 2021

fenggw-fnst commented Jan 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

serathius left a comment • edited Loading

Choose a reason for hiding this comment

ehashman commented Jan 20, 2021

fenggw-fnst commented Jan 22, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fenggw-fnst commented Jan 27, 2021

ehashman commented Jan 27, 2021

fenggw-fnst commented Jan 28, 2021

ehashman commented Jan 28, 2021

kikisdeliveryservice left a comment •

edited

Loading

hase1128 commented Jan 7, 2021 •

edited

Loading

serathius left a comment •

edited

Loading