Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] log correlation on host.name is showing incorrect logs #148788

Closed
sorenlouv opened this issue Jan 12, 2023 · 10 comments · Fixed by #150005
Closed

[APM] log correlation on host.name is showing incorrect logs #148788

sorenlouv opened this issue Jan 12, 2023 · 10 comments · Fixed by #150005
Labels
apm:logs apm:test-plan-done Pull request that was successfully tested during the test plan bug Fixes for quality problems that affect the customer experience Team:APM All issues that need APM UI Team support

Comments

@sorenlouv
Copy link
Member

sorenlouv commented Jan 12, 2023

APM UI shows log lines for a service via the logs component

image

Logs annotated with a matching service.name are displayed . Log lines without service.name will be matched using container.id and then host.name (in that order).

Problem

Scenario: A customer runs multiple APM services on a single host. Some of the services use log correlation and therefore emit service.name-annotated logs. Other services don't have log correlation enabled. These logs will therefore not be annotated with service.name and will show up in APM UI for every service on that host. In this case it's better not to show those log lines anywhere, than to show them everywhere.

Option A

Remove (or make is possible to disable) correlation on host.name. This means remove host.name from here:

https://github.com/elastic/kibana/blob/ae5594849c26775e5a6207258f5fe5139bd3e5a0/x-pack/plugins/apm/public/components/app/service_logs/index.tsx#L67-L86

Option B

EDIT: Option A was chosen.

Only correlate by host.name if there are no logs in the selected time range with service.name correlation.

We can check for this like:

GET logs/_search?terminate_after=1
{
  "track_total_hits": 1,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "service.name"
          }
        },
        {
          "range": {
            "@timestamp": {
              "lte": "...",
              "gte": "..."
            }
          }
        }
      ]
    }
  }
}

We can get the log indices to query using getResolvedLogView:

const res = await logViews.getScopedClient(request).getResolvedLogView('default')

Related

@sorenlouv sorenlouv added the Team:APM All issues that need APM UI Team support label Jan 12, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:APM)

@kpatticha
Copy link
Contributor

In the meeting, another solution was suggested, @sqren could you please add it to the ticket? (I can't recall what was it :( )

@sorenlouv
Copy link
Member Author

In the meeting, another solution was suggested, @sqren could you please add it to the ticket? (I can't recall what was it :( )

@dgieselaar suggested that we we only correlate by host.name if there are no logs in the selected time range with service name correlation.

Concretely we'd have to query the log index like:

GET logs/_search?terminate_after=1
{
  "track_total_hits": 1,
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "service.name"
          }
        },
        {
          "range": {
            "@timestamp": {
              "lte": "...",
              "gte": "..."
            }
          }
        }
      ]
    }
  }
}

We haven't queried the logs index directly before. We'd have to talk to the logs team how we can get the user-configured logs indices.

@sorenlouv sorenlouv changed the title [APM] Make log correlation by host.name and container.id configurable [APM] Remove log correlation on host.name Jan 19, 2023
@sorenlouv sorenlouv changed the title [APM] Remove log correlation on host.name [APM] log correlation on host.name is showing incorrect logs Jan 19, 2023
@sorenlouv
Copy link
Member Author

I've updated the issue with "Option B"

@sorenlouv sorenlouv added bug Fixes for quality problems that affect the customer experience 8.8 candidate labels Jan 23, 2023
@felixbarny
Copy link
Member

I'd prefer option A. The issue with B is that it's difficult to reason about why logs are shown or not shown in the service logs tab. It would show host logs if there's no service that has correctly set up log correlation. But if there's any service, that may be owned by a completely different team, that has set up log correlation, it would suddenly be empty. Even more confusingly, the behavior may flip back-and-forth depending on the selected time range and whether that application has emitted any logs during that time.

I'd rather have an empty logs tab with a clear description on what users need to do in order to populate the screen: to set up log collection and to configure the service.name and/or the container.id in the logs.

Let's keep the logic simple so that our users and we can more easily understand and troubleshoot log correlation.

@sorenlouv
Copy link
Member Author

I'd rather have an empty logs tab with a clear description on what users need to do in order to populate the screen: to set up log collection and to configure the service.name and/or the container.id in the logs.

This is fine by me. Just to be sure we're essentially removing the logs feature for users that have services running on different hosts (rather than in containers) and doesn't have log correlation setup.

@felixbarny
Copy link
Member

I think we should add release notes about this change.

Are there any docs we need to change as a result of this?
/cc @SylvainJuge and @bmorelli25 who have been working on general logging docs.

@sorenlouv
Copy link
Member Author

sorenlouv commented Feb 1, 2023

I think we should add release notes about this change.

Are there any docs we need to change as a result of this? /cc @SylvainJuge and @bmorelli25 who have been working on general logging docs.

@felixbarny I've already added the release_note:fix label (to the PR) so it will be included in the release notes

@sorenlouv
Copy link
Member Author

sorenlouv commented Feb 1, 2023

I've created a separate PR (#150065) that adds log correlation on service.environment.

@felixbarny @AlexanderWert Btw. I noticed we have had a bug since 8.1 (introduced by yours truly in #120694) that makes the host.name correlation problem much worse.

The bug is even embedded in the unit tests:

'service.name: "opbeans-node" or (not service.name and (host.name: "baz" or host.name: "quz"))'

service.name: "opbeans-node" or (not service.name and (host.name: "baz" or host.name: "quz"))

not service.name and... is supposed to only match logs by host.name if they are not annotated with service.name. Unfortunately the syntax used is not correct. It should have been not service.name : * and.... This means that even log lines annotated with service.name shows up for every other service on that host.

I've fixed that in #150065.

kqualters-elastic pushed a commit to kqualters-elastic/kibana that referenced this issue Feb 6, 2023
@sorenlouv
Copy link
Member Author

Implementing #117772 will make testing issues like this much easier.

@sorenlouv sorenlouv added the apm:test-plan-done Pull request that was successfully tested during the test plan label Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apm:logs apm:test-plan-done Pull request that was successfully tested during the test plan bug Fixes for quality problems that affect the customer experience Team:APM All issues that need APM UI Team support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants