Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Agent datastreams are conflicting with Filebeat setup #19369

Closed
mostlyjason opened this issue Jun 24, 2020 · 29 comments
Closed
Assignees
Labels
bug Ingest Management:beta1 Group issues for ingest management beta1

Comments

@mostlyjason
Copy link

mostlyjason commented Jun 24, 2020

While trying to set up filebeat in my 8.0 snapshot cluster and I got this error message. Is it possible filebeat modules are conflicting with agent?

jason@jason-VB-Development:~/filebeat-8.0.0-SNAPSHOT-linux-x86_64$ ./filebeat setup
Overwriting ILM policy is disabled. Set `setup.ilm.overwrite:true` for enabling.
Exiting: failed to check for alias 'filebeat-8.0.0': (status=400) {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}: 400 Bad Request: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}

For confirmed bugs, please report:

  • Version: 8.0 snapshot
  • Operating System: Ubuntu
  • Steps to Reproduce:
    1. Install Elastic Agent on host, enroll into fleet and run it
    2. Install filebeat
    3. ./filebeat modules enable system
    4. ./filebeat setup

@EricDavisX is it worth having a test case to make sure that Elastic Agent is compatible with Filebeat running on the same cluster? I imagine a large % of customers use Beats.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@ruflin
Copy link
Contributor

ruflin commented Jun 25, 2020

@michalpristas What I see above in the error comes probably from the agent / Filebeat run through the Agent. Is Agent somehow interferring with other filebeat binaries?

@ruflin ruflin added the Ingest Management:beta1 Group issues for ingest management beta1 label Jun 25, 2020
@ph
Copy link
Contributor

ph commented Jun 25, 2020

I am confused by this.

@mostlyjason Could you share the YML configuration that you are using for filebeat?

@mostlyjason
Copy link
Author

mostlyjason commented Jun 25, 2020

I'm just using the default filebeat.yml with the cloud.id and and cloud.auth pasted in. Seems to be reproducable since I tried it with a fresh 8.0 cluster on staging, and a fresh directory for filebeat and elastic agent. The error message is generated by standalone filebeat. You can see I'm running the setup command in the directory where I extracted the filebeat tar.gz.

@michalpristas
Copy link
Contributor

@mostlyjason i'm confused by this as well. are you running agent + filebeat (independent of filebeat included in agent) on the same machine and then try to configure the standalone one?

@ph
Copy link
Contributor

ph commented Jun 26, 2020

Exiting: failed to check for alias 'filebeat-8.0.0': (status=400) {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}: 400 Bad Request: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}

So, taking a step back and looking more deeply in the error messages.

  1. Overwriting ILM policy is disabled: this mean that there is a matching ILM policy on the remote cluster.
  2. failed to check for alias 'filebeat-8.0.0: Filebeat is indeed running into ILM and check that the writing alias filebeat-8.0.0 exists.
  3. logs-agent-default This is weird, because this new indexing strategy only concern Agent, Filebeat have no knowledge of that concept at all.

I do have a theory, there is a conflict between the ILM policy in ingest manager and the one shipped with Filebeat, what point me to this is the datastream error logs-agent-default.

I am not sure if this is an error with Agent / Filebeat but instead with how we deal with packages, @michalpristas Can you try to reproduce the above use case?

@mostlyjason
Copy link
Author

Thanks @ph thats a helpful breakdown. @michalpristas yes what you said is correct

@ph
Copy link
Contributor

ph commented Jun 26, 2020

@michalpristas Lets try to reproduce on our hand, but I suspect the problem isn't on the agent and the Filebeat side but in EPM.

@jsoriano
Copy link
Member

jsoriano commented Jul 6, 2020

I have found this issue with Metricbeat too:

2020-07-06T18:23:26.628+0200	ERROR	[publisher_pipeline_output]	pipeline/output.go:154	Failed to connect to backoff(elasticsearch(http://127.0.0.1:9200)): Connection marked as failed because the onConnect callback failed: failed to check for alias 'metricbeat-8.0.0': (status=400) {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}: 400 Bad Request: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}

Once this error appears Metricbeat stops ingesting, so it could affect working deployments where giving a try to the Agent.

Steps to reproduce:

  • Have metribeat running.
  • Start ingesting with an agent in the same cluster.
  • Restart original metricbeat.

Then original metricbeat cannot ingest anymore. To reproduce this it is not enough with adding needed to add a configuration, an agent needs to ingest data.

I can reproduce it with this metricbeat configuration:

metricbeat.modules:
- module: zookeeper
  metricsets: [mntr, server, connection]
  hosts:
  - localhost:2181

output.elasticsearch:
  hosts: [127.0.0.1:9200]
  username: elastic
  password: changeme

And elastic agent on standalone mode (not managed by fleet), this configuration seems to be enough:

outputs:
  default:
    type: elasticsearch
    hosts: [127.0.0.1:9200]
    username: elastic
    password: changeme

logging.to_stderr: true

inputs:
  - type: system/metrics
    dataset.namespace: default
    use_output: default
    streams:
      - metricset: cpu
        dataset.name: system.cpu
      - metricset: memory
        dataset.name: system.memory
      - metricset: network
        dataset.name: system.network
      - metricset: filesystem
        dataset.name: system.filesystem

@ph ph assigned ph and unassigned michalpristas Jul 6, 2020
@ph
Copy link
Contributor

ph commented Jul 6, 2020

@jsoriano I am going to reproduce it thanks for the config.

@ph
Copy link
Contributor

ph commented Jul 8, 2020

@jsoriano The error that you see:

2020-07-06T18:23:26.628+0200	ERROR	[publisher_pipeline_output]	pipeline/output.go:154	Failed to connect to backoff(elasticsearch(http://127.0.0.1:9200)): Connection marked as failed because the onConnect callback failed: failed to check for alias 'metricbeat-8.0.0': (status=400) {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}: 400 Bad Request: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}

This is in the metricbeat log?

@jsoriano
Copy link
Member

jsoriano commented Jul 8, 2020

Yes, this is in the Metricbeat log.

@ph
Copy link
Contributor

ph commented Jul 8, 2020

@jsoriano Just to clarify the steps.

In your reproducable case, did you ever run kibana / ingest manager?

@jsoriano
Copy link
Member

jsoriano commented Jul 8, 2020

Yes, I run Kibana and ES with the 8.0.0-SNAPSHOTs, and Metricbeat and Agent built from master. Not sure though if I opened the ingest manager UI, I think I didn't do it on my last try with the posted configurations, but I could try again to confirm.

Are you having problems to reproduce this?

@ph
Copy link
Contributor

ph commented Jul 14, 2020

Checking with @blakerouse we aren't sure how this is possible yet.

@michalpristas
Copy link
Contributor

trying to repro filebeat issue from both masters i had no issues
image

i will try metricbeat as described above

@michalpristas
Copy link
Contributor

@jsoriano do you have logs from metricbeat logs from both original metricbeat and one running by agent (in data/logs/default/metrcibeat) available.
Not using zookeeper but kibana module but i'm ingesting still

@ph were you able to repro?

@michalpristas michalpristas assigned michalpristas and unassigned ph and blakerouse Jul 15, 2020
@michalpristas
Copy link
Contributor

assigning me as i'm playing with it

@ph
Copy link
Contributor

ph commented Jul 15, 2020

I've haven't been able to reproduce it, looking at the error its maybe a package that we install via EPM. :(

@jsoriano
Copy link
Member

An update of this, I have tried to reproduce it again with master and with 7.x, good news is that with 7.x this issue doesn't happen to me. So maybe this is caused by some breaking change in Beats/ES/Kibana for 8.x and we are good with 7.x, but we should be careful in case we backport what is causing this.

To reproduce this I am using Elasticsearch and Kibana from the scenario in https://github.com/elastic/integrations/tree/master/testing/environments. I don't open Kibana or install any package on any moment.

For master:

  • I run the scenario as is in the integrations repository, that starts the stack using 8.0 snapshots.
  • I run metricbeat built from master branch.
  • I run elastic-agent built from master branch (with mage package) and with the reference config file included in the tar.gz.
  • After running elastic agent, if I restart metricbeat it cannot ingest anymore, with this error:
2020-07-28T11:07:49.883+0200	ERROR	[publisher_pipeline_output]	pipeline/output.go:154	Failed to connect to backoff(elasticsearch(http://127.0.0.1:9200)): Connection marked as failed because the onConnect callback failed: failed to check for alias 'metricbeat-8.0.0': (status=400) {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-elastic.agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-elastic.agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}: 400 Bad Request: {"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The provided expression [logs-elastic.agent-default] matches a data stream, specify the corresponding concrete indices instead."}],"type":"illegal_argument_exception","reason":"The provided expression [logs-elastic.agent-default] matches a data stream, specify the corresponding concrete indices instead."},"status":400}

Note: running PLATFORMS=linux/amd64 mage package fails for me with current master with the following error, but tar.gz seems to be correctly generated:

COPY failed: stat /var/lib/docker/tmp/docker-builder878593990/beat: no such file or directory
...
Error: failed building elastic-agent type=docker for platform=linux/amd64: failed to build docker: running "docker build -t docker.elastic.co/beats/elastic-agent:8.0.0 build/package/elastic-agent/elastic-agent-linux-amd64.docker/docker-build" failed with exit code 1

For 7.x:

  • I run the scenario as in the integrations repository, but replacing the versions with 7.9.0-SNAPSHOT.
  • I run metricbeat built from 7.x branch.
  • I run elastic-agent built from 7.x branch (with mage package) and with the reference config file included in the tar.gz.
  • Metricbeat has no problem ingesting even after restarting it.

@michalpristas
Copy link
Contributor

michalpristas commented Jul 28, 2020

not sure what changed but today i'm hitting some race on ^^^ beat build dir. sometimes it gets build completely sometimes partially and sometimes not at all before proceeding to composing docker image.

edit: i may have a clue actually

@michalpristas
Copy link
Contributor

@jsoriano try latest master for build issue, should be ok now

@jsoriano
Copy link
Member

@jsoriano try latest master for build issue, should be ok now

It builds now, yes, thanks!

@ph
Copy link
Contributor

ph commented Jul 28, 2020

Can we close this?

@jsoriano
Copy link
Member

Can we close this?

This issue still happens with master, are we ok with this?

It doesn't seem to affect 7.9/beta1, so in any case I think we can remove the Ingest Management:beta1 label.

@ruflin
Copy link
Contributor

ruflin commented Jul 28, 2020

If it still happens in master, we should keep it open and continue investigating. I would really like to understand why it happens so we can figure out if 7.x is also effected.

@ph
Copy link
Contributor

ph commented Jul 28, 2020

Let's keep it open.

I am worried, are we missing a commit in 7.9?

@michalpristas
Copy link
Contributor

tried to repro today without luck (using cloud 8.0.0 snapshot for ES and kibana) but i found this which looks exactly like the same issue

elastic/kibana#69061

and their fix here

elastic/kibana#68794

not sure what we can fix as this appears to manifests on ILM setup while trying to check whether alias exists

@jsoriano
Copy link
Member

jsoriano commented Aug 4, 2020

I can confirm that I cannot reproduce this issue anymore with latest 8.0 snapshots. And I agree that elastic/kibana#69061 looks like the same issue. So let's close this one. Thanks for the investigation!

@jsoriano jsoriano closed this as completed Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Ingest Management:beta1 Group issues for ingest management beta1
Projects
None yet
Development

No branches or pull requests

7 participants