Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

feat: initial specs for ingest management #126

Merged
merged 17 commits into from
Jun 8, 2020

Conversation

mdelapenya
Copy link
Contributor

What does this PR do?

It adds the initial specs for the Ingest management project.

Why is it important?

We should start a discussion around them to make them perfect and totally understandable by anybody in the team: product owners, developers, testers, consumers, etc.

Related issues

Comment on lines 37 to 39
Then filebeat is started
And metricbeat is started
And endpoint is started
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BDD step is the same, so we could write just one implementation method, with an input parameter (the process to be present in the target)

And the "Fleet" Kibana setup has been created
And the agent binary is installed in the target host
When the agent is un-enrolled from Kibana
Then no new data shows up in Elasticsearc locations using the enrollment token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added using the enrollment token to match an existing step below. Is this assumption correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would phase it as 'using' the enrollment token, but its not entirely wrong. I'd phrase it as the host / agent is no longer able to send documents into ES (it will still be attempting to send them, running on the host)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I think you should say using the access token when an agent enroll into fleet we exchange an enrollmont token for an access token (that is one per agent).
One you invalidate an enrollment token, the agent already enrolled should continue to work, but you cannot enroll more agents with that enrollment token

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification Nicholas! Please look at L27:33 There is specific scenario for revoking the enrollment token for an agent. Is that what you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, reading your comment, I'd rephrase this second scenario (the one revoking the token) to this:

Scenario: Revoking the enrollment token for an agent
  Given there is a "Fleet" user in Kibana
    And the "Fleet" Kibana setup has been created
    And the agent binary is installed in the target host
    And the agent is un-enrolled from Kibana
  When the enrollment token is revoked
  Then no new data shows up in Elasticsearc locations using the enrollment token
    And the enrolled agent continues to work

And I'd create another use case:

Scenario: A revoked enrollment token cannot enroll more agents
  Given there is an enrollment token
  When the enrollment token is revoked
  Then it's not possible to use the token to enroll more agents

Does it make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we should clarify what the enrolled agent continues to work means: i.e. it sends data to elasticsearch, there is an endpoint we can query, a process is running in the host, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining above two scenarios into one:

Scenario: Revoking the enrollment token for an agent
  Given there is an agent enrolled with an enrollment token
  When the enrollment token is revoked
  Then it's not possible to use the token to enroll more agents
    And the enrolled agent continues to work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Nicolas and Manu, I'm learning here too! Knowing now what I do, I'd suggest we really only have 1 distinct different case to test and I'd phrase it as:

Scenario: Revoking an enrollment token 
  Given the Fleet user is set up and a valid enrollment token exists
  When the enrollment token is revoked
  Then an attempt to enroll a new agent fails

the pre-requisite for the test changes such that the agent is NOT running and is NOT already enrolled.
@mdelapenya what do you think? Honestly, if you can get us the first more straight-forward case I'm happy to work this with the code snippets we have and infrastructure you provide. We need not stress about completing this one case now, the team is fine to take it over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this scenario, because it's very straight-forward and simple at the same time. I'd replace what we had. wdyt about rephrasing the Given... to Given an agent is enrolled? Or do we want to make it clear for this scenario that we need the fleet user and the existence of a valid token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, in what state would be the existing agent? Will it pause? will it continue to send data?

Given there is a "Fleet" user in Kibana
And the "Fleet" Kibana setup has been created
When the agent binary is installed in the target host
Then the dashboards for the agent are present in Elasticsearch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know the exact data needed here: the ES query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the command to run the agent is:
./elastic-agent run

after this command is executed, we can wait a matter of seconds (5-20 seconds?) and then verify the existence of certain folders / data on the host as evidence of it working.
The logs we can check for are relative to the path where the agent was installed, so it would be, for example with a 7.8 agent:
elastic-agent-7.8.0-darwin-x86_64-BC5/data/logs/default/filebeat
elastic-agent-7.8.0-darwin-x86_64-BC5/data/logs/default/metricbeat

and from here:
elastic-agent-7.8.0-darwin-x86_64-BC5/data/run/default/metricbeat--7.8.0/meta.json

  • any non-empty file will suffice for all 3 assertions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for the Dashboards, lets actually use the API from Kibana, and even the Ingest one to assess this:
/api/ingest_manager/data_streams

  • if you call it prior to any Agent being deployed it should return a list of zero data streams as:
    {
    "data_streams": []
    }

when called after the Agent is running, it will return a list of (currently in 7.8) 20 streams, with a format as:
{
"data_streams": [
{},
{
"index": "metrics-system.load-default",
"dataset": "system.load",
"namespace": "default",
"type": "metrics",
"package": "system",
"package_version": "0.1.0",
"last_activity": "2020-06-04T18:59:29.693Z",
"size_in_bytes": 42605308,
"dashboards": [
{
"id": "79ffd6e0-faa0-11e6-947f-177f697178b8-ecs",
"title": "[Metrics System] Host overview ECS"
},
...
{
"id": "5517a150-f9ce-11e6-8115-a7c18106d86a-ecs",
"title": "[Logs System] SSH login attempts ECS"
},
{
"id": "Filebeat-syslog-dashboard-ecs",
"title": "[Logs System] Syslog dashboard ECS"
}
]
},
...
{},
{}
]
}

Lets assert the following...

  • the data_streams call returns more than 1 elements in its list.
  • the data_streams call returns a list element with an "index" of "metrics-system.process-default"
  • the list element "index": "metrics-system.process-default" has a sibling of a list called 'dashboards'
  • the list 'dashboards' will be confirmed to have an element with a title of "[Metrics System] Host overview ECS"

I don't think we should walk the whole list here, I understand there is separate automation to confirm this and would make the test brittle to changes. How does that sound?

And the "Fleet" Kibana setup has been created
And the agent binary is installed in the target host
When the agent is un-enrolled from Kibana
Then no new data shows up in Elasticsearc locations using the enrollment token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What data is not present here? I'd be great to understand more about its nature to identify when it shows up and when not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated:
a query you can use is as follows:
query the metrics* index and hit the equivalent of KQL:
host.name:"7exl-w10x64l6-d" and @timestamp >= "2020-06-06T01:30:00.948Z"
where the hostname is replaced correctly and the timestamp in question is captured 2 seconds after the unenroll call.

translated into an ES query (forgive me if this is terrible, its a hacked version from dev tools and I didn't take the time to re-work it much:

  • the same find/replace of the hostname and timestamp values is needed of coruse:

GET _search
{
"version": true,
"size": 500,
"docvalue_fields": [
{
"field": "@timestamp",
"format": "date_time"
},
{
"field": "system.process.cpu.start_time",
"format": "date_time"
},
{
"field": "system.service.state_since",
"format": "date_time"
}
],
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"host.name": "7exl-w10x64l6-d"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:50:00.948Z",
"time_zone": "America/New_York"
}
}
}
],
"minimum_should_match": 1
}
}
]
}
},
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:36:29.564Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is perfect! :)

And the agent is un-enrolled from Kibana
When the agent is re-enrolled from the host
And the agent runs from the host
Then the agent shows up in Kibana
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need here the exact thing to check: and API call, an XPATH element in the UI...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can absolutely get you the API calls and expectations. I don't know all of them off hand and am still digging thru 7.8 testing finding odd bugs, but I will work with the team tomorrow to fill in all of these with haste. we don't have the api documented yet either, so we'll get specifics for this and all similar requests in the branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the re-enroll call is exactly the same as it was prior, and the asserts are the same with the exception that we can check the timestamps on the metricbeat and filebeat files, to see that they are newer. newer than exactly what I'm not 100% sure on (there is some period where the Agent is in a state of transition. we could put a short pause in and wait for it to finish unenrolling and then capture that time and use it in the next step. ?

@apmmachine
Copy link
Contributor

apmmachine commented Jun 3, 2020

💔 Tests Failed

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #126 updated]

  • Start Time: 2020-06-08T16:53:20.697+0000

  • Duration: 19 min 29 sec

Test stats 🧪

Test Results
Failed 1
Passed 43
Skipped 13
Total 57

Test errors

Expand to view the tests failures

  • Name: Initializing / Tests / Sanity checks / checkgherkinlint – pre_commit.lint

    • Age: 4
    • Duration: 0
    • Error Details: error

Log output

Expand to view the last 100 lines of log output

[2020-06-08T17:09:28.326Z] </testsuites>+ sed -e 's/^[ \t]*//; s#>.*failed$#>#g' outputs/TEST-metricbeat-vsphere
[2020-06-08T17:09:28.326Z] + grep -E '^<.*>$'
[2020-06-08T17:09:28.326Z] + exit 0
[2020-06-08T17:09:28.367Z] Recording test results
[2020-06-08T17:09:28.882Z] Archiving artifacts
[2020-06-08T17:09:30.277Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:09:30.277Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:09:30.277Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:09:30.277Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:09:29Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=5.7.12 variant=MySQL
[2020-06-08T17:09:36.815Z] <?xml version="1.0" encoding="UTF-8"?>
[2020-06-08T17:09:36.815Z] <testsuites name="main" tests="0" skipped="0" failures="0" errors="0" time="252.963730352">
[2020-06-08T17:09:36.815Z]   <testsuite name="The Helm chart is following product recommended configuration for Kubernetes" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:09:36.815Z]   <testsuite name="The Helm chart is following product recommended configuration for Kubernetes" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:09:36.815Z]   <testsuite name="The Helm chart is following product recommended configuration for Kubernetes" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:09:36.815Z] </testsuites>+ grep -E '^<.*>$'
[2020-06-08T17:09:36.815Z] + sed -e 's/^[ \t]*//; s#>.*failed$#>#g' outputs/TEST-helm-metricbeat
[2020-06-08T17:09:36.815Z] + exit 0
[2020-06-08T17:09:36.857Z] Recording test results
[2020-06-08T17:09:37.232Z] None of the test reports contained any result
[2020-06-08T17:09:37.246Z] Archiving artifacts
[2020-06-08T17:09:52.283Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:09:52.283Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:09:52.283Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:09:52.283Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:09:53.684Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:09:53.684Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:09:54.259Z] Pulling mysql (docker.elastic.co/integrations-ci/beats-mysql:mysql-8.0.13-1)...
[2020-06-08T17:09:54.831Z] mysql-8.0.13-1: Pulling from integrations-ci/beats-mysql
[2020-06-08T17:10:04.865Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:10:04.865Z] Creating metricbeat_mysql_1 ... 
[2020-06-08T17:10:28.265Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:10:28.265Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:10:28.265Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:10:28.265Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:10:27Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=8.0.13 variant=MySQL
[2020-06-08T17:10:50.268Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:10:50.268Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:10:50.268Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:10:50.268Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:10:51.670Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:10:51.670Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:10:52.242Z] Pulling mysql (docker.elastic.co/integrations-ci/beats-mysql:percona-5.7.24-1)...
[2020-06-08T17:10:52.817Z] percona-5.7.24-1: Pulling from integrations-ci/beats-mysql
[2020-06-08T17:10:59.419Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:10:59.419Z] Creating metricbeat_mysql_1 ... 
[2020-06-08T17:11:23.339Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:11:23.339Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:11:23.339Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:11:23.339Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:11:22Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=5.7.24 variant=Percona
[2020-06-08T17:11:45.334Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:11:45.334Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:11:45.334Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:11:45.334Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:11:46.732Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:11:46.732Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:11:47.309Z] Pulling mysql (docker.elastic.co/integrations-ci/beats-mysql:percona-8.0.13-4-1)...
[2020-06-08T17:11:48.256Z] percona-8.0.13-4-1: Pulling from integrations-ci/beats-mysql
[2020-06-08T17:11:56.415Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:11:56.415Z] Creating metricbeat_mysql_1 ... 
[2020-06-08T17:12:20.971Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:12:20.971Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:12:20.971Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:12:20.971Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:12:20Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=8.0.13-4 variant=Percona
[2020-06-08T17:12:42.954Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:12:42.954Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:12:42.954Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:12:42.954Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:12:45.503Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:12:45.503Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:12:45.765Z] Stopping metricbeat_elasticsearch_1 ... 
[2020-06-08T17:12:46.709Z] 
Stopping metricbeat_elasticsearch_1 ... done
Removing metricbeat_elasticsearch_1 ... 
[2020-06-08T17:12:46.709Z] 
Removing metricbeat_elasticsearch_1 ... done
Removing network metricbeat_default
[2020-06-08T17:12:46.710Z] <?xml version="1.0" encoding="UTF-8"?>
[2020-06-08T17:12:46.710Z] <testsuites name="main" tests="7" skipped="0" failures="0" errors="0" time="443.881419428">
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that default configuration works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the Apache module works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the MySQL module works as expected" tests="7" skipped="0" failures="0" errors="0" time="395.220570912">
[2020-06-08T17:12:46.710Z]     <testcase name="Check MariaDB-10.2.23 is sending metrics to Elasticsearch without errors" status="passed" time="58.907072381"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MariaDB-10.3.14 is sending metrics to Elasticsearch without errors" status="passed" time="49.532053793"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MariaDB-10.4.4 is sending metrics to Elasticsearch without errors" status="passed" time="52.041878602"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MySQL-5.7.12 is sending metrics to Elasticsearch without errors" status="passed" time="53.058912659"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MySQL-8.0.13 is sending metrics to Elasticsearch without errors" status="passed" time="54.354813698"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check Percona-5.7.24 is sending metrics to Elasticsearch without errors" status="passed" time="51.521304203"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check Percona-8.0.13-4 is sending metrics to Elasticsearch without errors" status="passed" time="53.670707572"></testcase>
[2020-06-08T17:12:46.710Z]   </testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the Redis module works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the vSphere module works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z] </testsuites>+ sed -e 's/^[ \t]*//; s#>.*failed$#>#g' outputs/TEST-metricbeat-mysql
[2020-06-08T17:12:46.710Z] + grep -E '^<.*>$'
[2020-06-08T17:12:46.710Z] + exit 0
[2020-06-08T17:12:46.749Z] Recording test results
[2020-06-08T17:12:47.221Z] Archiving artifacts
[2020-06-08T17:12:48.468Z] Stage "Release" skipped due to when conditional
[2020-06-08T17:12:48.783Z] Running on worker-854309 in /var/lib/jenkins/workspace/stack_e2e-testing-mbp_PR-126
[2020-06-08T17:12:48.822Z] [INFO] getVaultSecret: Getting secrets
[2020-06-08T17:12:48.883Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-06-08T17:12:50.782Z] + chmod 755 generate-build-data.sh
[2020-06-08T17:12:50.782Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7 UNSTABLE 1168686
[2020-06-08T17:12:50.782Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7/steps/?limit=10000 -o steps-info.json
[2020-06-08T17:12:51.485Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7/tests/?status=FAILED -o tests-errors.json
[2020-06-08T17:12:52.188Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7/log/ -o pipeline-log.txt

And the "Fleet" Kibana setup has been created
When the agent binary is installed in the target host
Then the dashboards for the agent are present in Elasticsearch
And the agent shows up in Kibana
Copy link
Contributor Author

@mdelapenya mdelapenya Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to get this without checking the UI, maybe an API call? I'd like to avoid any UI/DOM interaction if possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes is it. I was using very 'loose' language, 'shows up' and 'in Kibana' can be interpreted to the API as:
Request URL, GET: /api/ingest_manager/fleet/agents?page=1&perPage=20&showInactive=false
With the presumption that there were zero agents when we started, there should be one item in the list[] that is returned. Response snippet we can use to assert:
{
"list": [
{
"id": "0a17686e-40c5-4a81-86ae-fb41ddd7ea96",
"active": true,
"config_id": "f1a077d0-a688-11ea-b905-bd56f880a400",
"type": "PERMANENT",
"enrolled_at": "2020-06-04T18:10:49.376Z",
"user_provided_metadata": {},
"local_metadata": {},
"access_api_key_id": "m7SHgHIBm78rI0UKTW-D",
"current_error_events": [],
"last_checkin": "2020-06-04T18:34:30.949Z",
"config_revision": 3,
"status": "online"
}
],
"success": true,
"total": 1,
"page": 1,
"perPage": 20
}

I suggest we look only that the ID exists and that the current_error_events[] list is empty
The status: 'online' would be good, but note that it is likely to be 'error' after it is enrolled, but before the agent is 'run' just to be aware of that nuance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call GET /api/ingest_manager/fleet/agents

@nchaulet
Copy link
Member

nchaulet commented Jun 4, 2020

Putting this here, I think it could help you later when implementing the steps.
https://github.com/elastic/kibana/blob/master/x-pack/test/api_integration/apis/fleet/agent_flow.ts

Scenario: Un-enrolling an agent
Given an agent is deployed to Fleet
When the agent is un-enrolled
Then the agent is not listed as online in Fleet
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @EricDavisX, could we reuse here the step the agent is listed in Fleet as "online"?

Therefore we would have:

the agent is listed in Fleet as "online"
the agent is listed in Fleet as "offline"

which would be one single step. wdyt?

Scenario: Re-enrolling an agent
Given an agent is enrolled
And the agent is un-enrolled
And the Agent is stopped on the host
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one automatically inferred from un-enrolling the agent, or must be done as a separate action?

If the later, I would keep it as is (although from user's perspective it seems more -unproductive?- work)

When the agent is re-enrolled on the host
And the agent is run on the host
Then the agent is listed in Fleet as online
And new documents are inserted into Elasticsearch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to abstract this step to a more product-related level, as I see it very technical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about something like

index `xxx` is created 
And index `xxx` has more than 123 documents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it! What if the number of documents is not there after an amount of time (minutes)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes exactly this should fail then

Comment on lines 47 to 48
Then the agent is un-enrolled
And the agent is stopped on the host
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scenario: Stopping the agent stops backend processes
Given an agent is deployed to Fleet
When the agent is stopped on the host
Then filebeat is stopped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need probably more like Then there are '2' metricbeat processes as we will need to check monitoring and ingesting beats

@EricDavisX
Copy link
Contributor

This is so great - moving fast and getting better! I agree with Michal we could enhance the 'stopped on the host' to indicate more accurate that the Agent (with defaults set) will start 2 of the Metricbeat and 2 of the Filebeat processes. Sorry I forgot that nuance. Still, for the first version we can leave it as implied and handle it in the implementation assertion (and correct it in a coming PR)

  • I say that so we can truly iterate on only what is most critical to getting a test in and running fast.

@EricDavisX
Copy link
Contributor

quick comment on the step:
And new documents are inserted into Elasticsearch

@mdelapenya
mdelapenya 7 hours ago Author Member
I'd like to abstract this step to a more product-related level, as I see it very technical.

@michalpristas
michalpristas 28 minutes ago Member
what about something like

index xxx is created
And index xxx has more than 123 documents
@mdelapenya
mdelapenya 24 minutes ago Author Member
I like it! What if the number of documents is not there after an amount of time (minutes)?

from @EricDavisX I don't mind any rework we want to do in elaborating this. I would like to suggest we keep it really stable and simple however, and I don't know if a given # of documents over a given amount of time would be. The Filebeat / Metricbeat info sent is based on host vm activity, right? I suggest if we have control over the environment and agents then we should be able to wait seconds (not minutes) and confirm changes regarding what Docs the Agent are sending in. Might be good to take this off line and discuss in a quick call if we have this and any other 'final' items before we can get further into the implementation.

@mdelapenya
Copy link
Contributor Author

Cool! Let's discuss about the specific implementation details in a follow-up iteration. Then I'd keep that step as And there is data for Ingest Manager in the index. Then it's an implementation detail

@EricDavisX
Copy link
Contributor

that sounds great to me. thanks Manu

@mdelapenya mdelapenya marked this pull request as ready for review June 8, 2020 16:19
@mdelapenya
Copy link
Contributor Author

@EricDavisX @michalpristas I think we are in the right track! Please let me know if the requirements are ready to be merged, so I can continue with the implementation

Thanks!

Copy link
Contributor

@EricDavisX EricDavisX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this - it looks ready to merge and we can iterate on it.

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved
And the "Fleet" Kibana setup has been created
And the agent binary is installed in the target host
When the agent is un-enrolled from Kibana
Then no new data shows up in Elasticsearc locations using the enrollment token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would phase it as 'using' the enrollment token, but its not entirely wrong. I'd phrase it as the host / agent is no longer able to send documents into ES (it will still be attempting to send them, running on the host)

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved
And the agent is un-enrolled from Kibana
When the agent is re-enrolled from the host
And the agent runs from the host
Then the agent shows up in Kibana
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can absolutely get you the API calls and expectations. I don't know all of them off hand and am still digging thru 7.8 testing finding odd bugs, but I will work with the team tomorrow to fill in all of these with haste. we don't have the api documented yet either, so we'll get specifics for this and all similar requests in the branch

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved
Given there is a "Fleet" user in Kibana
And the "Fleet" Kibana setup has been created
And the agent binary is installed in the target host
When the agent is un-enrolled from Kibana
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that we'll have to manually terminate the shell / process running on the host as part of the 'tear down' of this scenario, in order to test the re-enrolling and re-starting of the Agent.

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved
And the "Fleet" Kibana setup has been created
And the agent binary is installed in the target host
When the agent is un-enrolled from Kibana
Then no new data shows up in Elasticsearc locations using the enrollment token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Nicolas and Manu, I'm learning here too! Knowing now what I do, I'd suggest we really only have 1 distinct different case to test and I'd phrase it as:

Scenario: Revoking an enrollment token 
  Given the Fleet user is set up and a valid enrollment token exists
  When the enrollment token is revoked
  Then an attempt to enroll a new agent fails

the pre-requisite for the test changes such that the agent is NOT running and is NOT already enrolled.
@mdelapenya what do you think? Honestly, if you can get us the first more straight-forward case I'm happy to work this with the code snippets we have and infrastructure you provide. We need not stress about completing this one case now, the team is fine to take it over.

And the "Fleet" Kibana setup has been created
And the agent binary is installed in the target host
When the agent is un-enrolled from Kibana
Then no new data shows up in Elasticsearc locations using the enrollment token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated:
a query you can use is as follows:
query the metrics* index and hit the equivalent of KQL:
host.name:"7exl-w10x64l6-d" and @timestamp >= "2020-06-06T01:30:00.948Z"
where the hostname is replaced correctly and the timestamp in question is captured 2 seconds after the unenroll call.

translated into an ES query (forgive me if this is terrible, its a hacked version from dev tools and I didn't take the time to re-work it much:

  • the same find/replace of the hostname and timestamp values is needed of coruse:

GET _search
{
"version": true,
"size": 500,
"docvalue_fields": [
{
"field": "@timestamp",
"format": "date_time"
},
{
"field": "system.process.cpu.start_time",
"format": "date_time"
},
{
"field": "system.service.state_since",
"format": "date_time"
}
],
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"host.name": "7exl-w10x64l6-d"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:50:00.948Z",
"time_zone": "America/New_York"
}
}
}
],
"minimum_should_match": 1
}
}
]
}
},
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:36:29.564Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved
@mdelapenya mdelapenya merged commit 66a4a28 into elastic:master Jun 8, 2020
@mdelapenya
Copy link
Contributor Author

@EricDavisX @michalpristas merged!

I'm going to send a PR with the Go code scaffolding, so please feel free to contribute to it in the way you prefer

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants