feat: initial specs for ingest management #126

mdelapenya · 2020-06-03T18:41:27Z

What does this PR do?

It adds the initial specs for the Ingest management project.

Why is it important?

We should start a discussion around them to make them perfect and totally understandable by anybody in the team: product owners, developers, testers, consumers, etc.

Related issues

Relates Create a PoC for ingest management #124

mdelapenya · 2020-06-03T18:43:07Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+  Then filebeat is started
+    And metricbeat is started
+    And endpoint is started


The BDD step is the same, so we could write just one implementation method, with an input parameter (the process to be present in the target)

e2e/_suites/ingest-manager/features/ingest-manager.feature

mdelapenya · 2020-06-03T18:45:16Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the "Fleet" Kibana setup has been created
+    And the agent binary is installed in the target host
+  When the agent is un-enrolled from Kibana
+  Then no new data shows up in Elasticsearc locations using the enrollment token


I added using the enrollment token to match an existing step below. Is this assumption correct?

I'm not sure I would phase it as 'using' the enrollment token, but its not entirely wrong. I'd phrase it as the host / agent is no longer able to send documents into ES (it will still be attempting to send them, running on the host)

here I think you should say using the access token when an agent enroll into fleet we exchange an enrollmont token for an access token (that is one per agent).
One you invalidate an enrollment token, the agent already enrolled should continue to work, but you cannot enroll more agents with that enrollment token

Thanks for the clarification Nicholas! Please look at L27:33 There is specific scenario for revoking the enrollment token for an agent. Is that what you mean?

Mmm, reading your comment, I'd rephrase this second scenario (the one revoking the token) to this:

Scenario: Revoking the enrollment token for an agent Given there is a "Fleet" user in Kibana And the "Fleet" Kibana setup has been created And the agent binary is installed in the target host And the agent is un-enrolled from Kibana When the enrollment token is revoked Then no new data shows up in Elasticsearc locations using the enrollment token And the enrolled agent continues to work

And I'd create another use case:

Scenario: A revoked enrollment token cannot enroll more agents Given there is an enrollment token When the enrollment token is revoked Then it's not possible to use the token to enroll more agents

Does it make sense to you?

BTW, we should clarify what the enrolled agent continues to work means: i.e. it sends data to elasticsearch, there is an endpoint we can query, a process is running in the host, etc.

Combining above two scenarios into one:

Scenario: Revoking the enrollment token for an agent Given there is an agent enrolled with an enrollment token When the enrollment token is revoked Then it's not possible to use the token to enroll more agents And the enrolled agent continues to work

Thanks so much Nicolas and Manu, I'm learning here too! Knowing now what I do, I'd suggest we really only have 1 distinct different case to test and I'd phrase it as:

Scenario: Revoking an enrollment token Given the Fleet user is set up and a valid enrollment token exists When the enrollment token is revoked Then an attempt to enroll a new agent fails

the pre-requisite for the test changes such that the agent is NOT running and is NOT already enrolled.
@mdelapenya what do you think? Honestly, if you can get us the first more straight-forward case I'm happy to work this with the code snippets we have and infrastructure you provide. We need not stress about completing this one case now, the team is fine to take it over.

I like this scenario, because it's very straight-forward and simple at the same time. I'd replace what we had. wdyt about rephrasing the Given... to Given an agent is enrolled? Or do we want to make it clear for this scenario that we need the fleet user and the existence of a valid token?

BTW, in what state would be the existing agent? Will it pause? will it continue to send data?

e2e/_suites/ingest-manager/features/ingest-manager.feature

mdelapenya · 2020-06-03T18:48:27Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+  Given there is a "Fleet" user in Kibana
+    And the "Fleet" Kibana setup has been created
+  When the agent binary is installed in the target host
+  Then the dashboards for the agent are present in Elasticsearch


I'd like to know the exact data needed here: the ES query

the command to run the agent is:
./elastic-agent run

after this command is executed, we can wait a matter of seconds (5-20 seconds?) and then verify the existence of certain folders / data on the host as evidence of it working.
The logs we can check for are relative to the path where the agent was installed, so it would be, for example with a 7.8 agent:
elastic-agent-7.8.0-darwin-x86_64-BC5/data/logs/default/filebeat
elastic-agent-7.8.0-darwin-x86_64-BC5/data/logs/default/metricbeat

and from here:
elastic-agent-7.8.0-darwin-x86_64-BC5/data/run/default/metricbeat--7.8.0/meta.json

any non-empty file will suffice for all 3 assertions

And for the Dashboards, lets actually use the API from Kibana, and even the Ingest one to assess this:
/api/ingest_manager/data_streams

if you call it prior to any Agent being deployed it should return a list of zero data streams as:
{
"data_streams": []
}

when called after the Agent is running, it will return a list of (currently in 7.8) 20 streams, with a format as:
{
"data_streams": [
{},
{
"index": "metrics-system.load-default",
"dataset": "system.load",
"namespace": "default",
"type": "metrics",
"package": "system",
"package_version": "0.1.0",
"last_activity": "2020-06-04T18:59:29.693Z",
"size_in_bytes": 42605308,
"dashboards": [
{
"id": "79ffd6e0-faa0-11e6-947f-177f697178b8-ecs",
"title": "[Metrics System] Host overview ECS"
},
...
{
"id": "5517a150-f9ce-11e6-8115-a7c18106d86a-ecs",
"title": "[Logs System] SSH login attempts ECS"
},
{
"id": "Filebeat-syslog-dashboard-ecs",
"title": "[Logs System] Syslog dashboard ECS"
}
]
},
...
{},
{}
]
}

Lets assert the following...

the data_streams call returns more than 1 elements in its list.

the data_streams call returns a list element with an "index" of "metrics-system.process-default"

the list element "index": "metrics-system.process-default" has a sibling of a list called 'dashboards'

the list 'dashboards' will be confirmed to have an element with a title of "[Metrics System] Host overview ECS"

I don't think we should walk the whole list here, I understand there is separate automation to confirm this and would make the test brittle to changes. How does that sound?

mdelapenya · 2020-06-03T18:49:30Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the "Fleet" Kibana setup has been created
+    And the agent binary is installed in the target host
+  When the agent is un-enrolled from Kibana
+  Then no new data shows up in Elasticsearc locations using the enrollment token


What data is not present here? I'd be great to understand more about its nature to identify when it shows up and when not

updated:
a query you can use is as follows:
query the metrics* index and hit the equivalent of KQL:
host.name:"7exl-w10x64l6-d" and @timestamp >= "2020-06-06T01:30:00.948Z"
where the hostname is replaced correctly and the timestamp in question is captured 2 seconds after the unenroll call.

translated into an ES query (forgive me if this is terrible, its a hacked version from dev tools and I didn't take the time to re-work it much:

the same find/replace of the hostname and timestamp values is needed of coruse:

GET _search
{
"version": true,
"size": 500,
"docvalue_fields": [
{
"field": "@timestamp",
"format": "date_time"
},
{
"field": "system.process.cpu.start_time",
"format": "date_time"
},
{
"field": "system.service.state_since",
"format": "date_time"
}
],
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"host.name": "7exl-w10x64l6-d"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:50:00.948Z",
"time_zone": "America/New_York"
}
}
}
],
"minimum_should_match": 1
}
}
]
}
},
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:36:29.564Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}

This query is perfect! :)

mdelapenya · 2020-06-03T18:50:05Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the agent is un-enrolled from Kibana
+  When the agent is re-enrolled from the host
+    And the agent runs from the host
+  Then the agent shows up in Kibana


We will need here the exact thing to check: and API call, an XPATH element in the UI...

we can absolutely get you the API calls and expectations. I don't know all of them off hand and am still digging thru 7.8 testing finding odd bugs, but I will work with the team tomorrow to fill in all of these with haste. we don't have the api documented yet either, so we'll get specifics for this and all similar requests in the branch

the re-enroll call is exactly the same as it was prior, and the asserts are the same with the exception that we can check the timestamps on the metricbeat and filebeat files, to see that they are newer. newer than exactly what I'm not 100% sure on (there is some period where the Agent is in a state of transition. we could put a short pause in and wait for it to finish unenrolling and then capture that time and use it in the next step. ?

apmmachine · 2020-06-03T19:02:01Z

💔 Tests Failed

Expand to view the summary

Build stats

Build Cause: [Pull request #126 updated]
Start Time: 2020-06-08T16:53:20.697+0000
Duration: 19 min 29 sec

Test stats 🧪

Test	Results
Failed	1
Passed	43
Skipped	13
Total	57

Test errors

Expand to view the tests failures

Name: Initializing / Tests / Sanity checks / checkgherkinlint – pre_commit.lint
- Age: 4
- Duration: 0
- Error Details: error

Log output

Expand to view the last 100 lines of log output

[2020-06-08T17:09:28.326Z] </testsuites>+ sed -e 's/^[ \t]*//; s#>.*failed$#>#g' outputs/TEST-metricbeat-vsphere
[2020-06-08T17:09:28.326Z] + grep -E '^<.*>$'
[2020-06-08T17:09:28.326Z] + exit 0
[2020-06-08T17:09:28.367Z] Recording test results
[2020-06-08T17:09:28.882Z] Archiving artifacts
[2020-06-08T17:09:30.277Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:09:30.277Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:09:30.277Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:09:30.277Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:09:29Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=5.7.12 variant=MySQL
[2020-06-08T17:09:36.815Z] <?xml version="1.0" encoding="UTF-8"?>
[2020-06-08T17:09:36.815Z] <testsuites name="main" tests="0" skipped="0" failures="0" errors="0" time="252.963730352">
[2020-06-08T17:09:36.815Z]   <testsuite name="The Helm chart is following product recommended configuration for Kubernetes" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:09:36.815Z]   <testsuite name="The Helm chart is following product recommended configuration for Kubernetes" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:09:36.815Z]   <testsuite name="The Helm chart is following product recommended configuration for Kubernetes" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:09:36.815Z] </testsuites>+ grep -E '^<.*>$'
[2020-06-08T17:09:36.815Z] + sed -e 's/^[ \t]*//; s#>.*failed$#>#g' outputs/TEST-helm-metricbeat
[2020-06-08T17:09:36.815Z] + exit 0
[2020-06-08T17:09:36.857Z] Recording test results
[2020-06-08T17:09:37.232Z] None of the test reports contained any result
[2020-06-08T17:09:37.246Z] Archiving artifacts
[2020-06-08T17:09:52.283Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:09:52.283Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:09:52.283Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:09:52.283Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:09:53.684Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:09:53.684Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:09:54.259Z] Pulling mysql (docker.elastic.co/integrations-ci/beats-mysql:mysql-8.0.13-1)...
[2020-06-08T17:09:54.831Z] mysql-8.0.13-1: Pulling from integrations-ci/beats-mysql
[2020-06-08T17:10:04.865Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:10:04.865Z] Creating metricbeat_mysql_1 ... 
[2020-06-08T17:10:28.265Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:10:28.265Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:10:28.265Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:10:28.265Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:10:27Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=8.0.13 variant=MySQL
[2020-06-08T17:10:50.268Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:10:50.268Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:10:50.268Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:10:50.268Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:10:51.670Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:10:51.670Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:10:52.242Z] Pulling mysql (docker.elastic.co/integrations-ci/beats-mysql:percona-5.7.24-1)...
[2020-06-08T17:10:52.817Z] percona-5.7.24-1: Pulling from integrations-ci/beats-mysql
[2020-06-08T17:10:59.419Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:10:59.419Z] Creating metricbeat_mysql_1 ... 
[2020-06-08T17:11:23.339Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:11:23.339Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:11:23.339Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:11:23.339Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:11:22Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=5.7.24 variant=Percona
[2020-06-08T17:11:45.334Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:11:45.334Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:11:45.334Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:11:45.334Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:11:46.732Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:11:46.732Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:11:47.309Z] Pulling mysql (docker.elastic.co/integrations-ci/beats-mysql:percona-8.0.13-4-1)...
[2020-06-08T17:11:48.256Z] percona-8.0.13-4-1: Pulling from integrations-ci/beats-mysql
[2020-06-08T17:11:56.415Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:11:56.415Z] Creating metricbeat_mysql_1 ... 
[2020-06-08T17:12:20.971Z] 
Creating metricbeat_mysql_1 ... done
Found orphan containers (metricbeat_mysql_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
[2020-06-08T17:12:20.971Z] metricbeat_elasticsearch_1 is up-to-date
[2020-06-08T17:12:20.971Z] Creating metricbeat_metricbeat_1 ... 
[2020-06-08T17:12:20.971Z] 
Creating metricbeat_metricbeat_1 ... done
time="2020-06-08T17:12:20Z" level=info msg="Metricbeat is running configured for the service" metricbeatVersion=7.7.0 service=mysql serviceVersion=8.0.13-4 variant=Percona
[2020-06-08T17:12:42.954Z] Stopping metricbeat_metricbeat_1 ... 
[2020-06-08T17:12:42.954Z] 
Stopping metricbeat_metricbeat_1 ... done
Removing metricbeat_metricbeat_1 ... 
[2020-06-08T17:12:42.954Z] 
Removing metricbeat_metricbeat_1 ... done
Going to remove metricbeat_metricbeat_1
[2020-06-08T17:12:42.954Z] Stopping metricbeat_mysql_1 ... 
[2020-06-08T17:12:45.503Z] 
Stopping metricbeat_mysql_1 ... done
Removing metricbeat_mysql_1 ... 
[2020-06-08T17:12:45.503Z] 
Removing metricbeat_mysql_1 ... done
Going to remove metricbeat_mysql_1
[2020-06-08T17:12:45.765Z] Stopping metricbeat_elasticsearch_1 ... 
[2020-06-08T17:12:46.709Z] 
Stopping metricbeat_elasticsearch_1 ... done
Removing metricbeat_elasticsearch_1 ... 
[2020-06-08T17:12:46.709Z] 
Removing metricbeat_elasticsearch_1 ... done
Removing network metricbeat_default
[2020-06-08T17:12:46.710Z] <?xml version="1.0" encoding="UTF-8"?>
[2020-06-08T17:12:46.710Z] <testsuites name="main" tests="7" skipped="0" failures="0" errors="0" time="443.881419428">
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that default configuration works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the Apache module works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the MySQL module works as expected" tests="7" skipped="0" failures="0" errors="0" time="395.220570912">
[2020-06-08T17:12:46.710Z]     <testcase name="Check MariaDB-10.2.23 is sending metrics to Elasticsearch without errors" status="passed" time="58.907072381"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MariaDB-10.3.14 is sending metrics to Elasticsearch without errors" status="passed" time="49.532053793"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MariaDB-10.4.4 is sending metrics to Elasticsearch without errors" status="passed" time="52.041878602"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MySQL-5.7.12 is sending metrics to Elasticsearch without errors" status="passed" time="53.058912659"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check MySQL-8.0.13 is sending metrics to Elasticsearch without errors" status="passed" time="54.354813698"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check Percona-5.7.24 is sending metrics to Elasticsearch without errors" status="passed" time="51.521304203"></testcase>
[2020-06-08T17:12:46.710Z]     <testcase name="Check Percona-8.0.13-4 is sending metrics to Elasticsearch without errors" status="passed" time="53.670707572"></testcase>
[2020-06-08T17:12:46.710Z]   </testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the Redis module works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z]   <testsuite name="As a Metricbeat developer I want to check that the vSphere module works as expected" tests="0" skipped="0" failures="0" errors="0" time="0"></testsuite>
[2020-06-08T17:12:46.710Z] </testsuites>+ sed -e 's/^[ \t]*//; s#>.*failed$#>#g' outputs/TEST-metricbeat-mysql
[2020-06-08T17:12:46.710Z] + grep -E '^<.*>$'
[2020-06-08T17:12:46.710Z] + exit 0
[2020-06-08T17:12:46.749Z] Recording test results
[2020-06-08T17:12:47.221Z] Archiving artifacts
[2020-06-08T17:12:48.468Z] Stage "Release" skipped due to when conditional
[2020-06-08T17:12:48.783Z] Running on worker-854309 in /var/lib/jenkins/workspace/stack_e2e-testing-mbp_PR-126
[2020-06-08T17:12:48.822Z] [INFO] getVaultSecret: Getting secrets
[2020-06-08T17:12:48.883Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-06-08T17:12:50.782Z] + chmod 755 generate-build-data.sh
[2020-06-08T17:12:50.782Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7 UNSTABLE 1168686
[2020-06-08T17:12:50.782Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7/steps/?limit=10000 -o steps-info.json
[2020-06-08T17:12:51.485Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7/tests/?status=FAILED -o tests-errors.json
[2020-06-08T17:12:52.188Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/stack/e2e-testing-mbp/PR-126/runs/7/log/ -o pipeline-log.txt

e2e/_suites/ingest-manager/features/ingest-manager.feature

mdelapenya · 2020-06-03T22:51:38Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the "Fleet" Kibana setup has been created
+  When the agent binary is installed in the target host
+  Then the dashboards for the agent are present in Elasticsearch
+    And the agent shows up in Kibana


Is it possible to get this without checking the UI, maybe an API call? I'd like to avoid any UI/DOM interaction if possible

Yes is it. I was using very 'loose' language, 'shows up' and 'in Kibana' can be interpreted to the API as:
Request URL, GET: /api/ingest_manager/fleet/agents?page=1&perPage=20&showInactive=false
With the presumption that there were zero agents when we started, there should be one item in the list[] that is returned. Response snippet we can use to assert:
{
"list": [
{
"id": "0a17686e-40c5-4a81-86ae-fb41ddd7ea96",
"active": true,
"config_id": "f1a077d0-a688-11ea-b905-bd56f880a400",
"type": "PERMANENT",
"enrolled_at": "2020-06-04T18:10:49.376Z",
"user_provided_metadata": {},
"local_metadata": {},
"access_api_key_id": "m7SHgHIBm78rI0UKTW-D",
"current_error_events": [],
"last_checkin": "2020-06-04T18:34:30.949Z",
"config_revision": 3,
"status": "online"
}
],
"success": true,
"total": 1,
"page": 1,
"perPage": 20
}

I suggest we look only that the ID exists and that the current_error_events[] list is empty
The status: 'online' would be good, but note that it is likely to be 'error' after it is enrolled, but before the agent is 'run' just to be aware of that nuance.

You can call GET /api/ingest_manager/fleet/agents

nchaulet · 2020-06-04T18:51:33Z

Putting this here, I think it could help you later when implementing the steps.
https://github.com/elastic/kibana/blob/master/x-pack/test/api_integration/apis/fleet/agent_flow.ts

This new step will combine the others

Running "godog -t stop-agent" will filter the execution to those scenarios using the "@stop-agent" annotation. See https://github.com/cucumber/godog#tags

gherkin syntax changes and steps rework

mdelapenya · 2020-06-08T07:37:05Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+Scenario: Un-enrolling an agent
+  Given an agent is deployed to Fleet
+  When the agent is un-enrolled
+  Then the agent is not listed as online in Fleet


Hey @EricDavisX, could we reuse here the step the agent is listed in Fleet as "online"?

Therefore we would have:

the agent is listed in Fleet as "online" the agent is listed in Fleet as "offline"

which would be one single step. wdyt?

mdelapenya · 2020-06-08T07:38:57Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+Scenario: Re-enrolling an agent
+  Given an agent is enrolled
+    And the agent is un-enrolled
+    And the Agent is stopped on the host


Is this one automatically inferred from un-enrolling the agent, or must be done as a separate action?

If the later, I would keep it as is (although from user's perspective it seems more -unproductive?- work)

mdelapenya · 2020-06-08T07:40:17Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+  When the agent is re-enrolled on the host
+    And the agent is run on the host
+  Then the agent is listed in Fleet as online
+    And new documents are inserted into Elasticsearch


I'd like to abstract this step to a more product-related level, as I see it very technical.

what about something like

index `xxx` is created And index `xxx` has more than 123 documents

I like it! What if the number of documents is not there after an amount of time (minutes)?

yes exactly this should fail then

e2e/_suites/ingest-manager/features/ingest-manager.feature

mdelapenya · 2020-06-08T07:41:54Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+  Then the agent is un-enrolled
+    And the agent is stopped on the host


See https://github.com/elastic/e2e-testing/pull/126/files#r436508010

michalpristas · 2020-06-08T13:04:46Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+Scenario: Stopping the agent stops backend processes
+  Given an agent is deployed to Fleet
+  When the agent is stopped on the host
+  Then filebeat is stopped


we would need probably more like Then there are '2' metricbeat processes as we will need to check monitoring and ingesting beats

EricDavisX · 2020-06-08T13:11:53Z

This is so great - moving fast and getting better! I agree with Michal we could enhance the 'stopped on the host' to indicate more accurate that the Agent (with defaults set) will start 2 of the Metricbeat and 2 of the Filebeat processes. Sorry I forgot that nuance. Still, for the first version we can leave it as implied and handle it in the implementation assertion (and correct it in a coming PR)

I say that so we can truly iterate on only what is most critical to getting a test in and running fast.

EricDavisX · 2020-06-08T14:38:41Z

quick comment on the step:
And new documents are inserted into Elasticsearch

@mdelapenya
mdelapenya 7 hours ago Author Member
I'd like to abstract this step to a more product-related level, as I see it very technical.

@michalpristas
michalpristas 28 minutes ago Member
what about something like

index xxx is created
And index xxx has more than 123 documents
@mdelapenya
mdelapenya 24 minutes ago Author Member
I like it! What if the number of documents is not there after an amount of time (minutes)?

from @EricDavisX I don't mind any rework we want to do in elaborating this. I would like to suggest we keep it really stable and simple however, and I don't know if a given # of documents over a given amount of time would be. The Filebeat / Metricbeat info sent is based on host vm activity, right? I suggest if we have control over the environment and agents then we should be able to wait seconds (not minutes) and confirm changes regarding what Docs the Agent are sending in. Might be good to take this off line and discuss in a quick call if we have this and any other 'final' items before we can get further into the implementation.

mdelapenya · 2020-06-08T14:43:27Z

Cool! Let's discuss about the specific implementation details in a follow-up iteration. Then I'd keep that step as And there is data for Ingest Manager in the index. Then it's an implementation detail

EricDavisX · 2020-06-08T15:14:51Z

that sounds great to me. thanks Manu

mdelapenya · 2020-06-08T16:22:01Z

@EricDavisX @michalpristas I think we are in the right track! Please let me know if the requirements are ready to be merged, so I can continue with the implementation

Thanks!

EricDavisX

I love this - it looks ready to merge and we can iterate on it.

e2e/_suites/ingest-manager/features/ingest-manager.feature

EricDavisX · 2020-06-04T02:30:07Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the "Fleet" Kibana setup has been created
+    And the agent binary is installed in the target host
+  When the agent is un-enrolled from Kibana
+  Then no new data shows up in Elasticsearc locations using the enrollment token


I'm not sure I would phase it as 'using' the enrollment token, but its not entirely wrong. I'd phrase it as the host / agent is no longer able to send documents into ES (it will still be attempting to send them, running on the host)

e2e/_suites/ingest-manager/features/ingest-manager.feature

EricDavisX · 2020-06-04T02:35:09Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the agent is un-enrolled from Kibana
+  When the agent is re-enrolled from the host
+    And the agent runs from the host
+  Then the agent shows up in Kibana


we can absolutely get you the API calls and expectations. I don't know all of them off hand and am still digging thru 7.8 testing finding odd bugs, but I will work with the team tomorrow to fill in all of these with haste. we don't have the api documented yet either, so we'll get specifics for this and all similar requests in the branch

e2e/_suites/ingest-manager/features/ingest-manager.feature

EricDavisX · 2020-06-04T21:13:18Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+  Given there is a "Fleet" user in Kibana
+    And the "Fleet" Kibana setup has been created
+    And the agent binary is installed in the target host
+  When the agent is un-enrolled from Kibana


I forgot to mention that we'll have to manually terminate the shell / process running on the host as part of the 'tear down' of this scenario, in order to test the re-enrolling and re-starting of the Agent.

e2e/_suites/ingest-manager/features/ingest-manager.feature

EricDavisX · 2020-06-05T12:46:53Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the "Fleet" Kibana setup has been created
+    And the agent binary is installed in the target host
+  When the agent is un-enrolled from Kibana
+  Then no new data shows up in Elasticsearc locations using the enrollment token


Thanks so much Nicolas and Manu, I'm learning here too! Knowing now what I do, I'd suggest we really only have 1 distinct different case to test and I'd phrase it as:

Scenario: Revoking an enrollment token Given the Fleet user is set up and a valid enrollment token exists When the enrollment token is revoked Then an attempt to enroll a new agent fails

the pre-requisite for the test changes such that the agent is NOT running and is NOT already enrolled.
@mdelapenya what do you think? Honestly, if you can get us the first more straight-forward case I'm happy to work this with the code snippets we have and infrastructure you provide. We need not stress about completing this one case now, the team is fine to take it over.

EricDavisX · 2020-06-05T12:48:39Z

e2e/_suites/ingest-manager/features/ingest-manager.feature

+    And the "Fleet" Kibana setup has been created
+    And the agent binary is installed in the target host
+  When the agent is un-enrolled from Kibana
+  Then no new data shows up in Elasticsearc locations using the enrollment token


updated:
a query you can use is as follows:
query the metrics* index and hit the equivalent of KQL:
host.name:"7exl-w10x64l6-d" and @timestamp >= "2020-06-06T01:30:00.948Z"
where the hostname is replaced correctly and the timestamp in question is captured 2 seconds after the unenroll call.

translated into an ES query (forgive me if this is terrible, its a hacked version from dev tools and I didn't take the time to re-work it much:

the same find/replace of the hostname and timestamp values is needed of coruse:

GET _search
{
"version": true,
"size": 500,
"docvalue_fields": [
{
"field": "@timestamp",
"format": "date_time"
},
{
"field": "system.process.cpu.start_time",
"format": "date_time"
},
{
"field": "system.service.state_since",
"format": "date_time"
}
],
"_source": {
"excludes": []
},
"query": {
"bool": {
"must": [],
"filter": [
{
"bool": {
"filter": [
{
"bool": {
"should": [
{
"match_phrase": {
"host.name": "7exl-w10x64l6-d"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:50:00.948Z",
"time_zone": "America/New_York"
}
}
}
],
"minimum_should_match": 1
}
}
]
}
},
{
"range": {
"@timestamp": {
"gte": "2020-06-06T01:36:29.564Z",
"format": "strict_date_optional_time"
}
}
}
],
"should": [],
"must_not": []
}
}
}

e2e/_suites/ingest-manager/features/ingest-manager.feature

mdelapenya · 2020-06-08T17:57:21Z

@EricDavisX @michalpristas merged!

I'm going to send a PR with the Go code scaffolding, so please feel free to contribute to it in the way you prefer

feat: initial specs for ingest management

85351cb

mdelapenya requested review from EricDavisX, michalpristas and kuisathaverat June 3, 2020 18:41

mdelapenya self-assigned this Jun 3, 2020

mdelapenya added the ingest-manager label Jun 3, 2020

mdelapenya commented Jun 3, 2020

View reviewed changes

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved

mdelapenya commented Jun 3, 2020

View reviewed changes

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved

mdelapenya commented Jun 3, 2020

View reviewed changes

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved

mdelapenya commented Jun 3, 2020

View reviewed changes

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved

mdelapenya commented Jun 3, 2020

View reviewed changes

mdelapenya and others added 9 commits June 4, 2020 21:20

chore: rewrite revoking scenario

1123809

chore: group common steps into one

4c55304

This new step will combine the others

chore: pair step with existing one

b7355e2

Merge branch 'master' into 124-ingest-management

cf41253

chore: add tags supporting filtering the execution

778e4d9

Running "godog -t stop-agent" will filter the execution to those scenarios using the "@stop-agent" annotation. See https://github.com/cucumber/godog#tags

gherkin syntax changes and steps rework

4395d0a

updates from pr discussion for e2e Ingest test

c56b8bf

Merge pull request #2 from EricDavisX/patch-3

c57fa4d

gherkin syntax changes and steps rework

chore: apply proper given/when/then pattern

b588a63

mdelapenya commented Jun 8, 2020

View reviewed changes

e2e/_suites/ingest-manager/features/ingest-manager.feature Outdated Show resolved Hide resolved

mdelapenya commented Jun 8, 2020

View reviewed changes

michalpristas reviewed Jun 8, 2020

View reviewed changes

mdelapenya added 2 commits June 8, 2020 17:05

chore: be more generic about data in the index

3359246

chore: combine started/stopped steps into just one

8b27389

mdelapenya added 4 commits June 8, 2020 17:16

chore: use existing step for a running process in a host

974b2e7

chore: lowercase agent to match existing steps

e74ee66

chore: convert process name into an input argument for the step

8ed05c6

chore: address pre-commit lint issues

6fef267

mdelapenya marked this pull request as ready for review June 8, 2020 16:19

EricDavisX approved these changes Jun 8, 2020

View reviewed changes

chore: simplify revoke token scenario

4096531

mdelapenya merged commit 66a4a28 into elastic:master Jun 8, 2020

mdelapenya deleted the 124-ingest-management branch June 8, 2020 17:58

jfsiii mentioned this pull request Jun 9, 2020

[Ingest] OpenAPI spec file elastic/kibana#68323

Merged

mdelapenya mentioned this pull request Jun 18, 2020

[Ingest-Manager] Implementation for the Deploy scenario of the stand-alone mode #140

Merged

		Then the agent is un-enrolled
		And the agent is stopped on the host

feat: initial specs for ingest management #126

feat: initial specs for ingest management #126

Conversation

mdelapenya commented Jun 3, 2020

What does this PR do?

Why is it important?

Related issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apmmachine commented Jun 3, 2020 • edited Loading

💔 Tests Failed

Build stats

Test stats 🧪

Test errors

Log output

mdelapenya Jun 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nchaulet commented Jun 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricDavisX commented Jun 8, 2020

EricDavisX commented Jun 8, 2020

mdelapenya commented Jun 8, 2020

EricDavisX commented Jun 8, 2020

mdelapenya commented Jun 8, 2020

EricDavisX left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdelapenya commented Jun 8, 2020

apmmachine commented Jun 3, 2020 •

edited

Loading

mdelapenya Jun 3, 2020 •

edited

Loading