Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter_ecs: new filter for AWS ECS Metadata #5898

Merged
merged 3 commits into from
Nov 18, 2022

Conversation

PettitWesley
Copy link
Contributor

@PettitWesley PettitWesley commented Aug 16, 2022

Signed-off-by: Wesley Pettit [email protected]


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@PettitWesley
Copy link
Contributor Author

Running on an instance inside of an ECS cluster:

$ docker ps
CONTAINER ID   IMAGE                                                             COMMAND                CREATED      STATUS                PORTS     NAMES
de7fbb1b66db   111111111111.dkr.ecr.us-west-2.amazonaws.com/better-json-logger   "python ./logger.py"   3 days ago   Up 3 days                       ecs-fb-daemon-demo-1-app-c0d3dccbb0fdcd820400
c5d660dc5642   amazon/amazon-ecs-agent:latest                                    "/agent"               4 days ago   Up 4 days (healthy)             ecs-agent

The first container is part of an ECS Task. The filter's use case is to attach metadata to its logs. I could set up Fluent Bit to actually collect its logs, but for testing, the easiest thing to do is to set a config to mimic a tag coming from a real task:

[SERVICE]
    Log_Level info
    Grace 1

[INPUT]
    Name dummy
    Tag prefix.de7fbb1b66db


[INPUT]
    Name        forward
    Listen      0.0.0.0
    Port        24224

[FILTER]
    Name ecs
    Match *
    ECS_Tag_Prefix prefix.
    ecs_meta_cache_ttl 6h
#    Cluster_Metadata_Only On
    ADD THE_CLUSTER_IS $ClusterName
    ADD THE_CONTAINER_INSTANCE_ARN_IS $ContainerInstanceArn
    ADD THE_CONTAINER_INSTANCE_ID_IS $ContainerInstanceID
    ADD THE_ECS_AGENT_VERSION_IS $ECSAgentVersion
    ADD THE_TASK_ID_IS $TaskID
    ADD THE_TASK_ARN_IS $TaskARN
    ADD THE_TASK_DEF_CONTAINER_NAME_IS $ContainerName
    ADD THE_DOCKER_CONTAINER_NAME_IS $DockerContainerName
    ADD THE_DOCKER_ID_IS $ContainerID
    ADD THE_TASK_DEF_FAMILY_IS $TaskDefFamily
    ADD THE_TASK_DEF_VERSION_IS $TaskDefVersion

[OUTPUT]
    Name stdout
    Format json_lines
    Match *

Even though we use dummy input, the tag and ECS_Tag_Prefix configured makes filter think the logs are coming from the task:

Fluent Bit v1.9.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/08/16 14:51:53] [ info] [fluent bit] version=1.9.7, commit=3d57a63e54, pid=24547
[2022/08/16 14:51:53] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/08/16 14:51:53] [ info] [cmetrics] version=0.3.5
[2022/08/16 14:51:53] [ info] [input:forward:forward.1] listening on 0.0.0.0:24224
[2022/08/16 14:51:54] [ info] [sp] stream processor started
[2022/08/16 14:51:54] [ info] [output:stdout:stdout.0] worker #0 started
{
  "date": 1660661514.303914,
  "message": "dummy",
  "THE_CLUSTER_IS": "fb-daemon-project",
  "THE_CONTAINER_INSTANCE_ARN_IS": "arn:aws:ecs:us-west-1:111111111111:container-instance/fb-daemon-project/c7e5e34d4157429c90337d8e6f130612",
  "THE_CONTAINER_INSTANCE_ID_IS": "c7e5e34d4157429c90337d8e6f130612",
  "THE_ECS_AGENT_VERSION_IS": "Amazon ECS Agent - v1.61.3 (63f97f40)",
  "THE_TASK_ID_IS": "bf3152cb-08c8-4f76-b974-0ad5b2993f9d",
  "THE_TASK_ARN_IS": "arn:aws:ecs:us-west-1:111111111111:task/bf3152cb-08c8-4f76-b974-0ad5b2993f9d",
  "THE_TASK_DEF_CONTAINER_NAME_IS": "app",
  "THE_DOCKER_CONTAINER_NAME_IS": "ecs-fb-daemon-demo-1-app-c0d3dccbb0fdcd820400",
  "THE_DOCKER_ID_IS": "de7fbb1b66db297c51aff04e3ca90d2a9df690bb79cd5eadc9ccfa4bf02c6779",
  "THE_TASK_DEF_FAMILY_IS": "fb-daemon-demo",
  "THE_TASK_DEF_VERSION_IS": "1"
}

No leaks:

==24642== HEAP SUMMARY:
==24642==     in use at exit: 80 bytes in 1 blocks
==24642==   total heap usage: 5,803 allocs, 5,802 frees, 832,891 bytes allocated
==24642==
==24642== LEAK SUMMARY:
==24642==    definitely lost: 0 bytes in 0 blocks
==24642==    indirectly lost: 0 bytes in 0 blocks
==24642==      possibly lost: 0 bytes in 0 blocks
==24642==    still reachable: 80 bytes in 1 blocks
==24642==         suppressed: 0 bytes in 0 blocks
==24642== Rerun with --leak-check=full to see details of leaked memory
==24642==
==24642== For lists of detected and suppressed errors, rerun with: -s
==24642== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
SUCCESS: All unit tests have passed.

@PettitWesley PettitWesley added this to the Fluent Bit v1.9.9 milestone Sep 16, 2022
@PettitWesley PettitWesley temporarily deployed to pr October 5, 2022 16:51 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 5, 2022 16:51 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 5, 2022 17:10 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 6, 2022 23:28 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 6, 2022 23:28 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 6, 2022 23:42 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 7, 2022 00:21 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 7, 2022 00:21 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 7, 2022 00:35 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 7, 2022 21:43 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 7, 2022 21:43 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 7, 2022 21:59 Inactive
@PettitWesley PettitWesley added AWS Issues with AWS plugins or experienced by users running on AWS and removed docs-required labels Oct 10, 2022
@PettitWesley
Copy link
Contributor Author

Doc PR: fluent/fluent-bit-docs#925

@PettitWesley PettitWesley temporarily deployed to pr October 10, 2022 22:56 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 10, 2022 22:56 Inactive
@PettitWesley
Copy link
Contributor Author

[ec2-user@ip-10-192-11-106 build]$ valgrind ./bin/flb-rt-filter_ecs
==8030== Memcheck, a memory error detector
==8030== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==8030== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==8030== Command: ./bin/flb-rt-filter_ecs
==8030==
Test flb_test_ecs_filter...                     [2022/10/10 23:02:54] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8031
[2022/10/10 23:02:54] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:02:54] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:02:54] [ info] [sp] stream processor started
==8031== Warning: client switching stacks?  SP change: 0x910a778 --> 0x8437f20
==8031==          to suppress, use: --max-stackframe=13445208 or greater
==8031== Warning: client switching stacks?  SP change: 0x8437e98 --> 0x910a778
==8031==          to suppress, use: --max-stackframe=13445344 or greater
==8031== Warning: client switching stacks?  SP change: 0x910a998 --> 0x8437e98
==8031==          to suppress, use: --max-stackframe=13445888 or greater
==8031==          further instances of this message will not be shown.
[2022/10/10 23:02:56] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:02:57] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8031==
==8031== HEAP SUMMARY:
==8031==     in use at exit: 80 bytes in 1 blocks
==8031==   total heap usage: 5,817 allocs, 5,816 frees, 892,886 bytes allocated
==8031==
==8031== LEAK SUMMARY:
==8031==    definitely lost: 0 bytes in 0 blocks
==8031==    indirectly lost: 0 bytes in 0 blocks
==8031==      possibly lost: 0 bytes in 0 blocks
==8031==    still reachable: 80 bytes in 1 blocks
==8031==         suppressed: 0 bytes in 0 blocks
==8031== Rerun with --leak-check=full to see details of leaked memory
==8031==
==8031== For lists of detected and suppressed errors, rerun with: -s
==8031== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_no_prefix...           [2022/10/10 23:02:57] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8034
[2022/10/10 23:02:57] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:02:57] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:02:58] [ info] [sp] stream processor started
==8034== Warning: client switching stacks?  SP change: 0x910a778 --> 0x8437ec0
==8034==          to suppress, use: --max-stackframe=13445304 or greater
==8034== Warning: client switching stacks?  SP change: 0x8437e38 --> 0x910a778
==8034==          to suppress, use: --max-stackframe=13445440 or greater
==8034== Warning: client switching stacks?  SP change: 0x910a998 --> 0x8437e38
==8034==          to suppress, use: --max-stackframe=13445984 or greater
==8034==          further instances of this message will not be shown.
[2022/10/10 23:03:00] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:00] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8034==
==8034== HEAP SUMMARY:
==8034==     in use at exit: 80 bytes in 1 blocks
==8034==   total heap usage: 5,817 allocs, 5,816 frees, 892,808 bytes allocated
==8034==
==8034== LEAK SUMMARY:
==8034==    definitely lost: 0 bytes in 0 blocks
==8034==    indirectly lost: 0 bytes in 0 blocks
==8034==      possibly lost: 0 bytes in 0 blocks
==8034==    still reachable: 80 bytes in 1 blocks
==8034==         suppressed: 0 bytes in 0 blocks
==8034== Rerun with --leak-check=full to see details of leaked memory
==8034==
==8034== For lists of detected and suppressed errors, rerun with: -s
==8034== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_cluster_metadata_only... [2022/10/10 23:03:00] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8037
[2022/10/10 23:03:00] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:03:00] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:03:01] [ info] [sp] stream processor started
==8037== Warning: client switching stacks?  SP change: 0x910a778 --> 0x842cf50
==8037==          to suppress, use: --max-stackframe=13490216 or greater
==8037== Warning: client switching stacks?  SP change: 0x842cec8 --> 0x910a778
==8037==          to suppress, use: --max-stackframe=13490352 or greater
==8037== Warning: client switching stacks?  SP change: 0x910a998 --> 0x842cec8
==8037==          to suppress, use: --max-stackframe=13490896 or greater
==8037==          further instances of this message will not be shown.
[2022/10/10 23:03:03] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:03] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8037==
==8037== HEAP SUMMARY:
==8037==     in use at exit: 80 bytes in 1 blocks
==8037==   total heap usage: 5,800 allocs, 5,799 frees, 848,966 bytes allocated
==8037==
==8037== LEAK SUMMARY:
==8037==    definitely lost: 0 bytes in 0 blocks
==8037==    indirectly lost: 0 bytes in 0 blocks
==8037==      possibly lost: 0 bytes in 0 blocks
==8037==    still reachable: 80 bytes in 1 blocks
==8037==         suppressed: 0 bytes in 0 blocks
==8037== Rerun with --leak-check=full to see details of leaked memory
==8037==
==8037== For lists of detected and suppressed errors, rerun with: -s
==8037== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_cluster_error...       [2022/10/10 23:03:03] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8041
[2022/10/10 23:03:03] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:03:03] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:03:04] [ warn] [filter:ecs:ecs.0] Failed to get metadata from /v1/metadata, will retry
[2022/10/10 23:03:04] [ info] [sp] stream processor started
[2022/10/10 23:03:04] [ warn] [filter:ecs:ecs.0] Failed to get metadata from /v1/metadata, will retry
[2022/10/10 23:03:04] [error] [filter:ecs:ecs.0] Could not retrieve cluster metadata from ECS Agent
==8041== Warning: client switching stacks?  SP change: 0x910a778 --> 0x841dae0
==8041==          to suppress, use: --max-stackframe=13552792 or greater
==8041== Warning: client switching stacks?  SP change: 0x841da58 --> 0x910a778
==8041==          to suppress, use: --max-stackframe=13552928 or greater
==8041== Warning: client switching stacks?  SP change: 0x910a998 --> 0x841da58
==8041==          to suppress, use: --max-stackframe=13553472 or greater
==8041==          further instances of this message will not be shown.
[2022/10/10 23:03:06] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:06] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8041==
==8041== HEAP SUMMARY:
==8041==     in use at exit: 80 bytes in 1 blocks
==8041==   total heap usage: 5,770 allocs, 5,769 frees, 788,438 bytes allocated
==8041==
==8041== LEAK SUMMARY:
==8041==    definitely lost: 0 bytes in 0 blocks
==8041==    indirectly lost: 0 bytes in 0 blocks
==8041==      possibly lost: 0 bytes in 0 blocks
==8041==    still reachable: 80 bytes in 1 blocks
==8041==         suppressed: 0 bytes in 0 blocks
==8041== Rerun with --leak-check=full to see details of leaked memory
==8041==
==8041== For lists of detected and suppressed errors, rerun with: -s
==8041== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Test flb_test_ecs_filter_task_error...          [2022/10/10 23:03:06] [ info] [fluent bit] version=1.9.9, commit=9200384f8b, pid=8044
[2022/10/10 23:03:06] [ info] [storage] version=1.3.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/10/10 23:03:06] [ info] [cmetrics] version=0.3.7
[2022/10/10 23:03:07] [ info] [sp] stream processor started
[2022/10/10 23:03:07] [ warn] [filter:ecs:ecs.0] Failed to get metadata from /v1/tasks?dockerid=79c796ed2a7f, will retry
[2022/10/10 23:03:07] [error] [filter:ecs:ecs.0] Requesting metadata from ECS Agent introspection endpoint failed
[2022/10/10 23:03:07] [error] [filter:ecs:ecs.0] Failed to get ECS Task metadata for 79c796ed2a7f, falling back to process cluster metadata only. If this is intentional, set `Cluster_Metadata_Only On`
==8044== Warning: client switching stacks?  SP change: 0x910a778 --> 0x8428f20
==8044==          to suppress, use: --max-stackframe=13506648 or greater
==8044== Warning: client switching stacks?  SP change: 0x8428e98 --> 0x910a778
==8044==          to suppress, use: --max-stackframe=13506784 or greater
==8044== Warning: client switching stacks?  SP change: 0x910a998 --> 0x8428e98
==8044==          to suppress, use: --max-stackframe=13507328 or greater
==8044==          further instances of this message will not be shown.
[2022/10/10 23:03:09] [ warn] [engine] service will shutdown in max 1 seconds
[2022/10/10 23:03:09] [ info] [engine] service has stopped (0 pending tasks)
[ OK ]
==8044==
==8044== HEAP SUMMARY:
==8044==     in use at exit: 80 bytes in 1 blocks
==8044==   total heap usage: 5,798 allocs, 5,797 frees, 832,641 bytes allocated
==8044==
==8044== LEAK SUMMARY:
==8044==    definitely lost: 0 bytes in 0 blocks
==8044==    indirectly lost: 0 bytes in 0 blocks
==8044==      possibly lost: 0 bytes in 0 blocks
==8044==    still reachable: 80 bytes in 1 blocks
==8044==         suppressed: 0 bytes in 0 blocks
==8044== Rerun with --leak-check=full to see details of leaked memory
==8044==
==8044== For lists of detected and suppressed errors, rerun with: -s
==8044== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
SUCCESS: All unit tests have passed.
==8030==
==8030== HEAP SUMMARY:
==8030==     in use at exit: 0 bytes in 0 blocks
==8030==   total heap usage: 6 allocs, 6 frees, 2,837 bytes allocated
==8030==
==8030== All heap blocks were freed -- no leaks are possible
==8030==
==8030== For lists of detected and suppressed errors, rerun with: -s
==8030== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@PettitWesley PettitWesley temporarily deployed to pr October 10, 2022 23:11 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 18, 2022 21:34 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 18, 2022 21:34 Inactive
@PettitWesley PettitWesley temporarily deployed to pr October 18, 2022 21:49 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:31 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:31 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:35 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:35 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:36 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:36 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:37 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 20:37 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 21:15 Inactive
… not from a task

The filter is built primarily for a daemon deployment mode where Fluent Bit
runs once per node/instance and collects all log files on that host. The
filter can attach metadata to these logs. But- what if there is a
container running on the host that is not part of an ECS Task?
Json-file log driver files are written to disk with only the
container ID to distinguish where they came from. So the
daemon would still collect logs from containers not part of a task
and there is no easy way to ignore those logs. Without this patch,
in that case, the Fluent Bit output would be spammed with constant
errors from this filter. This patch suppresses failures after
2 failed attempts.

Signed-off-by: Wesley Pettit <[email protected]>
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 21:24 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 21:24 Inactive
@PettitWesley PettitWesley temporarily deployed to pr November 1, 2022 21:43 Inactive
@edsiper edsiper merged commit 0f46cf3 into fluent:1.9 Nov 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Issues with AWS plugins or experienced by users running on AWS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants