Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Netflow data indexing failures after upgrading to 8.13 #38703

Closed
etigervaise opened this issue Apr 2, 2024 · 10 comments
Closed

[Filebeat] Netflow data indexing failures after upgrading to 8.13 #38703

etigervaise opened this issue Apr 2, 2024 · 10 comments
Assignees
Labels
bug Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution

Comments

@etigervaise
Copy link

Please post all questions and issues on https://discuss.elastic.co/c/beats
before opening a Github Issue. Your questions will reach a wider audience there,
and if we confirm that there is a bug, then you can open a new issue.

For security vulnerabilities please only send reports to [email protected].
See https://www.elastic.co/community/security for more information.

Please include configurations and logs if available.

For confirmed bugs, please report:

# Module: netflow
# Docs: https://www.elastic.co/guide/en/beats/filebeat/main/filebeat-module-netflow.html

- module: netflow
  log:
    enabled: true
    var:
      netflow_host: 0.0.0.0
      netflow_port: 2055
      # internal_networks specifies which networks are considered internal or private
      # you can specify either a CIDR block or any of the special named ranges listed
      # at: https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html#condition-network
      internal_networks:
        - private

4 - Send netflow to filebeat. For debugging, I have used the following project: https://github.com/nerdalert/nflow-generator/tree/master
5 - Notice the following warning:

Apr 02 13:57:11 tv filebeat[597]: {"log.level":"warn","@timestamp":"2024-04-02T13:57:11.590Z","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails","file.name":"elasticsearch/client.go","file.line":454},"message":"Cannot index event (status=400): dropping event! Enable debug logs to view the event and cause.","service.name":"filebeat","ecs.version":"1.6.0"}

Note that this is following the breaking change #37901

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 2, 2024
@lucabelluccini lucabelluccini added the Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution label Apr 2, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 2, 2024
@andrewkroh
Copy link
Member

Cannot index event (status=400): dropping event! Enable debug logs to view the event and cause.

Can you please enable debug logging so that you can see the raw event and Elasticsearch error message. Then can you share that with us (mask any sensitive details) it will help figure out the cause of the rejection.

@pkoutsovasilis
Copy link
Contributor

Hello @etigervaise 👋 with a really quick look maybe the event normalisation that we deem as safe to disable affects the data produced by the generator, or it can be something else. If you are able to provide what @andrewkroh mentions above it will help us a lot in finding the actual issue 🙂

@etigervaise
Copy link
Author

Is this what you need:

tv filebeat[2101]: {"log.level":"debug","@timestamp":"2024-03-30T15:34:11.368Z","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).bulkCollectPublishFails","file.name":"elasticsearch/client.go","file.line":455},"message":"Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2024, time.March, 30, 15, 34, 1, 0, time.UTC), Meta:{\"pipeline\":\"filebeat-8.13.0-netflow-log-pipeline\"}, Fields:{\"agent\":{\"ephemeral_id\":\"03a72809-448e-480e-b803-107a716d60b5\",\"id\":\"75009873-0b59-468c-af43-a32121fbc9f4\",\"name\":\"tv\",\"type\":\"filebeat\",\"version\":\"8.13.0\"},\"destination\":{\"ip\":\"13.109.185.170\",\"locality\":\"external\",\"port\":443},\"ecs\":{\"version\":\"1.12.0\"},\"event\":{\"action\":\"netflow_flow\",\"category\":[\"network\"],\"created\":\"2024-03-30T15:34:02.010420583Z\",\"dataset\":\"netflow.log\",\"duration\":0,\"end\":\"2024-03-30T14:33:30.341Z\",\"kind\":\"event\",\"module\":\"netflow\",\"start\":\"2024-03-30T14:33:30.341Z\",\"type\":[\"connection\"]},\"fileset\":{\"name\":\"log\"},\"flow\":{\"id\":\"fs64I72dWmc\",\"locality\":\"external\"},\"input\":{\"type\":\"netflow\"},\"netflow\":{\"destination_ipv4_address\":\"13.109.185.170\",\"destination_transport_port\":443,\"egress_interface\":0,\"exporter\":{\"address\":\"192.168.1.1:55242\",\"source_id\":0,\"timestamp\":\"2024-03-30T15:34:01Z\",\"uptime_millis\":12494239,\"version\":9},\"flow_end_sys_up_time\":8863580,\"flow_start_sys_up_time\":8863580,\"ingress_interface\":0,\"ip_class_of_service\":0,\"ip_version\":4,\"octet_delta_count\":83,\"packet_delta_count\":1,\"protocol_identifier\":6,\"source_ipv4_address\":\"192.168.1.226\",\"source_transport_port\":57936,\"tcp_control_bits\":24,\"type\":\"netflow_flow\"},\"network\":{\"bytes\":83,\"community_id\":\"1:21jePJZ+BagWDCl5Gcgjfvs8UME=\",\"direction\":\"unknown\",\"iana_number\":6,\"packets\":1,\"transport\":\"tcp\"},\"observer\":{\"ip\":\"192.168.1.1\"},\"related\":{\"ip\":[\"13.109.185.170\",\"192.168.1.226\"]},\"service\":{\"type\":\"netflow\"},\"source\":{\"bytes\":83,\"ip\":\"192.168.1.226\",\"locality\":\"internal\",\"packets\":1,\"port\":57936},\"tags\":[\"forwarded\"]}, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {\"type\":\"document_parsing_exception\",\"reason\":\"[1:191] failed to parse field [destination.ip] of type [ip] in document with id 'IcL_j44BMPHi30hENqqB'. Preview of field's value: '13'\",\"caused_by\":{\"type\":\"illegal_argument_exception\",\"reason\":\"'13' is not an IP string literal.\"}}, dropping event!","service.name":"filebeat","ecs.version":"1.6.0"}

@andrewkroh
Copy link
Member

It does appear to be related to the removal of the event normalization. Until we release a fix please use 8.12. Options for fixing would be

  • Revert that change
  • try registering an encoder for net.IP (I don't think will work for *net.IP)
  • Don't put net.IP in events

I wanted to understand how we missed this in all of our tests. Here are few areas where I think we can improve:

  1. Our Fleet integration system tests should have been failing. But there are no checks for _ignored fields during the netflow integration system tests. With that we would observe test failures for the integration. Fail tests if documents contained _ignored fields elastic-package#1276

    This is being worked now from what I see so that gap should be closed soon.
GET logs-netflow*/_search
{
  "query": {
    "exists": {
      "field": "_ignored"
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".ds-logs-netflow.log-ep-2024.04.02-000001",
        "_id": "aoHsoI4B_m33RHDxiqbG",
        "_score": 1,
        "_ignored": [
          "netflow.source_ipv4_address",
          "related.ip",
          "netflow.destination_ipv4_address",
          "source.ip",
          "destination.ip"
        ],
        "_source": {
          "agent": {
            "name": "docker-fleet-agent",
            "id": "bc14f86e-0545-4ef6-aa88-976e6585c46f",
            "type": "filebeat",
            "ephemeral_id": "7d0d2a51-3830-4d1c-832f-e7a840d9ad02",
            "version": "8.13.0"
          },
          "destination": {
            "port": 80,
            "ip": [
              159,
              65,
              125,
              168
            ],
            "locality": "external"
          },
  1. Event serialization used in units tests differs from the serialization used by the outputs. The outputs use https://github.com/elastic/go-structform while tests are using encoding/json. They behave differently so the golden tests pass, but when you route to Elasticsearch the data is broken. We should change the tests to use the same encoding as the output.

  2. The "Cannot index event" log message is not helpful in troubleshooting indexing failure because it does not contain the actual JSON data that was sent to Elasticsearch. It contains Go string representation. It should be noted that logs from the debugPrintProcessor do contain the event encoded using the same encoder as the outputs.

@andrewkroh andrewkroh changed the title Netflow get rejected or incorrectly parsed after breaking change 37901 [Filebeat] Netflow data indexing failures after upgrading to 8.13 Apr 2, 2024
@andrewkroh andrewkroh added the bug label Apr 2, 2024
@111andre111
Copy link
Contributor

111andre111 commented Apr 3, 2024

At least for 3 I kind of slightly disagree as the whole doc can be easily extracted with following steps:

  1. First replaced all single quotes with underscores in doc which can be done either in a text editor (using underscore for rendering reasons, just don't use double quotes)
  2. save the whole message resulting from step 1 into a variable STRING='<message-transformed-step1'
  3. Then output the whole result { meta=$(echo $STRING | sed 's/.*Meta:/"/g' | sed 's/, Fields:.*/"/g') ; fields=$(echo $STRING | sed 's/.*Fields:/"/g' | sed 's/, Private:interface.*/"/g'); error=$(echo $STRING | sed 's/.*status=400): /"/g' | sed 's/, dropping event.*/"/g' ); echo "bulk:" ; echo -n "POST /_bulk?pipeline="; echo $meta | jq -r | jq -r .pipeline; echo '{ "index" : { "_index" : "test" } }' ; echo "$fields" | jq -r; echo "" ; echo "" ; echo "doc:" ; echo "$fields" | jq -r | jq; echo "error:" ; echo $error | jq -r | jq }

Output format is kind of

bulk:
POST /_bulk?pipeline=...
...
...

doc:
...
<the whole document to be ingested again>

error:
...
<the exact error rendering reason>
...

I used sed and jq.

@andrewkroh
Copy link
Member

andrewkroh commented Apr 3, 2024

At least for 3 I kind of slightly disagree as the whole doc can be easily extracted with following steps:

@111andre111 Extracting the information is not the problem. The problem is that the information is not sufficient in some cases to debug the reason the message failed to index in ES. Yes, you can extract a document from that message with some CLI magic, but it is not the same content that was sent to Elasticsearch.

Take #38703 (comment) as an example, it contains:

"destination":{"ip":"13.109.185.170..."

but if you were to observe the JSON that was sent to Elasticsearch in the _bulk then you would have saw the problem that was

{"destination":{"ip":[13,109,185,170]...

@111andre111
Copy link
Contributor

Ah, I see, you mean the array kind of things. Got you. That makes sense. Thanks for pointing that out.

@etigervaise
Copy link
Author

Hello,

I'm trying to understand how to fix this thing, and i dont know where to start.

I've setup a basic developper environment and trying to understand how to register an encoder for net.ip, since it seems to be the most promising solution for me.

Does it have to be a new codec, which i will develop into this file. and then add into the enc.go at the given line ?

Is there any reference i should look at before starting?

Thanks

@norrietaylor
Copy link
Member

Closing this for now. @pkoutsovasilis will create a follow-up issue for further automated testing steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Security-Deployment and Devices Deployment and Devices Team in Security Solution
Projects
None yet
Development

No branches or pull requests

7 participants