Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to route logs in the Azure Logs integration #92

Open
zmoog opened this issue Aug 27, 2024 · 10 comments
Open

Figure out how to route logs in the Azure Logs integration #92

zmoog opened this issue Aug 27, 2024 · 10 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@zmoog
Copy link
Owner

zmoog commented Aug 27, 2024

Situation

The Azure Logs integration allows multiple log categories to be collected from a single event hub.

At a high level, users (1) define the event hub name and settings, and (2) the integration will use the same event hub for all the integrations.

CleanShot 2024-08-28 at 23 10 05@2x

Problem

This setup is inefficient, and we plan to change it in future releases.

Solutions

You can only use the generic integration and route logs to the right data stream using the reroute processor.

image

@zmoog zmoog self-assigned this Aug 27, 2024
@zmoog zmoog added the documentation Improvements or additions to documentation label Aug 27, 2024
@zmoog zmoog added this to Notes Aug 27, 2024
@zmoog zmoog moved this to In Progress in Notes Aug 27, 2024
@zmoog
Copy link
Owner Author

zmoog commented Aug 28, 2024

With one input + routing, we can reduce the user errors metric to zero and make the fewest storage account API calls possible.

Here's the diagram to leverage routing:

image

  1. One input receives log events from the event hub
  2. The input publishes the log events to the logs-azure.eventhub-default data stream.
  3. The logs-azure.eventhub-default data stream contains a logs-azure.eventhub@custom custom pipeline with rules to route log events based on the log category.
  4. Each log event lands in the target data stream.

If the routing rules cover all incoming log categories, the logs-azure.eventhub-default data stream will be empty. However, we can set up an alarm rule to trigger a notification in case any log event doesn't have a routing rule so that we can iterate and update the logs-azure.eventhub@custom custom pipeline.

The routing option is probably the most efficient method.

Here's the source code of the logs-azure.eventhub@custom pipeline I am testing:

PUT _ingest/pipeline/logs-azure.eventhub@custom
{
  "processors": [
    {
      "json": {
        "field": "message",
        "target_field": "tmp_json"
      }
    },
    {
      "set": {
        "field": "routing_category",
        "copy_from": "tmp_json.category",
        "ignore_empty_value": true
      }
    },
    {
      "remove": {
        "field": "tmp_json",
        "ignore_missing": true
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.signinlogs"
        ],
        "if": "ctx.routing_category == \"SignInLogs\" || ctx.routing_category == \"NonInteractiveUserSignInLogs\" || ctx.routing_category == \"ServicePrincipalSignInLogs\" || ctx.routing_category == \"ManagedIdentitySignInLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.identity_protection"
        ],
        "if": "ctx.routing_category == \"RiskyUsers\" || ctx.routing_category == \"UserRiskEvents\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.provisioning"
        ],
        "if": "ctx.routing_category == \"ProvisioningLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.auditlogs"
        ],
        "if": "ctx.routing_category == \"AuditLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.activitylogs"
        ],
        "if": "ctx.routing_category == \"Administrative\" || ctx.routing_category == \"Security\" || ctx.routing_category == \"ServiceHealth\" || ctx.routing_category == \"Alert\" || ctx.routing_category == \"Recommendation\" || ctx.routing_category == \"Policy\" || ctx.routing_category == \"Autoscale\" || ctx.routing_category == \"ResourceHealth\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.graphactivitylogs"
        ],
        "if": "ctx.routing_category == \"MicrosoftGraphActivityLogs\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.firewall_logs"
        ],
        "if": "ctx.routing_category == \"AzureFirewallApplicationRule\" || ctx.routing_category == \"AzureFirewallNetworkRule\" || ctx.routing_category == \"AzureFirewallDnsProxy\" || ctx.routing_category == \"AZFWApplicationRule\" || ctx.routing_category == \"AZFWNetworkRule\" || ctx.routing_category == \"AZFWNatRule\" || ctx.routing_category == \"AZFWDnsQuery\""
      }
    },
    {
      "reroute": {
        "dataset": [
          "azure.application_gateway"
        ],
        "if": "ctx.routing_category == \"ApplicationGatewayFirewallLog\" || ctx.routing_category == \"ApplicationGatewayAccessLog\""
      }
    }
  ]
}

@zmoog zmoog moved this from In Progress to In Review in Notes Aug 28, 2024
@nicpenning
Copy link

That seems clean to me.

FYI : Your second diagram is showing as missing.
image

@zmoog
Copy link
Owner Author

zmoog commented Aug 28, 2024

FYI : Your second diagram is showing as missing.

Ouch, I probably copied and pasted an expiring URL from GitHub. Checking!

@zmoog
Copy link
Owner Author

zmoog commented Aug 28, 2024

It should be fixed now.

@nicpenning
Copy link

How does this model work if you wanted more than 1 agent for redundancy and improved performance?

@zmoog
Copy link
Owner Author

zmoog commented Sep 9, 2024

How does this model work if you wanted more than 1 agent for redundancy and improved performance?

Good question! I should update the note to add this detail.

Here is a diagram showing how the two inputs work together to achieve improved redundancy and performance.

CleanShot 2024-09-09 at 10 57 08

Users set up diagnostic settings, sending data to an event hub (1). The two (or more) inputs start and claim an equal part of partitions. With a four-partition event hub, two inputs usually get two partitions each. Each input processes messages and sends them to the data stream in Elasticsearch.

The routing (2) happens on Elasticsearch at the data stream level, so it works with one or multiple event hubs.

@nicpenning
Copy link

This sounds great. Unfortunately the graphic won't load for me.

@nicpenning
Copy link

I can zoom in here, looks awesome!

@zmoog
Copy link
Owner Author

zmoog commented Sep 10, 2024

I can zoom in here, looks awesome!

Yeah, the GitHub images URL expires quickly. I usually reload the page and click on the image to get the whole picture. Let me know if you have difficulties in opening it.

@nicpenning
Copy link

Works great now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: In Review
Development

No branches or pull requests

2 participants