Kubernetes integration #260

lukesteensen · 2019-04-08T16:30:20Z

Description

We should have first-class support for using Vector in infrastructures running k8s. This will involve a combination of good documentation and potentially a few k8s-specific parsers/transforms/sources/etc. This should include both ingesting and processing data from applications running in k8s, as well as best practices for running Vector itself in k8s.

Prior Art

Behavior

Requierments

An image of vector is available in some repository.
vector.yaml file is served on some url. Let's call it yaml_url.
kubectl is installed.

Installation/Running

To install/run Vector with default configuration run:

kubectl apply -f yaml_url

Configuration

To configure Vector, download/copy-paste vector.yaml file, with for example:

wget yaml_url

Edit toml part of vector.yaml to configure Vector.
Let the path to edited vector.yaml be yaml_path, then run:

kubectl apply -f yaml_path

which will install/run Vector with edited configuration.

Reconfiguration

Edit toml part of vector.yaml. Run:

kubectl apply -f yaml_path

`vector.yaml`

All Kubernetes and Vector configurations are in this one file. Where vector.toml, that is usualy a separate file, is now embeded, and clearly documented, into vector.yaml.

Benefits of only yaml file:

Having one configuration file makes UX better in at least above mentioned cases.
Vector configuration can pull configurations from Kubernetes part of file.
Easy to create/share configurations. This would allow supporting a lot of common use cases out of the box with minimal effort. For example: aggregating all logs and exposing them on one public http endpoint. This will also empower users in the same way.

`kubernetes source`

The kubernetes source ingests log data from local Kubernetes node and outputs log events.

[sources.my_source_id]
 # REQUIRED - General
  type = "kubernetes" # must be: "kubernetes"

  (EDIT: out of scope)
  # Collect logs from kubernetes node components: kubelet, container runtime, kube-proxy. 
  # And also from master components, if kubernetes is configured to run user containers on master  machine. 
 log_system = false # by default false 

  # OPTIONAL
  
  (EDIT: covered by #1059)
  # Collect logs from these pods.
  named = ["pod_name"]

  (EDIT: more info in todo section)
  # And collect logs from pods with all of these requirements.
  # Requirements are defined as in https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors 
  # Example: ["environment = production","tier notin (frontend, backend)"]
  match = ["kubernetes_requirement"]

If named and match are empty, kubernetes source will collect logs from all applicable pods, except from itself.

Implementation

Kubernetes has CRI (Container Runtime Interface) which all container runtimes for Kubernetes should implement. Docker implements it fully, while OCI, rkt, Frakti, Containerd, and Singularity, are an active work in progress.

CRI defines how and where log files are to be stored. kubernetes source can read those files to get logs from all containers on it's node. This can be done with file source, which has already been demonstrated to work by @LucioFranco.

Kubernetes documentation defines where Kubernetes node components keep ther logs. This is also collectable with file source and journald source.

Applicable pods, that is pods on which this implementation is capable of collecting logs, are those that have configured logging to a file. Docker has this as default, and Kubernetes highly recommends it as it also uses those logs for it's own features. Therefor if this implementation doesn't have access to some logs, then neither does Kubernetes. And as Kubernetes assume that logging is then done in some other way by the user, this implementation assumes the same.

Communication between vector nodes can be done with vector source/sink pair.

Enrichment

Besides:

message
timestamp
stream
pod_uid
container_name
~~instance_number~~
~~labels~~ (Edit: will be part of later enrichment issue)

which are almost freely available, other information could be pulled over Kubernetes API to enrich the Event. But I would delay this for now as it can be added later. My main reason for this is that I expect testing this properly will take most of the time, and adding/testing things after that will be much easier as base of it is already added/tested.

Topologies

There are two base topologies that are to be supported from start by having a dedicated vector.yaml file.

Distributed

Matches Distributed topology in Vector Docs.
vector.yaml file for this topology would have:

One DaemonSet with template of Vector agent.
Toml configuration inside of it would have preadded default kubernetes source configuration.

This configuration is a base for almost all other configurations/deployments.

Centralized (EDIT: delayed for now)

Matches Centralized topology in Vector Docs.
This is an upgrade on Distributed topology with Vector also being on down-stream end of things. As such vector.yaml for this is based on Distributed version with additions:

Toml configuration of Vector agent would have preadded vector sink configuration.
One Deployment with template of Vector master.
Toml configuration of Vector master would have preadded vector source configuration.
Necessary parts in Vector agent and Vector master templates to facilitate networking between them.

This configuration is a base for all configurations/deployments with Centralized topology.

Alternatives

kubernetes source implementation could spinup Vector sources dedicated for each present container runtime, and aggregate logs from them. This could also be a fallback for original implementation.
kubernetes source implementation could have only one master agent which would collect logs from pods over Kubernetes API logs command.

`logging operator`

This specification is compatible with idea of logging operator. So if it was ever to be implemented, it can be build upon this specification.

This section should describe:

How Vector will be installed (ex: daemonset)
How Vector will ingest data and the structure of the event leaving the source.
How Vector will enrich the event with k8s metadata.
How users would go about filtering specific services via labels and selectors.

Requirements

Users should be able to install Vector with a single kubectl [apply|create] command.
Vector should be able to collect collect logs for all services by default.
Users should be able to filter the logs collected via labels and selectors (ex: app=nginx).
Each event should be enriched with k8s metadata.

Todo

match filtering. Requires New kubernetes_pod_metadata transform #1072. But where to put it? In that transform?
centralized deployment

The text was updated successfully, but these errors were encountered:

LucioFranco · 2019-04-08T16:37:22Z

@lukesteensen as for this, do we think leveldb and rdkafka support is needed for this? I imagine we can produce a very small container that can run as a side car in kube without those two.

lukesteensen · 2019-04-08T21:55:42Z

Our release binary is 25MB with those two and 19MB without them, so I don't think there's any good reason to exclude them. If anything it'd be confusing to have a k8s build that supported fewer features.

derekperkins · 2019-07-03T18:59:43Z

I would expect it to work the same way that fluentbit does in terms of log enrichment via the k8s api.
https://docs.fluentbit.io/manual/installation/kubernetes/

tlvenn · 2019-08-19T01:34:35Z

Created this issue as well: #768 , I believe having an operator would be the way to go.

binarylogic · 2019-08-27T13:17:56Z

@ktff I've assigned this issue to you. As a first step, I'd like to finish the spec. Could you fill in the "Behavior" and "Requirements" sections above? Feel free to expand out as much as you'd like, whatever you need to describe how this will work.

Note: the first version of this can be simple, it does not need to include every feature. We are big fans of shipping in small incremental changes. Ex: maybe it makes sense to separate out the metadata enrich as a follow up PR.

ktff · 2019-09-01T16:58:38Z

I have filled in the Behavior section.

sciyoshi · 2019-09-01T18:18:38Z

Prior art also includes Filebeat, which has a processor that adds K8s metadata: https://www.elastic.co/guide/en/beats/filebeat/master/add-kubernetes-metadata.html

LucioFranco · 2019-09-03T19:26:07Z

@ktff thank you for writing this up!

I think this approach sounds generally pretty good!

vector.yaml

All Kubernetes and Vector configurations are in this one file. Where vector.toml, that is usualy a separate file, is now embeded, and clearly documented, into vector.yaml.

Do you have an example of what this would look like? This kinda sounds a bit messy and something we may not want to do. Even for IDE's this will make the formatting harder.

Easy to create/share configurations. This would allow supporting a lot of common use cases out of the box with minimal effort.

I think actually embedding the toml within the yaml will make it less sharable since many users will share their configs as direct toml files, not as yaml.

The other option is to provide some packing tool that will generate a daemon set yaml with the provided toml embedded within it.

This also leads me to think that we should 100% provide a way to load a config via http and/or grpc. This would even allow in a centralized setup to only need one config since the master/primary can then supply a subset of that config to the agents. This also would allow us to uncouple the deployment of vector with its config. Aka introduces a kinda control layer. I will defer on this for now but its something we should think of as we introduce more complex setups like k8.

 # Collect logs from kubernetes node components: kubelet, container runtime, kube-proxy. 
 # And also from master components, if kubernetes is configured to run user containers on master machine. 
 log_system = false # by default false

Does it make sense for the inital version to just support the container runtime api and defer this extra collecting to either a transform or a second version of the kube api? These seems somewhat out of scope.

  # Collect logs from these pods.
 named = ["pod_name"]

Do we want to think about possibly supporting the kube selector api? I'm not sure how much work this would be but it could add a lot of value.

which are almost freely available, other information could be pulled over Kubernetes API to enrich the Event.

This is 👍 I think we would probably want to do this as a separate component anyways.

Topologies

There are two base topologies that are to be supported from start by having a dedicated vector.yaml file.

As for the topologies, we should 100% start with the decentralized version. I think there are still many questions about how we will do the centralized version. Like do we support service discovery via the k8 api? etc

Overall, I think this approach is good! We should also think about supporting pulling the logs via file and supporting pulling logs via the docker.sock.

ktff · 2019-09-04T12:05:13Z

@LucioFranco thank you for the detailed feedback.

Do you have an example of what this would look like? This kinda sounds a bit messy and something we may not want to do. Even for IDE's this will make the formatting harder.

Here is an example of how it looks:

# Vector master
apiVersion: apps/v1
kind: Deployment
metadata:
  name: master_vector
  namespace: default
spec:
  selector:
    matchLabels:
      name: master_vector
  template:
    metadata:
      labels:
        name: master_vector
    spec:
      containers:
      - name: vector
        image: timberio/vector
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        env:
        - name: CONFIG
          value: |
                # VECTOR.TOML

                # Set global options
                data_dir = "/var/lib/vector"

                [sources.agents]
                  type = "vector" 
                  address = "0.0.0.0:$(MASTER_VECTOR_SERVICE_PORT)"

                  shutdown_timeout_secs = 30 # default, seconds
   

        # This line is not in VECTOR.TOML  
---
# Vector master service
apiVersion: v1
kind: Service
metadata:
  name: vector-service
spec:
  selector:
    name: master_vector
  ports:
    - protocol: TCP
      port: 9000
---
# Vector agent
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vector
  namespace: default
spec:
  selector:
    matchLabels:
      name: vector
  template:
    metadata:
      labels:
        name: vector
    spec:
      containers:
      - name: vector
        image: timberio/vector
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        env:
        - name: CONFIG
          value: |
                # VECTOR.TOML

                # Set global options
                data_dir = "/var/lib/vector"

                # Ingest logs from Kubernetes
                [sources.kubernetes_logs]
                  type         = "kubernetes"
                  log_system = false
        
                  match = ["environment = production","tier notin (frontend, backend)"]   

                [sinks.my_sink_id]
                  # REQUIRED - General
                  type = "vector" # must be: "vector"
                  inputs = ["kubernetes_logs"]
                  address = "$(MASTER_VECTOR_SERVICE_HOST):$(MASTER_VECTOR_SERVICE_PORT)"

                  # OPTIONAL - General
                  healthcheck = true # default

        # This line is not in VECTOR.TOML

Obviously there are things missing, but this is only an example.

IDE's should format yaml correctly, but yes toml part probably wont have special syntax. I have tried it out in online yaml formatters, and they deal with it nicely, and they recognize toml as a string. Editing is also nice. Try it out online yaml formatter. Visual studio code has almost the same behavior.

I think actually embedding the toml within the yaml will make it less sharable since many users will share their configs as direct toml files, not as yaml.

Generally in Vector ecosystem yes. But among those using Kubernetes, I suspect yaml will be more convenient. Since various vector.toml configurations also require some coordinating configuration in Kubernetes yaml file. Examples are:

Centralized topology, requires networking configuration coordinated with vector source/sink
Http server, builded on Centralized topology with logs served on http edpoint, requires networking configuration coordinated with http sink

The other option is to provide some packing tool that will generate a daemon set yaml with the provided toml embedded within it.

The third option is to use ConfigMap feature like Fluentbit logging operator does. But in that case, the above example would require three files in total.

This also leads me to think that we should 100% provide a way to load a config via http and/or grpc. This would even allow in a centralized setup to only need one config since the master/primary can then supply a subset of that config to the agents. This also would allow us to uncouple the deployment of vector with its config. Aka introduces a kinda control layer. I will defer on this for now but its something we should think of as we introduce more complex setups like k8.

In any setup, only one yaml file is necessary. The reason is it's ability to have multiple documents in one file. The above configuration example has this.

Not all configurations can be achieved by only changing toml. For example: if any sink that serves data is added where there was none, a public IP address needs to be associated with the pod, and that requires configuration through Kubernetes which can be done through yaml.

Does it make sense for the inital version to just support the container runtime api and defer this extra collecting to either a transform or a second version of the kube api? These seems somewhat out of scope.

I agree. This seems out of scope. I am for this being a separate feature of kube api.

Do we want to think about possibly supporting the kube selector api? I'm not sure how much work this would be but it could add a lot of value.

Do you mean label selectors? If yes, they are present.

  # And collect logs from pods with all of these requirements.
  # Requirements are defined as in https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors 
  # Example: ["environment = production","tier notin (frontend, backend)"]
  match = ["kubernetes_requirement"]

As for the topologies, we should 100% start with the decentralized version. I think there are still many questions about how we will do the centralized version. Like do we support service discovery via the k8 api? etc

Yes, it is doable with Service. And fetching it's IP is just a matter of using right env var. The above configuration example has this.

We should also think about supporting pulling the logs via file and supporting pulling logs via the docker.sock.

I agree. And this is addable after, so this can be a separate issue for each container runtime source that Vector supports/will support.

binarylogic · 2019-09-04T14:20:26Z

I don't have all of the context here, but embedding TOML in the YAML is perfectly fine as a first step. I've seen this done before (ex: Elasticbeanstalk configuration). I don't think it's a blocker for the first version of this unless we have a light weight alternative.

lukesteensen · 2019-09-04T15:03:47Z

I don't have all of the context here, but embedding TOML in the YAML is perfectly fine as a first step.

Agreed. A potential next step that would be pretty simple could be a very basic "fetch config over http" feature.

LucioFranco · 2019-09-04T15:05:28Z

@ktff

Ok, I think the embedded is fine for now but it seems like we will have to build a way to load the config via env var as well? Which I think is totally fine for now!

As for the config coordination, how do you expect that a vector to vector sink might find each other? I assume in a centralized setup we would have many agents to one server/master/primary. This master would live as some sort of pod that is discoverable through the k8 service discovery api. It looks like it can inject env vars for the destination so we should be able to set that up via env var injection into the config. 👍

In any setup, only one yaml file is necessary. The reason is it's ability to have multiple documents in one file. The above configuration example has this.

👍 This actually follows k8 config guidelines so that is good.

I agree. And this is addable after, so this can be a separate issue for each container runtime source that Vector supports/will support.

Agreed, this should be pretty easy to do!

I am on board with all this, thanks for explaining!

ktff · 2019-09-05T18:00:59Z

@LucioFranco

Ok, I think the embedded is fine for now but it seems like we will have to build a way to load the config via env var as well?

Yes.

It looks like it can inject env vars for the destination so we should be able to set that up via env var injection into the config.

Yes, in the above example that is visible as $(MASTER_VECTOR_SERVICE_HOST) and $(MASTER_VECTOR_SERVICE_PORT) env var. That feature will save us from a lot of issues.

binarylogic · 2019-09-10T14:27:38Z

It sounds like we're in agreement with the above spec. Nice work @ktff! I think we're ready to proceed work unless you have any outstanding issues you'd like to discuss?

Before we dive, how do you want to break this up across pull requests? Do you want to address this in a single PR or break it up into steps?

ktff · 2019-09-10T14:50:04Z

Excellent.

There is a lot of moving parts in the specification, and around it. So going with smaller steps is the way.

I see three PRs:

Decentralized configuration, without kubernetes source optional configuration.
- This will also serve as a proof of concept.
Centralized configuration.
- Major networking questions are contained in this.
kubernetes source optional configuration.
- Proper Kubernetes API <--> Vector communication is contained in this.

LucioFranco · 2019-09-10T14:53:14Z

@ktff 1. sounds 👍 to me, 2. I think maybe we can do last, I do feel like it is one area we have not spent much time on anyways. 3. Curious what you see this containing? Is this more related to adding additional k8 metadata to events or is there something else?

ktff · 2019-09-10T15:09:49Z

@LucioFranco 3. will need to fetch additional info on pods it encounters in the log folder. More specifically, name and label-value pairs. They are needed to support kubernetes source optional configuration.

Alright, we will do 2. last. So 1. 3. 2. is the order.

LucioFranco · 2019-09-10T15:12:19Z

@ktff sounds good 👍 , do you know if this pod level info is available on disk, will it require hooking into k8's api, or is it fetchable via env var?

ktff · 2019-09-10T15:20:50Z

@LucioFranco I know that it's available on k8's api, and how to hook on it. That's the worst case scenario, but it's doable. I haven't encountered other better ways of getting them. But, I also didn't specifically searched for that. And that was enough for the specification, but I plan to address that when it's 3. order to be implemented.

ktff · 2019-12-15T19:02:23Z

A note:

There is a peculiarity around testing new Kubernetes features.
That is, since testing is conducted in Kubernetes cluster a image of vector is necessary. That image should be made from branch with the new Kubernetes features that is being tested. And the PR can be merged with that, but after it's merger a separate PR should change image pulled from custom one to timberio/vector:latest-alpine once a version with the change has been released.

alexgavrisco · 2020-03-21T17:43:18Z

Is there currently a way to use stable version of Vector within a kubernetes cluster and have at least basic info as event attributes (at least pod/container name, and ideally namespace)?
I'd love to switch from fluent* stack (or at least test it) since it gave me quite some headache lately. I don't really need advanced filtering for kubernetes logs, most of my filtering is based on log entry itself.

ktff · 2020-03-22T14:24:12Z

@Alexx-G there is. Current stable Vector contains kubernetes source, and it's alpha but the only thing that remains to be changed is field naming. It's documentation is in the works, but here is the inital guide that works.

MOZGIII · 2020-04-24T17:59:50Z

Superseded by #2222.

fix(user_trace): format expected error type in batch service

lukesteensen added type: enhancement A value-adding code change that enhances its existing functionality. UX: Docs labels Apr 8, 2019

binarylogic added Core: Docs and removed UX: Docs labels Apr 23, 2019

binarylogic pinned this issue Jul 11, 2019

binarylogic added the needs: outside help Needs help outside of the Vector core team label Aug 7, 2019

binarylogic changed the title ~~Solid kubernetes integration~~ Kubernetes integration Aug 10, 2019

binarylogic added Type: New Feature and removed Core: Docs type: enhancement A value-adding code change that enhances its existing functionality. labels Aug 10, 2019

binarylogic unpinned this issue Aug 23, 2019

binarylogic assigned ktff Aug 27, 2019

binarylogic added the needs: requirements Needs a a list of requirements before work can be begin label Aug 27, 2019

binarylogic mentioned this issue Aug 30, 2019

Initial AWS EKS support #814

Closed

ktff mentioned this issue Aug 31, 2019

feat(new source): Initial docker source implementation #787

Merged

9 tasks

binarylogic added needs: approval Needs review & approval before work can begin. and removed needs: requirements Needs a a list of requirements before work can be begin labels Sep 1, 2019

binarylogic added this to the Initial containers support milestone Sep 7, 2019

binarylogic removed needs: outside help Needs help outside of the Vector core team needs: approval Needs review & approval before work can begin. labels Sep 10, 2019

ktff mentioned this issue Sep 18, 2019

feat(new source): Inital kubernetes source implementation #893

Merged

4 tasks

binarylogic mentioned this issue Oct 18, 2019

Improve kubernetes source event metadata #1060

Closed

2 tasks

ktff mentioned this issue Dec 8, 2019

New kubernetes_system_events source #1293

Open

This was referenced Feb 24, 2020

fix: (kubernetes source) Change timestamp_key back to message_key #1887

Merged

Re-enable kubernetes tests in CI #1635

Closed

binarylogic added the have: should We should have this feature, but is not required. It is medium priority. label Mar 30, 2020

binarylogic assigned MOZGIII and unassigned ktff Apr 4, 2020

binarylogic added have: must We must have this feature, it is critical to the project's success. It is high priority. and removed have: should We should have this feature, but is not required. It is medium priority. labels Apr 4, 2020

binarylogic mentioned this issue Apr 4, 2020

chore: Kubernetes Integration RFC #2222

Merged

MOZGIII closed this as completed Apr 24, 2020

binarylogic added type: feature A value-adding code addition that introduce new functionality. and removed type: new feature labels Jun 16, 2020

binarylogic removed this from the Initial Containers Support milestone Jul 26, 2020

aholmberg pushed a commit to aholmberg/vector that referenced this issue Feb 14, 2024

Merge pull request vectordotdev#260 from answerbook/mdeltito/LOG-16417

678e68c

fix(user_trace): format expected error type in batch service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes integration #260

Kubernetes integration #260

lukesteensen commented Apr 8, 2019 •

edited by ktff

Loading

LucioFranco commented Apr 8, 2019

lukesteensen commented Apr 8, 2019

derekperkins commented Jul 3, 2019 •

edited

Loading

tlvenn commented Aug 19, 2019

binarylogic commented Aug 27, 2019

ktff commented Sep 1, 2019

sciyoshi commented Sep 1, 2019

LucioFranco commented Sep 3, 2019

`vector.yaml`

Topologies

ktff commented Sep 4, 2019 •

edited

Loading

binarylogic commented Sep 4, 2019 •

edited

Loading

lukesteensen commented Sep 4, 2019

LucioFranco commented Sep 4, 2019

ktff commented Sep 5, 2019

binarylogic commented Sep 10, 2019

ktff commented Sep 10, 2019

LucioFranco commented Sep 10, 2019

ktff commented Sep 10, 2019

LucioFranco commented Sep 10, 2019

ktff commented Sep 10, 2019

ktff commented Dec 15, 2019 •

edited

Loading

alexgavrisco commented Mar 21, 2020

ktff commented Mar 22, 2020

MOZGIII commented Apr 24, 2020

Kubernetes integration #260

Kubernetes integration #260

Comments

lukesteensen commented Apr 8, 2019 • edited by ktff Loading

Description

Prior Art

Behavior

Requierments

Installation/Running

Configuration

Reconfiguration

vector.yaml

kubernetes source

Implementation

Enrichment

Topologies

Distributed

Centralized (EDIT: delayed for now)

Alternatives

logging operator

Requirements

Todo

LucioFranco commented Apr 8, 2019

lukesteensen commented Apr 8, 2019

derekperkins commented Jul 3, 2019 • edited Loading

tlvenn commented Aug 19, 2019

binarylogic commented Aug 27, 2019

ktff commented Sep 1, 2019

sciyoshi commented Sep 1, 2019

LucioFranco commented Sep 3, 2019

vector.yaml

Topologies

ktff commented Sep 4, 2019 • edited Loading

binarylogic commented Sep 4, 2019 • edited Loading

lukesteensen commented Sep 4, 2019

LucioFranco commented Sep 4, 2019

ktff commented Sep 5, 2019

binarylogic commented Sep 10, 2019

ktff commented Sep 10, 2019

LucioFranco commented Sep 10, 2019

ktff commented Sep 10, 2019

LucioFranco commented Sep 10, 2019

ktff commented Sep 10, 2019

ktff commented Dec 15, 2019 • edited Loading

alexgavrisco commented Mar 21, 2020

ktff commented Mar 22, 2020

MOZGIII commented Apr 24, 2020

lukesteensen commented Apr 8, 2019 •

edited by ktff

Loading

`vector.yaml`

`kubernetes source`

`logging operator`

derekperkins commented Jul 3, 2019 •

edited

Loading

`vector.yaml`

ktff commented Sep 4, 2019 •

edited

Loading

binarylogic commented Sep 4, 2019 •

edited

Loading

ktff commented Dec 15, 2019 •

edited

Loading