-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes integration #260
Comments
@lukesteensen as for this, do we think |
Our release binary is 25MB with those two and 19MB without them, so I don't think there's any good reason to exclude them. If anything it'd be confusing to have a k8s build that supported fewer features. |
I would expect it to work the same way that fluentbit does in terms of log enrichment via the k8s api. |
Created this issue as well: #768 , I believe having an operator would be the way to go. |
@ktff I've assigned this issue to you. As a first step, I'd like to finish the spec. Could you fill in the "Behavior" and "Requirements" sections above? Feel free to expand out as much as you'd like, whatever you need to describe how this will work. Note: the first version of this can be simple, it does not need to include every feature. We are big fans of shipping in small incremental changes. Ex: maybe it makes sense to separate out the metadata enrich as a follow up PR. |
I have filled in the Behavior section. |
Prior art also includes Filebeat, which has a processor that adds K8s metadata: https://www.elastic.co/guide/en/beats/filebeat/master/add-kubernetes-metadata.html |
@ktff thank you for writing this up! I think this approach sounds generally pretty good!
Do you have an example of what this would look like? This kinda sounds a bit messy and something we may not want to do. Even for IDE's this will make the formatting harder.
I think actually embedding the toml within the yaml will make it less sharable since many users will share their configs as direct toml files, not as yaml. The other option is to provide some packing tool that will generate a daemon set yaml with the provided toml embedded within it. This also leads me to think that we should 100% provide a way to load a config via http and/or grpc. This would even allow in a centralized setup to only need one config since the master/primary can then supply a subset of that config to the agents. This also would allow us to uncouple the deployment of vector with its config. Aka introduces a kinda control layer. I will defer on this for now but its something we should think of as we introduce more complex setups like k8.
Does it make sense for the inital version to just support the container runtime api and defer this extra collecting to either a transform or a second version of the kube api? These seems somewhat out of scope.
Do we want to think about possibly supporting the kube selector api? I'm not sure how much work this would be but it could add a lot of value.
This is 👍 I think we would probably want to do this as a separate component anyways.
As for the topologies, we should 100% start with the decentralized version. I think there are still many questions about how we will do the centralized version. Like do we support service discovery via the k8 api? etc Overall, I think this approach is good! We should also think about supporting pulling the logs via file and supporting pulling logs via the |
@LucioFranco thank you for the detailed feedback.
Here is an example of how it looks:
Obviously there are things missing, but this is only an example. IDE's should format
Generally in Vector ecosystem yes. But among those using Kubernetes, I suspect
The third option is to use
In any setup, only one Not all configurations can be achieved by only changing
I agree. This seems out of scope. I am for this being a separate feature of kube api.
Do you mean label selectors? If yes, they are present.
Yes, it is doable with Service. And fetching it's IP is just a matter of using right env var. The above configuration example has this.
I agree. And this is addable after, so this can be a separate issue for each |
I don't have all of the context here, but embedding TOML in the YAML is perfectly fine as a first step. I've seen this done before (ex: Elasticbeanstalk configuration). I don't think it's a blocker for the first version of this unless we have a light weight alternative. |
Agreed. A potential next step that would be pretty simple could be a very basic "fetch config over http" feature. |
Ok, I think the embedded is fine for now but it seems like we will have to build a way to load the config via env var as well? Which I think is totally fine for now! As for the config coordination, how do you expect that a vector to vector sink might find each other? I assume in a centralized setup we would have many agents to one server/master/primary. This master would live as some sort of pod that is discoverable through the k8 service discovery api. It looks like it can inject env vars for the destination so we should be able to set that up via env var injection into the config. 👍
👍 This actually follows k8 config guidelines so that is good.
Agreed, this should be pretty easy to do! I am on board with all this, thanks for explaining! |
Yes.
Yes, in the above example that is visible as |
It sounds like we're in agreement with the above spec. Nice work @ktff! I think we're ready to proceed work unless you have any outstanding issues you'd like to discuss? Before we dive, how do you want to break this up across pull requests? Do you want to address this in a single PR or break it up into steps? |
Excellent. There is a lot of moving parts in the specification, and around it. So going with smaller steps is the way. I see three PRs:
|
@ktff 1. sounds 👍 to me, 2. I think maybe we can do last, I do feel like it is one area we have not spent much time on anyways. 3. Curious what you see this containing? Is this more related to adding additional k8 metadata to events or is there something else? |
@LucioFranco 3. will need to fetch additional info on pods it encounters in the log folder. More specifically, name and label-value pairs. They are needed to support Alright, we will do 2. last. So 1. 3. 2. is the order. |
@ktff sounds good 👍 , do you know if this pod level info is available on disk, will it require hooking into k8's api, or is it fetchable via env var? |
@LucioFranco I know that it's available on k8's api, and how to hook on it. That's the worst case scenario, but it's doable. I haven't encountered other better ways of getting them. But, I also didn't specifically searched for that. And that was enough for the specification, but I plan to address that when it's 3. order to be implemented. |
A note: There is a peculiarity around testing new Kubernetes features. |
Is there currently a way to use stable version of Vector within a kubernetes cluster and have at least basic info as event attributes (at least pod/container name, and ideally namespace)? |
@Alexx-G there is. Current stable Vector contains |
Superseded by #2222. |
fix(user_trace): format expected error type in batch service
Description
We should have first-class support for using Vector in infrastructures running k8s. This will involve a combination of good documentation and potentially a few k8s-specific parsers/transforms/sources/etc. This should include both ingesting and processing data from applications running in k8s, as well as best practices for running Vector itself in k8s.
Prior Art
Behavior
Requierments
vector
is available in some repository.vector.yaml
file is served on some url. Let's call ityaml_url
.kubectl
is installed.Installation/Running
To install/run Vector with default configuration run:
Configuration
To configure Vector, download/copy-paste
vector.yaml
file, with for example:Edit
toml
part ofvector.yaml
to configure Vector.Let the path to edited
vector.yaml
beyaml_path
, then run:which will install/run Vector with edited configuration.
Reconfiguration
Edit
toml
part ofvector.yaml
. Run:vector.yaml
All Kubernetes and Vector configurations are in this one file. Where
vector.toml
, that is usualy a separate file, is now embeded, and clearly documented, intovector.yaml
.Benefits of only
yaml
file:http
endpoint. This will also empower users in the same way.kubernetes source
The
kubernetes source
ingests log data from local Kubernetes node and outputs log events.If
named
andmatch
are empty,kubernetes source
will collect logs from all applicable pods, except from itself.Implementation
Kubernetes has
CRI
(Container Runtime Interface) which allcontainer runtimes
for Kubernetes should implement. Docker implements it fully, while OCI, rkt, Frakti, Containerd, and Singularity, are an active work in progress.CRI
defines how and where log files are to be stored.kubernetes source
can read those files to get logs from all containers on it's node. This can be done withfile source
, which has already been demonstrated to work by @LucioFranco.Kubernetes documentation defines where Kubernetes node components keep ther logs. This is also collectable with
file source
andjournald source
.Applicable pods, that is pods on which this implementation is capable of collecting logs, are those that have configured logging to a file. Docker has this as default, and Kubernetes highly recommends it as it also uses those logs for it's own features. Therefor if this implementation doesn't have access to some logs, then neither does Kubernetes. And as Kubernetes assume that logging is then done in some other way by the user, this implementation assumes the same.
Communication between vector nodes can be done with
vector source/sink
pair.Enrichment
Besides:
message
timestamp
stream
pod_uid
container_name
instance_number
(Edit: will be part of later enrichment issue)labels
which are almost freely available, other information could be pulled over
Kubernetes API
to enrich theEvent
. But I would delay this for now as it can be added later. My main reason for this is that I expect testing this properly will take most of the time, and adding/testing things after that will be much easier as base of it is already added/tested.Topologies
There are two base topologies that are to be supported from start by having a dedicated
vector.yaml
file.Distributed
Matches Distributed topology in Vector Docs.
vector.yaml
file for this topology would have:DaemonSet
with template of Vector agent.Toml
configuration inside of it would have preadded defaultkubernetes source
configuration.This configuration is a base for almost all other configurations/deployments.
Centralized (EDIT: delayed for now)
Matches Centralized topology in Vector Docs.
This is an upgrade on Distributed topology with Vector also being on down-stream end of things. As such
vector.yaml
for this is based on Distributed version with additions:Toml
configuration of Vector agent would have preaddedvector sink
configuration.Deployment
with template of Vector master.Toml
configuration of Vector master would have preaddedvector source
configuration.This configuration is a base for all configurations/deployments with Centralized topology.
Alternatives
kubernetes source
implementation could spinup Vector sources dedicated for each presentcontainer runtime
, and aggregate logs from them. This could also be a fallback for original implementation.kubernetes source
implementation could have only one master agent which would collect logs from pods overKubernetes API
logs
command.logging operator
This specification is compatible with idea of
logging operator
. So if it was ever to be implemented, it can be build upon this specification.This section should describe:
Requirements
kubectl [apply|create]
command.app=nginx
).Todo
match
filtering. Requires Newkubernetes_pod_metadata
transform #1072. But where to put it? In that transform?The text was updated successfully, but these errors were encountered: