Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce data streams #69

Open
jalvz opened this issue Oct 22, 2020 · 9 comments
Open

Enforce data streams #69

jalvz opened this issue Oct 22, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@jalvz
Copy link
Contributor

jalvz commented Oct 22, 2020

As of now, data streams and inputs can always be enabled/disabled via a Kibana toggle switch.

This makes sense for existing integrations, but not so much for APM. Eg., APM records and ingests traces, so if you disable a traces data stream then APM wouldn't work.

We need a way in the spec to define that a data stream is always enabled, so that Kibana doesn't even show a toggle for it.

I suggest a simple boolen attribute force_enabled in the data stream manifest.yml and default to false.

@jalvz jalvz added the enhancement New feature or request label Oct 22, 2020
@jalvz
Copy link
Contributor Author

jalvz commented Oct 22, 2020

@ruflin does this looks reasonable and doable?

@ruflin
Copy link
Contributor

ruflin commented Oct 22, 2020

SGTM. Will this specific data stream have any configs or also configs should be skipped?

@jalvz
Copy link
Contributor Author

jalvz commented Oct 22, 2020

So I imagine something like this in manifest.yml

policy_templates:
  - name: apm
    title: Elastic APM Integration
    inputs:
      - type: traces
        title: Collect application traces
        force_enabled: true
        ...

force_enabled would instruct Kibana to not render the toggle, and treat it internally as enabled.

Other than that, I don't think there is anything else required to make this work for us.

@jalvz
Copy link
Contributor Author

jalvz commented Oct 22, 2020

Wait, hold on. There are still many things I don't understand... :(
It just occurred to me that if there are no inputs, Kibana will not install any templates or anything. Fields are defined per input, not per data stream. Is that correct ?

If so, we must treat apm-server as an input, so that Kibana actually installs the APM templates (meaning: install the APM Server "input" templates).
If my assumptions are right, we will have the same requirement on inputs as well, that is, the ability to enforce them. It wouldn't make any sense that a user disables the apm-server input when installing the APM integration...


A better alternative for us would be if we can bypass the "input" concept completely. Define all the assets at the top level, and internally generate a default/fake/placeholder input holding all the top level configuration... But I guess this is easier said than done.

@ruflin
Copy link
Contributor

ruflin commented Oct 26, 2020

Inputs and data streams are not directly attached to each other. You can install a package without ever setting up a policy for it. In this case, only the data_stream assets are installed. What we need to validate is on what happens, if a data_stream does not contain any input in the UI. In the best case, it should just skip it but so far we did not have this example.

In the APM case I agree it makes most sense to probably specify it all directly in the package manifest. This should already be possible.

@jalvz
Copy link
Contributor Author

jalvz commented Oct 26, 2020

Inputs and data streams are not directly attached to each other.

How come? If I am reading the spec right, inputs are properties of streams:

streams:
description: Streams offered by data stream.
type: array
items:
type: object
additionalProperties: false
properties:
input:

I know it is possible to define an input in the top level manifest file (here), but that input defines a type attribute that must be linked in some stream's input...

What we need to validate is on what happens, if a data_stream does not contain any input in the UI. In the best case, it should just skip

If a data stream does not contain any input (or all its inputs are disabled), the vars it defines are not propagated (because there is no input where to copy those settings to) and are simply ignored. So yes, it just skips, but it is not what we need.

So, What I tried to ask above is: if there are no inputs for a data stream, will Kibana still install the templates/assets defined for that data stream? My assumption is no.

So I think that, in addition to the enhancement request here (enforce data streams) we have 2 other needs:

  • Force Kibana to install any stream assets, even if it has no inputs.
  • Make sure that the generated policy includes any stream vars, even if it has no inputs.

Do you agree with this? If not, what am I missing?

@ruflin
Copy link
Contributor

ruflin commented Oct 26, 2020

If there is no input for a data stream, I expect that all assets like ingest pipeline, templates etc are still installed. If not, I consider it a bug. Did you try it to leave it just out?

For the vars, these can be defined on the package level too. So my expectation is that all these are defined on the package level. Taking nginx as example, the key streams would be completely missing in the case of apm-server on the data stream level: https://github.com/elastic/package-storage/blob/production/packages/nginx/0.2.4/data_stream/access/manifest.yml#L4 All the vars only show up here in the policy_templates: https://github.com/elastic/package-storage/blob/production/packages/nginx/0.2.4/manifest.yml#L29 This is where I expect Kibana does not support it. Even if you set all variables here, Kibana will perhaps not show it (but still install all the assets).

The last part I didn't fully get: Why do you still need the vars from the data_stream? Why not all global?

@jalvz
Copy link
Contributor Author

jalvz commented Oct 26, 2020

If there is no input for a data stream, I expect that all assets like ingest pipeline, templates etc are still installed

Ok, that answers my main question :)
Still, fields are required in data streams as per the spec, what are they for if it can install templates defined at the package lavel anyways?

For the vars, these can be defined on the package level too

Maybe I am too dense, but in that Nginx example, vars are defined under inputs in L32, not just at the policy level. I don't see anywhere in the spec the vars can be defined at the top level.

If have eg. 1 stream with 1 input, the policy generated will look like:

  - id: 7d251e90-1796-11eb-b40f-6db605b21013
    streams:
      - id: traces-apm.my_stream
        data_stream:
          dataset: apm.my_stream
          type: logs
        apm:
          top_level_var: my top level var

If the data stream has no streams key, the policy generated will look like:

inputs:
  - id: 85cc00b0-1794-11eb-b310-3fd923b16d02
    streams: []

So my top level var is ignored, I guess that is what you meant it is not supported?

Why do you still need the vars from the data_stream

Yeah sorry, I meant policy vars, still groking terminology...

@ruflin
Copy link
Contributor

ruflin commented Oct 28, 2020

Even if you have a single global config, the data_streams are important. APM will have multiple data_streams and templates, pipelines etc. to create these should still be defined in the data_streams directory, I think only the config / policy part is special.

So my top level var is ignored, I guess that is what you meant it is not supported?

Exactly. In the case of apm I assume, there should not even be a streams block.

rw-access pushed a commit to rw-access/package-spec that referenced this issue Mar 23, 2021
* Rename cluster to stack

* Sorting imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants