Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Indexing permissions as part of the Elastic Agent policy #101

Closed
ruflin opened this issue Feb 23, 2021 · 25 comments
Closed

[Discuss] Indexing permissions as part of the Elastic Agent policy #101

ruflin opened this issue Feb 23, 2021 · 25 comments
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@ruflin
Copy link
Contributor

ruflin commented Feb 23, 2021

Currently all Elastic Agents enrolled into the fleet-server get the same permissions. To cover all the use cases these are the most permissive permissions. This proposal is to reduce the permissions given to an Elastic Agent based on the the policy. With this, each Elastic Agent would receive an API Key with the minimal permissions needed to get its job done.

Elastic Agent Policy contains permissions block

The fleet-server creates the API Keys for the Elastic Agents. Which permissions an Elastic Agent requires is based on the content of the policy. Because of this the policy should contain a section with the permissions it requires. This section could look similar to the following (inspired by Elasticsearch API Key permissions):

policy: A
permissions: [
  {
    "names": ["logs-*", "metrics-*"],
    "privileges": ["create_doc"]
  }
]

The above would give create_doc permissions for the logs-* and metrics-* data streams. If an Elastic Agent is enrolled for the policy A, an Elasticsearch output API Key would be created with only the above permissions and added to the policy.

The above model also works for more complex cases. Lets assume we have a case where only 2 indices should be allowed to be written to and one index should have read permissions. This could look as following:

policy: B
permissions: [
  {
    "names": ["logs-nginx.access-default", "logs-nginx.error-default"],
    "privileges": ["create_doc"]
  },
  {
    "names": ["state-docs"],
    "privileges": ["create_doc", "read"]
  }
]

If an Elastic Agent is enrolled for policy B, the permissions to write to the nginx indices is given and read from the state-docs index.

In general, this model can be used to be as permissive or restrive as needed based on the Elasticearch permission model. The limitation on the maximum permissions that can be given to an Elastic Agent is the permissions the fleet-server user itself has.

Change of policy

A change to a policy can mean the permission required on the Elastic Agent changes. For example at first only nginx access logs were collected but now also error logs. This means additional privileges for the error log data stream are required. The fleet-server must be able to hand out new API Keys with increased / reduced permissions in case of policy changes. In addition, old API Keys have to be invalidated.

Same permissions for all processes per Elastic Agent

The above assumes there are no sub permissions per input in an Elastic Agent. Whatever runs in an Elastic Agent, the input that requires the most permissions will defined the permissions of the API Key. If for example APM Integration is run together with the Endpoint integration and APM process requires read access to certain indices, the Endpoint process would also get the same permissions. This simplifies the permissions model.

On the policy creation side, it is important to notify the user about potential issues through concept likes Trusted / Untrusted integrations or similar, but this is not part of the privileges concept itself.

Fleet

The permissions which are part of the policy need to be created somewhere. It is expected that these are created in Fleet. Every integration should have the option to specify which permissions it needs. Based on this information and the user input like namespace, Fleet needs to generate the permission block. The UX and parts needed in Fleet should be discussed separately.

@scunningham
Copy link

To the extent that can be achieved, it would desirable for the fleet server to limit the amount of interpretation it has to do. Ideally, the permissions required would be listed explicitly in a section of the policy, or if necessary, multiple sections that the fleet server has to combine. It's not yet clear where in the system these explicit permission will be enumerated and what support is necessary to do so.

Also, there will be some race conditions to deal with here. Ideally, the fleet server would not revoke the previous ingest key until the policy was ACK'd by the agent. This would largely prevent the scenario where the key was delayed in rolling out, or we lost comms to the agent immediately after the old key was revoked, but before the new policy rolled out. That said, there will be some stateful book-keeping in the fleet server to accomplish this.

@scunningham
Copy link

Also, Fleet-Server should have the permission to create these data streams. Basically, for any explicitly enumerated data stream that does not have the auto_configure, Fleet-Server will need to attempt to create the data stream (and no-op if already there). This is to avoid having to provide auto_configure privileges to untrusted endpoints, which is what we have right now.

@ph ph added the Team:Fleet Label for the Fleet team label Feb 23, 2021
@ph
Copy link
Contributor

ph commented Feb 23, 2021

The data stream fields are already present in each input or default values already exists, if we add a "permission" key under the "input", the stream, or both the input can be traversed and an aggregated set of permissions be generated.

@ruflin Where the required permissions would be defined, would it be the package responsable of defining the permissions. The system integration should only be able to define permissions for logs-system.*- and `metrics-system.-,

@ph ph added the Team:Elastic-Agent Label for the Agent team label Feb 23, 2021
@ph
Copy link
Contributor

ph commented Feb 23, 2021

@andresrc FYI, I am adding this item to both Agent and Fleet board, once the design we surely have 2 (or more issues)

@ph
Copy link
Contributor

ph commented Feb 23, 2021

This is the design issue to fix the APM-Server different permissions

With the above proposal the Elastic Agent would still have a single api key to talk with Elasticsearch, that API key will have the correctly defined permission defined on it based on the configuration. Based on this dicussion I will close elastic/kibana#91704

@axw
Copy link
Member

axw commented Feb 24, 2021

Thanks for filing this @ruflin. This sounds similar to what I've had in mind, like an app store type approach: install an app (integration), and the store (Fleet/Integrations UI) prompts you to grant additional privileges. I guess we can skip this until untrusted integrations need the feature.

In an email thread @simitt raised the principle of least privilege. I tend to agree that in an ideal world we would restrict each integration's process from obtaining another integration's privileges that it doesn't need access to; but there's no practical issue with this for APM's foreseeable future.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 24, 2021

@scunningham My assumption was that Fleet (Kibana) would be the tool that sets up all the data streams if need as setup of the stack already happens there. fleet-server just takes the permissions and creates API keys based on it.

@ph I assume the definition of the permissions needed should happen where the data streams are defined. So this would be in the manifest of the data_stream directory for example. But we should have a separate discussion around this in package-spect repo.

@scunningham
Copy link

@ruflin I would prefer if Kibana creates the data streams as well, assuming the kibana_system user has permissions to do that.

@scunningham
Copy link

I should clarify that to the extent we can, we should not be giving wild carded privileges to the endpoints. Even an append-only logs-* privilege will allow an attacker to scribble over any log data stream in the system. We should try to limit permissions as specifically to the target data streams as possible.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 24, 2021

I don't think the kibana_system user has and should have the permission. But the user that creates the policy should have sufficient permissions, be it in the UI or API call. The creation of the data_streams happens as soon as the user hits save for a policy.

With the above model, we delegate the responsibility about how many privileges an integration needs to the creator of the integration. The model would still allow for someone to use logs-*-* if really needed, fleet-server would just create the API keys needed. Instead I think we should have additional features in Fleet where certain flags can be set on policies. So only if a user sets trusted flag on a policy for example, the permissions in the policy allow to have * or dynamic fields are allowed. Otherwise Fleet would overwrite certain flags from the integrations and hopefully tell the user that not all the permissions are ok for this policy.

@scunningham
Copy link

Yep, that sounds right. Customer should not be able to implicitly create data streams by adding an integration to a policy without having the privileges to do so. For the exceptional cases, and there will be some, the user adding the integration should have those privileges as well. Exceptional cases should be called out in the UX by marking a policy as untrusted or some other obvious treatment.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 25, 2021

Do we forsee any other permissions which might be given out to Agents beside read / write index permissions? The reason I'm asking is because if we only need index permissions, we could just take the ES format that exists today and add it to the policy. So if new permissions are introduced, only Kibana would need to be updated but fleet-server would just forward the request to ES and not know about the details. One additional permission I have in mind is around pulling artifacts. Not directly defined in API key itself but fleet-server should know if agents are allowed to pull artifacts or not.

@ruflin
Copy link
Contributor Author

ruflin commented Feb 25, 2021

Thinking out loud about a potential implementation: We add this feature to fleet-server and Fleet at first, just keep shipping down what we have defined today as defaults. This makes it possible that we can start to iterate on this feature in Kibana and don't require changes to fleet-server.

@ruflin ruflin mentioned this issue Mar 8, 2021
20 tasks
@ph ph unassigned ruflin Mar 8, 2021
@ph ph added Team:Elastic-Agent Label for the Agent team and removed Team:Elastic-Agent Label for the Agent team Team:Fleet Label for the Fleet team labels Mar 8, 2021
@ph
Copy link
Contributor

ph commented Mar 8, 2021

@ruflin looking at #101 (comment) I think @aleksmaus could take that action item?

@ruflin
Copy link
Contributor Author

ruflin commented Mar 9, 2021

I filed elastic/kibana#94058 on the Kibana side as a follow up.

There was an offline discussion around how we detect changes to the permissions to not have to give out new API keys all the time. The idea here was to potentially use a hash that is persisted and used for comparison. @nchaulet This might also solve elastic/kibana#85777 ?

@ph
Copy link
Contributor

ph commented Mar 9, 2021

@ruflin Concerning the hash, this is something that fleet-server would compute? We have code in place in the Elastic Agent that can be reused to generate a predictable hash for a specific tree-structure.

@ruflin
Copy link
Contributor Author

ruflin commented Mar 10, 2021

The hash could be calculated by Fleet or fleet-server. I was thinking of fleet-server as it needs to read and store the config anyway. Someone should think through the pros / cons here.

@scunningham
Copy link

Fleet Server should normalize the input and calculate a sha2 on the normalized input. This is the natural place for the calculation is the Fleet Server will be making the decision whether or not it needs to regenerate an output API key based on this input. Sha2 is pretty brute force; a more subtle method could be employed later; however I think it is sufficient to handle the 99% use case here.

By normalized, I mean pull in the json, pretty print it out, and hash that. We want to avoid the situation where the upstream processor (kibana) generates random key order breaking the hash on each policy update.

@ph
Copy link
Contributor

ph commented Mar 10, 2021

This is what I had in mind, we have something that does the normalization in elastic-agent to decide if the configuration changed so we could use the same logic there.

@ruflin
Copy link
Contributor Author

ruflin commented Mar 17, 2021

One issue to think through is how and when we revoke the old api keys.

@aleksmaus
Copy link
Member

aleksmaus commented Mar 25, 2021

For the very first cut PR that works against elastic/kibana#94591

not invalidating the api keys

In the next iteration we could:
keep and array of old keys id (maybe with ttl, possibly with coordinar/revision idxs) within the agent document. revoke the keys as soon as we get the policy coordinator/revision past the values that are kept or expire them within a reasonable time (1 week)?

@ruflin
Copy link
Contributor Author

ruflin commented Mar 25, 2021

If we get the ACK from all Agents for a policy, does it mean already all sub processes are shipping data now with the new API Keys or can be there a certain delay, meaning some of the events are still send with the old API Keys? TTL of 1 week sounds pretty long, assuming all Agents have acked, we could probably choose something shorter?

@aleksmaus
Copy link
Member

aleksmaus commented Mar 25, 2021

Just had a conversation with @blakerouse over Slack. As far as I understood the beats are not restarted, and will continue using the the old key.
Here is the issue about that:
elastic/beats#24538

So it's not clear at the moment when it it safe to invalidate the key.

It would be nice to be able to update the output key without restarting the beats though.

@ruflin
Copy link
Contributor Author

ruflin commented Mar 26, 2021

@urso Would it be possible to exchange API Keys in the output without a restart of the Beat?

@ruflin
Copy link
Contributor Author

ruflin commented Mar 31, 2021

Closing as #187 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

5 participants