Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Improvements to core Salt doc topics requested by VMware customers #64473

Open
3 tasks
barbaricyawps opened this issue Jun 12, 2023 · 4 comments
Open
3 tasks
Labels
Documentation Relates to Salt documentation severity-high 2nd top severity, seen by most users, causes major problems time-estimate-sprint VMware

Comments

@barbaricyawps
Copy link
Contributor

The VMware customer support team requested some improvements to core Salt documentation that is causing a lot of confusion with VMware customers. They wanted a Salt engineer to review a few of these docs to let us know which elements are out of date, contain missing information, or that are wrong:

  • Troubleshooting the Salt Master
    • We just need a quick pass on this to validate that the content is still accurate.
    • To give an example of something that we think is maybe outdated or no longer necessary, the section about too many files is still a concern but not as important as a concern as it used to be.
  • High availability features in Salt
    • This could be good for a general review.
    • One thing is that we probably just need to add that minions actually send events back to all masters now. This is relatively new functionality and it needs to be called out specifically since it appears to be undocumented.
  • Using Salt at scale
    • This also just needs a general review for validation.
    • For example, there are recommended configuration options related to SOC that might be irrelevant.

After a Salt engineer did a review, he made these recommendations:

  • For Troubleshooting The Salt Master I think the entire "Too Many Files" section can get axed. I also think the section below it "Master Stops Responding" can get axed. Especially if other team members agree they can go away.
  • The high availability features of salt section is okay for now. As far as events getting sent to both masters, That might be a bug, not a feature or if it is a feature it's one I'm not aware of. The rest of that doc looks accurate to me.
  • Scaling doc looks okay but I'm sure it can be improved. I think we'll have some more options on both those in a future release.
@barbaricyawps barbaricyawps added Documentation Relates to Salt documentation needs-triage labels Jun 12, 2023
@barbaricyawps barbaricyawps self-assigned this Jun 12, 2023
@barbaricyawps barbaricyawps added severity-high 2nd top severity, seen by most users, causes major problems time-estimate-sprint VMware and removed needs-triage labels Jun 12, 2023
@OrangeDog
Copy link
Contributor

Just having a quick look myself and there are immediate problems. For example, Using Salt at scale says:

If many pillars are encrypted using [the] gpg renderer, it is possible to cache GPG data. To do this, see the set of master configuration options which are prefixed with gpg_cache.

Yet there are none listed at Configuring the Salt Master.

@barbaricyawps
Copy link
Contributor Author

Thanks for the great feedback, @OrangeDog . If you go through those docs and notice anything else, please let me know. I plan to actively work on this fairly soon.

@OrangeDog
Copy link
Contributor

OrangeDog commented Jun 14, 2023

It is actually covered in the example config. Though note it's largely a copy-paste of the pillar_cache docs so has a bunch of errors like claiming "a fresh pillar is recompiled and stored". That example is also not up-to-date with other fixes in the main docs, e.g. it says "Same defaults are provided" instead of "Sane defaults are provided". When it's documented correctly it should be clear what the cache is keyed on (I assume it's the cyphertext, so the same secret only needs to be decrypted once if it's used in multiple minions' pillars, and changing the secret doesn't have any stale data issues).

And speaking of pillar_cache, the security warnings seem to be written by someone lacking understanding:

Secondly, and perhaps more importantly, this means that unencrypted pillars will be accessible to any process which can examine the memory of the salt-master! This may represent a substantial security risk.

That is true even if a cache is not being used. Something with access to Salt's memory can read any and all unencrypted data that Salt is currently using - including all pillars and all private keys. This documentation seems to be trying to present the memory cache as fundamentally less secure than the file cache, when in reality the file cache will be less secure in most models.

Further, as the cyphertext and the decryption key are both going to be on the same disk in most cases, neither caching method makes much difference at all to overall security. Similarly, for remote pillar data (e.g. sdb, vault), the credentials needed to read the secrets are already on the disk. Anyone who can read the master's disk or memory can exfiltrate all pillar data, without needing a cache to be used. It only makes it slightly easier.

@barbaricyawps
Copy link
Contributor Author

As always, this is great feedback, @OrangeDog . Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Relates to Salt documentation severity-high 2nd top severity, seen by most users, causes major problems time-estimate-sprint VMware
Projects
None yet
Development

No branches or pull requests

2 participants