Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt Managment Inter VM Communication #1541

Closed
nrgaway opened this issue Dec 25, 2015 · 37 comments
Closed

Salt Managment Inter VM Communication #1541

nrgaway opened this issue Dec 25, 2015 · 37 comments
Assignees
Labels
C: mgmt P: major Priority: major. Between "default" and "critical" in severity. release notes This issue should be mentioned in the release notes. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Milestone

Comments

@nrgaway
Copy link

nrgaway commented Dec 25, 2015

On 25 December 2015 at 07:40, Marek Marczykowski-Górecki [email protected] wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Fri, Dec 25, 2015 at 03:58:47AM -0800, Jason M wrote:
> BTW, going to play around with bombshell next week :)

That would be probably the easiest thing to do, but not sure if the most
secure. 

I totally agree. I was thinking of bombshell for vm to vm communication where there could be a SaltVM for user formulas where the SaltVM would have no network access or communication to dom0. That still may be risky though, but I wanted to at least experiment with a couple of different ideas to at least eliminate some of them.

I think the way to go, would be to package states/formulas/etc
from dom0, send them to the VM using _simple_ mechanism
(qvm-copy-to-vm?), then execute states there. And the most important
part: do not parse anything coming from that VM back. This means for
example do not render the output. Maybe only exit code
("success"/"failure") and point the user where to look for the logs.

I totally agree. What about dom0 sending configurations to a SaltVM master, which could communicate with TemplateVM's? Or would you rather not introducing a SaltVM in case it gets infected.

As for the state execution, I see state.pkg:
https://docs.saltstack.com/en/latest/ref/modules/all/salt.modules.state.html#salt.modules.state.pkg

But I can't find an easy way to create such package (other than digging
deep into salt-ssh...). Any ideas?

I have not considered salt.modules.state.pkg, but will look into it as well. I was thinking of using either a modified salt-ssh or creating a simple client on the TemplateVM. The salt-master can reside in dom0 that will create specific states based on pillar configuration and tops, etc. Based on your comment, I will look into the best way of creating the required data and sending it to TemplateVM without any parsing. So, the salt-master will create some type of state package, upload it only, the templateVM will unpack and execute it and can notify Salt Master when completed with a result code.

Another thing we can consider is using git to receive formulas from web which the TemplateVM can download and install which means we don't need to package it for each distribution. The formulas can be downloaded as a git repo or salt package (one file). Both methods have no existing security other than checksums, but we could verify git repos in a similar fashion to what qubes-builder does, but adding requirement and verification on each commit as well as tag and possibly a checksum on each file and overall?

I have copied this reply to qubes-issues to allow further discussions.

@marmarek
Copy link
Member

On Fri, Dec 25, 2015 at 08:44:01AM -0800, nrgaway wrote:

I totally agree. What about dom0 sending configurations to a SaltVM master, which could communicate with TemplateVM's? Or would you rather not introducing a SaltVM in case it gets infected.

If SaltVM (which is generally a good idea) would have power to control
(some) VMs, it also should be protected. So process as little as
possible untrusted data.

I have not considered salt.modules.state.pkg, but will look into it as well. I was thinking of using either a modified salt-ssh or creating a simple client on the TemplateVM. The salt-master can reside in dom0 that will create specific states based on pillar configuration and tops, etc. Based on your comment, I will look into the best way of creating the required data and sending it to TemplateVM without any parsing. So, the salt-master will create some type of state package, upload it only, the templateVM will unpack and execute it and can notify Salt Master when completed with a result code.

Yes, this is exactly the idea.

Another thing we can consider is using git to receive formulas from web which the TemplateVM can download and install which means we don't need to package it for each distribution. The formulas can be downloaded as a git repo or salt package (one file). Both methods have no existing security other than checksums, but we could verify git repos in a similar fashion to what qubes-builder does, but adding requirement and verification on each commit as well as tag and possibly a checksum on each file and overall?

Not sure how proceed with this. But I think for the first iteration we
can skip solving this problem - leaving formula installation (in SaltVM
or dom0) up to the user. Then use some mechanism (as discussed above) to
transfer them to the target (Template)VM.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@nrgaway
Copy link
Author

nrgaway commented Dec 25, 2015

Okay, thanks for the feedback. I won't dive too deep into this until I can present a recommendation or two for further analysis based on some of my research and experiments I will be performing over the next couple of weeks.

My main goal will be as requested; to process the minimum amount of untrusted data.

@Rudd-O
Copy link

Rudd-O commented Dec 26, 2015

I recommend not running the bombshell client in dom0. It's possible, but I would advise running it on a non-networked, domU AppVM instead. You can then use nice editors and whatnot to manage your formulas and playbooks, without contaminating dom0 with extra software.

Always treat your VM as you would treat any VM that is going to be running qubes.VMShell commands, because that is what bombshell is -- a proper stdin/stdout/stderr proxy that uses qubes.VMShell to provide for batch and interactive sessions across VMs (and even across computers, with the right by-hand network setup).

@Rudd-O
Copy link

Rudd-O commented Dec 26, 2015

Note that, when used in conjunction with the Qubes Ansible plugin, or as a salt-ssh backend, bombshell-client is almost certainly going to be a return transport for JSON or YAML data from the modules.

While it's conceivable that a malicious VM could fudge the Ansible / Salt modules copied into the VM for execution, such that those malicious'd modules would return bogus data:

  1. reading and parsing that data back is unlikely to result in execution of anything at the calling VM, unless there's code in Ansible or Salt that will execute Python when fed JSON / YAML (not really the case),
  2. nor are terminal codes spat out to the running terminal when executing an Ansible or Salt call, so your ANSI terminal should be safe.

Key point: at no point in the use of Ansible or salt-ssh is there any way for the target hosts to call back into the manager host.

@marmarek
Copy link
Member

On Fri, Dec 25, 2015 at 08:51:34PM -0800, Rudd-O wrote:

Note that, when used in conjunction with the Qubes Ansible plugin, or as a salt-ssh backend, bombshell-client is almost certainly going to be a return transport for JSON or YAML data from the modules.

While it's conceivable that a malicious VM could fudge the Ansible / Salt modules copied into the VM for execution, such that those malicious'd modules would return bogus data:

  1. reading and parsing that data back is unlikely to result in execution of anything at the calling VM, unless there's code in Ansible or Salt that will execute Python when fed JSON / YAML (not really the case),
  2. nor are terminal codes spat out to the running terminal when executing an Ansible or Salt call, so your ANSI terminal should be safe.

That's true in theory, but we still want to be on a safe side. What is
user will install some custom renderer? Or there will be (or is) a bug
in the current renderer? I don't think it is totally impossible (but
granted - unlikely). We are trying to have as small attack surface as
possible - parsing output from a VM (even with very simple code)
enlarges it.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@Rudd-O
Copy link

Rudd-O commented Dec 26, 2015

At least Ansible explicitly removes all nonprintable characters before passing results to the renderer.

Reducing the risk is why I recommend running the bombshell and its Ansible / Salt connection modules in a non-networked domU.

@nrgaway
Copy link
Author

nrgaway commented Dec 26, 2015

On 26 December 2015 at 09:03, Rudd-O [email protected] wrote:

At least Ansible explicitly removes all nonprintable characters before
passing results to the renderer.

Would it be overkill to use a DisposibleVM, maybe a special type of
DisposibleVM, that either dom0 or a SaltManagementVM would send a master
configuration to for one VM at a time. Basically, the /srv salt
directory would be transfered to the DisposibleVM. Then SaltManagementVM
triggers the DisposibleVM to contact one AppVM to update it.

In this senerio, the SaltManagementVM stay safe since it never needs to
parse anything. The TemplateVM's remain safe since the DisposibleVM is
starting from a known state and only ever updates one TemplateVM per run.

The DisposibleVM can then parse output from updating TemplateVM and save
the logs. If TemplateVM were to infect DisposibleVM, it would not matter
since it would not infect SaltManagementVM or other VMs.

Using a process like this means we would not have to write too much custom
code to ensure SaltManagementVM remains safe

Any thoughts?

@marmarek
Copy link
Member

On Sat, Dec 26, 2015 at 03:42:22PM -0800, nrgaway wrote:

On 26 December 2015 at 09:03, Rudd-O [email protected] wrote:

At least Ansible explicitly removes all nonprintable characters before
passing results to the renderer.

Would it be overkill to use a DisposibleVM, maybe a special type of
DisposibleVM, that either dom0 or a SaltManagementVM would send a master
configuration to for one VM at a time. Basically, the /srv salt
directory would be transfered to the DisposibleVM. Then SaltManagementVM
triggers the DisposibleVM to contact one AppVM to update it.

In this senerio, the SaltManagementVM stay safe since it never needs to
parse anything. The TemplateVM's remain safe since the DisposibleVM is
starting from a known state and only ever updates one TemplateVM per run.

The DisposibleVM can then parse output from updating TemplateVM and save
the logs. If TemplateVM were to infect DisposibleVM, it would not matter
since it would not infect SaltManagementVM or other VMs.

Using a process like this means we would not have to write too much custom
code to ensure SaltManagementVM remains safe

Any thoughts?

That's a good idea in terms of security. Some technical detail would be
to properly set qrexec policies, so that DisposableVM really have
access only to that one selected TemplateVM (so even if compromised by
one TemplateVM, it would not have access to another). Also that
DisposableVM should be carefully selected - probably based on the same
template as the target VM. But generally it is a good idea.

It isn't optimal for performance or resources, but if it would be much
easier to implement than the alternative discussed previously, IMO it
would be good enough.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@Rudd-O
Copy link

Rudd-O commented Dec 27, 2015

I think the dispvm is overkill, frankly. A regular VM with no network access should be more than okay.

the ansible and salt connection plugins fill a great niche that the qubes salt system cannot by itself fill -- the actual setup and configuration of the contents of the VMs -- because it would be too risky to have dom0 reach into the VMs and run commands, then parse the outputs.

@marmarek marmarek added T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality. P: major Priority: major. Between "default" and "critical" in severity. release notes This issue should be mentioned in the release notes. C: mgmt labels Jan 6, 2016
@marmarek marmarek added this to the Release 3.2 milestone Jan 6, 2016
@marmarek
Copy link
Member

marmarek commented Feb 5, 2016

I've created major hack for salt-ssh qrexec transport (something like bombshell):
https://gist.github.com/marmarek/bf01ac51bb53cb767297

Differences vs bombshell:

  • uses qvm-copy-to-vm for "scp" service
  • does not pass output from the VM back to salt-ssh, instead construct some dummy data

Usage - in some "MgmtVM" (I've tested it from a VM, but it's trivial to use the same from dom0):

  1. Place as ssh somewhere in $PATH, then create a symlink scp->ssh.
  2. Add target VM to /etc/salt/roster, for example like this:
d8test: 
   # target VM name
   host: d8test
   sudo: True

It is PoC only, do not use for anything else than testing. It is also ugly, sorry. There is also extensive logging to /tmp - there will be a lot of files.

General salt-ssh workflow:

  1. Create an archive with required state files (salt_state.tgz), send it through scp
  2. Connect through ssh (replaced by my script here)
  3. Send a (shell one-liner) shim, which search for python executable, then execute salt-call wrapper.
  4. That wrapper outputs a delimiter string - everything after that is interpreted by salt-ssh (calling site)
  5. If wrapper doesn't find salt thin agent installed, it prints deploy and exit with code 11 - in such case, salt-ssh uses scp to send thin agent (salt-thin.tgz file) and retry
  6. If wrapper doesn't find required external module(s), it prints ext_mods and exit with code 13 - salt-ssh will send required files and retry
  7. In any other case it simply execute requested command (which may imply executing some state package, sent using scp)
  8. Thin agent prints command output - JSON formatted data.
  9. salt-ssh loads that data and display a summary

Conclusions:

  • it is (mostly) possible to use salt that way, it transfer all required files (states, pillars, modules etc)
  • it works without any prior provisioning in the VM - just bare qubes.FileCopy and qubes.VMShell services are enough
  • thanks to exit code, I can artificially output appropriate request (deploy, ext_mods)
  • limitations of not having feedback channel:
    • no feedback whether state succeeded or failed (that could be probably carried through exit code),
    • no details what was done in the target VM (salt normally provide information what was changed, and what was already in place) - this can be probably logged inside the target VM,
    • and the most important: not possible to use grains from the target VM

The last one is particularly important, because a lot of states depends on it, at least to have os_family grain (distribution name), to adjust paths, etc. For example pkg.installed state (installation of some package) doesn't work without this grain. In my PoC I've hardcoded it to Debian.

I see two options here:

  1. Keep strict requirement to not parse anything returned from a VM, and add some heuristic for a few required grains (at least os_family - for example based on template name...)
  2. Add a simple filtering (whitelist approach) for grains and allow a few required values. If going that way, I would use python json module (hopefully it is secure enough), then extract few white-listed values. If not using full json parser, then it would be some pattern matching script - IMHO much more error prone.

I'm slightly for the second option, although it has slightly bigger attack surface.
Example full input for such parser: https://gist.github.com/marmarek/4f4fa732a02ff3073ee4
Example white-listed output: https://gist.github.com/marmarek/3741e5551d33ba9aaf82

@rootkovska any strong opinion here?

@marmarek
Copy link
Member

marmarek commented Feb 5, 2016

BTW @Rudd-O I was able to use qrexec-client-vm directly, without need for any serializer like yout bombshell. Probably signals are not carried through, but it doesn't matter for Salt, fortunately.

@Rudd-O
Copy link

Rudd-O commented Feb 8, 2016

The serializer is necessary to multiplex stdout and stderr in the same channel. If you don't care about stderr (I do), you don't need bombshell.

I fail to see what the problem is with parsing the returned data. It's JSON. If you don't trust what salt-ssh does with the returned data, because you feel that the returned data can be leveraged into dom0 execution (I don't see how based on your own gists) -- then you can write your own custom Salt runner which takes the data returned by the VMs verbatim and raw, then ditches all of them to a file or something. How to do this is well-documented by the SaltStack folks. Needless to say, the returned data is fundamental to using Salt state modules and their highstates.

Also grains break with the mechanism you outlined here. :-(

Note that I'm not using bombshell in dom0 at all. I use bombshell from a non-networked trusted VM, and I point it at other VMs and to dom0 (to configure hardware settings).

@ag4ve
Copy link

ag4ve commented Feb 8, 2016

On Feb 7, 2016 11:18 PM, "Rudd-O" [email protected] wrote:

The serializer is necessary to multiplex stdout and stderr in the same
channel. If you don't care about stderr (I do), you don't need bombshell.

I fail to see what the problem is with parsing the returned data. It's
JSON.

You show me a deserialization library, I'm about 90% sure I can find you a
CVE (or a commit directly related to someone else's). Do I agree something
like this needs to be implemented - yes. Do I agree being flip about the
risks helps anyone - nope.

We all agree an issue here is small. But I posit that with millions of eyes
on JS it still took years to find JSONP. You find the same user base/time
with salt and (partially due to much less complexity) and I think we'll
all be ~ok with salt being a first class citizen in Qubes.

@marmarek
Copy link
Member

marmarek commented Feb 8, 2016

On Sun, Feb 07, 2016 at 08:18:38PM -0800, Rudd-O wrote:

The serializer is necessary to multiplex stdout and stderr in the same channel. If you don't care about stderr (I do), you don't need bombshell.

Raw qrexec can pass stderr as a separate stream. In case of RPC
services, it is redirected to syslog, but when using it from dom0, you
can use raw qrexec, so you'd get stderr too. Not very helpful in your
use case.

Anyway, I don't care about stderr here (as long as it is available
somewhere, for debugging purposes only). Ideally I would not care about
having stdout too...

I fail to see what the problem is with parsing the returned data. It's JSON. If you don't trust what salt-ssh does with the returned data, because you feel that the returned data can be leveraged into dom0 execution (I don't see how based on your own gists)

We don't want to think about it, that's the point. If salt-ssh does not
receive any untrusted input, we can be sure that it won't be exploited
that way. Otherwise, we'd need to analyze that risk. And at least add
salt-ssh to the TCB (a software where one bug can break the whole
system).

Some example attack vectors would be:

  • json parser (rather unlikely)
  • some state template would do something unexpected when get weird
    string in grains or other returned data; code execution is also
    unlikely, but some other errors are possible - like path traversal
    (which could lead to stealing arbitrary private.img for example)
  • some bug in outputter
  • ...

-- then you can write your own custom Salt runner which takes the data returned by the VMs verbatim and raw, then ditches all of them to a file or something. How to do this is well-documented by the SaltStack folks.

I'm not sure if the "runner" is what we need here. It looks like it
isn't about communication channel, but just running some commands on
salt master. But I may be wrong.
It looks to me that in terms of communication layer there are three
options:

  • salt native protocol - highly interactive, so having it without
    feedback channel would be at least very hard if possible at all
  • salt-ssh - described above
  • proxy minions - not running minion at the remote side at all, only
    executing selected operations (using whatever mechanism, like SNMP)
    from some other minion - much less flexible than other methods

Needless to say, the returned data is fundamental to using Salt state modules and their highstates.

Fortunately it isn't true, as I've outlined above. salt-ssh collect all
the needed files, creates a state package (tgz) and execute it on the
minion side using state.pkg function. The only problem is that states
(and pillars etc) are rendered (jinja2 templates are evaluated) on
"master" side and that may need access to some grains. In examples I've
tried, at least 'os' and 'os_family'.

Also grains break with the mechanism you outlined here. :-(

Yes, this is the problem we need to solve somehow.

Note that I'm not using bombshell in dom0 at all. I use bombshell from a non-networked trusted VM, and I point it at other VMs and to dom0 (to configure hardware settings).

If that "non-networked trusted VM" has ability to run any command in any
VM including dom0, it doesn't matter it isn't dom0. It have exactly the
same power as dom0.

As said before, in the feature we want to add an API to allow having
such management VM, but with somehow limited. To not give full control
over the whole system (and user data) to such VM. For example allow only
to manage VMs created by that management VM. And obviously do not allow
direct dom0 command execution.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@rootkovska
Copy link
Member

I would think that - ultimately - we are only interested in Succeed/Fail response from the VM machinery, so a single 0 or 1 bit, no? I really, really don't like parsing of all these stuff you linked to :(

@marmarek
Copy link
Member

marmarek commented Feb 9, 2016

The point is salt needs some information about the target VM to send right states. At least target distribution to select appropriate modules (for example whether to use apt-get or yum).
If that information isn't going to be retrieved from a VM, then user will need to provide it manually, somehow undermine the whole purpose of having management stack...

If you don't like parsing that salt-minion output directly, we can think of what informations is really needed, extract them from a VM using for example separate qrexec service, using some Simple Representation (TM), then provide it to salt the way I've described (my ugly PoC shell script does that - but instead of extracting that info, use hardcoded values in there).

@rootkovska
Copy link
Member

How about creating a DispVM, with a copy of all the info from Dom0, and then let the salt in the target VM talk to the DispVM only? Then destroy the DispVM (after retrieving 0 or 1 from it)?

@marmarek
Copy link
Member

marmarek commented Feb 9, 2016

That DispVM would have control over everything that salt may control, right? Or at least will have all the salt configuration - which can for example contain VPN credentials the user want to configure in some selected ProxyVM.
I don't see a way how to do that, to not:

  • reimplement "everything", or
  • make that DispVM compromise (almost?) as fatal as dom0 compromise

I think the information needed to extract from a VM (using whatever mechanism) is a really small set. In my PoC it is:

            "grains": {
                "os": "Linux",
                "os_family": "Debian"
            },

I guess in practice, maybe two or three more values are needed, but that would be all.

@rootkovska
Copy link
Member

Ok, if there is no easy way to create a subset with all the info for the specific VM, which I understand is not trivial on Salt, then perhaps just have the parser of the return channel in the DispVM? So, the DispVM in this picture doesn't hold any config, only is tasked with simplification of the output returned from the VM.

@marmarek
Copy link
Member

marmarek commented Feb 9, 2016

It may be an option. But I think it can be also done in the managed VM itself - simply put salt-minion output through a "converter" there and return to dom0 simplified version. Dom0 would need to validate that simplified output anyway regardless it comes from the target VM or DispVM, right?

@rootkovska
Copy link
Member

Yeah, right. So the point is not to run json parser and the like in Dom0. Perhaps return some very simplified strings and reconstruct JSON in Dom0 from it?

@marmarek
Copy link
Member

marmarek commented Feb 9, 2016

Exactly.

@marmarek
Copy link
Member

marmarek commented Feb 9, 2016

I think also about extracting the info using other mechanism - simplar to qubes.NotityTools (used by Windows Tools for now, but planning to extend also for Linux - #1637). It already covers remote OS info. That way, hopefully, we could construct the most important grains without using that JSON output in any way (not even in VM, simply redirect to /dev/null there). Will look into it later.

@marmarek
Copy link
Member

No, we don't want to trust behavior of some particular version of salt.
But you are already trusting that behavior as updates happen.

There is a big difference between:

  • in newer version some states do not work because of some missing grains or whatever else there was expected
  • in newer version is it possible to take over the whole system from some VM

It needs to use the returndata to show to the user what the hell changed on the VMs that were modified

It can be simply logged somewhere in the VM. For debugging purposes (if that 0/1 flag show you there was an error). You can tail -f on that file when debugging configuration.

If I were to run a crippled version of salt-ssh that would, e.g., not show me a diff of the file changes it has made, then that's a useless product which I decidedly will not use.

Of course you can do that if you want. You can also configure salt/ansible whatever way you want.
But in default Qubes OS installation we want as small attack surface as possible. If management stack would enlarge it significantly (to our discretion), we'd rather not have management stack at all.

I think further discussion on this matter, here is senseless. If you want to continue, switch to mailing list.

@Rudd-O
Copy link

Rudd-O commented Feb 10, 2016

There is a big difference between:

This sort of argument can be applied to pretty much every software package shipping in Qubes, Xen, kernel, coreutils, Python, anything used in the Qubes utilities, whatever. Under your rationale, the problem isn't Salt itself -- the problem is that software updates to code that has already been reviewed happen, and those updates can bring security holes.

Philosophically speaking, your argument qualifies as what's called an "isolated demand for rigor". http://slatestarcodex.com/2014/08/14/beware-isolated-demands-for-rigor/

Note that I do really understand this: future Salt updates can introduce problems processing with minion returndata and minion grains. I do not deny this. I'm merely saying that's just a cost of doing business, just as much as it is a cost of doing business when you update libxenvchan, xen-core, or any of the myriad packages that qubes-dom0-* depend (I believe there's an icon converter in the pipeline, and that could totally be exploited if the libraries used to convert icons somehow gained a security vulnerability via updates...).

It can be simply logged somewhere in the VM. For debugging purposes

That's effectively unusable because such a user interface is that much harder to do compared to vanilla Salt. It's actually less usable than writing a dirty shell script, truly. I know this from experience because I developed bombshell doing exactly that kind of "tail across VMs" crap, and it was not pleasant, and it gave me many headaches. Avoiding that sort of terrible user experience is in fact the initial motivation for my writing of bombshell.

I predict this sort of crippled user interface will doom the product to be entirely unused, as people will naturally default to using something else entirely to automate their needs. Even shell scripts! At least with shell scripts piped through qvm-run, the user gets the chance to 2>&1 and see stdout+stderr.

Then, at this point, you might begin to hear about people getting their dom0s exploited by ANSI escape sequences from a compromised VM, because they understandably forgot -- unlike the Salt developers -- to filter them out of the output printed in dom0. After all, no one single developer who wants something automated quickly has the time to reimplement an ANSI escape filter on every one of their shell scripts.

The general rule is: that you can do something doesn't mean people are going to tolerate it.

But in default Qubes OS installation we want as small attack surface as possible. If management stack would enlarge it significantly (to our discretion), we'd rather not have management stack at all.

It's probably a good idea to avoid having any management stack then. Salt is a complex beast as it is right now, you can guarantee that it will become more complex over time, and its direction within the Qubes ecosystem doesn't seem to be defined to be user-friendly or usable like the actual SaltStack product is -- in fact, right now, it seems to be an option only for the hardest of hardcore geeks who do inter-VM tail -f for sport. That relegates it to be an extremely niche tool intended for people unwilling to write automation shell scripts but also unwilling to diagnose what the effects of their automation are.

Given those minimal benefits, it looks like stripping Salt from the Qubes TCB has the undeniable advantage that the TCB would measurably and objectively shrink.

This is all in the context of us all waiting for capability OSes like Genode to take over and provide better security by compartmentation. It would be wise to choose well what we all invest our efforts in.

@marmarek
Copy link
Member

Related salt-users topic on salt-ssh data flow: https://groups.google.com/d/msg/salt-users/gDvuc-5v8Hw/dmniH1eQBAAJ

In short:

  1. salt-ssh sends only states targeted to the given host, not all of them (as in traditional ZeroMQ based setup). This includes also additional files referenced through salt://. But not additional modules - in that case (if needed at all) all modules are sent.
  2. States are rendered on master side and "lowstate" data is sent (in contrast to traditional ZeroMQ based setup). This means master really needs grains for more complex states.
  3. It is highly nontrivial to move state rendering to minion side in salt-ssh architecture.

Given the above confirmation, here is what we discussed internally recently:

  1. For every VM managed by Salt (from dom0):
    1. Start target VM.
    2. Have dom0 to create DispVM.
    3. Send all the Salt configuration there.
    4. Grant it qubes.VMShell access to that selected VM only
    5. Run salt-ssh (over qrexec) from the DispVM, targeting that single VM. Do not filter return channel there - so for example all the grains will be available to salt-ssh during state rendering.
    6. Collect output back to dom0 (success/failure flag, optionally logging full output to some file)
    7. Destroy DispVM
    8. Shutdown target VM (opt-out? only when wasn't running at the start?).
  2. Repeat above for next VM. Multiple instances can be running at the same time, if there is enough memory.

This gives us the following properties:

  • (+) Full set of Salt states features does work
  • (+) Exploiting hypothetical bug in salt-ssh doesn't give control over the whole system
  • (-) Exploiting hypothetical bug in salt-ssh does leak whole Salt configuration, including the ones dedicated to other VMs
  • (-) Have some major performance impact (DispVM creation, memory overhead)

Configuration leak (which would require salt-ssh exploit in the first place) can be even further limited using additional steps:

  • Do not store sensitive data (credentials) in Salt configuration at all. Instead store them in vault VM and grant access to some qrexec service for it ("split keepassx"?)
  • Do not send all the pillars to the DispVM, only those targeted at particular VM. While for states it would be non-trivial, for pillars it should be doable (as long as grains are not used to target them, which is discouraged anyway).

Some implementation details/ideas:

  • It would be useful to have pillar external module exporting VM properties. For example to target some state to all the templates. Requires more thinking which properties to export that way (all of them? only user defined tags (Containers tags/attributes #865)? a configurable list of them?). Probably worth a separate ticket.
  • It is easy to check if some VM have any state targeted at it at all (unless grains are used to target states...): salt-call --local --id=VMNAME state.show_lowstate. Note that resulting lowstate can be inaccurate because of missing grains. But should be enough to detect whether call all that machinery (DispVM, start VM at all etc) or not.
  • Consider implementing the whole dom0 part as Salt runner module - it would make easier (faster) to access above mentioned data. But it may have also some downsides. Just an idea to consider.
  • Copying full Salt configuration to DispVM should be as simple as copying /srv.

@marmarek
Copy link
Member

One unanswered question: what template should be used for that DispVM? Having that template compromised means compromise of every salt-managed VM (probably all the templates or so). Also choosing "DispVM based on the same template as the target VM" is not good, because of configuration leak ("red-crappy-install-every-curl-pipe-bash" template based VM would learn all the Salt configuration, even that targeting "secret-project-xyz" VM).
So we need "the most trusted" template here... Maybe a separate one dedicated for that purpose? In that case it would be based on some minimal template.

@rootkovska
Copy link
Member

Yeah, I think some kind of minimal template would be good to have anyway (e.g. for the vault service).

Also, FWIW, attaching my original arch sktech from the meeting :)
salt-dom0-disp-sketch

marmarek added a commit to marmarek/old-qubes-core-agent-linux that referenced this issue Mar 4, 2016
In case of some services it makes much sense for caller to receive also
stderr in addition to stdout. For example:
 - qubes.VMShell (stderr required for salt-ssh over qrexec)
 - qubes.OpenInVM - especially when called to DispVM - otherwise
 diagnosing errors can be hard

And generally all sort of error reporting (the purpose of stderr). It
would ease debugging - instead of message "error occurred, check here and
there for more details", it could be "error occurred: the reason".

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt-base-config that referenced this issue Mar 5, 2016
Targeting with grains (especially pillars) is insecure by design,
because grains are provided by target system. In our case it doesn't
matter in theory since dom0 doesn't receive VM grains. But this have
another problem - such targeting is inaccurate then.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt-base-config that referenced this issue Mar 5, 2016
We plan to use salt-ssh based approach to manage VMs, so no salt
configuration will be present there.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt-base-topd that referenced this issue Mar 5, 2016
All the configuration is managed from dom0 - here top.* commands are
useful. But in VMs we have simplified jinja-based version to only render
top.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt that referenced this issue Mar 5, 2016
Implement simplified version of top.get_top in pure jinja and use it
when plain list of tops to include (`tops.yaml`) is present.
This is to avoid using topd module, which isn't available over salt-ssh
(because 'utils' directory required by it isn't copied over by
 salt-ssh).

Such list of tops can be easily generated using `find` or similar tool.
For example:
cd /srv/salt; find _tops ! -type d -printf '- %p\n' > tops.yaml

QubesOS/qubes-issues#1541
@marmarek
Copy link
Member

marmarek commented Mar 5, 2016

Implemented in marmarek/qubes-mgmt-salt-connector repository. It depends on minor changes in:

This implementation is based on core2 API. But migration to core3 API should be easy (DispVM creation)

@marmarek
Copy link
Member

marmarek commented Mar 5, 2016

Todo:

  • wrap the code into proper classes, install using setuptools
  • integrate with qubesctl
  • parallel execution

@woju is it possible (and if so, is it a good idea?) to install additional ("external") subpackages into the same package as core (qubes)?
If not, we need some convention on python package naming. Currently we already have qubes, qubesdb, splitgpg, imgconverter. And we need to move out remaining parts from qubes (https://github.com/QubesOS/qubes-mgmt-salt-base-overrides/tree/master/src/qubes/mgmt, probably more)

@woju
Copy link
Member

woju commented Mar 5, 2016

Combining one python package from different distro packages is generally a bad idea. I wouldn't discount it altogether, however the need is very rare. Note that there are similar solutions like drop-in directories for plugins (vide /srv/salt/_modules), those however aren't proper python packages. In case of qubes I strongly advise against.

I also don't think we need a "naming policy" precisely because the names aren't changing anything (this wouldn't be the case if we put those modules in single package). The gain is purely cosmetic. That said, we could name all our packages as qubes*, i.e. qubessplitgpg, qubesimgconverter, and qubesmanager for that matter.

@marmarek marmarek self-assigned this Apr 24, 2016
marmarek added a commit to QubesOS/qubes-mgmt-salt-base that referenced this issue May 1, 2016
It will be really useful for fine-grained targeting.

QubesOS/qubes-issues#1541
@marmarek
Copy link
Member

marmarek commented May 1, 2016

Finally integrated into qubes-mgmt-salt repository (where qubesctl tool live). Will remove qubes-mgmt-salt-connector repository.
There are two required packages to use this feature:

  • qubes-mgmt-salt-vm-connector package installed in default template
  • openssh-clients (for scp binary) installed in every managed VM (it isn't used, but salt-ssh checks for its existence)

Then, qubesctl --all state.highstate can be used to apply configuration everywhere. More details in documentation.

marmarek added a commit to marmarek/qubes-mgmt-salt that referenced this issue May 1, 2016
Enable usage of Debian template for "DispVMs" from which salt-ssh is
called.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt-dom0-virtual-machines that referenced this issue May 1, 2016
This will make it easier to reuse states where both dom0 and VMs are
configured from the same formula.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt-dom0-virtual-machines that referenced this issue May 1, 2016
This will make it easier to reuse states where both dom0 and VMs are
configured from the same formula.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-builder-rpm that referenced this issue May 25, 2016
marmarek added a commit to marmarek/qubes-mgmt-salt that referenced this issue May 25, 2016
Do this using python API instead of separate process. This saves time
spent on importing all that modules, which is about half of all the time
spent on `state.show_top` call.

QubesOS/qubes-issues#1541
marmarek added a commit to marmarek/qubes-mgmt-salt that referenced this issue Jun 7, 2016
Even in fc23-based dom0 it is needed to apply them (see
QubesOS/qubes-issues#2048), to correctly render top file.

Otherwise qubesctl doesn't correctly detect which VMs have any
configuration to apply.

QubesOS/qubes-issues#1541
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: mgmt P: major Priority: major. Between "default" and "critical" in severity. release notes This issue should be mentioned in the release notes. T: enhancement Type: enhancement. A new feature that does not yet exist or improvement of existing functionality.
Projects
None yet
Development

No branches or pull requests

6 participants