Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed work submissions #414

Merged
merged 2 commits into from
Sep 22, 2021

Conversation

fosterseth
Copy link
Member

@fosterseth fosterseth commented Sep 3, 2021

Overview

Adds ability for work submissions to be digitally signed.

#377

Details

An RSA private key is added to any node authorized to submit secure work.

---
- node:
    id: foo
- work-signing: 
    privatekey: /home/sbf/sockceptor/certs/signworkprivate.pem
    tokenexpiration: 10h30m

The corresponding RSA public key is added to any node that receives secure work.

---
- node:
    id: bar
- work-verification:
    publickey: /home/sbf/sockceptor/certs/verifyworkpublic.pem

Any work command in bar.yml that expects secure work submissions can set verifysignature to be true.

- work-command:
    workType: echosleep
    command: bash
    params: "-c \"while read -r line; do echo $line; sleep 1; done\""
    verifysignature: true

Tell receptor to sign the work submission using the --signwork parameter.

(on foo) receptorctl work submit echosleep --node bar --signwork

Only nodes with the correct work-signing key is able to start echosleep on bar.

Notes

receptorctl status now lists secure work types. Nodes must use --signwork to run these work units.

Node         Work Types
bar          sleepcat, 100cat

Node         Secure Work Types
bar          echosleep

Verification occurs for locally submitted work as well. This is because currently the receptor control service does not know the origin of the incoming connection. Therefore, it cannot distinguish between commands submitted from remote nodes, or locally via the unix socket.

Each generated json web token is set to expire in 5 minutes.

@fosterseth fosterseth marked this pull request as ready for review September 8, 2021 16:25
@jladdjr jladdjr self-requested a review September 9, 2021 15:57
@AlanCoding
Copy link
Member

https://github.com/ansible/awx/blob/devel/docs/receptor_mesh.md

Our instructions for AWX execution nodes, with this, would get slightly modified to be like:

- verifyingkeypublic: /var/run/awx-receptor/verifyworkpublic.pem

- work-command:
    worktype: ansible-runner
    command: ansible-runner
    params: worker
    allowruntimeparams: true
    verifysignature: true

Eventually @tchellomello and others will come up with a final location for the files, I just made that up now.

Where the public key corresponds to what the main cluster nodes set up. Among other people who will be making changes downstream from this @thenets is working on some ephemeral node tooling.

@yagomarques
Copy link

Hello @fosterseth i have already tested 3 scenarios:

  1. auth correctly - it's working
  2. auth with an invalid public rsa key
  3. auth with a wrong public rsa key

about the auth with an invalid public rsa key scenario, I created a invalid pem like:

-----BEGIN PUBLIC KEY-----
invalidMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCUB3gvzciIrXtCTZl7h/grsj1w
KNnVga96+n1zMyHezDvsnCoKT9SFVhWrjT1t36QKAreImtmceDQuOPnbo03w9b0u
CAIQY1slwcwyaRakbzTGlSKKd6oKAro8RyLfGUJRwAhkD2Ag0BPDRWw5eT15M5Az
FnWK4pYXG2fxcYzzowIDAQAB
-----END PUBLIC KEY-----

When i ran the work on the node
seq 100 | receptorctl --socket /tmp/receptor.sock work submit rsa-validate --node executionnode --payload - -f --signwork

an unexpected error occurs:
Error: Remote error: ERROR: read error reading from executionnode: INTERNAL_ERROR: no connection to next hop

and the node stopped with an error:

NFO 2021/09/09 15:17:49    controlnode via controlnode
DEBUG 2021/09/09 15:17:49 Received routing update 15w6eGs0 from controlnode via controlnode
DEBUG 2021/09/09 15:17:49 Sending routing update wkJcfLJO. Connections: controlnode(1.00)
INFO 2021/09/09 15:17:56 Client connected to control service
INFO 2021/09/09 15:17:56 Client disconnected from control service
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x96e162]

goroutine 116 [running]:
github.com/ansible/receptor/pkg/certificates.LoadFromPEMFile(0xc000041c80, 0x2b, 0x2, 0xc00041a368, 0xc0000acb70, 0xc00041a340, 0x30)
	/home/yagosilva/projects/receptor/pkg/certificates/ca.go:49 +0x122
github.com/ansible/receptor/pkg/certificates.LoadPublicKey(0xc000041c80, 0x2b, 0xc000046db0, 0xc, 0xc00041a3d8)
	/home/yagosilva/projects/receptor/pkg/certificates/ca.go:213 +0x39
github.com/ansible/receptor/pkg/workceptor.(*Workceptor).AllocateUnit(0xc00040bf80, 0xc000046db0, 0xc, 0xc0001e6000, 0xe9, 0x0, 0xc0000acb70, 0x0, 0x0, 0x0, ...)
	/home/yagosilva/projects/receptor/pkg/workceptor/workceptor.go:163 +0x37f
github.com/ansible/receptor/pkg/workceptor.(*workceptorCommand).ControlFunc(0xc0004a66c0, 0xc000415ba0, 0x1880db8, 0xc0000ae690, 0x0, 0x0, 0x0)
	/home/yagosilva/projects/receptor/pkg/workceptor/controlsvc.go:248 +0xfc5
github.com/ansible/receptor/pkg/controlsvc.(*Server).RunControlSession(0xc000410870, 0x188d150, 0xc00038e0f0)
	/home/yagosilva/projects/receptor/pkg/controlsvc/controlsvc.go:223 +0xacb
github.com/ansible/receptor/pkg/controlsvc.(*Server).RunControlSvc.func2.1(0x188d150, 0xc00038e0f0, 0xc00007a000, 0xc000410870)
	/home/yagosilva/projects/receptor/pkg/controlsvc/controlsvc.go:361 +0x6e
created by github.com/ansible/receptor/pkg/controlsvc.(*Server).RunControlSvc.func2
	/home/yagosilva/projects/receptor/pkg/controlsvc/controlsvc.go:335 +0x5e

About the auth with a wrong public rsa key scenario, i created a valid pem but that key don't match with the private key, like:

-----BEGIN PUBLIC KEY-----
MIGeMA0GCSqGSIb3DQEBAQUAA4GMADCBiAKBgFm5UU7Xji42S4pFKlhTobsntzCp
90TKXLrTHZlqATSckpETC97m0TJOwvx/Aw8wAACA3J6sbhPYHscn8iuPsdn8mRhL
NnMXflBwT9XoOk0Au4dGZLxmPe26W6t5tA0VqFzqY17PF/FKz61ZFVtkpL5qAt8V
luHM/QNqV51Z4RPpAgMBAAE=
-----END PUBLIC KEY-----

When i ran the work on the node
seq 100 | receptorctl --socket /tmp/receptor.sock work submit rsa-validate --node executionnode --payload - -f --signwork

this error occurs
Error: Remote error: ERROR: could not parse response: ERROR: could not verify signature: crypto/rsa: verification error

on the node side, there aren't logs about it

DEBUG 2021/09/09 15:33:45 Sending service advertisement: &{executionnode control 2021-09-09 15:33:45.360870106 -0300 -03 m=+245.286055909 1 map[type:Control Service] [{"WorkType":"echosleep","Secure":false},{"WorkType":"rsa-validate","Secure":true}]}
DEBUG 2021/09/09 15:33:47 Received routing update RKvua2BM from controlnode via controlnode
DEBUG 2021/09/09 15:33:47 Sending routing update FzX9AqSu. Connections: controlnode(1.00)
DEBUG 2021/09/09 15:33:57 Received routing update 0ENwxo2F from controlnode via controlnode
DEBUG 2021/09/09 15:33:57 Sending routing update Flf4cbde. Connections: controlnode(1.00)
DEBUG 2021/09/09 15:33:58 Received service advertisement &{0xc000090720 false}

Should we include more logs on this feature?

@fosterseth fosterseth force-pushed the feat_signed_work_requests branch from 55fc36c to ed12f53 Compare September 9, 2021 19:03
@fosterseth
Copy link
Member Author

@yagomarques thank you, I now added proper error handling if an invalid PEM is used

Error: Remote error: ERROR: could not parse response: ERROR: failed to decode PEM block

@AlanCoding
Copy link
Member

receptorctl status now lists secure work types

Following up from discussion, either me or @beeankha or @jladdjr need to update AWX to accommodate the changed structure.

Current output looks like:

bash-4.4$ receptorctl status --json
{"Advertisements": [{"NodeID": "awx_1", "Service": "control", "Time": "2021-09-21T14:41:33.346577282Z", "ConnType": 1, "Tags": null, "WorkCommands": ["local", "kubernetes-runtime-auth", "kubernetes-incluster-auth"]}], "Connections": [{"NodeID": "receptor-hop", "Cost": 1}], "KnownConnectionCosts": {"awx_1": {"receptor-hop": 1}, "receptor-1": {"receptor-hop": 1}, "receptor-2": {"receptor-hop": 1}, "receptor-hop": {"awx_1": 1, "receptor-1": 1, "receptor-2": 1}}, "NodeID": "awx_1", "RoutingTable": {"receptor-1": "receptor-hop", "receptor-2": "receptor-hop", "receptor-hop": "receptor-hop"}, "SystemCPUCount": 8, "SystemMemoryMiB": 31855, "Version": "1.0.0.0a2"}

Can you give an example output after this change with some secure work types enabled? With that, we will update:

https://github.com/ansible/awx/blob/0ac3a377fdb74acb2ecbeb286bd1998d6d72a42a/awx/main/tasks.py#L477

So that we will look for the "ansible-runner" work type as either a work type or a secure work type. We would like to error on the side of adding the node, because this discovery process needs to be pretty stable/reliable.

Remote work submissions can be digitally signed by the sender. The
target node will verify the signature of the work command before
starting the work unit.

A pair of RSA public and private keys are created offline and
distributed to the nodes. The public key should be on the node receiving
work (PKIX format). The private key should be on the node submitting
work (PKCS1 format).
@fosterseth fosterseth force-pushed the feat_signed_work_requests branch from 8a58483 to 6d12cfe Compare September 21, 2021 15:24
@fosterseth fosterseth merged commit 156e6e2 into ansible:devel Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants