Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate-less bootstrap tokens #93

Merged
merged 1 commit into from
Jun 22, 2023

Conversation

edigaryev
Copy link
Collaborator

In #86, Orchard was starting to create certificate-less contexts for Controllers that are using PKI-compatible certificates.

However, I've overlooked the fact the we also need to add the certificate-less support to the bootstrap tokens.

Resolves #86.

@edigaryev edigaryev requested a review from fkorotkov June 22, 2023 19:01
@edigaryev edigaryev marked this pull request as ready for review June 22, 2023 19:01
@edigaryev edigaryev merged commit c4c1851 into main Jun 22, 2023
@edigaryev edigaryev deleted the certificate-less-bootstrap-tokens branch June 22, 2023 20:53
@ruimarinho
Copy link

ruimarinho commented Jun 22, 2023

Great work @edigaryev - the worker has now been able to re-register. I did a quick test and everything seems to be working so far, but there is a recurring message around a 400 error:

orchard@mac % sudo launchctl load -w /Library/LaunchDaemons/org.cirruslabs.orchard.worker.plist
orchardi@mac % tail -f /tmp/orchard-worker.log
{"level":"info","ts":1687469693.550679,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469698.553354,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469703.5502238,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"warn","ts":1687469707.148829,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469708.546686,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469713.624796,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469718.548105,"msg":"syncing 1 local VMs against 1 remote VMs..."}
{"level":"info","ts":1687469745.880018,"msg":"registered worker mac-M2GVQ20L75"}
{"level":"info","ts":1687469745.9966872,"msg":"syncing on-disk VMs..."}
{"level":"warn","ts":1687469746.322099,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469746.995243,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469747.868917,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469749.248595,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469751.3179488,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"warn","ts":1687469754.977362,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469755.83937,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469756.161123,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469760.6640599,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"warn","ts":1687469761.893548,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469765.731574,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469770.6702971,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"warn","ts":1687469775.540767,"msg":"failed to watch RPC: rpc error: code = Internal desc = unexpected HTTP status code received from server: 400 (Bad Request); transport: received unexpected content-type \"text/plain; charset=utf-8\""}
{"level":"info","ts":1687469775.749318,"msg":"syncing 1 local VMs against 0 remote VMs..."}
{"level":"info","ts":1687469780.666985,"msg":"syncing 1 local VMs against 0 remote VMs..."}```

I've already deleted all VMs and restarted orchard. Any idea what could be causing this behaviour?

@ruimarinho
Copy link

Also having issues with vnc and ssh:

forwarding 127.0.0.1:64247 -> ventura-xcode-new:5900...
no credentials specified or found, trying default admin:admin credentials...opening vnc://[email protected]:64247...
failed to forward port: websocket.Dial wss://orchard.example.internal:443/v1/vms/ventura-xcode/port-forward?port=5900&wait=60: bad status
^C2023/06/22 22:31:15 context canceled

@edigaryev
Copy link
Collaborator Author

@ruimarinho can you check if the following ingress configuration works for you:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: orchard-ingress
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: orchard
                port:
                  number: 6120
  ingressClassName: nginx
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: orchard-ingress-grpc
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "GRPCS"
spec:
  rules:
    - http:
        paths:
          - path: /Controller
            pathType: Prefix
            backend:
              service:
                name: orchard
                port:
                  number: 6120
  ingressClassName: nginx

It most certainly will need to be adapter for your environment, but the main idea is that without nginx.ingress.kubernetes.io/backend-protocol: "GRPCS" treatment for /Controller path gRPC (which we use for port-forwarding) wouldn't work.

I've tried this on a local Kubernetes cluster and port-forwarding/SSH seem to work just fine.

@ruimarinho
Copy link

@edigaryev I've tested your suggestion but I'm getting a 504 timeout:

2023/06/27 12:30:38 [error] 2282#2282: *83097928 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 10.10.10.100, server: orchard.example.internal, request: "POST /Controller/Watch HTTP/2.0", upstream: "grpcs://10.10.10.100:443", host: "orchard.example.internal:443"

I'm using 443 for the PORT environment variable, but I've also tested with forward /Controller to 6120 just in case the gRPC server would be listening to a different port (not that the code suggest this...) and then I got a connection refused.

Theoretically, it's being forwarded correctly because nginx is complaining about a grpcs:// upstream - now I just need to figure out why is it timing out. The ingress is behind an AWS NLB.

If you have any suspicion, let me know, otherwise I'll keep digging. Thanks!

@ruimarinho
Copy link

ruimarinho commented Jun 28, 2023

@edigaryev after testing with a few more settings (grpc_connect_timeout, grpc_read_timeout, grpc_send_timeout), the best outcome I've come across is getting a 499 status code instead of a 504 (gateway timeout). It seems like occasionally I was able to get a 502 too:

ingress-nginx-controller-6c48cbfb6f-2czfc controller 2023/06/28 11:51:03 [error] 1253#1253: *1194409 no connection data found for keepalive http2 connection while sending request to upstream, client: 10.10.10.100, server: orchard.example.internal, request: "POST /Controller/Watch HTTP/2.0", upstream: "grpcs://10.10.10.91:443", host: "orchard.example.internal:443"

After some investigation, it seems like nginx has an issue multiplexing HTTP/1.1 and gRPC, although I'm not entirely sure it's related with that here.

My suggestion would be to add a flag -- even a test build -- to run the gRPC server on a different port to see if that helps. There is nothing on the controller logs related to POST /Controller/Watch.

Any other ideas you may have?

Below is the nginx configuration block generated for /Controller:

``` location = /Controller {
		set $namespace      "orchard";
		set $ingress_name   "controller-ingress-grpc";
		set $service_name   "controller-lb";
		set $service_port   "https";
		set $location_path  "/Controller";
		set $global_rate_limit_exceeding n;

		rewrite_by_lua_block {
			lua_ingress.rewrite({
				force_ssl_redirect = true,
				ssl_redirect = true,
				force_no_ssl_redirect = false,
				preserve_trailing_slash = false,
				use_port_in_redirects = false,
				global_throttle = { namespace = "", limit = 0, window_size = 0, key = { }, ignored_cidrs = { } },
			})
			balancer.rewrite()
			plugins.run()
		}

		# be careful with `access_by_lua_block` and `satisfy any` directives as satisfy any
		# will always succeed when there's `access_by_lua_block` that does not have any lua code doing `ngx.exit(ngx.DECLINED)`
		# other authentication method such as basic auth or external auth useless - all requests will be allowed.
		#access_by_lua_block {
		#}

		header_filter_by_lua_block {
			lua_ingress.header()
			plugins.run()
		}

		body_filter_by_lua_block {
			plugins.run()
		}

		log_by_lua_block {
			balancer.log()

			monitor.call()

			plugins.run()
		}

		port_in_redirect off;

		set $balancer_ewma_score -1;
		set $proxy_upstream_name "orchard-controller-lb-https";
		set $proxy_host          $proxy_upstream_name;
		set $pass_access_scheme  $scheme;

		set $pass_server_port    $server_port;

		set $best_http_host      $http_host;
		set $pass_port           $pass_server_port;

		set $proxy_alternative_upstream_name "";

		client_max_body_size                    1m;

		# Pass the extracted client certificate to the backend

		# Allow websocket connections
		grpc_set_header                        Upgrade           $http_upgrade;

		grpc_set_header                        Connection        $connection_upgrade;

		grpc_set_header X-Request-ID           $req_id;
		grpc_set_header X-Real-IP              $remote_addr;

		grpc_set_header X-Forwarded-For        $remote_addr;

		grpc_set_header X-Forwarded-Host       $best_http_host;
		grpc_set_header X-Forwarded-Port       $pass_port;
		grpc_set_header X-Forwarded-Proto      $pass_access_scheme;
		grpc_set_header X-Forwarded-Scheme     $pass_access_scheme;

		grpc_set_header X-Scheme               $pass_access_scheme;

		# Pass the original X-Forwarded-For
		grpc_set_header X-Original-Forwarded-For $http_x_forwarded_for;

		# mitigate HTTPoxy Vulnerability
		# https://www.nginx.com/blog/mitigating-the-httpoxy-vulnerability-with-nginx/
		grpc_set_header Proxy                  "";

		# Custom headers to proxied server

		proxy_connect_timeout                   180s;
		proxy_send_timeout                      60s;
		proxy_read_timeout                      180s;

		proxy_buffering                         off;
		proxy_buffer_size                       16k;
		proxy_buffers                           4 16k;

		proxy_max_temp_file_size                1024m;

		proxy_request_buffering                 on;
		proxy_http_version                      1.1;

		proxy_cookie_domain                     off;
		proxy_cookie_path                       off;

		# In case of errors try the next upstream server before returning an error
		proxy_next_upstream                     error timeout;
		proxy_next_upstream_timeout             0;
		proxy_next_upstream_tries               3;

		grpc_pass grpcs://upstream_balancer;

		proxy_redirect                          off;

	}
</details>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS architecture and Kubernetes thoughts
3 participants