Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tinkerbell GRPC Insecure Mode #567

Merged
merged 10 commits into from
Feb 11, 2022
Merged

Conversation

nshalman
Copy link
Member

@nshalman nshalman commented Dec 6, 2021

Description

This PR, along with some minor cleanup, enables tink-server to run without a TLS certificate and enables tink-cli to connect to it without needing TLS to be set up.

At the time of opening the PR, this change is intended to only extend the capabilities of tink-server and tink-cli rather than fundamentally altering them, though I am open to the latter before this lands if that's the right answer.

Why is this needed

This will simplify development and testing of the tinkerbell stack, and could theoretically be used in the sandbox.
Alternately, the sandbox could include TLS termination similar to how many applications use Kubernetes ingress for TLS termination.

How Has This Been Tested?

Template creation and retrieval

cat focal.yaml | docker exec -i tink_tink-cli_1 tink template create
docker exec -i tink_tink-cli_1 tink template get
  • against make run-stack
  • with the sandbox

How are existing users impacted? What migration steps/scripts do we need?

This is intended to not impact

Checklist:

I have:

  • updated the documentation and/or roadmap (if required)
  • added unit or e2e tests
  • provided instructions on how to upgrade

@codecov
Copy link

codecov bot commented Dec 6, 2021

Codecov Report

Merging #567 (8282ae7) into main (fc76c03) will increase coverage by 0.02%.
The diff coverage is 71.42%.

❗ Current head 8282ae7 differs from pull request most recent head ae0df5f. Consider uploading reports for the commit ae0df5f to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #567      +/-   ##
==========================================
+ Coverage   43.39%   43.41%   +0.02%     
==========================================
  Files          51       51              
  Lines        3104     3107       +3     
==========================================
+ Hits         1347     1349       +2     
- Misses       1671     1673       +2     
+ Partials       86       85       -1     
Impacted Files Coverage Δ
grpc-server/grpc_server.go 36.48% <71.42%> (+1.27%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc76c03...ae0df5f. Read the comment docs.

@nshalman nshalman force-pushed the insecure-mode branch 3 times, most recently from 28cdc1e to 0a19809 Compare December 6, 2021 21:16
@nshalman nshalman marked this pull request as ready for review December 6, 2021 21:16
@nshalman nshalman requested a review from stephen-fox December 6, 2021 21:17
@ScottGarman ScottGarman self-requested a review December 6, 2021 21:30
conn, err := grpc.Dial(opt.GRPCAuthority,
method,
grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
grpc.WithStreamInterceptor(otelgrpc.StreamClientInterceptor()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent :)

Might want to add a client option to print the trace id in this case e.g. in the logs, so it's easy to take to a search box.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example elsewhere of how to do that I can crib from?

client/main.go Outdated
method := grpc.WithInsecure()
insecure := os.Getenv("TINKERBELL_INSECURE")

if insecure == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no big deal but I usually make flags like insecure here go through strconv.ParseBool, technically safer than allowing even " " to get interpreted as true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also implies TINKERBELL_INSECURE=0 means to run in insecure mode...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to being more explicit.

client/main.go Outdated Show resolved Hide resolved
@@ -93,6 +95,11 @@ func (c *DaemonConfig) PopulateFromLegacyEnvVar() {
if basicAuthPass := os.Getenv("TINK_AUTH_PASSWORD"); basicAuthPass != "" {
c.HTTPBasicAuthPassword = basicAuthPass
}
if insecure, isSet := os.LookupEnv("INSECURE"); isSet {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use TINKERBELL_INSECURE for consistency, and to avoid possibly conflicting with another application.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that should be TINK_INSECURE instead of TINKERBELL_?

@@ -93,6 +95,11 @@ func (c *DaemonConfig) PopulateFromLegacyEnvVar() {
if basicAuthPass := os.Getenv("TINK_AUTH_PASSWORD"); basicAuthPass != "" {
c.HTTPBasicAuthPassword = basicAuthPass
}
if insecure, isSet := os.LookupEnv("INSECURE"); isSet {
if b, err := strconv.ParseBool(insecure); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be err == nil?

Or, should this result in a panic?

You could also ignore the error I suppose, as that will result in what I consider to be a decent default: that a TLS listener gets created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I cribbed this from ONLY_MIGRATION above. a bug in both locations? Did any of this ever work??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably time to collapse the two different codepaths for env vars and command line flags into one piece of correct code... :sigh:

if config.Insecure {
rpcServer.SetupGRPC(ctx, logger, &rpcServer.ConfigGRPCServer{
Facility: config.Facility,
TLSCert: "insecure",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this field as unspecified / default value (empty string)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"insecure" is being used as a sentinel value to skip tls stuff in rpcServer.SetupGRPC. I'll be cleaning that up next and this will all go away then. In the meantime I don't think its worth re-working this + setupGRPC if its going to be deleted soon.

docker-compose-insecure.yaml Outdated Show resolved Hide resolved
client/main.go Outdated
}

func NewClientConn(opt *ConnOptions) (*grpc.ClientConn, error) {
resp, err := http.Get(opt.CertURL)
func fetchCert(url string) (credentials.TransportCredentials, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function name could be a bit more descriptive. Maybe grpcCredentialFromCertEndpoint, or trustedCertsFromCertEndpoint.

Either that, or a comment would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whynotboth.gif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

food for thought on naming: https://talks.golang.org/2014/names.slide#3

@@ -90,8 +71,23 @@ func NewClientConn(opt *ConnOptions) (*grpc.ClientConn, error) {
return nil, errors.Wrap(err, "parse cert")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we didn't create this issue, but I just realized that err here is nil.

An improvement would be decoding the certificate's PEM and parsing the resulting ASN.1 DER using the x509 library. That, or creating a new error that says something like failed to parse cert - cert pool append certs from pem failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree, but I'm actually looking to drop the /cert completely so this will go away anyway.

@@ -60,14 +60,19 @@ func SetupGRPC(ctx context.Context, logger log.Logger, config *ConfigGRPCServer,
dbReady: true,
logger: logger,
}
if cert := config.TLSCert; cert != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a field to ConfigGRPCServer that specified no TLS?

The original if statement might work as-is... But it does leave some ambiguity about what should happen if the field is an empty string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be reworking this soon and should do a better job of dealing with tls certs. Lets ignore this for now please.

Copy link
Contributor

@micahhausler micahhausler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to better understand the motivations for this PR. Is it that "generating TLS serving certs is hard"? Or is passing arounds certs trusted by the server is difficult?

My read of this PR is that we want to provide Tink server operators the ability to turn off client TLS authentication on the server.

Can the docs be updated with this new mode and the flags to turn it on?

Insecure can have multiple meanings:

  • No TLS at all (client or server or both)
  • Still use mTLS, but trust any TLS certificate

Can the "insecure" parameters be named more specifically? Maybe something like insecureNoTLSAuthentication

client/main.go Outdated
method := grpc.WithInsecure()
insecure := os.Getenv("TINKERBELL_INSECURE")

if insecure == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to being more explicit.

client/main.go Outdated
creds := credentials.NewClientTLSFromCert(cp, "")

method := grpc.WithInsecure()
insecure := os.Getenv("TINKERBELL_INSECURE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using pflag above, can we use Viper to get the env var?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be addressed

PGUSER: tinkerbell
TINKERBELL_GRPC_AUTHORITY: :42113
TINKERBELL_HTTP_AUTHORITY: :42114
TINKERBELL_INSECURE: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a full new docker-compoose? Can this not default to false and accept an override for true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'll give this a shot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten rid of the extra/unnecessary docker-compose file

@nshalman
Copy link
Member Author

nshalman commented Dec 7, 2021

My read of this PR is that we want to provide Tink server operators the ability to turn off client TLS authentication on the server.

We want to turn off TLS as the transport, largely to simplify development, but possibly also for production uses.
On the server side we don't want to use TLS.
On the client side we want the GRPC client connection to be created using the "WithInsecure" DialOption which "disables transport security" (see https://pkg.go.dev/google.golang.org/grpc#WithInsecure)

Can the docs be updated with this new mode and the flags to turn it on?

If you mean docs within this repo, that's certainly a reasonable thing to request before this is approved, though we should probably resolve the naming issues first.
I'd rather not write docs elsewhere before this is approved, though it seems reasonable to me that they be written before it lands.

Insecure can have multiple meanings:

  • No TLS at all (client or server or both)
  • Still use mTLS, but trust any TLS certificate

Can the "insecure" parameters be named more specifically? Maybe something like insecureNoTLSAuthentication

I chose "insecure" because that's what GRPC calls the DialOption. I guess "insecureNoTLS" could work, but at that rate we could call it "noTLS" too.

@nshalman nshalman added the do-not-merge Signal to Mergify to block merging of the PR. label Dec 7, 2021
@micahhausler
Copy link
Contributor

micahhausler commented Dec 7, 2021

We want to turn off TLS as the transport, largely to simplify development, but possibly also for production uses.
On the server side we don't want to use TLS.

Introducing a new mode where users can hurt themselves is a bit scary to me, even if it is prefixed with "insecure." Can we simplify development and deployment with certificate generating scripts/more documentation?

Edit: If the primary motivation is development, can we use build tags to make development easier, but release artifacts are don't use the insecure mode?

I chose "insecure" because that's what GRPC calls the DialOption. I guess "insecureNoTLS" could work, but at that rate we could call it "noTLS" too.

If we do go down this route, I'm a fan of insecureNoTLS/TINKERBELL_INSECURE_NO_TLS/--insecure-no-tls. If/when other security features are added (ex: authorization), you'll want to differentiate them and still make it explicit that the configuration option is insecure.

@nshalman nshalman marked this pull request as draft December 8, 2021 19:12
@nshalman nshalman force-pushed the insecure-mode branch 4 times, most recently from a1926dd to 0c2f2f4 Compare December 10, 2021 16:44
@mmlb
Copy link
Contributor

mmlb commented Jan 18, 2022

I'm going to take over this PR so will be dusting this off and responding to feedback.

@mmlb
Copy link
Contributor

mmlb commented Jan 18, 2022

I have 2 primary reasons for wanting to see this PR through. First is to enable a simpler dev mode where we don't paper over the insecure setup necessary for sandbox (https://github.com/tinkerbell/sandbox/blob/9a04338168efd85be173100eb165bd2d3c6d09c2/deploy/compose/docker-compose.yml#L78). The second reason is to do away with the whole /cert thing. /cert came around from when cacher was first developed and had to deal with internal stuff. It should have never made its way into tinkerbell. The only reason it worked in cacher and sort of works in tinkerbell is by piggy-backing on tls to fetch /cert. We can't do that in sandbox with the self-signed certs and so the one safety anchor goes away.

IMO tinkerbell should support tls via an ingress or tls-terminating proxy (and thus runs in insecure mode itself, so the traffic is in the clear once "on-prem") or if you want to have tls for tinkerbell all the way through we should pass in valid/trusted-ca certs and do away with self-signed certs all together. The former is a nice middle ground of safe-enough for many use cases (remember we do want to be homelab-ish friendly too) and the latter could still be used to support self-signed tls anyway, its just harder. I'm ok with stating "if you already want to deal with the hard work of a self-signed pki infrastructure, then you should be ready to roll your own live environment to handle your 'untrsuted CA' (or embed the CA cert into hook.. somehow".

@mmlb
Copy link
Contributor

mmlb commented Jan 18, 2022

We want to turn off TLS as the transport, largely to simplify development, but possibly also for production uses.
On the server side we don't want to use TLS.

Introducing a new mode where users can hurt themselves is a bit scary to me, even if it is prefixed with "insecure." Can we simplify development and deployment with certificate generating scripts/more documentation?

This should be doable I think in sandbox code, by repackaging hook's initrd to include the generated CA's cert into the trust store, but I don't want to do it myself. I'd rather spend our time on better tinkerbell features than supporting self-signed certs easiliy.

Edit: If the primary motivation is development, can we use build tags to make development easier, but release artifacts are don't use the insecure mode?

Its not just development, that was my primary want but after thinking some more on this I think "insecure" mode is actually secure enough for many users. Standing up a tls terminating grpc proxy that does clear-text to its backends is very common, especially in homelab/messing-around scenarios.

I chose "insecure" because that's what GRPC calls the DialOption. I guess "insecureNoTLS" could work, but at that rate we could call it "noTLS" too.

If we do go down this route, I'm a fan of insecureNoTLS/TINKERBELL_INSECURE_NO_TLS/--insecure-no-tls. If/when other security features are added (ex: authorization), you'll want to differentiate them and still make it explicit that the configuration option is insecure.

I like this, but would rather have it be --no-tls. We can be secure w/o tinkerbell server running tls itself.

@mmlb mmlb force-pushed the insecure-mode branch 2 times, most recently from ee9088f to 8246789 Compare January 18, 2022 21:23
@mmlb mmlb marked this pull request as ready for review January 19, 2022 16:03
@mmlb mmlb force-pushed the insecure-mode branch 2 times, most recently from f8d8636 to 9039869 Compare January 24, 2022 18:23
@mmlb
Copy link
Contributor

mmlb commented Jan 27, 2022

Alright this should be the last of the changes. I've changed the args and code from negative to positive --no-tls/TINKERBELL_NO_TLS (default false) -> --tls/TINKERBELL_TLS (default true). This avoided needing to have double negatives in the code (if !noTLS for example). Still depends on #583 and as such has its commits, but should be cleared up soon.

mergify bot added a commit that referenced this pull request Feb 10, 2022
## Description

Drop all code dealing with json rpc.

## Why is this needed

Its likely unused code which has some weird code (:eye: runtime.String) that make other PRs more tedious (#567, also I want to drop the self-signed `/cert` stuff too) to implement.

## How Has This Been Tested?

`go build` and CI.

## How are existing users impacted? What migration steps/scripts do we need?

No impact I suspect (and hope). I know of 3 other tink clients and they all use grpc:
 - https://github.com/swills/py-tink-cli
 - https://github.com/micahhausler/tink/tree/feature/k8s-mode/cmd/virtual-worker
 - an Equinix Metal internal service
nshalman and others added 10 commits February 10, 2022 10:06
This is useful for local development and should never be used in production
environments.

Signed-off-by: Nahum Shalman <[email protected]>
Signed-off-by: Manuel Mendez <[email protected]>
Helps for visually scanning.

Signed-off-by: Manuel Mendez <[email protected]>
We already import packethost/pkg might as well make the most of it.

Signed-off-by: Manuel Mendez <[email protected]>
NewClientConn already exists to make the grpc connection, no reason not to use it.

Signed-off-by: Manuel Mendez <[email protected]>
Do not setup tls in the grpc dial opts and do not setup /cert path.

Signed-off-by: Manuel Mendez <[email protected]>
Sort services.
Sort env bars.
Sort depends_on services.
Reorder defs so they are ordered in->out, e.g.:
 image/build first
 followed by entrypoint/cmd
 finishing with things needed to talk to it
idk just wanted to have a similar order to compare services.

Signed-off-by: Manuel Mendez <[email protected]>
/cert does not exist when in non-tls mode. Also, why use /cert when
/healthz exists and is made for exactly for this.

Signed-off-by: Manuel Mendez <[email protected]>
We already import packethost/pkg might as well make the most of it.

Signed-off-by: Manuel Mendez <[email protected]>
Signed-off-by: Nahum Shalman <[email protected]>
Signed-off-by: Manuel Mendez <[email protected]>
@mmlb mmlb added do-not-merge Signal to Mergify to block merging of the PR. ready-to-merge Signal to Mergify to merge the PR. and removed do-not-merge Signal to Mergify to block merging of the PR. labels Feb 10, 2022
@mmlb
Copy link
Contributor

mmlb commented Feb 10, 2022

Alrighty then, now that #583 is merged I've rebased this so the diff is minimal again. Looking from some review from @micahhausler @nshalman @stephen-fox 🙏

@mmlb mmlb self-assigned this Feb 10, 2022
@mmlb mmlb removed the do-not-merge Signal to Mergify to block merging of the PR. label Feb 10, 2022
@mergify mergify bot merged commit 1337c42 into tinkerbell:main Feb 11, 2022
mmlb added a commit to mmlb/tinkerbell-hook that referenced this pull request Feb 14, 2022
mergify bot added a commit that referenced this pull request Apr 18, 2022
## Description

- Stops serving the gRPC cert via http and stops fetching it.
- Cleans up a ton of code that was confusing to read in the clients due to dealing with the gRPC certificate.
- Removes a few Config structs that are passed into funcs
  These were used to get around having lots of parameters and the long lines that caused, but now are just one or two values.

## Why is this needed

The whole serving of the gRPC certificate via http as `/cert` should never have made it to tinkerbell.
It's too easy to use incorrectly and fall into a sense of security that may not be there.
Its also a pain to actually use in production when following modern best practices of short lived TLS certificates.
Clients can't use gRPC's/Go's builtin certificate handling of when certs are rotated.

This is not a lot of good, yet a bunch of bad only to make a corner case easier at the expense of normal route.
`/cert` makes self-sigend certs "easy"ish (see cert rotation issue still) but it's just not worth it.
If someone can handle the implications of self-signed certs in production (#567 can be used for dev envs) then they can figure out how to embed their CA into hook or roll their own tink-worker environment.

## How Has This Been Tested?

Compiles and tests pass.

## How are existing users impacted? What migration steps/scripts do we need?

This is a breaking change for out-of-tree clients, I think its worth it compared to the ops / security benefits.

Fixes #324
@displague displague added this to the 0.7.0 milestone Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Signal to Mergify to merge the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants