Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: require web login for secure clusters #28207

Closed
wants to merge 2 commits into from

Conversation

couchand
Copy link
Contributor

@couchand couchand commented Aug 2, 2018

Closes: #6307
Closes: #18206
Closes: #26518
Release note (admin ui change): Login is now required for secure clusters.
Users log in with a regular database username, so that user must already have a
password set. Insecure clusters do not require login, and have a visual
indicator showing that they are insecure.

@couchand couchand requested a review from a team August 2, 2018 16:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

couchand added a commit to couchand/cockroach that referenced this pull request Aug 2, 2018
Once cockroachdb#28207 lands, all of these endpoints will be protected by login already,
so we can skip the check for the setting remote_debugging.mode.

Closes: cockroachdb#24992
Release note (admin ui change): The cluster setting remote_debugging.mode no
longer controls access to any web ui API endpoints, since they are already
protected behind user login.
couchand added a commit to couchand/cockroach that referenced this pull request Aug 2, 2018
Once cockroachdb#28207 lands, all of these endpoints will be protected by login already,
so we can skip the check for the setting remote_debugging.mode.

Closes: cockroachdb#24992
Release note (admin ui change): The cluster setting remote_debugging.mode no
longer controls access to any web ui API endpoints, since they are already
protected behind user login.
@vilterp vilterp force-pushed the feature/enable-login branch from f12d895 to fa5ff70 Compare August 7, 2018 19:39
@couchand couchand requested a review from a team August 7, 2018 19:39
Closes: cockroachdb#6307
Closes: cockroachdb#18206
Closes: cockroachdb#26518
Release note (admin ui change): Login is now required for secure clusters.
Users log in with a regular database username, so that user must already have a
password set.  Insecure clusters do not require login, and have a visual
indicator showing that they are insecure.
@vilterp vilterp force-pushed the feature/enable-login branch from fa5ff70 to d90517d Compare August 9, 2018 19:22
Customers should have login on by default for secure clusters, but may
need to turn it off if there is some bug with login that we don't know
about, or if they really feel they'd rather leave security of the admin
UI to auth proxys, network firewalls, etc.

This commit provides that escape hatch, in a way that does not encourage
the customer to use it: it's not very discoverable, and requires a
restart to take effect.

Release note (admin ui change): add cluster setting to disable web ui
authentication on secure clusters
@mberhault
Copy link
Contributor

LGTM. We'll want some way to disable login for 2.1, but not so official as to make it a cluster setting. I think an environment variable strikes the right balance of 1) ease of use to disable in case of issues and 2) debug-level feature that can go away in 2.2.

@vilterp
Copy link
Contributor

vilterp commented Aug 9, 2018

I spoke to @BramGruneir, @mberhault and @vivekmenezes, who all thought that there should be a way to disable it. @piyush-singh was more on the side of not making it disable-able. We need a decision here.

@mberhault
Copy link
Contributor

The main thinking behind the environment variable is that login can be disabled in case of issues (which is not entirely out of the realm of possibilities).

Unlike cluster settings, environment variables are about deeper settings that shouldn't be widely toggled. It also provides no guarantees about future support, meaning we can (and should) remove it in 2.2 after a full cycle without issues.

It absolutely should not be documented or advertised as a way to easily disable login. Instead, we should make users aware of it if they run into issues. We can re-evaluate the need for a cluster setting to disable login if we get a lot of push back in 2.1.

@vilterp
Copy link
Contributor

vilterp commented Aug 9, 2018

Copied my commit to a separate PR (#28416) to avoid the confusion of a PR that hard codes it and then introduces the env var.

@vilterp
Copy link
Contributor

vilterp commented Aug 9, 2018

Just bors r+d the other PR. There's still time to remove the env var before 2.1 if we so decide.

@vivekmenezes
Copy link
Contributor

vivekmenezes commented Aug 10, 2018 via email

@BramGruneir
Copy link
Member

BramGruneir commented Aug 10, 2018 via email

@couchand
Copy link
Contributor Author

Closing, since #28416 was merged despite what seem to be reasonable, unresolved objections. Oh well.

@couchand couchand closed this Aug 11, 2018
@couchand
Copy link
Contributor Author

couchand commented Aug 11, 2018

This PR itself isn't the problem anymore, but I guess this is where the discussion is happening.

I have a number of concerns here. First and foremost, I don't understand why this is coming up again now. We've already discussed each and every point raised here, yet the prior decision (as far as I was aware) was to not go down this path. That decision seems to have been revisited and then changed without any compelling reason to revisit the decision and without any new arguments made in favor.

Generally, I think it's a bad idea to relitigate decisions made previously without presenting new information. Down that path lies madness. We need some amount of stability, and if we're just randomly changing our minds based on whomever happens to be around, that seems like the opposite of stability. It will be hard to pay down a bug list and deliver new features if we're also freely revisiting every decision we've made.

Specifically, I don't think any of the arguments made on this thread hold water. I don't know if there were other points made IRL that weren't recorded here. Since it seems like we're just going over this discussion once again, I'll add my own perspective. Let's go ahead and consider each of the arguments made on this thread. This is something I'm loathe to do at all, since I'm supposed to be on vacation right now, but because it's critical to get this issue right, I will take the time to help with the analysis. But first, some context.

We discussed previously the idea that we might allow some users to turn off authentication for the web UI. We decided that any such option would need to also disable endpoints that return sensitive information, and the UI elements for the same. We termed this feature "anonymous access" and all agreed that, while it was a good idea, it would be too expensive.

Ok, here we go with addressing the comments in this thread.

@mberhault is arguing that we need to hedge against potential bugs. A bit conservative in my mind (we should probably just invest in sufficient testing!), but not on the whole a crazy argument. He says we should just not document it and plan to remove it in 2.2. The only thing lacking is actually making the short-term nature of the switch clear. According to our recent decisionmaking around unsupported features, such a feature should be clearly indicated by adding EXPERIMENTAL or some such to the environment variable name. Also notably, Marc's position is mutually inconsistent with everyone else's.

@vivekmenezes suggests that we have a variety of users, but without more details we can't really be sure what he means or why it would matter if there's such a switch.

@BramGruneir argues that we aren't going far enough with an environment variable, and the switch should rather be implemented as a cluster setting, so as to not get in the way of our users. Notably, he's the only one to have (explicitly) considered a threat model, though his argument is that we should just expect our users to worry about it.

There are a few problems with this position. Most critically, taken to its logical conclusion, this argument would suggest we should allow users to deploy a "secure" cluster with no authentication even for SQL access. After all, why would we want to get in our users' way? "Requiring login... is just punishing." So why do we not have such a switch to disable all authentication?

Well, I think the arguments in favor of requiring authentication for secure clusters are quite valid: don't give your users a footgun. This is just basic security practice, if you give users a choice they will choose the wrong option.

Moreover, it disrespects the vast majority of our users that don't have sufficient resources to audit the information exposed through the web UI. We don't even have those resources ourselves! (If we did, we would just implement anonymous access...) Exposing this switch actually imposes a significant burden on our users, and it's irresponsible to do so.

A concern that no one on this thread raised (but which was raised in our previous discussions) regards single sign-on: many (most? maybe all?) of our users would actually prefer to authenticate with an external system rather than CRDB's internal user table. This somewhat dovetails into the fact that everywhere in our documentation we advise users against setting passwords (rightly! passwords are broken!), but the only form of authentication we currently support for the web UI is passwords. These arguments are sound, but only really suggest that we need to prioritize SSO support ASAP.

So I'll just wrap up this diatribe with: we shouldn't do this. We agreed not to do this and we don't have a good reason to do this. But I guess it was done anyway.

@vivekmenezes
Copy link
Contributor

I'm happy with this on by default. I'm okay taking the approach that we wait until we hear users complain about this and they will but they do have the workaround of setting a username:password

@BramGruneir
Copy link
Member

So I'd like to rebut a few points you made @couchand.

To your point that we shouldn't relitigate a decision that's already been made, in general, I think that goes against general engineering principles. We often revisit arguments with new information and change prior decisions. To be clear, I don't know where these previous discussions took place. Where did this happen and as this clearly is an impactful change that affects everyone that's using the product, why was this not an RFC? It's a lot easier to defend if we can just point back to the RFC and say this was already settled there. Especially, in the case where these arguments have already been considered. Doing so would have avoided the whole "randomly changing our minds" problem that I think everyone here wants to avoid.

Since I was not really privy to the discussion beforehand, these arguments are all new to me. Including the proposed "anonymous access" mode, which I think would be a great idea.

Now I'd like to rebut against the points you made towards my comments.

To start, I agree entirely with your concern about having only one means of authentication. It is a massive hole in our security model. Not being able to use LDAP or some SSO system yet is frustrating.

To the other points you raise:

Most critically, taken to its logical conclusion, this argument would suggest we should allow users to deploy a "secure" cluster with no authentication even for SQL access. After all, why would we want to get in our users' way? "Requiring login... is just punishing." So why do we not have such a switch to disable all authentication?

To start, I'm not arguing for that. And to be clear, there is right now a huge difference between what can be done in SQL and what can be done via the UI and even the endpoints. We do not require passwords (and even suggest against them) for secure cluster access. So I'd argue that this is materially different.
That being said, I also think your argument against disabling all authentication, while being clearly taken as an extreme example, might actually be useful for some security minded users. It's common practice to block access to the machines already using some other means. So why require yet another level of security on top of that? This is doubly so when dealing with our certs and trying to get a SQL connection established via drivers and ORMs. We only recently got a lot of those examples working in our docs and that was not a simple task and is quite brittle.

Well, I think the arguments in favor of requiring authentication for secure clusters are quite valid: don't give your users a footgun. This is just basic security practice, if you give users a choice they will choose the wrong option.

I think our stance should always be this: Let's default to the most secure option, but allow user to disable it when they choose to. To not give users options is to disrespect them. And to assume that they don't know their own systems and security model is misguided at best.

If we're really being security conscious here, requiring login is asking for the UI to be exposed to the internet. Have we tested this? Have we tried to see if the endpoints can withstand a DDOS attack or other such vectors? Have we tested timed attacks against our login with password system? Do we have an audit trail for all UI logins? Do we have different permissions for UI/endpoint access vs DB access? Do we limit access based on db and table grants in the UI? If not we're clearly giving the wrong impressions to our users on multiple fronts.

I'd like to go back to my original argument which I don't think you've adequately addressed.
Most customers lock down access to their DB machines based on IP, ssh cert, or some other means. Requiring login would be a significant burden on them that in those cases is unnecessary. Why would we not have a cluster setting to turn off login in the UI and let users decide access on their own?

@couchand
Copy link
Contributor Author

Thanks for taking the time to review my comment @BramGruneir, I really appreciate that you care enough to engage with this issue critically.

we shouldn't relitigate a decision that's already been made, in general, I think that goes against general engineering principles

We should absolutely revisit something if we get new information that invalidates an assumption we've made. But I don't think that's happened in this case, the only change (as far as I can tell) was a rotation of personalities.

I agree with your point that the documentation here seems scarce, @piyush-singh has mentioned several times that he wishes we had gone full-RFC on the login question. I totally agree with that point; it seems organizationally we've been shying away from RFCs, because they keep us from moving fast "enough". That seems really unfortunate to me, since careful analysis and documentation has the potential to significantly reduce churn.

there is right now a huge difference between what can be done in SQL and what can be done via the UI and even the endpoints

True, but maybe a bit myopic. I argued very early in this release cycle that the introduced pain of authentication needed to be offset by additional functionality to make it seem worth it. We also are definitely moving in the direction of exposing more sensitive details as well as administrative knobs, and I don't want to get in a situation where we are constantly looking over our shoulder to make sure we're remembering to keep individual pieces of functionality safe -- I'd much rather have the luxury that SQL does: if they have access, they can do stuff.

It's common practice to block access to the machines already using some other means. So why require yet another level of security on top of that?

Onion security. Don't just rely on a single wall repelling all invaders and trust everything inside. That's the sort of thinking that led to https://www.androidauthority.com/wp-content/uploads/2014/06/SSL-Added-and-Removed-Here.jpg

Let's default to the most secure option, but allow user to disable it when they choose to.

I'd suggest that the model should be: default to the most secure option, and allow the user to relax that to any other option we deem to also be secure. Don't provide a footgun. Again, we don't allow a user to run a secure cluster without SQL authentication, why would we allow one without web UI authentication? Given the direction we want things to go, they should really be considered to be the same class of access.

login is asking for the UI to be exposed to the internet. Have we tested this?

Great point! Let's test it, then! (oh, and we should probably also hold SQL to the same standard, which as far as I know has undergone none of the testing or auditing that you recommend, which is unfortunate)

Do we limit access based on db and table grants in the UI? If not we're clearly giving the wrong impressions to our users on multiple fronts.

I've made this argument over and over, you won't get any disagreement from me. Unfortunately, that's been deprioritized.

I'd like to go back to my original argument which I don't think you've adequately addressed.
Most customers lock down access to their DB machines based on IP, ssh cert, or some other means. Requiring login would be a significant burden on them that in those cases is unnecessary. Why would we not have a cluster setting to turn off login in the UI and let users decide access on their own?

I do think I've addressed that argument, by proxy. I think the argument can identically be made regarding a SQL connection. Unless we're willing to allow unfettered access universally (and we absolutely should not), we shouldn't allow unfettered access to the web UI.

@timaa2k
Copy link

timaa2k commented Jun 12, 2019

We at Mesosphere are currently relying on the undocumented environment variable COCKROACH_DISABLE_WEB_LOGIN to disable the Admin UI login. Our own Identity and Access Management component takes care of guarding the CockroachDB UI but we still require a secure cluster for TLS support between CockroachDB nodes.

In the future we'd really like to avoid scripting our way to do an automated CockroachDB login procedure in the current form with gRPC + protobuf. So please keep this option around until there are better alternatives for an automated login procedure.

@couchand
Copy link
Contributor Author

We at Mesosphere are currently relying on the undocumented environment variable COCKROACH_DISABLE_WEB_LOGIN to disable the Admin UI login.

I was deathly afraid something like that might happen. Oh well, I do hope nothing bad comes of it.

Of course the proper answer here is single-sign on, but it looks like there's not many resources going to addressing the problem now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants