ui: race condition during initialization with login enabled #25771

couchand · 2018-05-21T18:30:47Z

When login is enabled, there seems to be a race condition during initialization, manifesting as the error "settings/cluster/settings.go:164 Version() was called before having been initialized". It seems to occur when an existing admin ui browser tab is open during node startup, issuing requests such as health checks. It was previously theorized that #24945 fixed this issue, but it looks like it's come up since.

Stack trace for my repro just now:

$ COCKROACH_EXPERIMENTAL_REQUIRE_WEB_LOGIN=true ./cockroach start --certs-dir clusters/login-test/certs --store clusters/login-test/cockroach-data/
F180521 18:20:17.173359 127 settings/cluster/settings.go:164  Version() was called before having been initialized
goroutine 127 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc4204fa500, 0xc4204fa540, 0x6fe6c00, 0x1c)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:956 +0xcf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x76ab5e0, 0xc400000004, 0x6fe6cb6, 0x1c, 0xa4, 0xc4207f6400, 0x33)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:832 +0x753
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x63f4aa0, 0xc420054098, 0x4, 0x2, 0x0, 0x0, 0xc420b96c08, 0x1, 0x1)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:154 +0x2e5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x63f4aa0, 0xc420054098, 0x1, 0xc400000004, 0x0, 0x0, 0xc420b96c08, 0x1, 0x1)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:55 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(0x63f4aa0, 0xc420054098, 0xc420b96c08, 0x1, 0x1)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:158 +0x6c
github.com/cockroachdb/cockroach/pkg/settings/cluster.(*ExposedClusterVersion).Version(0xc420705830, 0x0, 0x0, 0x0, 0x5ea7e60)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/settings/cluster/settings.go:164 +0x161
github.com/cockroachdb/cockroach/pkg/settings/cluster.(*ExposedClusterVersion).IsActive(0xc420705830, 0x7, 0x608d904)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/settings/cluster/settings.go:199 +0x2b
github.com/cockroachdb/cockroach/pkg/kv.(*RangeDescriptorCache).performRangeLookup(0xc42052a420, 0x63f4b20, 0xc42024f260, 0xc4202b1c90, 0xe, 0x10, 0x6417800, 0xc4202868c0, 0xc4206eb000, 0x1000, ...)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:419 +0x103
github.com/cockroachdb/cockroach/pkg/kv.(*RangeDescriptorCache).lookupRangeDescriptorInternal.func3(0x0, 0x0, 0x0, 0x0)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:310 +0x132
github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall(0xc42052a460, 0xc42025e780, 0xc420312820, 0x14, 0xc4206b7d00)
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:118 +0x2e
created by github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).DoChan
	/Users/couch/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:111 +0x2d0

The text was updated successfully, but these errors were encountered:

tbg · 2018-05-21T18:43:10Z

We might be opening up the web endpoint too early in server/server.go ((*Server).Start). We intentionally open it early so that the debug pages can be accessed when the cluster can't connect to the cluster.

vilterp · 2018-06-04T19:07:15Z

This used to be easy to repro, but strangely now I can't, with COCKROACH_EXPERIMENTAL_REQUIRE_WEB_LOGIN=true cockroach start --certs-dir certs --logtostderr --host localhost and 10 tabs open. @couchand, can you repro?

vilterp · 2018-06-04T22:06:04Z

@couchand did a bisect and thinks this was fixed by ee6d448 in #26329, but we don't understand why.

We thought that this crash was happening because the web authentication code was running SQL before the SQL system was initialized. It's not evident to me how that PR would have fixed this, but we didn't completely understand the bug in the first place.

couchand · 2018-06-05T16:06:42Z

This bug was created in bedcbed when @vilterp added the maybeAuthMux to pkg/server to handle the root route "/".

bedcbed#diff-09856fe9becddf0199651f451409356aR1187

Previously, the authenticationMux had been mounted around line 1475, well after the pkg/server.Node has been initialized around line 1350 or 1400. The new mux was mounted around line 1185, way before the node is initialized. This provides a wide window between the route being available and the node being initialized.

Node initialization winds up setting the version cluster setting, and so only accesses after this initialization are legal. When a request is received after the root route is available but before the version cluster setting has been initialized, the process panics.

The bug was fixed by @nvanbenschoten in ee6d448. This change removes the method performLegacyRangeLookup from pkg/kv. This removes the need to check the version cluster setting along this code path, so the panic no longer occurs.

ee6d448#diff-ed00a68165bb66ba6524401660df29aaL419

It's great that this specific panic is no longer an issue, but it is a bit troubling that it was so easy to make this situation happen and then very difficult to debug and identify the root issue. Is there something that can be done to prevent this sort of thing from happening again? @tschottdorf @nvanbenschoten @vilterp

a-robinson · 2018-09-05T16:52:54Z

I just ran into this locally on a 1 node cluster on top of a recent master (61325cd):

F180905 16:47:23.248104 72 settings/cluster/settings.go:164  Version() was called before having been initialized
goroutine 72 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc4200c8b00, 0xc4200c8ba0, 0x7871b00, 0x1c)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:972 +0xcf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x7f85580, 0xc400000004, 0x7871bc1, 0x1c, 0xa4, 0xc4204fac80, 0x33)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:841 +0x804
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x6a38360, 0xc4200520d0, 0x4, 0x2, 0x0, 0x0, 0xc420a680c8, 0x1, 0x1)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:154 +0x2e5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x6a38360, 0xc4200520d0, 0x1, 0x4, 0x0, 0x0, 0xc420a680c8, 0x1, 0x1)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(0x6a38360, 0xc4200520d0, 0xc420a680c8, 0x1, 0x1)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:181 +0x6c
github.com/cockroachdb/cockroach/pkg/settings/cluster.(*ExposedClusterVersion).Version(0xc4200a11b0, 0x15518ee0b99de1ba, 0x0, 0x0, 0x0)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/settings/cluster/settings.go:164 +0x161
github.com/cockroachdb/cockroach/pkg/settings/cluster.(*ExposedClusterVersion).IsActive(0xc4200a11b0, 0x1c, 0x0)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/settings/cluster/settings.go:199 +0x2b
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).augmentMetaLocked(0xc42048def0, 0x9a446263b8fbf05e, 0x752022fea9e7e392, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15518ee09bd07cba, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:356 +0x58
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).augmentMetaLocked(0xc42048dc00, 0x9a446263b8fbf05e, 0x752022fea9e7e392, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15518ee09bd07cba, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:514 +0x11b
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSenderFactory).TransactionalSender(0xc42044bee0, 0x1, 0x9a446263b8fbf05e, 0x752022fea9e7e392, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15518ee09bd07cba, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:468 +0x6a8
github.com/cockroachdb/cockroach/pkg/internal/client.NewTxnWithCoordMeta(0xc420516800, 0xc400000000, 0x1, 0x9a446263b8fbf05e, 0x752022fea9e7e392, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:141 +0x13f
github.com/cockroachdb/cockroach/pkg/internal/client.NewTxnWithProto(0xc420516800, 0x0, 0x1, 0x9a446263b8fbf05e, 0x752022fea9e7e392, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:126 +0x150
github.com/cockroachdb/cockroach/pkg/internal/client.NewTxn(0xc420516800, 0x0, 0x1, 0x0)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:116 +0x134
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).prepare(0xc420113300, 0x6a383e0, 0xc420a4d020, 0x6a3ba60, 0xc4207b4800, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:175 +0x291
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).addPreparedStmt(0xc420113300, 0x6a383e0, 0xc420a4d020, 0x0, 0x0, 0x6a3ba60, 0xc4207b4800, 0x0, 0x0, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:122 +0xe1
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execPrepare(0xc420113300, 0x6a383e0, 0xc420a4d020, 0x0, 0x0, 0x6a3ba60, 0xc4207b4800, 0xc420a4cfc0, 0x0, 0x0, ...)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:55 +0x15e
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc420113300, 0x6a383e0, 0xc420a4cdb0, 0x0, 0x0, 0x0)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1143 +0x2b44
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx.func1(0xc420113300, 0x6a383e0, 0xc420a4cdb0, 0xc4203a0480, 0x0, 0xc42050cbd0)
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:210 +0x8a
created by github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx
        /Users/alex/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:199 +0x275

I had a couple logged-in tabs of the admin UI already open from before restarting the process. I also had a tab trying to load the make watch endpoint, with make watch TARGET=https://localhost:8080 running as well.

vilterp · 2018-09-06T20:56:39Z

Attempting to repro and bisect. May have to do with #29230.

vilterp · 2018-09-06T22:58:38Z

Nope, much older than that. Bisect seems to show that first commit with this bug is bedcbed.

couchand · 2018-09-10T15:29:41Z

That's unfortunate. Perhaps we should just live with a round trip for an API call as a short-term fix?

couchand · 2018-09-18T14:31:29Z

it also just highlights how little control our startup process gives about what components of the system get touched at which point in the boot sequence

Well that's exactly what I mean -- how can we make explicit which components are safe to touch and when? I don't think the answer can just be ¯\_(ツ)_/¯...

Currently, in a single 500 line function, we create a bunch of shared mutable state (server/Server has 40 fields and NewServer has 40 local variables), call into a lot of various subsystems (server/server.go directly imports 40 other modules), and hope and pray that we're doing things reasonably (server/server_test.go is half the length of server/server.go).

tbg · 2018-09-18T14:35:56Z

Well that's exactly what I mean

Oh, I somehow thought you meant something about the Version thing specifically. I agree with you that the boot code just needs to be... better.

mberhault · 2018-09-19T01:05:30Z

Occurred again today on cockroach-cyan-0002, running 280d8c7:

F180918 20:03:51.492753 38 settings/cluster/settings.go:164  Version() was called before having been initialized
goroutine 38 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc4200b9201, 0xc4200b92c0, 0x3e7bf00, 0x1c)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:972 +0xcf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x4758340, 0xc400000004, 0x3e7bff0, 0x1c, 0xa4, 0xc4208b2400, 0x33)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:843 +0x7ea
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x2fb8980, 0xc4200d6010, 0x4, 0x2, 0x0, 0x0, 0xc420ab6070, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2e5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x2fb8980, 0xc4200d6010, 0x1, 0x4, 0x0, 0x0, 0xc420ab6070, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(0x2fb8980, 0xc4200d6010, 0xc420ab6070, 0x1, 0x1)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:181 +0x6c
github.com/cockroachdb/cockroach/pkg/settings/cluster.(*ExposedClusterVersion).Version(0xc42008f1a8, 0x15559724c4c3b87f, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/settings/cluster/settings.go:164 +0x161
github.com/cockroachdb/cockroach/pkg/settings/cluster.(*ExposedClusterVersion).IsActive(0xc42008f1a8, 0x1c, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/settings/cluster/settings.go:199 +0x2b
github.com/cockroachdb/cockroach/pkg/kv.(*txnSpanRefresher).augmentMetaLocked(0xc4209286f8, 0x7f4edef963289f72, 0x808961136b56b4be, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15559724a6f6537f, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_span_refresher.go:371 +0x58
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).augmentMetaLocked(0xc420928400, 0x7f4edef963289f72, 0x808961136b56b4be, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15559724a6f6537f, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:534 +0x11b
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSenderFactory).TransactionalSender(0xc4205a45b0, 0x1, 0x7f4edef963289f72, 0x808961136b56b4be, 0x0, 0x0, 0x0, 0x0, 0x0, 0x15559724a6f6537f, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:479 +0x6da
github.com/cockroachdb/cockroach/pkg/internal/client.NewTxnWithCoordMeta(0x2fb8a00, 0xc4205ec0f0, 0xc4204b8580, 0xc400000000, 0x1, 0x7f4edef963289f72, 0x808961136b56b4be, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:145 +0x14c
github.com/cockroachdb/cockroach/pkg/internal/client.NewTxnWithProto(0x2fb8a00, 0xc4205ec0f0, 0xc4204b8580, 0x0, 0x1, 0x7f4edef963289f72, 0x808961136b56b4be, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:126 +0x16a
github.com/cockroachdb/cockroach/pkg/internal/client.NewTxn(0x2fb8a00, 0xc4205ec0f0, 0xc4204b8580, 0x0, 0x1, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/internal/client/txn.go:116 +0x14e
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).prepare(0xc4209c2000, 0x2fb8a00, 0xc4205ec0f0, 0x2fbc100, 0xc42081e240, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:179 +0x2a4
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).addPreparedStmt(0xc4209c2000, 0x2fb8a00, 0xc4205ec0f0, 0x0, 0x0, 0x2fbc100, 0xc42081e240, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:122 +0xd4
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execPrepare(0xc4209c2000, 0x2fb8a00, 0xc4205ec0f0, 0x0, 0x0, 0x2fbc100, 0xc42081e240, 0xc4205ec090, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:55 +0x144
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc4209c2000, 0x2fb8a00, 0xc42045bef0, 0x0, 0x0, 0x0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1144 +0x2b63
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx.func1(0xc4209c2000, 0x2fb8a00, 0xc42045bef0, 0xc4203f6280, 0x0, 0xc4203206e0)
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:211 +0x8a
created by github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:200 +0x275

couchand · 2018-09-19T14:20:38Z

@vilterp what's the status here? Have you made any progress on this?

vilterp · 2018-09-19T17:06:09Z

Not yet, besides identifying the commit which seems to have introduced it. Hoping to look at it today.

tbg · 2018-09-25T13:46:12Z

Friendly ping. I see this in crashes every now and then and presumably this was released in betas already? It's relatively urgent to fix.

vilterp · 2018-09-25T15:17:12Z

Hey Tobi, @couchand and I were debugging this yesterday and will be again today. We have a solid repro. The basic problem is that the authentication mux is running SQL with the internal executor before the Version cluster setting is initialized. The initialization of all the HTTP stuff is pretty spread out in server.Start; not sure why. Hoping to find a reordering such that everything the internal executor needs will be initialized by the time the endpoints are exposed.

tbg · 2018-09-25T15:36:00Z

Sounds good, thanks! FWIW, I'd be fine by me if you just added a way to check if the thing is initialized before you use it (or even just wait for it).

That's a wart, but as long as we're calling it that, I think it's defensible at this point in the cycle. Plus, the version thing is basically impossible to use correctly. Maybe all callers should implicitly wait until there's a sane way to write code that doesn't accidentally trigger this.

vilterp · 2018-09-25T15:58:36Z

Hoping not to have to do something like that. Btw, noticed this method IsInitialized that seems to be a duplicate of HasBeenInitialized (IsInitialized added later):

cockroach/pkg/settings/cluster/settings.go

Lines 141 to 145 in fcc4eb4

    
           // IsInitialized returns true if the cluster version has been initialized and is 
        
           // ready for use. 
        
           func (ecv *ExposedClusterVersion) IsInitialized() bool { 
        
           	return *ecv.baseVersion.Load().(*ClusterVersion) != ClusterVersion{} 
        
           }

I can remove that while I'm here.

This fixes cockroachdb#25771, in which the auth mux was using the internal executor to run SQL before the cluster version, which the internal executor is dependent upon, was initialized. A preferable solution would be to not register any handlers using the auth mux until the cluster setting is initialized, but it's not immediately apparent what reordering of things in Server.Start would achieve this without breaking anything else. Release note: None

vilterp · 2018-09-25T16:45:45Z

Ok, minimal fix in #30640. Would be better to reorder things s.t. that check is not necessary, but I'm a little afraid of rejiggering a bunch of things in Server.Start at this point in the cycle. Open to better fixes if we can think of them.

vilterp · 2018-09-27T15:50:17Z

Discovered while working on a unit test to repro this that the crash only happens when the request contains a cookie; otherwise there is no session to look up and the auth mux never runs any SQL.

So, the user has to log in, then restart a node or start a new node at the same address and send it a request containing that login cookie to trigger this. Doesn't really change the severity, since it's probably likely to be logged in and looking at the admin UI while you restart and add new nodes.

Teach the cluster-init test to reproduce the crash during bootup described in issue cockroachdb#25771. The trick is to send requests with a session cookie, as the admin UI would, while the cluster is waiting for init. Release note: None

Teach the cluster-init test to reproduce the crash during bootup described in issue cockroachdb#25771. The trick is to send requests with a session cookie, as the admin UI would, while the cluster is waiting for init. Note that the crash was caused by the authenticationMux, which is used to protect secure clusters, but this test uses an insecure cluster. This is relying on a wart: an authenticationMux is installed for UI assets whether or not the cluster is secure. This could change in the future. A better test would use a secure cluster, however roachprod/roachtest do not easily support that at this time, nor is it easy to write a unit test that exercises the crash more robustly. Release note: None

30746: server/ui: fix crash caused by race in server initialization r=vilterp a=vilterp Fixes #25771 See commits for details. Co-authored-by: Pete Vilter <[email protected]> Co-authored-by: Nikhil Benesch <[email protected]>

Teach the cluster-init test to reproduce the crash during bootup described in issue cockroachdb#25771. The trick is to send requests with a session cookie, as the admin UI would, while the cluster is waiting for init. Note that the crash was caused by the authenticationMux, which is used to protect secure clusters, but this test uses an insecure cluster. This is relying on a wart: an authenticationMux is installed for UI assets whether or not the cluster is secure. This could change in the future. A better test would use a secure cluster, however roachprod/roachtest do not easily support that at this time, nor is it easy to write a unit test that exercises the crash more robustly. Release note: None

couchand added this to the 2.1 milestone May 21, 2018

couchand assigned vilterp May 21, 2018

couchand mentioned this issue May 21, 2018

Web UI Authn & Authz (Tracking Issue) #25529

Closed

vilterp assigned couchand and unassigned vilterp Jun 4, 2018

a-robinson mentioned this issue Sep 6, 2018

Version() was called before having been initialized #29675

Closed

couchand assigned vilterp and unassigned couchand Sep 17, 2018

couchand mentioned this issue Sep 24, 2018

server: make node startup more principled #30553

Closed

vilterp mentioned this issue Sep 25, 2018

server: bail out of auth mux if cluster version not yet initialized #30640

Closed

benesch mentioned this issue Sep 27, 2018

roachtest: teach cluster-init to reproduce startup crash #30735

Closed

vilterp mentioned this issue Sep 27, 2018

server/ui: fix crash caused by race in server initialization #30746

Merged

craig bot closed this as completed in #30746 Oct 1, 2018

vilterp mentioned this issue Oct 1, 2018

release-2.1: server/ui: fix crash caused by race in server initialization #30838

Merged

mberhault mentioned this issue Oct 15, 2018

Version() was called before having been initialized #31372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui: race condition during initialization with login enabled #25771

ui: race condition during initialization with login enabled #25771

couchand commented May 21, 2018

tbg commented May 21, 2018

vilterp commented Jun 4, 2018

vilterp commented Jun 4, 2018

couchand commented Jun 5, 2018

a-robinson commented Sep 5, 2018

vilterp commented Sep 6, 2018 •

edited

Loading

vilterp commented Sep 6, 2018

couchand commented Sep 10, 2018

couchand commented Sep 18, 2018 •

edited

Loading

tbg commented Sep 18, 2018

mberhault commented Sep 19, 2018

couchand commented Sep 19, 2018

vilterp commented Sep 19, 2018

tbg commented Sep 25, 2018

vilterp commented Sep 25, 2018 •

edited

Loading

tbg commented Sep 25, 2018

vilterp commented Sep 25, 2018

vilterp commented Sep 25, 2018

vilterp commented Sep 27, 2018

ui: race condition during initialization with login enabled #25771

ui: race condition during initialization with login enabled #25771

Comments

couchand commented May 21, 2018

tbg commented May 21, 2018

vilterp commented Jun 4, 2018

vilterp commented Jun 4, 2018

couchand commented Jun 5, 2018

a-robinson commented Sep 5, 2018

vilterp commented Sep 6, 2018 • edited Loading

vilterp commented Sep 6, 2018

couchand commented Sep 10, 2018

couchand commented Sep 18, 2018 • edited Loading

tbg commented Sep 18, 2018

mberhault commented Sep 19, 2018

couchand commented Sep 19, 2018

vilterp commented Sep 19, 2018

tbg commented Sep 25, 2018

vilterp commented Sep 25, 2018 • edited Loading

tbg commented Sep 25, 2018

vilterp commented Sep 25, 2018

vilterp commented Sep 25, 2018

vilterp commented Sep 27, 2018

vilterp commented Sep 6, 2018 •

edited

Loading

couchand commented Sep 18, 2018 •

edited

Loading

vilterp commented Sep 25, 2018 •

edited

Loading