Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Concurrent map writes panic #2917

Closed
infa-bsurber opened this issue Mar 16, 2020 · 1 comment
Closed

Concurrent map writes panic #2917

infa-bsurber opened this issue Mar 16, 2020 · 1 comment
Labels

Comments

@infa-bsurber
Copy link
Contributor

infa-bsurber commented Mar 16, 2020

Describe the bug

As of version 1.17.1, seeing occasional panics in flux with the following trace of the panicking function

fatal error: concurrent map writes                                                                                                                                     
                                                                                                                                                                       
goroutine 71 [running]:                                                                                                      
runtime.throw(0x1acef2f, 0x15)                                                                                                                                        
        /usr/local/go/src/runtime/panic.go:774 +0x72 fp=0xc0025e4ad0 sp=0xc0025e4aa0 pc=0x42f302                                                                       
runtime.mapassign_faststr(0x18506e0, 0xc000446b40, 0xc0001462d2, 0x25, 0x0)                                                                                            
        /usr/local/go/src/runtime/map_faststr.go:291 +0x3fe fp=0xc0025e4b38 sp=0xc0025e4ad0 pc=0x41340e                                                                
github.com/fluxcd/flux/pkg/cluster/kubernetes.(*Cluster).getAllowedAndExistingNamespaces(0xc0000ea700, 0x1e52680, 0xc0000a8058, 0xc000728b40, 0x31, 0x4f5d47, 0xc0018c2
f54, 0x7)                                                                                                                                                              
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/kubernetes.go:314 +0x561 fp=0xc0025e4c18 sp=0xc0025e4b38 pc=0x1582cb1                      
github.com/fluxcd/flux/pkg/cluster/kubernetes.(*Cluster).listAllowedResources(0xc0000ea700, 0x1, 0xc0018c2f54, 0x4, 0xc0018c2f59, 0x2, 0xc0018c3160, 0xc, 0x0, 0x0, ...
)                                                                                                                                                
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/sync.go:298 +0x68 fp=0xc0025e4d20 sp=0xc0025e4c18 pc=0x1592158
github.com/fluxcd/flux/pkg/cluster/kubernetes.(*Cluster).getAllowedResourcesBySelector(0xc0000ea700, 0x0, 0x0, 0x2, 0x2, 0x1e0cb80)                                    
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/sync.go:255 +0x71f fp=0xc0025e5188 sp=0xc0025e4d20 pc=0x1591c0f                            
github.com/fluxcd/flux/pkg/cluster/kubernetes.(*Cluster).Sync(0xc0000ea700, 0xc0022488a0, 0x2b, 0xc00072cf00, 0x9, 0x10, 0x0, 0x0)                                     
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/cluster/kubernetes/sync.go:57 +0x14e fp=0xc0025e5568 sp=0xc0025e5188 pc=0x158f26e                             
github.com/fluxcd/flux/pkg/sync.Sync(0xc0022488a0, 0x2b, 0xc00246e8a0, 0x7feab739cec0, 0xc0000ea700, 0x0, 0xc00044c000)                                                
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/sync/sync.go:16 +0x9f fp=0xc0025e55e0 sp=0xc0025e5568 pc=0x15c931f                                            
github.com/fluxcd/flux/pkg/daemon.doSync(0x1e52680, 0xc0000a8058, 0x1e3c8c0, 0xc0011dd680, 0x1e6a420, 0xc0000ea700, 0xc0022488a0, 0x2b, 0x1e0cb80, 0xc0004465d0, ...)  
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/sync.go:157 +0xd5 fp=0xc0025e5720 sp=0xc0025e55e0 pc=0x15d5ad5
github.com/fluxcd/flux/pkg/daemon.(*Daemon).Sync(0xc00016cb40, 0x1e52680, 0xc0000a8058, 0x214f7d10, 0xed601b176, 0x0, 0xc0012b83f0, 0x28, 0x1e1d380, 0xc0000833c0, ...)
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/sync.go:74 +0x37f fp=0xc0025e5bc8 sp=0xc0025e5720 pc=0x15d4cef                                         
github.com/fluxcd/flux/pkg/daemon.(*Daemon).Loop(0xc00016cb40, 0xc000148780, 0xc0003d5030, 0x1e0cb80, 0xc000446630)
        /home/circleci/go/src/github.com/fluxcd/flux/pkg/daemon/loop.go:96 +0x507 fp=0xc0025e5fb8 sp=0xc0025e5bc8 pc=0x15d2f87
runtime.goexit()                                                                                                                                                       
        /usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc0025e5fc0 sp=0xc0025e5fb8 pc=0x45ecb1                                                                    
created by main.main                                                                                                                                                   
        /home/circleci/go/src/github.com/fluxcd/flux/cmd/fluxd/main.go:752 +0x5a9b 

This points to this line:

c.loggedAllowedNS[name] = false // reset, so if the namespace goes away we'll log it again

It seems that concurrent goroutines are calling the getAllowedAndExistingNamespaces function on the same cluster object, and this results in a panic as they write to the same key on the same map.

This should pretty easily fixable via a RWLock or a map of RWLocks, and moving the map writes & reads into Cluster helper functions or a sync map: https://golang.org/pkg/sync/#Map

To Reproduce

Unknown, it's just started to happen, though this is likely only reproducable when allowed-namespaces is provided

Expected behavior

No panics

Logs

Logs included in the description

Additional context

  • Flux version: 1.17.1
  • Kubernetes version:
  • Git provider:
  • Container registry provider:
@infa-bsurber infa-bsurber added blocked-needs-validation Issue is waiting to be validated before we can proceed bug labels Mar 16, 2020
@stefanprodan stefanprodan removed the blocked-needs-validation Issue is waiting to be validated before we can proceed label Mar 30, 2020
@stefanprodan
Copy link
Member

Fixed by #2926

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants