Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport deltaFIFO race to 1.3? #10826

Closed
smarterclayton opened this issue Sep 6, 2016 · 16 comments
Closed

Backport deltaFIFO race to 1.3? #10826

smarterclayton opened this issue Sep 6, 2016 · 16 comments
Assignees
Milestone

Comments

@smarterclayton
Copy link
Contributor

kubernetes/kubernetes#32125

Could result in incorrect decisions made based on a "sync", which is bad. We need to triage which controllers we have that depend on List() from a shared informer or DeltaFIFO being consistent and correct in order to decide whether to have it for 1.3.

@smarterclayton smarterclayton added this to the 1.3.0 milestone Sep 6, 2016
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Sep 6, 2016

@ncdc @liggitt @deads2k @bparees @knobunc which controllers use DeltaFIFO or use a shared informer and are dependent on a consistent list (i.e. a snapshot list in order to recalculate a state table, an allocator, or decide what to remove) to make decisions about deletion or other critical data?

@bparees
Copy link
Contributor

bparees commented Sep 6, 2016

the build pod delete controller and build delete controller use it to keep track of builds or pods that have been deleted so the corresponding pod or build can be deleted/marked appropriately.

@deads2k
Copy link
Contributor

deads2k commented Sep 7, 2016

the build pod delete controller and build delete controller use it to keep track of builds or pods that have been deleted so the corresponding pod or build can be deleted/marked appropriately.

Does order matter for that?

I can't think of an order dependent controller. Many already process in multiple threads.

@bparees
Copy link
Contributor

bparees commented Sep 7, 2016

order of seeing "delete for pod1" vs "delete for pod2"? no, should not matter.

@ncdc
Copy link
Contributor

ncdc commented Sep 7, 2016

I'm going to have to defer to the others on this one - I haven't looked at or touched the Origin controllers in ages.

@knobunc
Copy link
Contributor

knobunc commented Sep 7, 2016

@dcbw and I are investigating the impact on networking.

@deads2k
Copy link
Contributor

deads2k commented Sep 7, 2016

The changes look reasonable. We'll need kubernetes/kubernetes#30948 and kubernetes/kubernetes#32125

@smarterclayton
Copy link
Contributor Author

Hrm. Maybe we can live with it...

On Wed, Sep 7, 2016 at 12:50 PM, David Eads [email protected]
wrote:

The changes look reasonable. We'll need kubernetes/kubernetes#30948
kubernetes/kubernetes#30948 and
kubernetes/kubernetes#32125
kubernetes/kubernetes#32125


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#10826 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p5Lkv1-2Fxzs5SJn9ZDbKfD16nkoks5qnutygaJpZM4J2PYM
.

@dcbw
Copy link
Contributor

dcbw commented Sep 7, 2016

Networking is not affected by this change since we don't use DeltaFIFO yet (waiting on that until 3.4).

Services that only care about deletion (like the build controller) shouldn't be affected either, since they only care about deletion, and the DeltaFIFO already discards Sync events when a previous Deleted event is in the queue.

The service controller double-checks the apiserver for the services state, so it doesn't care either, though it would log errors if the Update->Sync race happened.

But other users (like pkg/controller/framework/ and their Informer/SharedInformer) will be affected by the race, and that means a lot of existing code that uses informers.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Sep 7, 2016 via email

@tracyrankin tracyrankin modified the milestones: 1.3.1, 1.3.0 Sep 9, 2016
@ichekrygin
Copy link

ichekrygin commented Sep 26, 2016

I am hitting this issue in v1.3.6:

fatal error: concurrent map read and map write

goroutine 140812 [running]:
runtime.throw(0x1b9f100, 0x21)
        /usr/local/go/src/runtime/panic.go:547 +0x90 fp=0xc82059eaf8 sp=0xc82059eae0
runtime.mapaccess1_faststr(0x148a4a0, 0xc820317f50, 0xc8208fd0e0, 0x15, 0x1)
        /usr/local/go/src/runtime/hashmap_fast.go:202 +0x5b fp=0xc82059eb58 sp=0xc82059eaf8
k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).queueActionLocked(0xc8200e5b80, 0x1a3e8c8, 0x4, 0x19dde80, 0xc820d01348, 0x0, 0x0)
        /usr/local/google/home/beeps/goproj/src/k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:305 +0x1dd fp=0xc82059ecc0 sp=0xc82059eb58
k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*DeltaFIFO).Resync(0xc8200e5b80, 0x0, 0x0)
        /usr/local/google/home/beeps/goproj/src/k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache/delta_fifo.go:511 +0x4f7 fp=0xc82059ee38 sp=0xc82059ecc0
k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch.func1(0xc8200c01e0, 0xc82008ec00, 0xc820612000, 0xc820cf04e0, 0xc8200c01e8)
        /usr/local/google/home/beeps/goproj/src/k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:289 +0x252 fp=0xc82059ef88 sp=0xc82059ee38
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82059ef90 sp=0xc82059ef88
created by k8s.io/contrib/ingress/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).ListAndWatch

This issue impacts ingress-controller resulting in frequent restarts

nginx-ingress-4280m             1/1       Running   25         5d
nginx-ingress-5dv2y             1/1       Running   25         5d
nginx-ingress-6pwtz             1/1       Running   16         3d
nginx-ingress-vt8cr             1/1       Running   24         5d

@smarterclayton
Copy link
Contributor Author

Looks like contrib/ingress vendors the kube libraries and would need to be updated. Is there an issue for that in contrib already?

@ichekrygin
Copy link

@smarterclayton I didn't create one, could you pls help me in creating one?

@smarterclayton
Copy link
Contributor Author

Opened kubernetes-retired/contrib#1790

@ncdc
Copy link
Contributor

ncdc commented Oct 27, 2016

@smarterclayton do you still want this in 1.3.x or can we close and it'll be in 1.4?

@smarterclayton
Copy link
Contributor Author

It's in 1.4, we can live without 1.3.

On Thu, Oct 27, 2016 at 2:14 PM, Andy Goldstein [email protected]
wrote:

@smarterclayton https://github.com/smarterclayton do you still want
this in 1.3.x or can we close and it'll be in 1.4?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#10826 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p80aYK8YpawP4yc_8qxbfjg3Nmvtks5q4OoHgaJpZM4J2PYM
.

@ncdc ncdc closed this as completed Oct 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants